AWS Public Sector Blog
Customizing isolated JupyterLab environments in Amazon SageMaker Studio
To provide robust security and compliance measures, organizations face the critical task of isolating and customizing their JupyterLab environments within Amazon SageMaker Studio. This challenge is particularly relevant in two common scenarios: when data scientists need to work within isolated JupyterLab environments while maintaining strict security controls and when educators need to incorporate proprietary datasets into artificial intelligence and machine learning (AI/ML) courses while preventing any potential data exfiltration.
In Amazon SageMaker Studio, users leverage JupyterLab as their Integrated Development Environment (IDE) for ML development, data analysis, and visualization. When working with these JupyterLab environments in secure settings, controlling data movement becomes essential for maintaining security compliance and protecting sensitive data. An isolated JupyterLab environment typically refers to a deployment within a private subnet of an Amazon Virtual Private Cloud (Amazon VPC) without internet access—meaning there is no internet gateway or NAT gateway configured. This setup is referred to as “air-gapped,” meaning that data and resources remain strictly within the controlled environment and can’t communicate with external networks.
To further strengthen this security posture and prevent potential data exfiltration, this technical post demonstrates how to enhance security and compliance in an isolated SageMaker JupyterLab environment by implementing two key customizations:
- Configuring download restrictions
- Implementing secure Python package installation through AWS CodeArtifact
These solutions are specifically designed to meet the stringent security requirements and compliance standards of public sector organizations, from government research laboratories to public educational institutions.
Solution one: Configuring download restrictions
As part of enhancing the security posture of an isolated JupyterLab environment and preventing potential data exfiltration, this solution uses a SageMaker AI lifecycle configuration script to disable the following JupyterLab extensions that could otherwise allow file downloads:
`@jupyterlab/docmanager-extension:download`
`@jupyterlab/filebrowser-extension:download`
`@jupyterlab/filebrowser-extension:open-browser-tab`
Prerequisites
Before implementing this solution, you need to have:
- An Amazon SageMaker AI domain
- Appropriate AWS Identity and Access Management (IAM) permissions to create and modify SageMaker AI lifecycle configurations
- Downloaded the SageMaker AI lifecycle configuration script
lifecycle-config.sh
from an AWS Samples GitHub repository
Implementation steps
Follow these steps to configure download restrictions.
To create and attach a lifecycle configuration:
- Navigate to the SageMaker AI console.
- Choose and open the SageMaker AI domain you want to work with.
- Go to the Environment
- In the Lifecycle configuration for personal Studio apps section, choose Attach.
- For Source, choose New configuration.
- For Select application type, choose JupyterLab.
- Give the lifecycle configuration a name, for example,
disable-download
. - In the Scripts field, paste the contents from the
lifecycle-config.sh
file you downloaded from the GitHub repository. - Choose Attach to domain.
You can also create and attach a lifecycle configuration to an individual user profile in a domain.
To apply the lifecycle configuration:
- Launch SageMaker Studio and select the JupyterLab application.
- Create a new JupyterLab private space or select an existing one.
- Before running the JupyterLab space, navigate to Space Settings.
In the Lifecycle Configuration field, choose the lifecycle configuration created in the previous step, as shown in the following figure.
To apply a lifecycle configuration to a JupyterLab shared space, you must specify the lifecycle configuration when creating or updating the shared space using the AWS CLI. For example, the following command creates a new JupyterLab shared space with a default lifecycle configuration.
aws sagemaker create-space \
--domain-id <domain-id> \
--space-name <space-name> \
--region <region> \
--space-settings '{
"AppType": "JupyterLab",
"JupyterLabAppSettings": {
"DefaultResourceSpec": {
"InstanceType": "ml.t3.medium",
"LifecycleConfigArn": "<lifecycle-configuration-arn>"
}
}
}' \
--space-sharing-settings '{
"SharingType": "Shared"
}' \
--ownership-settings '{
"OwnerUserProfileName": "<user-profile-name>"
}'
Verification process
After applying the lifecycle configuration, run the JupyterLab space. In the opened JupyterLab environment, verify that the download capabilities are disabled by attempting the following actions:
- In the file browser panel:
- Right-click on any file to verify that the Download option is unavailable.
- Right-click any file to verify that the Open in Browser Tab option is unavailable.
- In the top menu bar:
- Choose File to verify the Download option is disabled or hidden.
- Choose File to verify that Save and Export Notebook As options are disabled.
These restrictions make sure that files remain securely within the JupyterLab environment and can’t be downloaded to local devices. The complete solution, including additional debugging information, is available in the sample-disable-sagemaker-jupyter-download AWS Samples GitHub repository.
Solution two: Python package installation using AWS CodeArtifact
In an isolated SageMaker JupyterLab environment with restricted internet access, data scientists face a significant challenge: they can’t directly access public repositories like PyPI to install required Python packages. This network isolation, while essential for security, can potentially impede their workflow.
This solution demonstrates how to configure AWS CodeArtifact as a private package repository and integrate it with SageMaker JupyterLab through virtual private cloud (VPC) endpoints and AWS PrivateLink. The implementation enables data scientists to securely install Python packages from PyPI without requiring direct internet access.
Architecture overview
The following architecture diagram illustrates how Python packages can be securely installed in an isolated SageMaker JupyterLab environment. AWS CodeArtifact serves as a private repository that connects to the isolated JupyterLab environment through VPC endpoints and AWS PrivateLink, while also maintaining a connection to public PyPI repositories.
Implementation steps
Follow these steps to configure AWS CodeArtifact as a private package repository and integrate it with SageMaker JupyterLab.
To create an AWS CodeArtifact domain:
- Navigate to the AWS CodeArtifact console.
- Choose Domains in the navigation pane.
- Choose Create domain.
- Give the domain a name.
- Choose Create domain.
To create an AWS CodeArtifact repository:
- Choose Repositories in the navigation pane.
- Choose Create repository.
- Give the repository and name and an optional repository description.
- For Public upstream repositories, choose pypi-store. Choose Next.
- For AWS account, choose This AWS account.
- For Domain, choose the domain created in the previous step.
- Choose Next.
- Choose Create repository.
Upon completion, you have established two CodeArtifact repositories: a pypi-store
repository that serves as a connection to the external public PyPI repository and a private repository that uses pypi-store
as its upstream source. This configuration enables secure package access from the isolated SageMaker JupyterLab environment while maintaining control over external dependencies.
To copy the CodeArtifact login AWS Command Line Interface (AWS CLI) command:
- Open the private repository, in the Packages section, choose View connection instructions.
- For Operating system, choose Mac & Linux.
- For Choose a package manager client, choose pip under Python.
- For Select a configuration method, choose Configure using AWS CLI.
- Choose Copy in the Configure your pip client using the AWS CLI CodeArtifact command.
The command you copied is an AWS CLI command that authenticates your pip client with CodeArtifact. It follows this format:
aws codeartifact login --tool pip --repository my-pypi-repository --domain my-pypi-domain --domain-owner 111222333444 --region us-east-1
To configure CodeArtifact VPC interface endpoints:
- Navigate to the Amazon VPC console.
- Choose Endpoints in the navigation pane.
- Choose Create endpoint.
- Give the endpoint a name.
- For Type, choose AWS services.
- In the Services search box, search
codeartifact
and choose the Service Name withcom.amazonaws.[region].codeartifact.repositories
, as shown in the following screenshot. - For VPC, choose the VPC where you host the isolated SageMaker AI domain.
- For Subnets, choose the subnets where you host the isolated SageMaker AI domain.
- For Security groups, choose a security group that allows inbound traffic from the VPC hosting the SageMaker AI domain.
- Choose Create endpoint.
- Repeat the same process to create a second VPC interface endpoint, but this time select the Service Name with
com.amazonaws.[region].codeartifact.api
, as shown in the following screenshot.
Add CodeArtifact permissions to the SageMaker execution role:
To enable SageMaker JupyterLab notebooks to access CodeArtifact repositories, you need to attach appropriate permissions to SageMaker execution role. The recommended approach is to use the AWS managed policy AWSCodeArtifactReadOnlyAccess
. This can be configured at either the domain level or the user profile level.
Verification process
To verify the CodeArtifact connection in the isolated SageMaker JupyterLab environment, follow these steps:
- Open a cell in a JupyterLab notebook within the isolated SageMaker environment
- Copy and paste the AWS CodeArtifact AWS CLI login command from your repository’s connection instructions. For example:
!aws codeartifact login --tool pip --repository my-pypi-repository --domain my-pypi-domain --domain-owner 111222333444 --region us-east-1
- Execute the notebook cell. Upon successful authentication, you receive a confirmation message similar to:
Successfully configured pip to use AWS CodeArtifact repository https://my-pypi-domain-111222333444.d.codeartifact.us-east-1.amazonaws.com/pypi/my-pypi-repository/
After it’s authenticated, you can install Python packages in the JupyterLab notebook through the private CodeArtifact repository without internet access. For instance, to install the NumPy package, run the command `!pip install numpy`
in a notebook cell. This way, you can maintain a secure development environment while still accessing the Python packages your projects require.
Cleanup
To avoid unnecessary costs, delete the resources created in this tutorial if you no longer need them. Before deleting any resources, make sure that all important notebooks and data have been backed up and no other users or processes are actively using these resources.
To delete AWS CodeArtifact resources:
- Delete the private repositories you created.
- Delete the CodeArtifact domain.
- Delete the VPC endpoints for CodeArtifact repositories and API.
To delete SageMaker resources:
- Remove the lifecycle configuration from the domain or user profile if no longer needed.
- If you created other SageMaker resources for this tutorial, you can delete unused resources to avoid incurring additional costs.
Conclusion
The customizations described in this post enhance data security by preventing unauthorized file downloads and controlling package management in isolated SageMaker JupyterLab environments. These controls can be part of a broader security strategy when working with sensitive data. These solutions use built-in AWS services, minimizing additional costs and operational overhead. This approach allows organizations to scale their secure JupyterLab environments efficiently without compromising security.
Real-world implementation has delivered significant benefits:
- Professors can confidently incorporate real-world, sensitive datasets into their curriculum.
- Students can work with production-grade data while maintaining security compliance.
- Institutions maintain regulatory compliance without sacrificing educational quality.
- The solution scales cost-effectively across different courses and departments.
Other resources
To learn more about securing SageMaker AI environments:
- AWS Machine Learning Blog: Building secure machine learning environments with Amazon SageMaker
- Amazon SageMaker Developer Guide: Configure security in Amazon SageMaker AI
- CodeArtifact User Guide: What is AWS CodeArtifact?
- Amazon SageMaker AI workshop: Managing Administrative Tasks in Amazon SageMaker AI