AWS Public Sector Blog

How NIH scientists unlocked cardiovascular disease insights using AWS

AWS Branded Background with text "How NIH scientists unlocked cardiovascular disease insights using AWS"

Scientists at the National Institutes of Health (NIH) recently uncovered how a structure known as low-density lipoprotein (LDL), which transports “bad” cholesterol through the bloodstream, interacts with its receptor molecule to enter cells—information that has eluded researchers for decades. The findings could lead to more personalized treatments for cardiovascular disease and were enabled by cutting-edge high performance computing (HPC) infrastructure from Amazon Web Services (AWS).

The challenge

Scientists use cryogenic electron microscopy (cryo-EM) to determine the 3D structure of biomolecules at near-atomic resolution. Cryo-EM transmission electron microscopes flash-freeze protein samples in vitreous ice, revealing 3D digital representations of microscopic structures in near-native states. These 3D representations produce terabytes (TBs) of data per sample and the data resolution increases with each generation of new instruments. The data needs to be processed using HPC resources with a complex pipeline accelerated by one or more graphics processing units (GPUs). Cryo-EM requires iterative processing of large datasets, therefore reducing costs and operational overhead while increasing processing speeds is critical to enhancing the quality of structural biology research. Understanding how LDL interacts with its receptor (LDLR) has also been a major challenge for decades in cardiovascular research. Previous attempts to visualize this interaction were limited by LDL’s large size and structural complexity, as well as inadequate computing power and data storage.

The process

The workflow of data collection is shown in the following figure and is discussed in detail in Cryo-electron microscopy: A primer for the non-microscopist by Milne et al.

Figure 1. Cryo-EM structure determination workflow

  1. During the cryo-EM process, a homogenous, highly pure protein sample is applied to cryo-EM grids. The sample is rapidly frozen in liquid ethane in a thin layer of vitreous ice.
  2. Images are recorded as movies on a transmission electron microscope.
  3. Movie frames are then aligned to reduce the effects of drift or inability of a microscope to maintain the selected focal plane over an extended period of time.
  4. Particles are picked from each micrograph with those representing the same view.
  5. The particles are then grouped together to create 2D images.
  6. 2D images are then computationally aligned to generate a 3D map.
  7. Using sophisticated modeling, 3D classification can identify different conformational states, or changes in the shape of a macromolecule, of the protein.

The solution

NIH researchers needed powerful computational resources to process massive amounts of imaging data for their research. The cryo-EM dataset used to determine the LDL structure contained over 35,000 movies. A typical movie is about 0.5 gigabytes (GB) in size, resulting in approximately 17.5 (TB) per dataset. Data processing of the images increases the size of the data by 5 times. In addition to data storage, cryo-EM data processing is computationally intensive. The structure of a yeast spliceosomal complex requires more than half a million CPU hours of classification and high-resolution refinement as described by elifesciences. The implementation of GPUs to alleviate the computational bottleneck has transformed the cryo-EM field. Many of the common cryo-EM software packages have been redesigned to take advantage of recent advances in GPU technology and can now implement many independent tasks simultaneously.

As an outcome of this research, NIH scientists were able to show for the first time how ApoB100, the main structural protein of LDL, binds to its receptor—a process that starts the clearance of LDL from the blood – and what happens when that process is impaired in a disease called Familial Hypercholesterolemia that often leads to early heart disease. To complete the research, the NIH research team leveraged several AWS services and capabilities. The following figure shows an AWS HPC environment used by the NIH research team.

Figure 2. Architecture diagram of the compute-intensive HPC NIH workloads

 

The impact to cardiovascular research

The AWS infrastructure has revolutionized NIH’s research capabilities, enabling the processing of more than 35,000 molecular movies per dataset while efficiently managing 17.5TB of raw data per experiment, which typically expands by 3-5x during processing. The transformation to AWS has dramatically accelerated project completion rates, with 20 new structures determined successfully in just 12 months—a stark contrast to the traditional 2–3-year timeline required for on-premises implementations. The computing power offered by AWS has proven to be remarkably superior, delivering up to 12 times faster processing speeds compared to traditional on-premises systems. Additionally, the cloud infrastructure has significantly enhanced collaboration among researchers, making it easier for multiple teams to work simultaneously on complex datasets and share findings in real-time, ultimately accelerating the pace of scientific discovery.

The results of this groundbreaking research are published in the journal Nature and might lead to more highly targeted drugs for reducing blood cholesterol. The computational infrastructure established on AWS continues to support research initiatives at NIH, thereby demonstrating the power of cloud computing in advancing biomedical research.

Additional benefits of the AWS HPC solution

The AWS environment allows for both burst computing needs and sustained HPC workloads while maintaining security and performance requirements. This architecture is utilized for the cryo-EM research to accelerate:

  • Data collection and processing:
    • High-speed data ingestion from electron microscopes through AWS Direct Connect
    • Raw image data can be initially stored in Amazon FSx for Lustre for immediate processing
  • GPU queues (G4/G5/G6), which are crucial for:
  • Storage management:
    • Active datasets stored in Amazon S3 Standard for frequent access
    • Completed projects moved to S3 Intelligent-Tiering for cost optimization
    • FSx for high-performance shared storage for processing pipelines
    • DRA ensures seamless data movement between storage tiers
  • Computational workflows:
    • Multiple processing queues support different computational needs:
      • CPU Queue for basic preprocessing
      • GPU Queue for intensive image processing
      • Multi-GPU Queue for complex 3D reconstructions
      • Parallel processing capabilities for handling large datasets
  • Research collaboration:
    • Secure VPC environment for data protection
    • Shared volumes enable team collaboration
  • Cost management:
    • Scalable resources based on processing demands
    • Storage tiering optimizes costs for long-term data retention
    • Pay-per-use model for compute resources

The solution can also be integrated with common cryo-EM software packages and scaled according to research requirements.

The key components of the solution include:

Curious to learn more?

Learn more about how AWS can help build game-changing GPU-enabled cryo-EM workflows on AWS. Check out how AWS services provide an agile, modular, and scalable architecture to optimize the cryo-EM workflows thus providing a pathway to wider adoption of cryo-EM as a standard tool for structural biology.

Learn more about AWS solutions for healthcare and life sciences.

Gargi Singh Chhatwal

Gargi Singh Chhatwal

Gargi is a senior solutions architect with expertise in AI/ML and HPC, supporting US National Institutes of Health (NIH). He has eight years of experience assisting public sector customers leveraging AWS technology to build, architect, and design solutions for healthcare and life sciences and scientific research advancements.

Evan Bollig

Evan Bollig

Evan is a principal specialist solutions architect for high performance computing (HPC) with AWS and a senior member of the Institute of Electrical and Electronics Engineers (IEEE). His experience in HPC spans all roles and all workloads, from performance tuning multi-GPU physics codes for large-scale systems, to developing secure cloud-native infrastructures for production clinical genomics pipelines. Evan champions for greater good in the AWS community through open science and open source.

Tom Fonseca

Tom Fonseca

Tom is a lead customer solutions manager at AWS supporting U.S. federal customers with healthcare and biomedical research missions. He has over 25 years of experience leading enterprise-scale operations, transformations, and migrations for customers in the public and private sectors including manufacturing, energy production and transmission, and financial services.

Jon Lemon

Jon Lemon

Jon is a senior customer solutions manager at AWS supporting the National Institutes of Health. He has over 20 years of experience in the field with expertise in implementing advanced analytics, machine learning, and artificial intelligence solutions that enable federal government organizations to leverage data for more efficient, timely, and cost-effective decision-making.

Tyler Willis

Tyler Willis

Tyler is the principal account manager for the National Institutes of Health (NIH) at AWS. He has 20 years of experience helping public sector customers adopt innovative technology. He specializes in the heath care and life sciences industry.