Hardware

High-Performance Computing Breaks the Genomics Bottleneck

HPC enables accurate and rapid analysis for sequencing centers, clinical teams, genomic researchers and developers of sequencing equipment.

Nathan Eddy

Nathan Eddy works as an independent filmmaker and journalist based in Berlin, specializing in architecture, business technology and healthcare IT. He is a graduate of Northwestern University’s Medill School of Journalism.

Sequencing our 3.2 billion DNA base pairs is becoming increasingly crucial as genomic testing gains widespread acceptance.

Advancements in genomics are improving the detection of mutations that can lead to illnesses, with the potential to revolutionize personalized medicine by enabling the development of more effective treatments for genetic disorders.

High-performance computing is revolutionizing the field of genomics by accelerating the speed of analysis and processing of large-scale gene sequencing data sets.

However, genomics is facing a massive Big Data problem. Scientists are struggling to process a growing volume of data as precision medicine turns to gene sequencing for individual patients.

By leveraging the compute power of graphics processing units, geneticists can speed up analysis and reduce the cost of processing the huge amounts of data produced by gene sequencing.

NVIDIA is among the companies offering GPU-based HPC solutions.

NVIDIA’s Clara Parabricks is a GPU-accelerated computational genomics toolkit that supports analytical workflows for next-generation sequencing, including short- and long-read applications. The toolkit is designed for analyzing genomic data after it comes off the sequencer and turning it into interpretable data.

EXPLORE: How NVIDIA helps healthcare organizations unlock patient data.

“One of the main benefits of the software is that it is based on industry-standard tools, so a lot of what would be run on CPU can now be run on GPU,” explains Harry Clifford, NVIDIA’s head of genomics product. “There is a huge acceleration factor with it being on GPU.”

That translates into more than 80 times the acceleration on some of those industry standard tools, he notes, adding that the software is also scalable.

“It’s fully compatible with all the workflow managers genomics researchers are using,” Clifford says. “There’s also the improved accuracy point, which is provided by artificial intelligence-based deep learning and high accuracy approaches included in the toolkit.”

Volume, Velocity and Variety of Data Pose Challenges in Genomics

Clifford points out that Big Data analysis can be split broadly into three pillars: the amount of data (volume), the speed of processing (velocity) and the number of data types (variety).

“First off, we have this huge explosion of data, this volume problem in genomics, and that’s why you need HPC solutions,” he says.

The second aspect of the Big Data challenge is velocity, as each sample that is run through a sequencer must be run through a sequencing process, a wet lab process and then through the computational analysis process.

“Those sequencers are now running so quickly that compute is the new bottleneck in genomics,” Clifford explains.

First off, we have this huge explosion of data, this volume problem in genomics, and that’s why you need HPC solutions.”

Harry Clifford Head of Genomics Product, NVIDIA

The need to handle different sequencing experiments, from RNA to DNA sequencing or tumor sequencing, means there’s a huge challenge of data variety as well.

“That’s where deploying AI solutions and more adaptable solutions actually becomes really important,” Clifford says.

He adds that AI is now vital to genomics by driving higher accuracy and entirely novel insights, which is being done on GPUs at high speed and low cost.

“We have that in our Clara Parabricks genomics analysis software with AI-led, neural network–based solutions for high accuracy, as well as downstream in a lot of the drug discovery work with large language models driving new insights in the field,” he says.

The Benefits of a GPU-Based Approach for Genomics Analysis

The GPU-based approach allows for acceleration in the processing of various types of data.

“If you were to compare the run times on a CPU with a GPU for analysis of a single sample end to end, you’re looking at somewhere on the order of 24 hours-plus on a CPU, whereas on the GPU we have that down to less than 25 minutes on our DGX systems,” Clifford says. “That’s a huge acceleration of the analysis.”

He says the second benefit to increased processing power and reduced analysis times is lower costs.

“If you need to run this in the cloud, for example, where time is money, then that reduced time is saving you a huge amount in costs as well,” he says.

Clifford explains that NVIDIA’s full-stack approach to solutions means it’s getting easier for healthcare organizations to tap the power of HPC.

“You’re able to program these chips, to use so many different libraries for the data science steps and for the genomics itself with Clara Parabricks,” he explains. “The tools are there, and all of this analysis can now be brought on the GPU.”

He points to the next generation of chips, including the recently released H100 Tensor Core GPU, which he describes as “very well suited to genomics analysis.” It boasts a transformer engine that determines whether 8-bit and 16-bit floating point calculations are appropriate.

“The H100 also features a dynamic programing core, which accelerates routing pattern algorithms,” says Clifford. “This is incredibly useful for the alignment step of rebuilding a genome during rapid sequencing of DNA and RNA.”

All these components mean it will work well with large language models and some of the latest transformer-based deep learning architectures.

“These very large models give us a new ability to interpret data and understand biological meaning,” he says. “That’s an area of the field that is just getting started and benefits hugely from HPC and GPU.”

Brought to you by: