The Role of Accelerated Computing in Genomics
The National Institute of Standards and Technology hosts a public-private consortium called Genome in a Bottle to develop conventions and resources on human genome sequencing for clinical practice. Justin Zook, co-leader of Biomarker and Genomic Sciences Group at NIST, discussed unexpected use cases from the program’s data.
“One of the use cases that we didn’t really envision when we were starting Genome in a Bottle 12 or 13 years ago is that these can also be used to train machine learning models or deep learning models,” Zook said.
For example, DeepVariant, which was developed by Google and implemented in NVIDIA Parabricks, could train and test on GIAB benchmarks in an easier and faster way. “One of the things the deep learning methods have really helped with is the adoption of new technologies,” he added.
PREPARE: Expert guidance helps healthcare organizations achieve meaningful transformation with AI.
Laura Egolf, a computational scientist at the National Cancer Institute’s Frederick National Laboratory for Cancer Research, discussed how modernized workflows are supporting major studies on COVID-19 and other diseases.
“Newer technology allows us to conduct whole genome sequencing, which can capture more variation and enables us to study greater variants,” Egolf said. “Fortunately, the cost of whole genome sequencing has decreased substantially since the first human genome was sequenced in the early 2000s.”
COVNET is one such large-scale genomic study hoping to identify common and rare genetic variants associated with COVID-19 severity across different individuals and populations, she said.
The study has collected thousands of samples, which has raised issues concerning the cost and availability of storage and computational power to analyze all of the data. Data processed with standard CPU-based pipelines could take a day or longer for each sample.
With the development of an accelerated, cloud-portable pipeline based on Parabricks, run times for certain steps could be reduced to just three to four hours.
“The growing scale of genetic data requires accelerated, cloud-compatible solutions,” Egolf said. “These accelerated pipelines are enabling large-scale genetic studies of COVID-19, pediatric cancer, radiation exposure and other areas of human health.”