• Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Special Offers
Business Intelligence Info
  • Business Intelligence
    • BI News and Info
    • Big Data
    • Mobile and Cloud
    • Self-Service BI
  • CRM
    • CRM News and Info
    • InfusionSoft
    • Microsoft Dynamics CRM
    • NetSuite
    • OnContact
    • Salesforce
    • Workbooks
  • Data Mining
    • Pentaho
    • Sisense
    • Tableau
    • TIBCO Spotfire
  • Data Warehousing
    • DWH News and Info
    • IBM DB2
    • Microsoft SQL Server
    • Oracle
    • Teradata
  • Predictive Analytics
    • FICO
    • KNIME
    • Mathematica
    • Matlab
    • Minitab
    • RapidMiner
    • Revolution
    • SAP
    • SAS/SPSS
  • Humor

Nvidia and Harvard develop AI tool that speeds up genome analysis

March 8, 2021   Big Data
 Nvidia and Harvard develop AI tool that speeds up genome analysis

The power of audio

From podcasts to Clubhouse, branded audio is more important than ever. Learn how brands are increasing customer loyalty and personalization with these best practices.

Register Now


Join Transform 2021 for the most important themes in enterprise AI & Data. Learn more.


Researchers affiliated with Nvidia and Harvard today detailed AtacWorks, a machine learning toolkit designed to bring down the cost and time needed for rare and single-cell experiments. In a study published in the journal Nature Communications, the coauthors showed that AtacWorks can run analyses on a whole genome in just half an hour compared with the multiple hours traditional methods take.

Most cells in the body carry around a complete copy of a person’s DNA, with billions of base pairs crammed into the nucleus. But an individual cell pulls out only the subsection of genetic components that it needs to function, with cell types like liver, blood, or skin cells using different genes. The regions of DNA that determine a cell’s function are easily accessible, more or less, while the rest are shielded around proteins.

AtacWorks, which is available from Nvidia’s NGC hub of GPU-optimized software, works with ATAC-seq, a method for finding open areas in the genome in cells pioneered by Harvard professor Jason Buenrostro, one of the paper’s coauthors. ATAC-seq measures the intensity of a signal at every spot on the genome. Peaks in the signal correspond to regions with DNA such that the fewer cells available, the noisier the data appears, making it difficult to identify which areas of the DNA are accessible.

ATAC-seq typically requires tens of thousands of cells to get a clean signal. Applying AtacWorks produces the same quality of results with just tens of cells, according to the coauthors.

AtacWorks was trained on labeled pairs of matching ATAC-seq datasets, one high-quality and one noisy. Given a downsampled copy of the data, the model learned to predict an accurate high-quality version and identify peaks in the signal. Using AtacWorks, the researchers found that they could spot accessible chromatin, a complex of DNA and protein whose primary function is packaging long molecules into more compact structures, in a noisy sequence of 1 million reads nearly as well as traditional methods did with a clean dataset of 50 million reads.

AtacWorks could allow scientists to conduct research with a smaller number of cells, reducing the cost of sample collection and sequencing. Analysis, too, could become faster and cheaper. Running on Nvidia Tensor Core GPUs, AtacWorks took under 30 minutes for inference on a genome, a process that would take 15 hours on a system with 32 CPU cores.

In the Nature Communications paper, the Harvard researchers applied AtacWorks to a dataset of stem cells that produce red and white blood cells — rare subtypes that couldn’t be studied with traditional methods. With a sample set of only 50 cells, the team was able to use AtacWorks to identify distinct regions of DNA associated with cells that develop into white blood cells, and separate sequences that correlate with red blood cells.

“With very rare cell types, it’s not possible to study differences in their DNA using existing methods,” Nvidia researcher Avantika Lal, first author on the paper, said. “AtacWorks can help not only drive down the cost of gathering chromatin accessibility data, but also open up new possibilities in drug discovery and diagnostics.”

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform
  • networking features, and more

Become a member

Let’s block ads! (Why?)

Big Data – VentureBeat

Analysis, Develop, genome, Harvard, Nvidia, Speeds, tool
  • Recent Posts

    • Accelerate Your Data Strategies and Investments to Stay Competitive in the Banking Sector
    • SQL Server Security – Fixed server and database roles
    • Teradata Named a Leader in Cloud Data Warehouse Evaluation by Independent Research Firm
    • Derivative of a norm
    • TODAY’S OPEN THREAD
  • Categories

  • Archives

    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
© 2021 Business Intelligence Info
Power BI Training | G Com Solutions Limited