Alex’s Lemonade Stand Foundation and the Childhood Cancer Data Lab are cleaning up genome data and speeding up the research process. Image: Getty Images/iStockphoto Alex’s Lemonade Stand Foundation used a Continue Reading
Alex’s Lemonade Stand Foundation and the Childhood Cancer Data Lab are cleaning up genome data and speeding up the research process.
Alex’s Lemonade Stand Foundation used a grant from Amazon Web Services to clean up medical research data and build a pipeline of analysis expertise among cancer researchers.
The nonprofit used an Imagine Grant from AWS to expand the Childhood Cancer Data Lab and make more than 1.3 million genome-wide samples available to researchers.
Liz Scott, co-executive director of the foundation, said that the organization’s approach has always been to look for critical gaps in childhood cancer research. She recently discovered that the ability to handle large datasets was one of those gaps. In talking with researchers, Scott realized that there was not enough funding for data analysis and not enough young researchers in the field to do even basic data analysis.
“Several years ago, we started hearing more and more from scientists, ‘What are we going to do with this data and who are we going to get to work on this project?'” she said.
She said the foundation couldn’t even use a grant program to close the gap because there were not enough people with expertise in pediatric oncology to apply. Workshops were not enough, either, to get the momentum the foundation wanted.
So, Scott started the CCDL to build tools and training programs to make it easier for researchers to use large data sets. The organization also runs refine.bio, a repository of uniformly processed and normalized, ready-to-use transcriptome data from publicly available sources.
Refine.bio has processed 756.9 terabytes of raw data. So far, 17,000 visitors have used the site to download 1,441 datasets. Based on user testing and feedback, CCDL found that each download saves researchers about two weeks of time that would have been spent cleaning up and organizing the raw data.
Scott said that the organization had no expertise in data analysis even on the group’s scientific advisory board, which includes oncologists.
“Finding the right people to lead this effort was the greatest thing we could do to make this successful,” she said.
Jaclyn Taroni, Ph.D., is the principal data scientist at the Childhood Cancer Data Lab. The team also includes several data scientists and engineers, a software engineer, a UX designer, and two biological data analysts.
Taroni said scientists who wanted to use the large research investment in the biomedical space had to spend lots of time finding the data and cleaning it before they could do any analysis.
One of the team’s first goals was to organize petabytes of genome sequencing data and provide access to summary level data that researchers could use right away.
Summary level data provides a spreadsheet with measurements for genes on a per sample basis.
“It’s the main unit that we can use to dig into certain biological processes,” she said. “When a childhood cancer researcher has a biological question to use these data to answer, this is the starting point that they would like to be at.”
SEE: Cloud data storage policy (TechRepublic Premium)
Summary level data makes analysis go much faster than starting with raw data. Researchers can use the Refine.bio site to find and download datasets and samples from childhood cancer research as well as animal models.
“Cloud computing allows us to process the data and make the data discoverable,” she said.
Cloud services from AWS provides the power to scale the research and process millions of samples. Taroni said that the lab’s work is helping to unlock billions of dollars in research investment for much less than that in terms of compute.
“Part of what we need to do to make the data most useful is to make it searchable and AWS elastic search comes into play to do that,” she said.
Taroni’s doctorate is in genetics focused on computational biology and she runs the data science team. Her team figures out how to process research data and the engineering team and UX designer are responsible for implementation.
“My team also does short workshops targeted at building analytical capacity in pediatric researchers,” she said.
Taroni encouraged people interested in supporting childhood cancer research to check out Refine.bio for volunteer opportunities.
“Our products are open source, so there are ways to get involved now,” she said. “At our GitHub page you can see what’s happening now and find a way to contribute.”
Scott said the AWS grant helped the oundation scale up the Lab. She will continue to fund the work after the grant ends.
“The credibility from a grant like this will enable us to get future funding for this and have it fully funded as its own entity,” she said.
The Foundation has funded more than 1,000 grants at more than 150 institutions. Alexandra Scott raised $2,000 with a lemonade stand when she was four. She raised $1 million before dying at age eight from neuroblastoma, a type of childhood cancer.