Veritas Genetics Scoops Up an AI Company to Sort Out Its DNA – WIRED

Posted: August 8, 2017 at 3:47 am

Genes carry the information that make you you . So it's fitting that, when sequenced and stored in a computer, your genome takes up gobs of memoryup to 150 gigabytes. Multiply that across all the people who have gotten sequenced, and you're looking at some serious storage issues. If that's not enough, mining those genomes for useful insight means comparing them all to each other, to medical histories, and to the millions of scientific papers about genetics .

Sorting all that out is a perfect task for artificial intelligence . And plenty of AI startups have bent their efforts in that direction. On August 3, sequencing company Veritas Genetics bought one of the most influential: seven-year old Curoverse. Veritas thinks AI will help interpret the genetic risk of certain diseases and scour the ever-growing databases of genomic, medical, and scientific research. In a step forward, the company also hopes to use things like natural language processing and deep learning to help customers query their genetic data on demand.

It's not totally surprising that Veritas bought up Curoverse. Both companies spun out of George Church's prolific Harvard lab . Several years ago, Church started something called the Personal Genomics Project, with the goal of sequencing 100,000 human genomesand linking each one to participants' health information. Veritas' founders helped lead the sequencing partstarting as a prenatal testing service and launching a $1,000 full genome product in 2015while Curoverse worked on academic strategies to store and sort through all the data.

But more broadly, genomics and AI practically call out for one another. As a raw data format, a single person's genome takes up about 150 gigabytes. How!?! OK so, yes, storing a single base pair only takes up around two bits. Multiply that by roughly 3 billionthe total number of base pairs in your 23 chromosome pairsand you wind up with around 750 megabytes. But genetic sequencing isn't perfect. Mirza Cifric, Veritas Genetics cofounder and CEO, says his company reads each part of the genome at least 30 times in order to make sure their results are statistically significant. "And you gotta keep all that data, so you can refer back to it over time," says Cifric.

That's just storage. "Everything after that is going to specific areas and asking questions: Theres a variant at this location, a substitution of this base, a deletion here, or multiple copies of this same gene here, here, and here," says Cifric. Now, interpret all that. Oh, and do it across a thousand, hundred thousand, or million genomes. Querying all those genetic variations is how scientists get leads to find new drugs, or figure out how existing drugs work differently on different people.

But cross-referencing all those genomes is just the beginning. Curoverse, which was focusing on projects to store and sort genomic data, also has its work cut out for it in searching through the 6 millionand countingjargon-filled academic papers detailing gene behavior, including visual information found in charts, graphs, and illustrations.

That's pretty ambitious. Natural language processing is one of the stickiest problems in AI . "Look, I am a computer scientist, I love AI and machine learning, and no amount of coding makes sense to solve this," says Atul Butte , the director of UCSF's Institute of Computational Health Sciences. At his former job at Stanford University, Butte actually tried to do the same thinguse AI to dig through genetics research. He says in the end, it was way cheaper to hire people to read the papers and input the findings into his database manually.

Bahar Gholipour

Artificial Intelligence Could Dig Up Cures Buried Online

Megan Molteni

Artificial Intelligence Is Learning to Predict and Prevent Suicide

Anna Vlasits

AI Could Target Autism Before It Even EmergesBut It's No Cure-All

But hey, never say never, right? However they accomplish it, Veritas wants to move past what companies like 23andMe and Color offer: genetic risk based on single-variant diseases. Some of America's biggest dangers come from diseases like diabetes and heart disease, which are activated by interactions between multiple genesin addition to environmental factors like diet and exercise. With AI, Cifric believes Veritas will be able to not only dig up these various genetic contributors, but also assign each a statistical score showing how much it contributes to the overall risk.

Again, Butte hates to be a spoilsport, but ... there's all sorts of problems with doing predictive diagnostics with genetic data. He points to a 2013 study that used polygenic testing to predict heart disease using the Framingham Heart Study dataabout as good as you can get, when it comes to health data and heart disease. "They authors showed that yes, given polygenic risk score, and blood levels, and lipid levels, and family history, you can predict within 10 years if someone will develop heart disease," says Butte. "But doctors could do the same thing without using the genome!"

He says the problems come down to just how messy it is trying to square up all the different research on each gene alongside the environmental risks, and all the other compounding factors that come up when you try to peer into the future. "Its been the holy grail for a long time, structured genome reporting," says Butte. Even attempts to get researchers to write and report data in a standard, machine-readable way, have fallen flat. "You get into questions that never go away. One researcher defines autism different from another one, or high blood pressure, or any number of things," he says.

Butte isn't a total naysayer. He says partnerships like the one between Veritas and Curoverse are becoming more commonlike the data processing deal between genetic sequencing giant Illumina and IBM Watsonbecause there's a clear need for new computing methods in this area. "You want to get to a point where you are developing stuff that improves clinical care," he says.

Or how about directly to the owners of the genomes? Cifric hopes the merger will improve the consumer experience of using genetic data, even seamlessly integrating it into daily life. For instance, linking your genome and health records to your digital assistant. Alexa, should I eat this last piece of pizza? Maybe you should skip it, depending on your baseline genetic risk for cholesterol and latest blood test results. Diet isn't the only area where genomics could help improve your day to day life. Some people are more or less sensitive to over the counter drugs. A quick query might tell you whether you should take a little less Tylenol than is recommended.

Cifric thinks this acquisition could position Veritas as a global powerhouse of genomic data. "Apple recently announced that they had shipped 41 million iPhones in a quarter, right? I think in not too distant future, well be doing 41 million genomes in a quarter," he says. That might seem ambitious, given that the cost to consumers is nearly $1,000. But that cost is bound to come down. And artificial intelligence will make paying for the genome a matter of common sense.

This story has been updated to reflect that the company is named Veritas Genetics, not Veritas Genomics.

Go here to see the original:
Veritas Genetics Scoops Up an AI Company to Sort Out Its DNA - WIRED

Related Posts