AI algorithms are helping scientists map ten billion cells from the human body in an attempt to unlock the mysteries of how life emerges from the embryo, or how diseases like cancer manifest.
Dana Peer, the current chair and professor in computational and systems biology at the Memorial Sloan Kettering Cancer Center, a research lab focused cancer treatment in New York, described machine learning as a toolbox for building the Human Cell Atlas. The project aims to turn data from billions of tissue sample cells into 3D maps so scientists can visualize our bodies down at the smallest units.
Life develops from a single embryo. How does that initial cell go on to produce nearly 40 trillion cells to form a human body that is able to move and think?
The process can be largely described by cell differentiation, where a stem cell morphs into a specialized unit that carries out a vital function for a particular organ. Genetic information stored in DNA and encoded into every cell carries instructions on how to build different cell types for the body.
Although scientists broadly understand the process, theyre still perplexed at how it works down at the cellular level.
Cells are like tiny computers that get input from their environment, signals [and nutrients] from other cells, Peer said on stage during the Conference on Neural Information Processing Systems in Vancouver on Tuesday.
They have all sorts of proteins that are their processing devices. They make decisions, they interact with each other due to their biochemistry and molecular biology, and decide whether theyre going to proliferate, make more copies of themselves, differentiate, enter a new cell type, activate, or release some molecule to talk with another cell. Theyre really like little computers and we want to know how they work.
The problem with studying cells is, however, the sheer amount of data they produce. The genetic code describing the RNA in cells from one tissue sample is represented as a series of numbers in a giant matrix. At first glance these matrices dont make much sense but they can be turned into 3D maps with the help of AI algorithms.
Common machine learning techniques and models like t-SNE, k-nearest neighbors, Markov chains, or even deep learning have allowed biologists to visualize the behavior of cells. The jumbled stream of numbers describing a cell can now be represented as a clear graph that clusters the data by cell type and function.
Scientists have managed to trace the source of acute lymphoblastic leukemia, the most common cancer in children, to a rare cell type that only crops up in seven out of 10,000 cells. Peer described how data visualization has also allowed researchers to discover how a single mutation in a pancreatic cell can lead to cancer.
The mutation tricks the immune system and it can no longer defend our bodies against cancer. All the knowledge gleaned from these visualizations can help scientists develop new drugs and methods that target diseases like cancer and speed up the process of clinical trials.
Peer hopes that by building the Human Cell Atlas, itll serve as a healthy reference for mapping disease. She called it a candy land playground for biologists. But although machine learning algorithms have already had a huge impact, the techniques are more successful in modelling common patterns in data rather than highlighting any anomalous behaviors.
Our goal is not to predict but understand, and in biology, the outlier is often the most important.
Sponsored: Technical Overview: Exasol Peek Under the Hood
The rest is here: