Researchers review the state-of-the-art text mining technologies for chemistry – Phys.Org

June 22, 2017

In a recent Chemical Reviews article, Spanish researchers have published the first exhaustive revision of the state-of-the-art methodologies underlying chemical search engines, named entity recognition and text mining systems.

The rapidly growing field of big data applications in biomedical research, together with the use of machine learning and artificial intelligence technologies for text data mining, has resulted in promising tools. The authors write, "This review is organised to serve as a practical guide to researchers entering in this field but also to help them to envision the next steps in this emerging data science field."

"Through the release of Gold Standard datasets and the organisation of several community challenge benchmark events, the Biological Text Mining Unit has played a critical role in the development and evaluation of current chemical text mining systems, as highlighted in this article," explains Martin Krallinger, head of the unit and co-first author of the review.

A huge amount of unstructured data

A considerable fraction of biomedically relevant data is only available in the form of unstructured data. This type of data includes rapidly growing scientific literature, medicinal chemistry patents, electronic health records and clinical trial documents. In fact, every year, over 20,000 new compounds are published in medicinal and biological chemistry journals.

Being able to transform unstructured biomedical research data into structured databases that can be more efficiently processed by machines or queried by humans is critical for a range of heterogeneous applications. These include the identification of new drug targets and chemical probes to validate/discard those new potential targets, re-purposing of approved drugs, the identification of adverse drug events or retrieval of systems biology associated with chemical-disease or chemical-gene networks.

As a therapeutic strategy to treat medical needs, chemical compounds constitute a key entity type of critical relevance for biomedical research. "The construction of large chemical knowledge bases, integrating chemical information with biological and clinical data, is crucial to identify and validate new therapeutic targets for unmet medical needs as well as to speed up the drug discovery process," says Julen Oyarzabal, director of Translational Sciences at CIMA and co-leader of this report.

Explore further: Team presents an online tool to extract drug toxicity information from text

More information: Martin Krallinger et al, Information Retrieval and Text Mining Technologies for Chemistry, Chemical Reviews (2017). DOI: 10.1021/acs.chemrev.6b00851

Journal reference: Chemical Reviews

Provided by: Centro Nacional de Investigaciones Oncolgicas (CNIO)

There is an increasing interest in more sophisticated search engines that are tailored to cope with the complexity of biomedical data, not only enabling more targeted search queries but also easier integration and construction ...

A collaboration between two companies in Hungary and the UK has resulted in the inception of the first ever interactive text mining platform for chemists, overcoming difficulties with extracting information about chemicals ...

Every day, more than 3,000 new abstracts are uploaded to PubMed, the main biomedical literature reference database. Even in a researcher's narrowly-defined field, it is impossible to stay on top of the ever-evolving webs ...

Digital Science, a Macmillan company, and EMBL-EBI are transferring SureChem data on patented chemical structures into the public domain. It is the first time a world patent chemistry structure collection of this size has ...

Advances in genetic sequencing and other technologies have led to an explosion of biological data, and decades of openness (both spontaneous and enforced) mean that scientists routinely deposit data in online repositories. ...

Much of our reams of data sit in large databases of unstructured text. Finding insights among emails, text documents, and websites is extremely difficult unless we can search, characterize, and classify their text data in ...

Scientists have developed a new low-temperature catalyst for producing high-purity hydrogen gas while simultaneously using up carbon monoxide (CO). The discovery-described in a paper set to publish online in the journal Science ...

A team of chemists from the University of Kentucky and the Institute of Physics Research of Mar del Plata in Argentina has just reported a way to trigger a fundamental step in the mechanism of photosynthesis, providing a ...

Scientists at the University of Bath funded by Cancer Research UK have custom-built a molecule which stops breast cancer cells from multiplying in laboratory trials, and hope it will eventually lead to a treatment for the ...

Hydrogen is regarded as the energy source of the future: It is produced with solar power and can be used to generate heat and electricity in fuel cells. Empa researchers have now succeeded in decoding the movement of hydrogen ...

Sea sponges known as Venus' flower baskets remain fixed to the sea floor with nothing more than an array of thin, hair-like anchors made essentially of glass. It's an important job, and new research suggests that it's the ...

Australian scientists have paved the way for carbon neutral fuel with the development of a new efficient catalyst that converts carbon dioxide (CO2) from the air into synthetic natural gas in a 'clean' process using solar ...

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

See the original post:
Researchers review the state-of-the-art text mining technologies for chemistry - Phys.Org

Related Posts

Comments are closed.