For John Snow Labs, doing good with NLP is in their DNA (and yours) – Diginomica

Posted: March 3, 2021 at 2:01 am

(via Fotolia.com)

Why was Dr. John Snow designated the "Father of Epidemiology?" His painstaking investigations of the outbreaks of deadly cholera in London in the 1850s led him to conclude that the disease was caused by contaminated water. His meticulous data gathering pinpointed the source at a single water pump.

Not only had no one ever mapped the incidence of death before, but even the concept of the "germ theory" was still discredited. It took almost twenty years for the scientific and medical profession to accept his premise, but since the water pump was disabled, the cholera epidemic ceased. (See map at end of the piece).

Why did a commercial NLP company, John Snow Labs, choose this name? Though not exclusively producing models for healthcare and life sciences, that is a significant part of their business. I've had the chance to speak with them on several occasions, and they are a remarkable organization, in many ways, which I'll explain. But first, let's review how at least some aspects of NLP work.

By now, everyone is familiar with conversational NLP like Siri. For augmented analytics, the conversation may be, "Download the latest pricing analysis to my phone." The critical thing to remember is that the computer does not understand what you are saying, nor does it understand what it is saying. It can process it and answer, but make no mistake; it's all done with math.

Organizations that offer NLP capabilities do not start from scratch. There are open source libraries that can slot in and wrap their software around it, such as Spark NLP from John Snow Labs, for example. Or other open-source Python libraries such as spaCy, textacy, or nltk. Just to be clear, here are the steps an NLP goes through to satisfy your question. It isn't one model - "Parse my sentence." Each step is a different model. I'm oversimplifying here, but to give you a sense of how part of this works, here are the steps to "understand" a sentence:

Consider that John Snow Labs offers a community version (free) of its Spark NLP that supports an astounding 375 languages, some of which have fewer than 10,000 native speakers. The first question is how and the second question is why. The how is pretty complicated, and I'll save that for another article. But it involves training using deep learning techniques, but the why is pretty compelling.

John Snow Labs is a commercial company focused on Life Sciences, Genomics, and Healthcare. Unlike IBM's proclamation ten years ago that Watson would cure cancer (and failed), John Snow Labs set out to use NLP technology to assist practitioners in assembling credible medical records that are, to this day, scattered, siloed, and inconsistent.

Particularly with oncology, this is crucial because cancer treatment is still very complicated, and practitioners need all the data they can get. hen data is cloistered in multiple EMRs, John Snow Labs frees it. But why 375 languages? As David Talby, the company's founder and CTO said to me recently, accuracy in B2C transactions is useful, but it's not a matter of statistics in oncology. Everyone single person is important, whether they're at Mount Sinai Hospital or a Doctors Without Borders camp.

You may wonder, if these models aren't "smart" in any human intelligence fashion, how can you trust them? Alter all, human language is very complex, often ambiguous if not nonsensical. The answer is that a few years ago, the accuracy of NLP models hovered around 50%. Today, Spark NLP achieves better than 95% accuracy in academic peer-reviewed results.

We have lots of problems with "AI" companies, especially those with venture funding, expected to exhibit the growth their investors demand. As a result, ethical considerations about the products they produce take a severe hit. John Snow Labs is not in that category:

Why is it so crucial for John Snow Labs to have these policies and enforce them? The AI industry is riddled with ethical problems. Many companies engage in a sinister practice, "Ethics washing,"fabricating or exaggerating their commitment to equitable AI. It'sinauthentic and distracts from whether or not actual steps are being taken toward building a world where professional standards demand AI that works just as good for women, people of color, or young people as it does for the white men who make up themajority of people making AI systems.

Training in ethics has not been very effective, at least partly because it's been aimed at AI developers and researchers who make important determinations that can harm people. In contrast, they need to know when the technology benefits and harms. It is clear that better testing and engineering practices, grounded in concern for AI's implications, are urgently needed.

However, focusing on engineers without accounting for the broader political economy within which AI is produced and deployed runs the risk of placing responsibility on individual actors within a much larger system, erasing very real power asymmetries. Those at the top of corporate hierarchies have much more power to set direction and shape ethical decision-making than individual researchers and developers. Racism and misogyny are treated as "invisible" symptoms latent in individuals, not as structural problems that manifest in material inequities. These formulations ignore that engineers are often not at the center of the decisions that lead to harm and may not even know about them. For example, some engineers working on Google's Project Maven weren't aware that they were building a military drone surveillance system. Indeed, such obscurity is often by design, with sensitive projects being split into sections, making it impossible for anyone developer or team to understand the ultimate shape of what they are building and where it might be applied.

In January 2021, John Snow Labs released NLU 1.1, which integrates 720+ new models from the latestSpark-NLP 2.7 release. Including state-of-the-art results with Sequence2Sequence transformers on problems like text summarization, question answering, translation between 192+ languages, and extracted Named Entity in various Right to Left written languages like Arabic, Persian, Urdu, Hebrew, and languages that require segmentation like Korean, Japanese, Chinese, and many more in 1 line of code. These new features are possible because of integratingGoogle's T5 models andMicrosoft's Marian models.

NLU 1.1 has over 1,000 pertained models. In addition to this, NLU 1.1 comes with nine new notebooks showcasing training classifiers for various review and sentiment datasets and seven notebooks for the new features and models. You can browse the complete list of models in this release.

I'll sum it up this way. Facebook is the world's largest deliberate purveyor of disinformation. A company with, in my estimation, no soul. John Snow Labs is a small commercial NLP company of roughly 75 employees that provides an open source library with hundreds of pre-trained models, including tools, in contrast to Facebook, for detecting disinformation.

John Snow's original cholera data points map.

More here:
For John Snow Labs, doing good with NLP is in their DNA (and yours) - Diginomica

Related Posts