Can Artificial Intelligence Like IBM's Watson Do Investigative Journalism?

Two years ago, the two greatest Jeopardy champions of all time got obliterated by a computer called Watson. It was a great victory for artificial intelligence--the system racked up more than three times the earnings of its next meat-brained competitor. For IBMs Watson, the successor to Deep Blue, which famously defeated chess champion Gary Kasparov, becoming a Jeopardy champion was a modest proof of concept. The big challenge for Watson, and the goal for IBM, is to adapt the core question-answering technology to more significant domains, like health care.

WatsonPaths, IBMs medical-domain offshoot announced last month, is able to derive medical diagnoses from a description of symptoms. From this chain of evidence, its able to present an interactive visualization to doctors, who can interrogate the data, further question the evidence, and better understand the situation. Its an essential feedback loop used by diagnosticians to help decide which information is extraneous and which is essential, thus making it possible to home in on a most-likely diagnosis.

WatsonPaths scours millions of unstructured texts, like medical textbooks, dictionaries, and clinical guidelines, to develop a set of ranked hypotheses. The doctors feedback is added back into the brute-force information retrieval capabilities to help further train the system. Thats the AI part, which also provides transparency for the systems diagnosis. Eventually, this knowledge will be used to articulate uncertainty, identifying information gaps and asking questions to help it gather more evidence.

Health care is just the beginning for Watson. Other disciplines that rely on evidentiary reasoning from unstructured documents or the Deep Web, including law, education, and finance, are also on the road map. But lets consider another potential domain here, perhaps less lucrative than the others, but nonetheless important: news and journalism.

Media startup Vocativ identifies hot news stories by trawling the depth of the web, data-mining the vast seas of unindexed documents for information that might point to a story lead. Often journalists pair up with analysts, manually exploring data from different perspectives. The Associated Presss Overview Project aims to build better visualization and analysis tools for investigative journalists to make sense of huge document sets.

What if much of this could be automated? A cognitive computer, like Watson, could search reams of evidence, generate hypotheses, and collect supporting and/or contradicting evidence. Potential news stories would be presented to journalists and analysts who would weigh the evidence, assessing its accuracy, and decide which story ideas to pass on to an editor for further pursuit. In this scenario, Watson would be providing a well-sourced tip.

Adapting Watson to new domains isnt easy. According to a paper from IBM Research that describes the application of Watson in health care, the system has to be able to parse and understand the format of a variety of domain-specific documents. Then it needs to be re-trained so that it learns how to weigh different sources of evidence, and any special-purpose taxonomies or logic that drive the domain also need to be accessible to the system. For investigative journalism, documents might include interview transcripts, legal codes and statutes, social networks, other news articles, PDFs from the Freedom of Information Act (FOIA), or even requests or document-dumps from sources like WikiLeaks. Through an iterative process, the system would have to be trained, going back and forth with editors as it suggested stories and was told yay or nay, each new vote modulating how the system weighs and integrates evidence.

Given a lot of re-engineering for Watson, how might an acumen for investigative reporting play out in a real-world news scenario? Earlier this year the International Consortium of Investigative Journalists (ICIJ) published a database of 2.5 million leaked documents about the offshore holdings and accounts of more than 100,000 entities, including emails, PDFs, spreadsheets, images, and four large databases packed with information about offshore companies, trusts, intermediaries, and other individuals involved with those companies. Undaunted, it took 112 reporters 15 months to analyze the data--a lot of human time and effort.

For Watson, ingesting all 2.5 million unstructured documents is the easy part. For this, it would extract references to real-world entities, like corporations and people, and start looking for relationships between them, essentially building up context around each entity. This could be connected out to open-entity databases like Freebase, to provide even more context. A journalist might orient the systems attention by indicating which politicians or tax-dodging tycoons might be of most interest. Other texts, like relevant legal codes in the target jurisdiction or news reports mentioning the entities of interest, could also be ingested and parsed.

Watson would then draw on its domain-adapted logic to generate evidence, like IF corporation A is associated with offshore tax-free account B, AND the owner of corporation A is married to an executive of corporation C, THEN add a tiny bit of inference of tax evasion by corporation C. There would be many of these types of rules, perhaps hundreds, and probably written by the journalists themselves to help the system identify meaningful and newsworthy relationships. Other rules might be garnered from common sense reasoning databases, like MITs ConceptNet. At the end of the day (or probably just a few seconds later), Watson would spit out 100 leads for reporters to follow. The first step would be to peer behind those leads to see the relevant evidence, rate its accuracy, and further train the algorithm. Sure, those follow-ups might still take months, but it wouldnt be hard to beat the 15 months the ICIJ took in its investigation.

Go here to read the rest:

Can Artificial Intelligence Like IBM's Watson Do Investigative Journalism?

Related Posts

Comments are closed.