-->

Pages

Thursday, 23 July 2015

Can Big Data Algorithms Tell Better Stories Than Humans? - Bernard Marr

What if the computer algorithms could tell more compelling stories than journalists, writers or business analysts? Well, this is increasingly becoming a reality. A new generation of Big Data tools are being put to automate story telling.
The ideas behind this application of analytics were first put to use generating automated news reports, covering sports and financial stories. Take the recent Wimbledon tennis championships as an example. The Slamtracker system developed by IBM monitors each game using sensors and cameras, generating millions of real-time data points covering speed of serve, forced and unforced errors, and even the social media sentiment surrounding each game. This data can then be turned into automated stories or Twitter messages to ensure Wimbledon are the first to break news stories about the results.

Already journalists have expressed worries that technology like that could put them out of a job. But the truth is, if it is possible to teach the process of structuring data into a narrative to a human, it can be taught to a computer too.
Kris Hammond, co-founder and chief scientist at Narrative Science, which has created the Quill natural language generation platform, realized early on that technology could be used turn information into easy to understand narratives. In fact, Quill is a regular contributor to Forbes–just like me. You can see its latest contributions here.
Quill, or competing apps like Automated Insights are used by other media outlets – but due to a lack of information over how trustworthy readers would consider reports created by algorithms, many news publishers may be reluctant to admit whether their stories, or parts of them, are generated by computers.

The implications of this technology go further than putting journalists out of work, however. In fact Hammond concedes Quill isn’t yet great at finding news stories–its strengths lie in putting stories together from specific data sources. Narrative Science is currently running one application which reads the stock market and attempts to spot when unusual highs, lows or volume spikes could have important implications, but Hammond calls this a “very controlled” instance of Quill digging up its own stories. He stands by his claim, made in 2012 that a computer would be able to write Pulitzer-prize quality journalism within five years–although he admits the clock is ticking!

No, the real value, Hammond says, is not in the scattershot approach of news publishing, where one article is created for a vast audience in the hope that some will find it interesting or useful. Natural language generation and automated narrative creation mean that one dataset can be interpreted in multiple ways, giving each targeted audience segment precisely what they need to know, without any confusing background noise.

This makes it ideal for corporate communications, where e.g. a company’s financial, customer and operations data can be interpreted and insights reported directly to whichever people in the organization are in the best position to make a change.
So, for example, if an algorithm running at a manufacturing company was to pick up on the fact that a bottleneck in production of one component was leading to an overall loss in revenue, it could create tailored reports for every department involved in the process, explaining the situation and the best course of action to correct it. Doing this manually would be a very time-consuming undertaking.
Just as with other high-tech developments of today – driverless cars spring to mind – earning the trust of humans is essential. The algorithms must allow for full sourcing and accountability. This is why although Natural Language Generation is the foundation of this sort of technology, the data and analytics which underpin it are just as important.

Users have to be able to look “underneath” the language and see the data which it is built on. Quill allows for this – the logic which led to the use of every word can be interrogated manually. For example it might choose to use the phrase “Lion’s share”, to indicate a majority, based on mathematical analysis. It does this because the fact that something has a “lion’s share” of a market is immediately more meaningful to a human than being told (in isolation of other facts) that something holds 80% of a market.

The user would still be able to delve into the logic of the algorithm, though, and find out that it used the phrase “lion’s share” based on the statistic of 80%. And what drivers and factors it analyzed to arrive at the 80% statistic in the first place.

“All of that goes into play before the language ever shows up. Language is, even for us, the last mile. The structure of the story and the narrative comes first.” Hammond tells me.

At the moment, automated narratives generally work well with structured data – information such as numbers and measurements which fit nicely into a spreadsheet and can be compared quantitatively. In the future, I would expect to see an increasing amount of the messy, unstructured data which we are increasingly generating and collecting included in these processes. For example video data could be analyzed and interpreted to add color and insight to reports. Going back to news reporting, CCTV footage could tell us if streets were empty or crowded with people at the time of an armed robbery. At the same time, social media analysis could bolster reports with an ad-hoc assessment of public sentiment towards any issue which is relevant.

Might we even reach a time when Big Data can produce fiction which humans will find entertaining to read? Or even personalized fiction, which crunches data to come up with a perfect plotline guaranteed to keep you, personally, turning pages? You won’t be surprised to hear that research is already going on in that area, and statisticians have already used data to research how fictional stories are built.

Narratives are one of the most important tools we have. Humans have always told stories – fictional, real or somewhere in between–as a way of passing on information and influencing events. Giving that power to computers may, to some, seem a step too far. But don’t we already often distrust the concept of “narrative”? The word is commonly used interchangeably with “spin” to suggest that someone is tailoring their depiction of events to suit their own needs. Computers can’t “spin” (unless they are programmed to, of course) so for news reporting, or conveying hard facts about a business, couldn’t they be seen as more trustworthy than humans?

No comments:

Post a Comment