-->

Making Big Data Fast

Imagine for a moment the classic I Love Lucy scene that takes place in Kramer’s Kandy Kitchen. Lucy and Ethel are trying to wrap candy but the conveyor belt is delivering it too quickly. Some of the candy gets wrapped, some of it gets through unwrapped, and some of it ends up hilarious places better left unmentioned. And then venerable character actress Elvia Allman yells, “Speed it up a little!”

Now imagine that you’re Lucy and every piece of candy is actually a piece of data, coming at you with overwhelming speed. It’s a horrifying thought to have to wrap it – or, in this case, identify it and figure out whether it’s important enough to analyze, not to mention determine how it should be analyzed. But potential scenarios exist in real 21st century life: analyzing security data, video feed data, health-care data. Any scenario where finding an answer faster will fight crime, reduce costs, or save lives.



Ever since I wrote about the concept of fast data for Computerworld a few months ago, I’ve found the topic fascinating. That’s why I was so gratified to see it pop up on more radar screens recently.

Perhaps the most relevant approach to fast data appeared recently in the Wall Street Journal, where consultant Randy Bean noted, “In contrast [to big data’s historical focus], Fast Data is about “data in motion” and immediate response and action. It’s the velocity component of the Big Data triad. While large corporations have been focused on the variety and volume of data they manage, Fast Data applications are being developed to seize on the opportunities presented by data velocity.”

Emagine CEO David Peters wrote in TechRadar earlier this month about the importance of fast data for communications service providers – some of which applies to other industries – but since his company sells a “real-time event decisioning platform,” consider the source.

In InfoWorld, also earlier this month, consultant Yves de Montcheuil wrote about the drive toward real-time business intelligence, nicely encapsulating the issues behind real-time data collection, processing, and insight availability. Rather than whitewashing the concept, de Montcheuil addresses some of the pitfalls as well:

“Beyond the technical challenges of running smaller/faster batches and streaming/messaging technology, the major difficulty of real-time business intelligence resides in the differences in velocity between the different types and sources of data. Because not all data enters the data warehouse/mart/lake at the same time, and because it is not refreshed at the same frequency, updating the data analysis and reference structures becomes very challenging.”

InfoWorld continued with the question of the viability of fast data last week in Matt Asay’s look at the streaming future of big data, thanks to new technologies such as Spark, Storm, and Kafka (man, what a literary nightmare those names connote). Though much of his article is a volley between those who believe in batch and those who prefer real-time, he rightly notes that there’s room for both. As one of his sources says, “Batch isn’t going anywhere as there will always be a place for large-scale analytics with gobs of data.” The same source, he adds, has “a ton of interest in streaming analytics [but adds that] it’s ‘way too early to say’ how it will all shake out.”

It may be “too early” for that kind of decision, but it’s never too early to start thinking about data sources you have that currently move way too fast for you to analyze accurately. You may not be able to slow the conveyor belt down, but you can get a whole lot better at wrapping the candy.


Credit: Forbes

No comments:

Post a Comment