Perhaps
the biggest reason is that Big Data technologies are still way too hard
to use—and sometimes insufficient for the kinds of data enterprises
want to put to work.
But that's changing. Last week I sat down with Justin Langseth, CEO of Zoomdata,
to drill into the future of big data, and to better understand the
intersection between batch-oriented technologies like Hadoop's MapReduce
and Spark, a real-time processing engine . While I published excerpts from that conversation in an earlier ReadWrite piece, it's really worth reading Langseth's observations in total.
Real Time Gets Real
ReadWrite: Hadoop
has been all about batch processing, but the new world of streaming
analytics is all about real time and involves a different stack of
technologies.
Langseth: Yes,
however I would not entangle the concepts of real-time and streaming.
Real-time data is obviously best handled as a stream. But it’s possible
to stream historical data as well, just as your DVR can stream Gone with the Wind or last week’s American Idol to your TV.
This
distinction is important, as we at Zoomdata believe that analyzing data
as a stream adds huge scalability and flexibility benefits, regardless
of if the data is real-time or historical.
RW: So what are the components of this new stack? And how is this new big data stack impacting enterprise plans?
JL: The new stack is in some ways an extension of the old stack, and in some ways really new.
Data
has always started its life as a stream. A stream of transactions in a
point of sale system. A stream of stocks being bought and sold. A
stream of agricultural goals being traded for valuable metals in
Mesopotamia.
Traditional ETL processes
would batch that data up and kill its stream nature. They did so
because the data could not be transported as a stream, it needed to be
loaded onto removable disks and tapes to be transported from place to
place.
But now it is possible to take streams
from their sources, through any enrichment or transformation processes,
through analytical systems, and into the data’s “final resting
place”—all as a stream. There is no real need to batch up data given
today’s modern architectures such as Kafka and Kinesis, modern data
stores such as MongoDB, Cassandra, Hbase, and DynamoDB (which can accept
and store data as a stream), and modern business intelligence tools
like the ones we make at Zoomdata that are able to process and visualize
these streams as well as historical data, in a very seamless way.
Just
like your home DVR can play live TV, rewind a few minutes or hours, or
play moves from last century, the same is possible with data analysis
tools like Zoomdata that treat time as a fluid.
Throw That Batch In The Stream
Also
we believe that those who have proposed a “Lambda Architecture,”
effectively separating paths for real-time and batched data, are
espousing an unnecessary trade-off, optimized for legacy tooling that
simply wasn’t engineered to handle streams of data be they historical or
real-time.
At Zoomdata we believe that it is
not necessary to separate-track real-time and historical, as there is
now end-to-end tooling that can handle both from sourcing, to transport,
to storage, to analysis and visualization.
RW: So this shift toward streaming data is real, and not hype?
JL:
It's real. It's affecting modern deployments right now, as architects
realize that it isn’t necessary to ever batch up data, at all, if it can
be handled as a stream end-to-end. This massively simplifies Big Data
architectures if you don’t need to worry about batch windows, recovering
from batch process failures, etc.
So again,
even if you don’t need to analyze data from five seconds or even five
minutes ago to make business decisions, it still may be simplest and
easiest to handle the data as a stream. This is a radical departure
from the way things in big data have been done before, as Hadoop
encouraged batch thinking.
But it is much easier
to just handle data as a stream, even if you don’t care at all—or
perhaps not yet—about real-time analysis.
RW: So is streaming analytics what Big Data really means?
JL: Yes.
Data is just like water, or electricity. You can put water in
bottles, or electricity in batteries, and ship them around the world by
planes trains and automobiles. For some liquids, such as Dom Perignon,
this makes sense. For other liquids, and for electricity, it makes
sense to deliver them as a stream through wires or pipes. It’s simply
more efficient if you don’t need to worry about batching it up and
dealing with it in batches.
Data is very similar. It’s easier to stream big data end-to-end than it is to bottle it up.
Credit: Readwrite.com
No comments:
Post a Comment