The enterprise is rightly concerned about Big Data, but it isn’t so
much the size of the volumes that should be causing consternation but
the speed at which they need to be ingested and analyzed.
With knowledge workers increasingly gravitating toward the on-demand
world of mobile computing, the enterprise needs to start replicating
this functionality if it hopes to have any chance of maintaining control
of its data. And since current architectures are already straining
under the load generated by today’s applications and services, the
enterprise will need an entirely new approach to enable real-time
performance of Big Data loads.
Fortunately, this is not as daunting as it seems, although the
challenges are still formidable. According to Infostructure Associates’
Wayne Kernochan, many of the tools already exist to provide high-speed data services;
all the enterprise needs to do is find an effective way to deploy them.
A key strategy will be to push Flash memory throughout the data
architecture, essentially using it as a tiered extension to main memory.
This, along with integration of existing databases into Big Data
architecture, will allow the enterprise to better handle both speed and
volume.
You also won’t have much success with Big Data without Hadoop,
says Nathan Nickels of MetaScale. The platform that essentially kicked
off the Big Data revolution is still the best solution when it comes to
accessing, processing and analyzing large volumes quickly. Hadoop
should, in fact, become the basis of a new centralized data hub, or data
lake, that enables interactive, sub-second reporting and other
capabilities. This, along with the scale-out nature of NoSQL, should
provide the rapid-fire reporting and query services that organizations
will need to compete in a data-driven economy.
Even a centralized data hub won’t be enough to place the entire
enterprise environment on a real-time footing, however. For that, we
need to integrate the data center under a single operating system like Mesosphere,
says eWeek’s Chris Preimesberger. The startup is already out with its
Infinity release, which it bills as a real-time, enterprise-ready, open
source data engine designed for heavy data loads and multiple levels of
streaming traffic. The platform is built on the Apache Mesos abstraction layer
and offers compatibility with many of the leading Big Data tools within
the Apache ecosystem, including Spark, Kafka, Cassandra and Akka. And
Intel has already validated the entire stack for deployment on its
hardware portfolio.
Some might wonder if such a broad-based transformation is necessary
and think that maybe Big Data should reside on its own architecture. But
as IDC pointed out in a recent white paper, a fractured data environment is a fractured enterprise,
which ultimately inhibits the ability to leverage the full value of
data assets and to assess the costs and risks of data-driven ventures.
Even among current organizations, the vast majority of data management
is handled at the departmental level, which leads to disconnects and
miscommunication between business units. Multiply not only the data load
but the critical importance that data and analysis play in successful
outcomes and it is clear that a unified data environment will not only
be desirable going forward, but vital.
All of this points to the fact that while Big Data should not be
feared, neither should it be ignored. The coming changes are
significant, but they are not insurmountable, and few organizations have
the knowledge or the budgets to make the transition overnight –
although start-ups that leverage Big Data infrastructure from the start
have a way of becoming behemoths in relatively short order (cough,
Uber).
The good news is that the technology for real-time processing and
analysis is already available and will be continually refined as the
market expands. All that is needed is a plan to put it into action.
Have your say in the post comments section below
No comments:
Post a Comment