The Need for Real-Time Big Data Across the Enterprise - Arthur Cole

The enterprise is rightly concerned about Big Data, but it isn’t so much the size of the volumes that should be causing consternation but the speed at which they need to be ingested and analyzed.

With knowledge workers increasingly gravitating toward the on-demand world of mobile computing, the enterprise needs to start replicating this functionality if it hopes to have any chance of maintaining control of its data. And since current architectures are already straining under the load generated by today’s applications and services, the enterprise will need an entirely new approach to enable real-time performance of Big Data loads.

Fortunately, this is not as daunting as it seems, although the challenges are still formidable. According to Infostructure Associates’ Wayne Kernochan, many of the tools already exist to provide high-speed data services; all the enterprise needs to do is find an effective way to deploy them. A key strategy will be to push Flash memory throughout the data architecture, essentially using it as a tiered extension to main memory. This, along with integration of existing databases into Big Data architecture, will allow the enterprise to better handle both speed and volume.

You also won’t have much success with Big Data without Hadoop, says Nathan Nickels of MetaScale. The platform that essentially kicked off the Big Data revolution is still the best solution when it comes to accessing, processing and analyzing large volumes quickly. Hadoop should, in fact, become the basis of a new centralized data hub, or data lake, that enables interactive, sub-second reporting and other capabilities. This, along with the scale-out nature of NoSQL, should provide the rapid-fire reporting and query services that organizations will need to compete in a data-driven economy.

Even a centralized data hub won’t be enough to place the entire enterprise environment on a real-time footing, however. For that, we need to integrate the data center under a single operating system like Mesosphere, says eWeek’s Chris Preimesberger. The startup is already out with its Infinity release, which it bills as a real-time, enterprise-ready, open source data engine designed for heavy data loads and multiple levels of streaming traffic. The platform is built on the Apache Mesos abstraction layer and offers compatibility with many of the leading Big Data tools within the Apache ecosystem, including Spark, Kafka, Cassandra and Akka. And Intel has already validated the entire stack for deployment on its hardware portfolio.

Some might wonder if such a broad-based transformation is necessary and think that maybe Big Data should reside on its own architecture. But as IDC pointed out in a recent white paper, a fractured data environment is a fractured enterprise, which ultimately inhibits the ability to leverage the full value of data assets and to assess the costs and risks of data-driven ventures. Even among current organizations, the vast majority of data management is handled at the departmental level, which leads to disconnects and miscommunication between business units. Multiply not only the data load but the critical importance that data and analysis play in successful outcomes and it is clear that a unified data environment will not only be desirable going forward, but vital.

All of this points to the fact that while Big Data should not be feared, neither should it be ignored. The coming changes are significant, but they are not insurmountable, and few organizations have the knowledge or the budgets to make the transition overnight – although start-ups that leverage Big Data infrastructure from the start have a way of becoming behemoths in relatively short order (cough, Uber).

The good news is that the technology for real-time processing and analysis is already available and will be continually refined as the market expands. All that is needed is a plan to put it into action.

Have your say in the post comments section below