Spark began life in as a project at UC Berkeley in California, quickly delivering in-memory performance as much as 100 times that of the MapReduce framework that originally underpinned Apache Hadoop. Hadoop has moved on since then, to adopt other — faster and more flexible — ways of working. Spark has also progressed, promoting increasingly capable disk-based performance to complement its in-memory strengths, and establishing itself as a strong contender for use particularly in machine learning tasks. Spark moved to the Apache Software Foundation in 2013, becoming a top level project in 2014. In 2013, members of the original Berkeley team established the company now known as Databricks to build a business around Spark. The company launched with almost $14 million dollars from Andreessen Horowitz and others, and secured a further $33 million a year ago. And yet Spark is not without competitors of its own. Flink, which is also a top-level project of the Apache Software Foundation, has just recently begun to attract many of the same admiring comments directed Spark’s way 12-18 months ago. Despite sound technical credentials, ongoing development, big investments, and today’s high-profile endorsement from IBM, it would be unwise (and implausible) to crown Spark as the winner just yet.
IBM announced a number of initiatives today, aligning with what the company PR machine calls
potentially the most significant open source project of the next decade.These include:
- deepening the integration between Apache Spark and existing IBM products like the Watson Health Cloud;
- open sourcing IBM’s existing SystemML machine learning technology;
- tasking 3,500 IBM engineers to work on Spark-related projects, including those at a new Spark Technology Center in San Francisco;
- offer Spark as a Service, hosted on IBM Bluemix;
- partner with AMPLab and others to ‘educate more than 1 million data scientists and data engineers on Spark.’
In the enterprise market, where IBM remains a powerful force, Spark is almost unheard of. As Gartner’s Nick Heudecker told VentureBeat,
In the enterprise, I’m seeing almost no Spark adoption.
There, Flink is also effectively invisible. Hadoop has
much of the mindshare, whether it’s the right tool for the job or not.
Startups like Cloudera, Hortonworks and MapR make money supporting those
enterprise adoptions, as do the big data operations of established
vendors like HP, EMC and IBM.
IBM’s very public backing for Spark will open enterprise doors. And, if startups like Databricks are smart, it opens doors for them almost as much as it does for IBM.
Andreessen Horowitz’s millions got Silicon Valley’s cool
kids to sit up and pay attention. IBM’s posturing might do just the same
in the uncool boardrooms of the Fortune 1,000.
(Forbes)
No comments:
Post a Comment