-->

In the world of big data, Spark lights up new hope

If data be the new oil, a new software framework called Spark is promising to be a revolutionary way of refining the commodity faster than ever before to make fuel. The growing influence of Spark came into sharp focus earlier this week when IBM threw its weight behind this open-source data analytics framework.

So what is Spark and what is in it for Indian startups?

The technology was born out of a project at the University of California, Berkeley, and is particularly useful for machine learning--where algorithms continuously learn from and make predictions from the same set of data.

"Spark is setting the big data ecosystem on hyperdrive," said Sundara Raghavan Sankaran, a programmer at Chennai-based data analytics firm Crayon Data, whose product is used by companies in sectors such as banking, retail, hospitality and telecom.

While Crayon is considering integration with Spark, mobile ad network InMobi started using Spark last year and data analytics provider Indix has been doing so for 18 months.

Technologies such as Spark are vital because they can be the key to competitive advantage for companies. They can even help transform entire industries because of new insights gained by processing vast amounts of data. IDC predicts that the data analytics industry will be worth $125 billion (Rs 8 lakh crore) in 2015.

Before Spark, all the buzz has been around Hadoop, another open-source software that processes vast amounts of data using cheap off-the-shelf hardware. Spark processes data much like Hadoop does, but only about 100 times faster. The main reason is an important component called MapReduce that underpins Hadoop, because of which data processing can be relatively cumbersome compared to Spark.

"Spark is beautiful. With Hadoop, it would take us six-seven months to develop a machine learning model. Now, we can do about four models a day," said Rajiv Bhat, senior vice president of data sciences and marketplace at InMobi. InMobi for its part, has over half of its data science team work on Spark.

Earlier this week, IBM threw its weight behind the open source project, with plans to embed it into its analytics and commerce platforms. The company said it will commit more than 3,500 researchers and developers to work on Spark-related projects, and educate more than 1 million data scientists and data engineers through online courses.

"The ecosystem wasn't very mature when we first started evaluating Spark. Things have improved a lot since then with major Hadoop service providers like Cloudera and Hortonworks supporting Spark too," said Rajesh Muppalla, cofounder and director of engineering at Indix. Indix is creating the Google for product catalogues. It has so far catalogued 700 million consumer products globally.

"I think more and more people will be using Spark going forward. The landscape is definitely going to shift," said Muppalla.

As machine learning becomes the order of the day, the demand for Spark has surged.

Sigmoid, a two-year-old Bangalore-based startup whose full stack of analytics is based on Spark, counts Bangalore-based Capillary Technologies as its customer. Another firm Quoble has firms such as cab-hailing app Ola on its roster.

"We have quite a few customers on Spark and are really seeing great rates of adoption. We see actually now a cross-section of customers, about 50-60 of them, trying out both of these technologies (Spark and Hadoop)," said Joydeep Sen Sarma, head of Qubole's India operations.
But will Spark mean the end of Hadoop?

Unlikely, say experts. Hadoop's strength is in storage, while Spark's lies in processing, and the two will likely continue to coexist, until a better framework comes along.

"Even if Spark is a big winner, unless there is new distributed file system, we will be using Hadoop alongside Spark for a full big data package," said Sankaran of Crayon Data.

(Econimic Times)

No comments:

Post a Comment