-->

Big Data Problems Solved Fast On An Open Source Platform - Tom Groenfeldt

“If you have a big problem like fraud you don’t want to wait three years until you can act on your insights,” said Ingo Mierswa, founder  and CTO of RapidMiner, a predictive analytics firm that ranks in Gartner’s Magic Quadrant for Advanced Analytics Platforms. The company, which grew out of research at the Artificial Intelligence Unit of the Technical University of Dortmund in Germany, became a company in 2006 and set up its headquarters in Boston in 2013. It received a $15 million B round of funding in February.

The platform is designed to be easy to use and understand for business decision makers who don’t necessarily have advanced degrees in mathematics or physics. With an open source heritage, RapidMiner has a community of 250,000 users and about 600 academics, said Mierswa, so users facing a problem can often learn how others did it.
 
“It’s like having a small data scientist sitting on your shoulder; you can tap a community of experts,”  he said.
Time to implementation is one of the company’s bragging points.

“We had Hitachi, a consultancy working with a major bank in Japan, come for a week or 10 days of training in RapidMiner,” Mierswa recounted. “We trained them in predictive analytics solutions — the training time is nothing compared to R (a programming language for statistical analysis often used with big data). The Hitachi people went to the Tokyo bank and created a few fraud detection systems based on RapidMiner. It took them three weeks. The old system took three years. We see that constantly, speedup that is 10x to 60 x from detection of needs to full implementation of the models.”

A data scientist with a Ph.D. in mathematics, Mierswa describes standard analytics as akin to driving by looking in the rear view mirror. “If you figure out from business intelligence that you lost 20,000 customers, it is too late . It’s important to look beyond the next curve and see what will happen. With predictive analytics, you can figure out the customers you are in danger of losing and when will they shift. Then you can be proactive. It’s a paradigm shift; if you make the predictive analytics work, you can automate and change your business processes.”

Banks are using RapidMiner for churn prediction and fraud prevention, he said.

“Fraud detection is an interesting topic because typically you stop transactions while they are executing. That’s not really predictive because it occurs only after the transaction has started and the only thing you can do is interject and say that it looks suspicious and stop it.”

 A better practice is to predict it earlier and so you can be more proactive, he explained.

“You can flag this and say let’s stop all transactions on this card because it’s likely a fraudulent card. However if you overdo this it can be very annoying to your customers, and you certainly don’t want to stop transactions which are valid.”

Here’s where machine learning can help, he added. Traditional fraud systems are rule-based and not very flexible.

“You need adaptive machine learning systems to change the alerts.”
Predictive analytics requires the skills of data scientists, but not necessarily Ph.D.-holding data scientists themselves, who are in short supply. The U.S. produces only 3,000 data scientists a year, so the gap between demand and supply will never be closed, he added.
 
“With the right platform you can empower the business analysts  to do the work of data scientists without coding , in a visual way, in a way that everyone can understand.”

That’s critical, Mierswa said. C-level executives won’t trust a system of analysis that they can’t understand, and if it can’t be explained to them it probably will never get into production.

“When a 28-year old analyst tells the 56-year old executive he is wrong, the analyst may be right, but now we are back to issues of collaboration and communication. If the analyst cannot communicate the results and explain why something should be done, you are in trouble. For collaboration you need a common platform. What’s the point if you can’t talk about the results? You need a common language and R can never be a common language.”

Business executives still work with Microsoft Excel because they can work with their data and see the results, he added. RapidMiner lets users see the background model so they can check what happened with the data they are working on and see a visual representation of the workflow. That adds to trust. The RapidMiner workflows support integration with a firm’s IT infrastructure to provide a fully automated environment.

No comments:

Post a Comment