-->

Pages

Monday, 22 June 2015

Flash! Ways To Make Big Data Run Faster - Howard Baldwin

After all these weeks of writing about software, I thought I’d wrap up this column with an offbeat target: hardware.

One of the things that fascinates me about technology is how it works best when multiple capabilities come together to create a whole that’s greater than the sum of its parts. Anybody working with big data knows this – you’re taking data from structured databases and aggregating it with data from unstructured data sources.

Take mobility. Smartphones are fantastic, but they wouldn’t be without commensurate advances in application and user interface development, networks, batteries, and processors. What’s the corollary for big data? The hardware it runs on and the performance it delivers.

I’ve been intermittently covering flash (aka solid-state) technology since the 20th century when it got a bad rap for being too expensive and having too short a life span. It’s come a long way since then, and I’ve been noticing a lot of buzz about how big data and flash can themselves create a whole that’s greater than the sum of their parts.

This headline in GCN not too long ago says it all: Big data must haves: Capacity, compute, collaboration. As writer Mark Pomerleau noted, “While big data researchers are pushing the boundaries of science, the less glamorous side of big data research – the network, computing and cloud architecture required to support their work – must be at the forefront of their minds.”

Alex Woodie made the connection between flash and big data even stronger when he wrote in Datanami, a newsletter focusing on big data, a couple of weeks ago: “The amount of data we’re generating doubles every two years, which is creating crushing pressures on storage professionals to keep it all manageable and affordable. The combination of SSAs [solid state arrays] at the hardware level and SDS [software-defined storage] at the software level will be instrumental in helping data professionals to avoid being caught in the undertow of big data.”

Check out what Erik Vynckier, chief investment officer at asset management firm AllianceBernstein said at a recent conference, as noted in a Computing article last month about the quickly dropping cost of flash: “We find we have a lot of big data sets and we’re looking at how to make sense of that big data … What we’re trying to do now is harvest the microeconomic data from the internet, from live data. … The issue isn’t only about HPC, but how to handle large datasets … Flash comes up in this context. It could help to provide for better investment strategies. Big data can bring us to that level with help of flash.” And that’s a business guy, not a techie.

And Information Management noted – in the introduction to a slide show of in-memory databases – “Amid the big data boom, the in-memory database market will enjoy a 43 percent compound annual growth rate (CAGR) – leaping from $2.21 billion in 2013 to $13.23 billion in 2018, predicts Markets and Markets, a global research firm.” The impetus? As research firm Gartner has noted, “in-memory databases allow real-time analytics and situation awareness on “live” transaction data – rather than after-the-fact analysis on ‘stale data.’”

That’s a scenario that has big data written all over it. Why? Because a lot of big data relies on fast answers – think about near-real-time security applications looking to catch malware before it causes too much damage – and the speed of those answers increases thanks to the performance that flash storage brings.

Flash isn’t for everything, and it probably never will be, but for certain mission-critical applications – such as big data analysis – it’s a match made in technology heaven.

(Forbes)

No comments:

Post a Comment