Nate Silver says more data often mean more problems
Big data can problematic.
Think about it this
way: A small amount of data can be easy to manage and straightforward
analysis can be gleaned from it. After all, there’s only so much data to
consider.
But when data gets big, big problems can arise. That’s the
message from Nate Silver, who works with data a lot. Silver is the
founder of data-driven journalism site FiveThirtyEight (now owned by
ESPN); he spoke at the HP Big Data Conference in Boston recently,
outlining some of the problems that can come along with big data.
Silver says that
even small and medium amounts of data can be difficult to manage, both
technically in terms of how to store it and in terms of analyzing it.
So, the more data companies have the even more complex the problems of
managing it can become. Do you buy hardware? Do you store it in the
cloud? How often will you need to access it? Can you deal with latency?
These are the types of questions that will help you decide how to manage
your data.
One issue with a lot of data is that it can create
bias. Let’s say you have two polls, it can be pretty easy to decipher
what those polls are saying. Now if you’re analyzing 100 surveys, there
can be much more nuanced issues within that data. Ever heard of that
saying that you can make statistics say whatever you want? Well the more
data you have, the more wiggle room there can be to sway the stats.
Silver referenced a book by Daniel Kahneman titled
“Thinking, Fast and Slow,” the point of which is that sometimes people
rush decisions based on a subset of data (thinking fast). A better
practice is to “think slow” and really rationalize data. With big data,
thinking fast (not analyzing the data fully) can lead to false
positives.
Silver calls this extracting the “signal from the
noise.” Said another way, it’s the problem of finding the needle in the
haystack. The more data you have, sometimes the harder it can be to find
true value from the data.
- That’s not what I was looking for…
Imagine Google Maps giving you directions and
suggesting an alternate, “faster” route. You take it only to find that
it’s a dirt road under construction. Sometimes big data systems think
they have found a shortcut, but in reality, it’s not exactly what the
user was looking for.
No comments:
Post a Comment