-->

Big Data is dead. Data is “Just Data,” regardless of quantity, structure, or speed - Alexander Thamm

Innovation cycles aren’t the only things getting shorter and shorter. The hypes associated with certain terms also keep alternating more and more quickly. This is especially true in the environment of new technologies and digital business models. The IT consultation and market research institute Gartner removed the term “Big Data” from its much-noticed Hype-Cycle back in 2015. A look to the Google Trends shows how precisely the prediction matched the end of the hype around Big Data. For instance, here is a direct comparison of the search terms “Machine Learning” and “Big Data”:


Right when the catchphrase “Big Data” has just made it into the consciousness of many decision makers and boardrooms, it must now be clearly established: Big Data is “dead.” Just like Gartner, we at Alexander Thamm GmbH have determined that things are essentially different when it comes to data science projects: For us Big Data, Small Data, Little Data, Fast Data, and Smart Data are all “Just Data.” The critical success factors for the use of data do not depend on its quantity, structure, or speed – it’s about using data to create true added value!

Successful data science projects without any Big Data

We see that data science projects can be successful without any Big Data in our everyday practice. When a premium automotive manufacturer came to us with the task of increasing the repurchase rate in the leasing area, we were faced with the challenge of predicting the time of the repurchase. The problem that had thus far faced the auto retailer was that the customer approach often took place at the wrong time.

In order to increase the accuracy of the prognosis, that volume of data didn’t just increase. In fact, during the analysis we instead noticed that the data pool itself was responsible for the inaccuracy of the predictions. Our model, which is based on diagnostic and vehicle data, didn’t just lead to the manufacturer correcting 25 percent of implausible entries and being able to approach the customer at the right time. At the same time, unreliable retailers were able to be identified and their processes sustainably improved using top retailers’ best practice methods. This case shows that forecast quality does not depend on the volume of the data. Just Data means that, above all, the right data needs to be incorporated into analysis.

Just Data mindset facilitates focus on the relevant data

Another case had to do with improving the accuracy of the former forecast model for a customer from the energy sector. Energy producers need to know very precisely how high the current load is in order to adjust the power input to demand as exactly as possible. Too low or too high of a power supply can carry fines for the power suppliers. It’s therefore necessary to keep these penalties as low as possible. 

Our solution was based on a deep learning algorithm to improve the prognosis model. In the previous model, only the temperature was incorporated into the weather data. We expanded the weather data with additional parameters such as humidity, air pressure, and sun intensity. This way we were able to achieve significant improvements in the forecast and create a high degree of automation. If we had instead expanded the data sets for the current load and used data recorded over the past 30 years minute by minute for more precise prognoses, the model would have taken too long to calculate and the quality of the prognosis would only have been marginally improved.

As an example, in the following graphic you can see how the accuracy of a model only minimally increases with the addition of data from a certain point. Nevertheless, disproportionately high costs for corresponding computing capacities accrue in order to process these larger volumes of data. In many cases it’s therefore not worthwhile to increase the accuracy of a model by expanding the previous data sets. 

On the origin and the meaningfulness of the term “Big Data”

The term “Big Data” emerged in a time when it was becoming more and more difficult to process the exponentially growing volume of data with the hardware available at the time. From the beginning on, the Big Data phenomenon comprised more than just the volume of data. Rather, it designated an entire ecosystem. This is why the talk of the “Vs” of Big Data became established. Over the course of time, the concept became more and more refined. Initially, the Big Data ecosystem was described with three Vs: Volume, Variety, and Velocity. This concept was very quickly further expanded so that it soon was four Vs, then five Vs, then seven Vsnine Vs, and finally ten Vs

At this point the question needs to be asked of whether the term “Big Data” still actually makes sense, or if the concept hasn’t long since become completely watered down and indistinct. The variants Small DataLittle Data, and Smart Dataonly represent rescue attempts for a concept that really isn’t needed anymore today. Now is the time to fundamentally reconsider the term “Big Data” and its variants and, because their definitions have become inconsistent, unclear, and unnecessary, to throw them overboard. From here emerges the crucial question of what the essential core of Big Data is or was, and what part of it is really relevant.

What actually is Big Data, at its core?

As already mentioned, Big Data was never really about the largest quantities of data possible. It was more about selecting relevant data for the respective application case, cleaning it up, and evaluating it with corresponding methods. Admittedly, it may regularly result in the data volumes being large. However, that isn’t automatically the deciding feature for successful data science projects. This is why, in many cases, companies primarily have access to such large data volumes, because they gather data at any cost. Their hope is to obtain strategic advantages from seemingly unrelated masses data, similar to top players Google, Amazon, Facebook & Co., or even the NSA. The result is gigantic data lakes in which the companies gather all possible structured and unstructured data.

However, concentrating on the quantity of data often obscures the essential essence of Big Data projects: the analytical handling of data – namely “Just Data.” Those who dedicate themselves to this task, broken down its essentials, will very quickly realize that the factors critical to the success of such projects aren’t exclusively of a technological nature. In order to transform data into valuable information, companies also require a corresponding mindset, which concerns the entire corporate culture.

Just data: Independent of the quantity, structure, and speed

Regardless of its quantity, structure, and speed, data is simply “Just Data.” Much more important that the idiosyncrasies of the data itself is properly defining the business case, embedding analysis projects into the environment of an organization, and selecting the right analytical method. This is why we’ve developed the Data Compass for the execution of data science projects.

The success of data projects frequently depends on factors that aren’t technical in nature. Companies need to have a certain learning culture to better understand certain contexts using open and inclusive (learning) processes. It sounds paradoxical: Big Data may be dead, but precisely that fact represents a major opportunity for data science projects. When we move our concentration away from the buzz phrase “Big Data,” we reach the really crucial question. That is: How can companies and organizations create added value from data?

No comments:

Post a Comment