1: Big data projects are iterative in nature and require agile and prototypical approaches
Not only do new sources of big data constantly emerge, but the questions that business decision makers want to ask of this data are continuously evolving. This is why sandbox environments that enable data analysts and scientists to quickly query big data and then publish the results are critical to the big data value process.The data that these queries operate on is not neatly structured into fixed record length systems of record (SOR) where you know that the end product will be an order, a customer, or a part record, so you need an iterative process that can operate in this unpredictable data environment.
2: Consider a productive role for the cloud in your big data strategy
Large enterprises in particular tend to shy away from the use of public clouds because they are nervous about security and governance. However, in many cases public clouds are ideal environments for rapid big data analytics prototyping, as long as you move your prototypes off the public cloud as soon as you are through running them.Public clouds can also be economical places to stash and to archive your raw big data. Public clouds, and how you choose to use them, should be clearly articulated in your IT policy.
3: Use your SOR data as a matrix for big data
One of the greatest challenges in big data projects is finding ways to organize the data for best results. Many companies have discovered that they already have an organizational framework for their big data in their SOR data. For this reason, many companies use the data vectors from their SOR data and simply overlay these organizational frameworks on their big data.Customer data is a prime example. Within the SOR customer master file record you already have the customer's name, address, and possibly other demographics. If you later choose to add web storefront usage patterns and propensities from this customer, you can append the web-based big data to the SOR data for a more complete picture of the customer and how that customer is interacting with your company.
4: Prune your big data as soon as possible
There is a tendency for companies to maintain all of their incoming big data in raw form, even though much of it may never be used. The concern is that future queries might require big data that is not being used today, so IT is playing it safe by just keeping all of the data.However, there is an equally strong argument for sizing down the amount of big data that you accumulate. Some of this data, such as jitter from network and machine handoffs, is likely never to be used. There is also overhead data from website interactions.
Developing criteria and methods for stripping away data that you strategically consider to be unimportant for the long as well as for the short term is one way to control the data deluge, and the cost of storing it all.
No comments:
Post a Comment