Small data may be good enough, without the BIG investment
Nowadays there is a very popular trend – big data (see Gartner’s three part definition of big data: http://www.forbes.com/sites/gartnergroup/2013/03/27/gartners-big-data-definition-consists-of-three-parts-not-to-be-confused-with-three-vs/ ). The recommendations and approaches offered by big data proponents involve gathering huge amounts of both structured and unstructured data, and then integrating, connecting and analyzing all that information. That would seem to fit with the future of data collection, as we are going to have about 15 PB of data in total by 2015. (Though there are already other opinions on this. Some experts believe that after collecting so much data we will begin to cut back, making big data not so big. But that is definitely a topic for another post.
Right now social networks like Facebook or Twitter, big retailers, mass marketers, and large service organizations are collecting, collating and coping with huge amounts of data. The biggest of these are able to employ the people, software and hardware to actually make use of it. They are able to spread out multiple tasks on multiple nodes of multiple clusters to analyze their data in portions. Fortunately for the multitude of smaller but still data-rich organizations, there is another approach. This is what we at Coherent Solutions are dubbing small data.
Small data can produce big knowledge
Right now in your organization you have probably already employed small data to direct decisions on production, sales, marketing and more. And most of these are good decisions. Your sales are increasing, production is improving and the business is growing. Why fix something that isn’t broken?
The easy answer to that is: DON’T! What you can do is add to your processes and knowledge by making better use of the data you have. And you don’t have to employ a battalion of IT workers or invest in your own “cloud” to do it. Sophisticated modeling using data sampling will yield statistically significant information that can guide the vast majority of decisions. Research companies have known this for years. Analyzing a randomly selected terabyte of data out of twenty terabytes will yield results accurate to with 5%. In other words, you don’t NEED to know every detail about every customer to guide your decisions.
At Coherent Solutions, we are working with a number of clients to develop their decision-making based on small data principles. For instance, we transformed some very detailed source data into a high-level, representative set which we then inserted into a staging DB for further processing to a data warehouse. With this information, our client was able to make quick, highly accurate comparisons between the current and previous years’ business – with a small enough margin of error to feel confident about their decisions. By reducing their big data down to manageable numbers and reliable estimates, the company avoided a substantial investment in new hardware, architecture and human resources.
As a mathematician and from a data science perspective, I would love to be a part of a team which is creating/updating/optimizing techniques and modern data models to utilize big data. But as a BI consultant working with companies that have real world challenges and budgets, I will probably not be doing that anytime soon. Rather, I think I will be helping clients supplement their current technology and processes to improve their data analysis, yielding big results at significantly less cost.