Making sense of it all:
To put all of these sources into some kind of useable, understandable order is difficult and there are many opinions, however, leading the charge in servicing this potential new ‘big thing’ is Oracle, who IDC believe to have the most balanced and most comprehensive offerings in both storage and analytics- and whilst there are clear competitors at every level, it makes sense to use the Oracle definitions at this time.
The McKinsey Global Institute estimates that data volume is growing at 40% per year, and will continue to grow 44 times between 2009 and 2020. But while it’s often the most visible parameter, volume of data is not the only characteristic that matters.
To clarify matters, the 3 Vs of Volume, Velocity and Variety are commonly used to characterize different aspects of big data. They’re a helpful prism through which to view and understand the nature of the data, however, those 3 are incomplete- from the ECA perspective there are an additional 2 V’s that are as important, Value and Veracity (what we would also term ‘Integrity’ a concept that has been around for a very long time where Integrity = Accuracy, if data is not accurate it’s commercial use is limited).
- Volume. Machine-generated data is produced in much larger quantities than non-traditional data. For instance, a single jet engine management system can generate 10 Terra Bytes (TB) of data in 30 minutes. With more than 25,000 airline flights per day, the daily volume of just this single data source runs into the Petabytes (1 million gigabytes). Smart meters and heavy industrial equipment like oil refineries and drilling rigs generate similar data volumes, compounding the problem. (Whilst this is impressive from a technical viewpoint- there would- on the face of it, seem to be little of commercial interest in these vast volumes of data which are, or should be restricted to tight commercial confidentiality, even IPR.) However- what is also true is that increasing data volume beats improving your modelling, given modern analytics and the ease with which analytical algorithms can be generated 600 data points would produce a better forecast than a dozen and would, for example, predict demand more accurately.
- Velocity. Social media data streams – while not as massive as machine-generated data –produce a large influx of opinions and relationships valuable to customer relationship management. Even at only 140 characters per tweet, the high velocity (or frequency) of Twitter data ensures large volumes (over 8 TB per day). The Internet and mobile era means that the way we deliver and consume products and services is increasingly instrumented, generating a data flow back to the provider that can be harvested as part of the sea of ‘Big Data’. Online retailers are able to compile large histories of customers’ every click and interaction: not just the final sales. Those who are able to quickly utilise that information, by recommending additional purchases, for instance, gain significant competitive advantage.
- Variety. It is self evident that data will not always be present in a way that is perfectly formatted for use or analysis. Traditional data formats tend to be relatively well described and change slowly. In contrast, non-traditional data formats exhibit a bewildering rate of change. A common thread in Big Data is that data sources are diverse, and don’t fall into neat relational structures, therefore, a common use for Big Data analytics platforms is to extract ordered meaning out of unstructured mass data.
- Value. The economic value of different data varies significantly. Typically there is good information hidden amongst a larger body of non-traditional data; the challenge is identifying what is valuable and then transforming and extracting that data for analysis.
Jeff Jonas, chief scientist at IBM’s Entity Analytics group says : “The value of data is proportional to the context it’s in. Making better sense of the observable space and reacting faster [allows for] the best edge.” Traditionally in military or intelligence circles mass data is not intelligence until it is analysed, with context and association added to then become valuable.
- Veracity. All of the above means very little if there is doubt about the accuracy of the data being used and this can come about in various ways. Uncertainty due to incomplete data, entry errors, processing, sensors, social media, latency of information, deception, modelling approximations, etc. Incorporating inaccurate data into the analytical environment introduces unknown variations that resonate along any management initiatives. When data is spread out over multiple systems, different data standards, formats etc. the error factor is potentially magnified exponentially. Accurate and relevant contextual data that is reliable and delivered quickly has huge competitive advantage. To quote the American general Patton “he who gets there firstest, with the mostest usually Wins”!
What does this mean for potential commercial clients?
Commercial organisations are being feted with the notion that using the transactional data they have been storing for decades, and analysing that alongside the ‘treasure trove’ of newly available unstructured data, will yield in-market competitive advantage.
As a result, more and more companies are looking to include non-traditional yet potentially very valuable data and their traditional enterprise data in their business intelligence analysis.
The commercial challenge now is to make sense of what data is potentially valuable and what is not, then how to make sense of it all and what to do with it to give a commercial gain over competitors – who will probably be trying to do the same thing!
How companies go about that will naturally depend on which industry they are in, however, the process remains largely the same. Companies will need to have the ability to acquire the necessary data, organise it into useable chunks, analyse it and then come to implementation decisions, whilst not falling foul of the Regulator (if there is one, Regulators recognise a ‘duty of care’ and can determine the jurisdictional challenge).
This is the second instalment of a three-part story on Big Data – part two can be found here, and part three will be uploaded on December 19th.