Breaking Down the Basics of Big Data
Whenever a concept or topic breaks into the mainstream, there is a tendency for public and media discussion to oversimplify it. This is true in the case of Big Data, which is widely mentioned but remains poorly defined and understood.
It goes without saying that Big Data is not a single entity existing on a server farm in some remote locale. Instead, Big Data is a phenomenon, referring to the rising tide of traditional and digital information. Yet, what does this really mean? Back in 2001, Doug Laney described the three Vs of Big Data: volume, velocity and variety.
- Volume – We produce 2.5 quintillion (1018) bytes of data every day. For comparison, all of the data stored online by the Library of Congress accounts for just 1 trillion bytes. Furthermore, in antiquity, the Library of Alexandria in Egypt held most (if not all) major pieces of the world’s history and knowledge. Today, all of that knowledge would fit onto a single flash drive.
- Velocity – We create as much data every 2 days as was created from the birth of civilization until 2003. Given this, about 90% of all of the world’s data was created in the last 2 years.
- Variety – Data comes in many types. There is the structured numerical information from customer transactions, website usage and other data streams; there is also unstructured data, like text-heavy digital communications, blog posts, pictures, and much more that is not easily quantified; and there is multi-structured data, where there is a mix of structured and unstructured data.
The 3 Vs of Big Data help explain the essence and some of the challenges in working with data, but it falls short of detailing exactly what all this data is. Higinio Maycotte, founder and CEO of Umbel (a data company), writes in Wired about three ways to parse what constitutes Big Data.
- Smart Data – Actionable, structured data that can be segmented and analyzed with technological tools. Breaking up a larger data set into manageable silos, smart data can reveal trends and inform business operations.
- Identity Data – The data points that make up digital identity, such as purchasing habits, lifestyle preferences, and physical and e-mail addresses. This data is used in predictive modeling and machine learning.
- People Data – Targeted customer profiles built up over time by aggregating social data. Knowing what URLs people click on, the social media posts they like, their time on site and a range of other customer profile data points, organizations can better engage consumers in the digital space.
This gives a better sense of the kinds of information within the Big Data phenomenon. Yet, there remains a good deal of confusion and uncertainty, even between people at the same organization. As data expert Lisa Arthur asks in Forbes, “CMOs, when you talk about ‘big data’ in the C-suite, do you know if everyone’s on the same page?”
Arthur suggests that the absence of a definition of Big Data leads to divergent concepts of data and how it can help an organization. She offers a general definition of Big Data: “a collection of data from traditional and digital sources inside and outside your company that represents a source for ongoing discovery and analysis.”
This definition is a good start, but perhaps even more important than definition is understanding. What is needed is widespread awareness of how data can be used to drive innovation, economic growth and business success. Rather than just asking “what is Big Data?”, perhaps we might better ask, “how can I access the massive potential in Big Data and put it to work for my company?”