Big Data and the Change It Brings

General Foundation

Introduction

In antiquity, much of the world's recorded knowledge was located in one spot: the Library of Alexandria in Egypt. Nearly every culture and empire of that era recognized the library as the epicenter of scholarship. Today, however, information is increasingly decentralized. Alexandria's vast repositories of papyrus scrolls could now fit onto a single flash drive. In the 21st century, digital information is being created, analyzed, and stored at an astonishing rate. Consider that 90% of the world’s data has been produced in just the last two years. This explosion of information is known as Big Data, and it portends a radical paradigm shift in business, much like that of the personal computer and the Internet.

Big Data is already an integral part of every sector in the global economy—as essential a factor of production as physical and human capital. Much of our modern economic activity simply could not function without it. In fact, prominent research consultancy firm Gartner predicts that 6 million jobs in the United States will be generated by the Big Data-driven information economy over the next four years. The effect of Big Data is already evident in manufacturing, finance, and especially retail, where companies have been assembling increasingly sophisticated consumer profiles. British retailer Tesco collects 1.5 billion pieces of data on its customers’ shopping habits every month and uses that information to adjust prices and send targeted coupons. Even the recent election was shaped in large part by Big Data. Both the Obama and Romney campaigns relied on voter profile data sets to guide their ad buys and get out the vote efforts. The Obama data team was even able to carefully analyze the television viewing habits of critical voting blocs, and it used that information to analyze where and when the campaign could advertise to voters at the lowest cost.

The economic and business possibilities of Big Data and its broader significance for social and technological advances are critical issues that business leaders should be out in front of, lest they be left behind. Yet because Big Data is so, well, big, it can be a difficult concept to comprehend, let alone explain. Press accounts often conflate the topic with different data trends and information technology (IT) issues.

This report aims to bring some clarity to the matter by clearly defining the concept, identifying its implications for businesses and entrepreneurs, and outlining industry best practices and actionable steps for integrating Big Data into existing operations. Finally, it identifies some Big Data issues on the horizon that may arrive sooner than we think. Mobile users will soon have their hands on data engines as powerful as IBM's Watson—a development which could disrupt several industries at once. The ability to generate value by identifying useful information from this digital detritus will be a key indicator of business longevity. 

 

What Is Big Data? 

Big Data is a catch-all term that is often misunderstood and frequently misapplied. A Harris Interactive survey asking companies to define the term produced answers that were all over the map. Twenty-eight percent thought it meant a "massive growth of transaction data." Twenty-four percent thought it referred to new technologies for managing massive data, and 19% defined it as the "requirement to store and archive data for regulatory compliance." In fact, the first two responses are both correct, but incomplete, while the third suggests some real puzzlement. This conceptual fuzziness has led TechCrunch editor Leena Rao to call for Big Data’s banishment from the business lexicon.

Such a rash move would be both premature and unnecessary. It's true that Big Data is more of a dynamic than a "thing"—but it's a dynamic that can be described as large pools of data that can be captured, communicated, aggregated, stored, and analyzed. That definition comes from an oft-cited McKinsey Global Institute (MGI) report on the implications of Big Data for different domains. We might also add that this data is often so large that it exceeds the processing capacity of database systems and analytics software. Three Vs—volume, velocity, and variety—are commonly used to characterize the different dimensions of Big Data. IBM includes a fourth: veracity.

Defining these terms in reference to Big Data provides a more complete picture of the phenomenon:

      Volume: The sum total of data of all types, including financial records, Facebook posts, and cell phone GPS signals.

      Velocity: The speed of data communication. For time-sensitive processes (such as catching fraud or monitoring emerging trends), Big Data must be analyzed as it streams in order to maximize value.

      Variety: The range of formats, including both structured and unstructured data sets.

      Veracity: The accuracy of the data. 

 
Both MGI and IBM's research makes it clear that the main driver of Big Data is the nearly incalculable amount of transactional data being produced by companies, financial institutions, and online intermediaries. This includes trillions of bytes of information about buyers, suppliers, and operations of critical interest to businesses and financial analysts. In the physical world, a proliferation of sensors embedded in automobiles, energy meters, and smart phones creates additional metadata which is sent back to corporate servers. Add in the millions of individuals around the globe creating personalized content on social media sites, the expanding wealth of knowledge available on Google Books and Wikipedia, and the massive information architecture supporting the entire digital world, and there is a torrent of information cascading through the universe.

 

Not only is this data being generated at unprecedented rates—some 2.5 quintillion bytes of data each day—but a significant portion of it is captured and stored on massive servers. Digital storage has become increasingly cheap over the past decade with advances in both hardware (magnetic disk technology) and online storage models (cloud computing), but our ability to process those vaults of data has not kept up the pace. Normal data becomes Big Data when it exceeds our ability to process and analyze it with traditional tools. The size of Big Data varies across sectors, ranging from a few dozen terabytes to multiple petabytes (thousands of terabytes). How big is that? Even the smallest Big Data sets dwarfs the aforementioned Library of Alexandria, which only held about 64 gigabytes worth of books, research, and translations—a mere fraction of a percentage of a terabyte. 

 

How Is Big Data Being Used?

Companies that can effectively utilize Big Data will have a huge comparative advantage over less-tech-savvy rivals. The Economist argues that the information revolution is already disrupting established industries and business models. For example, Big Data is improving the health care market through services like Microsoft's HealthVault, a cloud-based platform that enables patients to compile and store personal health information from multiple sources in a single online depository.

Big Data is transforming the manufacturing sector into a service industry through sensors that enable companies like BMW to monitor their products and alert customers when they need repairing. Big Data is also producing innovation in government as well. The Swedish government has been analyzing a wide variety of real-time traffic data—including GPS information from vehicles, speeds and flows from sensors on motorways, congestion charging rates, and detailed weather reports—to more efficiently manage the highway system. The result has been a decrease in traffic congestion and accidents.

The retail industry remains the front line for the Big Data business revolution. This shouldn't seem surprising: Retailers have long sought price advantages over competitors by tracking customer purchasing habits to forecast seasonal buying patterns and optimize inventory levels. In the 1970s, Wal-Mart visionary Sam Walton was able to implement his radical cost-cutting philosophy through a partnership with IT-savvy executive Roy Mayer and his data processing protégé Royce Chambers. Both Mayer and Chambers convinced Walton to overhaul the company’s logistics and upgrade the computer system that tracked merchandise sales and orders. This investment in IT gave Wal-Mart a critical edge over less tech-adept rivals and led to unprecedented efficiency and huge profits. It also foreshadowed a wider adoption of retail forecasting technology, especially as more advanced tools became available in the 1980s and 1990s.

The abundance of real-time customer and interaction data in today's retail environment (both physical and online) can be transformed into new revenue streams and market opportunities through Big Data analytical tools. MGI estimates that retailers effectively utilizing Big Data can potentially increase operating margins by more than 60%. The possible applications are nearly endless. For example, Web logs can be analyzed to show how consumers navigate through Internet storefronts. This data can be combined with sales figures to compare the volume of website traffic for a particular product against the product's sales. Heavy traffic and low sales signal uncompetitive pricing. In the past such a low selling item may have simply been discarded. Retailers like American Apparel and Mont Blanc have successfully utilized such analytic services to improve the layout of their stores and increase in-store sales.

Tracking and analyzing a customer's purchasing history is one of the most basic steps a retailer can take to accumulate a “big” dataset. Loyalty cards such as those issued by Safeway, Tesco, and many other companies are ostensibly used to reward frequent shoppers with discounts and allow them to accumulate points for future bonuses. Yet each swipe of the card sends transaction data—what was bought, where it was purchased, and how it was paid for—into a data bank profile of each customer’s purchasing history that also includes phone numbers, home addresses, and other personal information that was given when individuals signed up for the card. There are many mutually beneficial aspects to this system. Retailers can identify when customers need a refill on a product, and then theysend out a timely coupon or other promotional material urging them to buy the refill. However, the insights companies can gain from such profiles often seem too personal. For example, Target is able to model a female shopper's characteristics accurately enough to identify not only when she is pregnant, but what trimester she is in. One Minneapolis-teen discovered this the hard way when her father became suspicious after the company began sending her coupons for baby clothes and cribs.

Suffice it to say, the privacy implications (and related liability issues) of Big Data are serious. For now though, let's focus on a few more positive aspects. One of the more innovative uses of Big Data is what's called “now-casting.” The term refers to use of real-time data to describe contemporaneous activities before "official" data sources become available. One of the most exciting—and informative—examples is Google Flu Trends, a Google spin-off service that can identify possible flu outbreaks up to two weeks earlier than official health reports by tracking the incidence of flu-related search terms. When the Google data is correlated with Centers for Disease Control (CDC) reports, the estimates it offers are 97% to 98% accurate. A similar approach can also be used to indicate economic trends. A decrease in Google search queries related to unemployment claims can signal the end of a recession much sooner than official government statistics and can provide an accurate snapshot of consumer sentiment, corporate health, and social interests. 

Big Data, Big Decisions

As the previous examples make clear, the data revolution has already disrupted many sectors of the economy. As technology trends accelerate and converge, the entire business environment will be transformed. Unlocking the potential of Big Data is a puzzle for business executives and entrepreneurs, but it is also an opportunity. Large data sets and sophisticated analytics can create new products, enhance existing services, significantly improve decision making, mitigate and minimize risks, and produce valuable insights about operations and consumer sentiment.

All the same, working with Big Data can be intimidating, even for tech veterans and savvy startups. Sorting through reams of unstructured data and strange-looking statistical categories can leave one “lost in a data hole.” Here, are several lessons that can help businesses approach Big Data and better harness its potential.

 

Adopt Early

As the Wal-Mart example previously cited shows, companies that develop and integrate a new technology ahead of their rivals can gain a clear competitive advantage. This is especially true for smaller companies, which often have greater organizational flexibility and are thus better able to experiment with new marketing opportunities. Big Data has traditionally been associated with big businesses boasting large data centers and full-time data specialists. Yet software firms like Intuit and SiSense are developing analytics applications that small and midsize businesses can afford and immediately utilize. Even existing programs like Google Analytics, Facebook Insights, and Quantcast can capture and analyze data from digital media and generate a wealth of valuable information.

Of course, the prerequisite here is to establish a digital footprint and be proactive in social media outreach and website development. Online and social media assets are small businesses best tools for gathering customer intelligence. Businesses can easily identify the most frequently used search queries that their customers employ to access each page on their sites, how customers move around their sites when they get there, and what Twitter and Facebook posts they respond to. All this information can be used to optimize online operations and increase sales. 

Craft a Goal-Oriented Plan

Carefully defined goals are critically important for both new adopters formulating their first Big Data dive as well as for experienced data analysts recalibrating existing operations. Without well-articulated goals, the sheer volume of information available can prove overwhelming.

It may be helpful to first identify two or three pressing business problems, and then consider how Big Data can help in mitigating or solving them. With that outcome in mind, the task of sorting through available data becomes more manageable as it shifts focus to only the most relevant numbers and statistics. For example, a common goal of small and large businesses alike is increasing website traffic. Yet any web marketer will say that traffic alone isn't a very valuable metric. Data analysis results will be more useful with more narrowly defined and relevant key performance indicators. How many visitors to company sites are signing up for newsletters or purchasing products? Analytics software, such as the aforementioned Google Analytics, can help compare traffic against these goals.

Of course, Big Data can produce much more interesting results than just Web traffic—it can inform important operational decisions. For instance, a small but growing business would be well served by knowing when its enterprise is large enough to justify hiring new employees. These sorts of decisions used to be made from the “gut.” Today, the Intuit Small Business Index can provide an insightful snapshot of regional small business trends to help inform that decision. The index is a measure of employment for firms with 20 or fewer employees derived from aggregate and anonymous online employment data from a statistical sample of employers using QuickBooks and its online payroll software. An analysis of the index lets companies see when local labor costs are increasing or decreasing, which may inform their hiring decisions. 

Collect, Manage, and Store Data

The necessity of collecting data for analysis may seem self-evident, but the need for good data can't be stressed enough. According to IBM, one in three business leaders doesn’t trust the information they use to make decisions. That's an incredible statistic. Information is useless if it's not valid or not perceived to be valid. Just as companies aim to hire the best and the brightest employees, they must also seek the most up-to-date and accurate data available. Accept no substitutes. Think of data as intelligence that guides decision making throughout all areas of business.

As the data accumulates, it can quickly overwhelm one person’s ability to manage and store it. The problem is amplified for medium-to-large businesses that move beyond point-of-purchase financial data and basic website analytics and begin recording and coding customer-support calls, analyzing shifts in inventory levels, and tracking online ad revenue. For large operations, storing massive data sets is a complex task, which can require special techniques and algorithms. The logistics of this process and necessary IT infrastructure need to be outlined up front. 

Hire the Right People

Trying to process and understand Big Data without the right support is a daunting task. As business expands, it may make sense to hire a full-time datadirector who can produce insightful and accurate analysis in a timely fashion. Consider picking up recent graduates; their tech acumen will compensate for lack of experience. Carnegie Mellon University, California Polytechnic State University, and the University of California at Berkeley offer degree programs in Big Data-oriented disciplines. Furthermore, “business analytics” is gaining popularity as a curriculum focus within M.B.A. programs at Yale, Wharton, and the Kelley School of Business at Indiana University. If a company is interested in hiring, it should act quickly: Statistician is the new "it" job, but there are not enough Americans with Big Data and analytics skills to meet the growing demand. Tech leaders have called on Washington to reform high-skilled immigration policy to supplement America's growing tech needs and stop the ongoing brain drain. 

Tell Stories

Big Data can reveal signficant information about a company’s audience. If used effectively, it can also tell that audience an interesting and compelling story about that company or business. This new approach to marketing represents an important new way to reach customers. Until recently, data in advertising was little more than nutritional information on the back of food packages, easily ignored and essentially powerless in shaping consumer choices. That has changed in recent years as customers have become more data empowered through instant access to product reviews and social media-derived recommendations, all of which inform their purchasing decisions. The smartest of the “smart” consumers know everything about a product before they step inside a store, and they make their buying decisions based on which retailers and brands have earned their trust. 

Companies should aim not only to provide smart customers with the best and most up-to-date information about their products and services, but to package that information as a story that creates a stronger connection to their brand. During my time at JESS3, the data visualization firm that I co-founded, we established a niche by specializing in data-driven storytelling. Companies came to us with dense charts and spreadsheets depicting trends and inflection points. We transformed those numbers into visual narratives accessible and engaging to a large audience. 

Competitive brands want to (1) understand what their data says about them, (2) use that data to help tell their story, and (3) do so in visually arresting ways. For example, take Nike’s “A Better World” campaign. This project, which was built around internal data, focused on the company’s efforts toward sustainability. Its efforts earned favorable responses even from skeptics, like those at TreeHugger.com. Nike was quick to utilize its own data as a means of shaping its public image. 

Nike is not the only business on the forefront of the data movement. EnergyHub, an energy software startup that provides users with detailed data on their energy consumption habits, is collecting previously unheard of data and presenting it in creative ways that attract not only media attention but also new customers. By harnessing the power of data, companies can step out in front of their competitors, both at home and abroad. 

Make the Right Privacy Choices

A White Plains, New York, newspaper stirred a national furor after it published an online map identifying the home addresses of 44,000 pistol permit holders in three counties in New York state. The information was obtained legally under a Freedom of Information Act (FOIA) request, but the decision to publish the data was highly controversial. The incident highlighted the accessibility of Big Data and likely increased awareness of related privacy concerns. 

Yet even more personal and private data is collected about consumers all the time with little to no publicity—leading some observers to cite the specter of Orwell’s Big Brother. Digital skeptic Nicholas Carr argues that Big Data could soon produce a “nightmare world” where “moment-by-moment behavior of human beings—analog resources—is tracked by sensors and engineered by central authorities to create optimal statistical outcomes.” Alistair Croll at O'Reilly Radar makes the case that Big Data is a civil rights issue: The same technology that can personalize marketing can also be used to discriminate against particular demographics. As one might suspect, this could eventually result in restrictive federal regulations.

Certainly there is historical precedent for such policy. In the 1960s, Congress was forced to take action in response to public outcry about “redlining”—the banking practice of marking a red line on a map to delineate neighborhoods where they would not invest. The Fair Housing Act of 1968 prohibited redlining based on race, religion, sex, familial status, disability, or ethnic origin. Complaints concerning telemarketing calls to homes led to the establishment of the National Do Not Call Registry in 2004. Three years later, 72% of Americans had registered their phone numbers on the list—completely decimating the telemarketing industry. Companies dependent on Big Data could experience a similar decline should data collection and application become restricted behind red tape and litigation. 

Businesses would be well advised to stay in the good graces of both the public and legislators by developing internal privacy best practices that aim to keep their databases secure, limit the exposure of customer information, and make sure that the information collected on consumers doesn't turn into a public relations disaster. It’s a lesson that some IT companies have learned the hard way: When Libyan rebels finally seized control of the country in 2011, they discovered that the Qaddafi regime had been using sophisticated surveillance and analytics technology supplied by prominent French firm Bull SA to track nearly all the online activities of the country’s 100,000 Internet users. Following a legal complaint from human rights organizations, one of Bull's subsidiaries is currently the subject of a judicial inquiry in Paris. 

Commit to Operating Business Differently

A recent Harvard Business Review article identified three mutually supportive abilities that companies need to have in place to fully capitalize on Big Data. First, they must sort and manage multiple sources of data. Second, they need to build advanced analytics models for predicting and optimizing outcomes. Third, they require support from management to restructure the organization so that data and analytics produce better decisions. All of these points suggest that transforming a company into a data-driven enterprise requires a commitment by the entire organization to establish technical architecture and maintain operational support. For large businesses especially, the organizational implications of Big Data necessitate executive-level attention and cross-department coordination. 

Forecasting the Future: Social, Mobile, Cloud

The next few years will represent a big tipping point for Big Data in terms of technology, market share, and economic impact. Mobile sensors and similar technology will soon be ubiquitous, straining the capacity of traditional IT and the business infrastructure it supports. Companies already adept at data accumulation and analysis will secure their market share, while startups and disruptors will push aside also-rans. The research firm IDC expects the Big Data market to grow to $16.9 billion by 2015, which represents a compound annual growth rate of 40%. Gartner predicts that this Big Data expansion will produce 4.4 million tech jobs globally. Although the salaries for different tech positions vary tremendously, the median salary for a data scientist is reported to be $98,600. The employment increase will be amplified by a multiplier effect: Each IT job created by Big Data will generate three more non-IT positions.

"Every budget is an IT budget," Gartner's Peter Sondergaard argued during his keynote address at the company's annual symposium. "Technology is embedded in every product."

Sondergaard believes that this transformation is being driven by the convergence and mutual reinforcement of social, mobile, and cloud technologies. Social networks such as Facebook are reaching the boundaries of audience growth, but businesses are only now integrating social media into the core of their operations. Gartner predicts that 10 organizations will each spend more than $1 billion on social media over the next three years. Traders alreadyutilize social-media algorithms to gauge investor confidence in the market and identify emerging trends in real time. The combination of cloud storage, hi-tech closed-circuit cameras, and big data analytics software may completely revolutionize security operations over the next decade. Such technology was on display at the 2012 London Olympic Summer games, and casinos are hoping it will let them better identify card counters and automatically assess player skill level and strategy with minimal operator oversight.

Mobile devices are increasingly at the point of entry for social networks and many other applications. The Economist says that the number of mobile phones and tablets surpassed the number of laptops and PCs for the first time in 2011. Gartner says that in 2016, two-thirds of the mobile workforce will own a smartphone, and 40% of the workforce will be mobile. By 2020, the number of mobile-connected devices operating across the globe should reach 10 billion. The data emanating from mobile phones holds particular interest to businesses because it is easier to link directly to individual users and can be used to provide a personalized, contextual user experience. The linking of mobile devices with financial transactions also represents a coming sea change. AppleGoogle, and Microsoft are all making investments in near field communication (NFC), a short-range wireless technology that allows for the automatic transmission of data between devices in close proximity. NFC-enabled smart phones function like mobile wallets, and retailers and credit card companies want to use the technology to facilitate in-store shopping apps and mobile payments.

Cloud computing is a perfect complement to big data since it offers unlimited storage space on demand. Cloud users do not have to commit to expensive software or a hardware infrastructure expansion—they can simply rent the resources they need as they go, only paying for what they use when they use it. Cloud computing usage has grown exponentially over the last few years. According to an IBM survey, most companies anticipate the cloud will replace on-premise servers for most or all their needs by 2015. Most important, the combination of mobile devices and cloud servers makes Big Data completely decentralized. Big Data is now truly everywhere—and on the move. Sondergaard says that 

We are just at the beginning of realizing the cost benefits of cloud, but organizations moving to the cloud are also attracted by the new capabilities they do not get today. It is bringing new approaches to designing applications, specifically for the cloud, and providing more resilience by architecturing [sic] failure as a design concept. Cloud also teaches us about services and service levels, and the contrast between what the business wants for outcomes versus IT’s old methods of getting there. 

Along with increased mobility comes context-aware computing—a mix of location awareness, social network integration, and smartphone hardware such as cameras and microphones. These components currently operate separately, but the next leap in mobile computing will occur as they begin to interoperate and connect. This new wave of tech development will enhance the way we interact with all digital devices and dramatically amplify existing privacy and security concerns. Big Data will only grow bigger in the years ahead. Today, a mere 1% of all physical objects are connected to the digital network. Increasing that number to even 5%, 8%, or 10% will completely change the way we conceptualize the divide between the physical and virtual world.

 

Conclusion

Much of the business press has hyped Big Data as the “next big thing” without properly defining the concept or specifying actionable steps businesses can take to effectively utilize it. What the business press needs to do is better explain the rewards, risks, implications, and opportunities Big Data has and advance a larger dialogue about its economic impact and social implications. Business leaders and entrepreneurs should continue to contribute to this discussion by detailing how they put Big Data to work for their organizations.

While the significance of Big Data is well established, it is worth reemphasizing that without vision and careful planning, even a treasure trove of Big Data will only serve to distract rather than drive business success. Indeed, the problems previously associated with "small" data—unreliability, lack of context, relative value—are only amplified as they grow larger. As The Economist has sagely observed, Big Data is ultimately "no substitute for sound intuition and wise judgment."Business leadership, then, is the critical link between data and data-driven success. Skillful executives will seek out the best tools, the best people, and the best information. With careful planning, today's companies can become tomorrow's data leaders. 

 


Further Reading 

Barton, Dominic. Court, David. “Making Advanced Analytics Work for You.” Harvard Business Review (Magazine). October 2012. 

Barton, Dominic. Court, David.  Get Started with Big Data: Tie Strategy to Performance.” Harvard Business Review (Blog). 1 October 2012. 

Bay, Monica. “Defending Big Data.” Law Technology News. October 2012. 

Building with Big Data: The Data Revolution is Changing the Landscape of Business.The Economist. 26 May 2011. 

Chen, Hsinchun, et al. “Business Intelligence and Analytics: From Big Data to Big Impact.” MIS Quarterly. December 2012. 

Chui, Michael. Manyika, James. "Big data: The next frontier for innovation, competition, and productivity." McKinsey Global Institute. May 2011. 

Croll, Alistair. “Big Data is Our Generation’s Civil Rights Issue, and We Don’t Know It.” O'Reilly Radar. 2 August 2012. 

Decoding Big Data. New York: Portfolio Penguin. 2013. E-Book. 

Duhigg, Charles. “How Companies Learn Your Secrets.” The New York Times Magazine. 16 February 2012. 

Dumbill, Edd. “What is Big Data? An Introduction to the Big Data landscape.” O'Reilly Media. 11 January 2012. 

Ingram, Mathew. “The Increasingly Blurry Line Between Big Data and Big Brother.” Gigaom. 1 February 2013. 

Lohr, Steve. “The Origins of ‘Big Data’: An Etymological Detective Story.” The New York Times. 1 February 2013. 

Rao, Leena. “Why We Need To Kill ‘Big Data.’” TechCrunch. 5 January 2013. 

Savitz, Eric. “Big Data Analytics: Not Just for Big Business Anymore.” Forbes. 28 December 2012.