By Joel Gurin, Founder of OpenDataNow.com
Last year, the Los Altos, California Historical Commission officially designated a new historic site: the garage where Steve Jobs and Steve Wozniak built the first Apple computers. At this writing, a motherboard from one of those machines, slated to go to auction in late October, is expected to fetch about $400,000.[i] Tech aficionados are fascinated by the artifacts left behind by the tinkering geniuses who launched the computer revolution late in the last century.
But many of today’s innovators won’t leave behind such physical traces. They’re working with less tangible but equally powerful resources. Rather than motherboards and CRT screens, their raw material is data—pure information, housed invisibly in the cloud, but able to have a real impact in the real world.
While everyone knows by now that we’re living in the midst of a data revolution, even the experts are finding it hard to figure out what it means. The most widely discussed trend has been Big Data, a concept that elicits a range of reactions. To some, Big Data means a world that looks like George Orwell’s 1984 on steroids: a world where every citizen’s activities, finances, movements, and private conversations are tracked and analyzed by governments, large corporations, or both. To others, Big Data is a boon, an explosion of information that can be mined for commercial applications, marketing insights, and rapid scientific progress.
Beyond reconciling these two worldviews—which are not necessarily incompatible—even defining what Big Data is can be difficult. A recent informal survey for the University of California, Berkeley School of Information came up with 43 different definitions from experts in different aspects of data science.[ii] My own attempt at a definition was that “Big Data describes datasets that are so large, complex, or rapidly changing that they push the very limits of our analytical capability. It’s a subjective term: What seems ‘big’ today may seem modest in a few years when our analytic capacity has improved.”
Whatever definition you choose, the use of Big Data depends on two basic factors: who owns it and how they control it. Large retailers may collect Big Data on their customers, and government agencies collect data related to national security, but these kinds of Big Data are not available for public use. While these kinds of data can help the organizations that collect it develop new products, strategies, and insights, they are not tools for widespread innovation.
In contrast, Open Data—a different but related resource—can be used by innovators of all kinds. Unlike Big Data, Open Data is characterized simply by the fact that it is available to all. In my book, Open Data Now, I described it as “accessible public data that people, companies, and organizations can use to launch new ventures, analyze patterns and trends, make data-driven decisions, and solve complex problems.”[iii]
Governments at the federal, state, and local level are making scientific, demographic, financial, healthcare, and environmental data, among other kinds, available at little or no charge for the private sector and the public to use. At the Governance Lab (GovLab) at New York University, where I am a senior advisor, I direct the Open Data 500—a comprehensive study to identify, categorize, and analyze more than 500 U.S.-based companies that use Open Data as a key business resource. They’re demonstrating a business paradox: this free resource can be used to launch companies that develop millions or even hundreds of millions of dollars in value.
The Open Data 500 includes companies across business sectors, and their numbers are growing rapidly. (Click here for a table of 25 Open Data Companies founded in 2010 or later, organized by business category.)
The largest category in the Open Data 500, interestingly enough, includes companies that are not focused on any one specific area but are in the business of making it easier for other businesses to use Open Data. These “data/technology” companies provide platforms and services that make open government data easier to find, understand, and access. Companies like this will have a multiplier effect: their success will help make many other data-driven companies successful as well. Some of the most active sectors for Open Data companies include:
Business and Legal Services: Companies are managing, analyzing, and providing Open Data for business intelligence and business operations, including the use of patent data, data for competitive intelligence, and data to facilitate international trade.
Education: Several startups are using data about the cost and value of educational institutions. Most colleges are now required to disclose the amount an individual student is likely to pay after the college’s financial aid options are taken into account. New education websites are using this kind of data to help students assess their options.
Energy: Data-driven companies are helping improve energy efficiency in commercial buildings, making solar energy and other forms of renewable energy more practical, and allowing homeowners to monitor and reduce their energy consumption.
Finance and Investment: Data from the Securities and Exchange Commission (SEC) has powered investment firms for decades, and some new companies are combining SEC data with other data sources for faster, more accurate, and more usable analysis. Others use different kinds of data to evaluate pension plans, offer comparisons of credit cards and financial services, or help consumers avoid fraudulent charges. And some are helping small- and medium-size enterprises (SMEs) get the capital they need by using Open Data to do due diligence on those SMEs for potential lenders.
Food and Agriculture: Several companies are practicing “precision agriculture”—using Open Data to help farmers adapt to climate change and increase the profitability of their farms. The Climate Corporation, which was sold to Monsanto in the fall of 2013 for about $1 billion, has become a leading example of a successful Open Data company.
Governance: Companies are springing up to help local governments organize and publish data on finances, city operations, and more with easy-to-use software that helps their residents understand city programs and helps city managers assess their success.
Healthcare: The diverse uses of Open Data in healthcare may transform the entire healthcare system. Open Data companies are helping individuals find low-cost, high-quality care and manage their health better with access to their medical records. Beyond these practical applications, data will enable new approaches to diagnosis and treatment, using predictive analytics to determine which kinds of patients will respond best to which medical interventions and under what circumstances. New companies are launching to put all this data to work. Venture capitalists reportedly invested more than $2 billion in digital health startups in the first half of 2014.[iv]
Housing and Real Estate: Real estate websites aggregate large numbers of listings from around the country and do much more as well. They now offer data on schools, neighborhood “walkability,” crime rates, and many other factors that can affect a potential homeowner’s decisions. They can combine federal Open Data with state and local data to provide an in-depth view of cities, neighborhoods, and individual houses.
Transportation: The availability of new, usable transportation data is transforming this sector as well. Different companies provide detailed directions and traffic advisories, traffic analytics to help transportation planners, and safety data to improve the trucking industry.
The Open Data 500 study has found that all these companies use a wide variety of business models, serve diverse kinds of customers, and earn revenue in many different ways. The graphic below describes and gives examples of six revenue models: licensing, advertising, consulting fees, analytics fees, subscription models, and lead generation.
All these business opportunities will expand as the amount and quality of federal, state, and local Open Data improve. The Obama administration established an Open Data Policy in May 2013 to make more government data available and useful, with a special focus on business needs.[v] When he announced the policy at a technology center in Austin, Texas, President Obama said, “Starting today, we’re making even more government data available online, which will help launch even more new startups. And we’re making it easier for people to find the data and use it, so that entrepreneurs can build products and services we haven’t even imagined yet.”[vi]
The Open Data Policy requires government agencies to get feedback on ways to improve from the companies and organizations that use their data. To facilitate that process, The GovLab has begun a series of Open Data Roundtables that bring together staff and officials from federal agencies with their diverse data “customers.” The GovLab released a public report on the first of these roundtables, which was held with the U.S. Department of Commerce in October. The GovLab has also held a roundtable with the USDA and has events planned with the Departments of Energy, Education, Labor, and Transportation, among others. The Commerce Roundtable highlighted seven ways that that Department, as well as other federal agencies, can improve the data they provide:
- Make it easier to discover and find data, for example, by publishing full inventories of the data that an agency has available;
- Improve access to data by making it easier to download and by providing data in different formats for different users’ needs;
- Improve data quality by making the data more complete, valid, and accurate;
- Collect data more frequently and from more sources, and share the data widely through both government programs and public-private partnerships;
- Make data interoperable, that is, make it easier to combine and compare data from one government agency with data from another.
- Use new strategies to store and disseminate data, including public-private partnerships, to make it more widely available; and
- Treat data users as customers by engaging with them and getting their input and feedback on a regular basis.
All of these improvements will be needed to make open government data as useful as it can be. The ultimate goal is to make all government data open by default—to set it free unless there is a specific privacy, security, or other reason not to. One of the paradoxes of Open Data is that you don’t know exactly how it will be used until it’s released and people start experimenting with it. That makes its uses and its value hard to predict. But for today’s data entrepreneurs, that’s also the beauty of Open Data. It presents a huge range of possibilities to discover, develop, and explore. And you don’t even need a garage.
Joel Gurin is a leading expert on Open Data— accessible public data that can drive new company development, business strategies, scientific innovation, and ventures for the public good. He is the author of the book Open Data Now and senior advisor at the Governance Lab at New York University, where he directs the Open Data 500 project.
[i] Todd Wasserman, “Rare Apple Computer from 1976 to Be Auctioned for About $400,000,” Mashable, October 7, 2014.
[ii] Jenna Dutcher, “What Is Big Data?”, datascience@berkeley Blog, September 3, 2014.
[iii] Joel Gurin, Open Data Now (New York: McGraw-Hill, 2014), 9.
[iv] Eric Whitney, “Power to the Health Data Geeks,” NPR Morning Edition, June 16, 2014.
[v] Executive Office of the President, Office of Management and Budget, “Open Data Policy – Managing Information as an Asset” (Memorandum for the Heads of Executive Departments and Agencies), May 9, 2013.
[vi] The White House, Office of the Press Secretary, “Obama Administration Releases Historic Open Data Rules to Enhance Government Efficiency and Fuel Economic Growth,” White House blog, May 9, 2013.