Did Data Predict the Ebola Outbreak?

To see how corporations are responding to the Ebola virus, check the Corporate Aid Tracker monitored by the U.S. Chamber of Commerce Foundation's Corporate Citizenship Center.


In recent weeks, the deadly hemorrhagic virus Ebola has ravaged large swaths of West Africa, infecting thousands of people, including many caretakers. The advance of this disease is as tragic as it is expected. The strength and spread of deadly zoonotic viruses closely stalks the interaction between man and nature as well as the use of modern medicines.

Part of the answer to the threat of pandemics is an arms race of sorts for the latest drugs to counter the newest mutations. The other half of the response is to know when and where a pandemic may strike well in advance, which is where Big Data comes in. Traditional epidemiological surveillance networks are a crucial part of the Center for Disease Control’s (CDC’s) response to pandemics, but married with improved real-time data we can hope to better predict and counter horrific diseases like Ebola before they do more damage.

I recently spoke with Qing Wu, a panelist at our upcoming data-driven innovation event and a senior economist at Google, about his company’s work on public health. Google handles more than 7 billion queries a day, pouring in data from across the world in timely, granular form. It used to be that cross-world data varied dramatically in relevance, but those concerns have faded. Data quality has improved as the number of mobile users has expanded around the world, reaching 4.5 billion this year. There remains a degree of selection bias in who has an internet connection and where, but those biases have become easier to correct, too. As Qing has noted before, Google is fast becoming a “barometer of the world.”

That’s why Google can be remarkably useful at anticipating and managing infectious diseases, particularly in resource-poor environments. Knowledge and time are two things public health officials need the most, yet are often denied this when a disease first starts to spread. Official data suffers from a time lag and rumors often run rampant just when a disease is most vulnerable to being tackled. Google can look at what users are looking for in a moment of crisis and see what correlations emerge. With enough analysis, and by combining multiple datasets with triangulated social media data, Google can provide the right information for the right area in the right time.

Google Flu Trends is the most well-known example of the company’s efforts on behalf of digital disease detection. Flu Trends is a statistical model  that aims to “now-cast” flu hot-spots each day. It runs algorithms that number-crunch over 40 different search terms that Google has found to be correlated with flu prevalence. When Flu Trends first appeared in 2008 it was a revelation. The chief of the epidemiology and prevention branch in the CDC's influenza division came out and said that he was “excited about the future of using different technologies, including technology like this, in trying to figure out if there's better ways to do surveillance for outbreaks of influenza or any other diseases in the United States.”


Recently Google Flu Trends has been critiqued for over-reporting incidents compared to the CDC’s lagging data. The researchers who found the disconnect did so in order to point to ways Google could continue to improve Flu Trends, as it does every year. That is to say, they agreed with Google’s aim to consistently recalibrate their model. While commentators have used these errors to guess at Big Data’s demise, I see it as evidence of its success. This is how the Big Data era is supposed to work, after all: constantly learning and adapting, avoiding the inevitable traps along the way. And the same researchers who critiqued Google’s work also found that, combined with the CDC’s official stats, Flu Trends offered better monitoring results than what the CDC could do alone.

Now with the alarming spread of Ebola, Google’s pathbreaking Flu Trends approach is finding new life. HealthMap, a data analytics project started in 2006 out of Boston, has tracked Ebola’s rise with remarkable precision. Researchers at Harvard Medical School and Boston Children’s Hospital combine data from Google News, Twitter, GeoSentinel, and the World Health Organization to plot the instances of Ebola outbreaks across the world. HealthMap was one of the first groups to identify Ebola’s development, and in the weeks that followed even beat out the World Health Organization’s tracking efforts by two weeks.


You get the sense that we’re just in the early stages of seeing what better data gathering and analysis can do for spotting, managing, and preventing deadly diseases. Google Flu Trends was the tip of the iceberg, and with HealthMap we’re seeing even further inroads.

In talking with Qing later, he said that applying Google’s know-how to public and personal health is what really excites him the most. The potential is enormous, as is the need. 


See also: