Big Data in the Time of Cholera
On a summer morning in 1854, cholera came to London. Within two days, panic set in as the vibrio cholera bacterium consumed its victims, taking over half to the grave. Six hundred and sixteen soon lay dead, most felled within the first week. Central London became a ghost town as residents frantically poured out through sewage and fetid air to outrun death. And then, a miracle happened. Barely a week after the epidemic started, Dr. John Snow entered St. James parish and presented before their board a carefully plotted map showing each of the first 83 cholera-related deaths. The result was electric. It immediately became clear that nearly all of the deaths were centered on a single water pump; the tight cluster of dots on the map was undeniably clear. The very next morning, the pump’s handle was broken—and so was the epidemic.
Dr. Snow did much more than save countless lives that day. He showed the world’s cities how vital it is for public health that drinking water and waste water be housed in separate systems. Snow also demonstrated one of the earliest examples of Big Data at work. His example offers a number of surprising and important lessons for our data-driven future.
As Alon Peled finds in a new paper, “Snow possessed a Big Data state of mind.” He began his work on cholera long before the outbreak hit. What leads to cholera, Snow asked, and what might be correlated to its rise? He was skeptical of the popular notion that cholera spread through a “miasma” of bad air, and said so in a pamphlet published nearly 5 years to the month before the 1854 outbreak. With this mission established, he set out to systemically gather all of the open data on cholera’s instance in London from the General Register Office. After finding the mortality data, he linked it to the location of death in order to get what Peled calls “two-dimensional data” that Snow could then correlate and visualize. It was a simple matter then to plot a basic map that allowed the data to tell the story of cholera’s spread.
Once the cholera epidemic hit and he realized the value of his preparatory work, Snow immediately set out to deepen his data set by visiting the houses of those first 83 who died. This was not an easy task. Some of the victims came from beyond the confines of London, which at the time of Snow’s research was still in turmoil. Nevertheless, he traveled to relative’s houses to personally interview them on the specific activities of the deceased in the day’s leading up to their death. As these first-hand accounts came in, Snow situated these findings within his existing work—and began to find that they aligned.
There is something worth applauding too about how Snow went about his work. Peled points out that “his sense of purpose, ingenuity, clever data design, collaboration, humility, and humanity are the kind of qualities that must characterize Big Data scientists” [emphasis added]. He took pains to humbly share credit where it was due. He also thanked the family members of those who had died. He saw more than dots on a map, after all, but real people whose deaths would ultimately save far more lives than Snow could have imagined.
Here is where we get to the 7 key lessons that Dr. John Snow and his cholera map can teach us today:
1. “Treat data subjects as individuals” – Snow was consistent and rigorous in his data collection in order to account for human complexity. That focus also led him to be notably humane in his approach. As Peled notes, this is why “he is considered the father of modern epidemiology.”
2.“Completeness is more important than ‘bigness’” – Snow recognized that he faced limits in how much data he could gather on the epidemic’s victims. He chose to focus on a representative sample of individual cases, and to then gather as much information as possible on them. Snow continually kept in mind his mission of finding the cause of cholera’s spread (even if he never understood what bacteria were), as completeness requires focus and a clear set of goals.
3. Collaborate across disciplines – Snow “invited other scholars and laymen to contribute insights and data to his project,” Peled finds, and he recognized that they had the potential to see things that he didn’t. Moreover, he needed the protection and support afforded by the gatekeepers in his pool of study. In this case, the church. Snow collaborated with the St. John’s parish leadership throughout the process of collecting data and visiting families, and then to act on his findings in order to remove access to the infected water.
4. Be relentless – Snow continued collecting data long after the outbreak subsided. He went on to show that cholera was up to ten times more likely around the water pumps owned by a particular company that drew its water from the most polluted sections of the River Thames. The heavy amounts of fecal matter contained in the sewage, along with the state of the victim’s digestive systems, aligned precisely with Snow’s initial argument that cholera spread through contaminated drinking water. These findings would go on to form the basis for most public health work in the coming decades.
5. There will be resistance, but forge on – Snow’s findings did not align with the accepted views of the medical community—that cholera spread through miasma—and he received tremendous pushback as a result. The editor of the well-respected medical journal Lancet asked at the time, “Has he any facts to show in proof? No!” and declared that he had “fallen down through a gully-hole.” The Board of Health even commissioned a study to debunk Snow’s theories of water-borne contagion. Moreover, Snow’s work waded directly into a nexus of political and economic interests that rapidly organized against him. Water delivery workers lobbied against his call to shut down particular water sources, and commentators wrote that his views on contagion were a threat to the “natural” ordering of society and the economy.
6. Data visualization enables action (and can yield surprises) – Snow’s dot maps were a profoundly creative use of his data, all the more so considering the reclusive doctor behind it. He never thought of his graphics as significant research tools, but he did see in them a useful way to help decision makers understand and talk about his work. Peled points out that “his 1854 map was merely the ‘marketing vehicle’ that he developed to promote this Big Data project.” Florence Nightingale similarly plotted out the causes of death among soldiers in the Crimean War in order to successfully argue for more sanity conditions for soldiers. Still, when Snow plotted out the spots where Londoners were dying from cholera, he found a few significant outliers that supported his initial hypothesis. A brewery and work house near the infected water pump were surrounded by little dots on Snow’s map—somehow they escaped the epidemic. Upon visiting these locations in person, Snow found the one thing that made these places unique: they enjoyed an independent water source.
7. “Information is power and power corrupts” – Snow understood that his theories on the spread of cholera had to be consistently questioned and that his dataset would never be fully complete. He looked for ways then to verify his findings, especially using outside sources. Yet attempts at falsifying his theories and findings did not sway him from continuing to operate under a central mission focused on a single idea.
Dr. John Snow is more than the father of epidemiology. He is the forebear of Big Data. His life is a lesson for all who walk in his footsteps.
Note: The Guardian has an interactive version of Dr. Snow’s groundbreaking map.