Genealogy and Big Data: How Are They Connected?

September 4, 2014

Many companies doing business on the Internet collect data from customers. They use data to suggest different things to buy based upon your viewing or purchase history, or they predict what word you will type next on your phone. While most data collection is considered to be only useful to the company, it can also be useful to the consumer.

Big Data is defined as “datasets that are too large and complex to manipulate or interrogate with standard methods or tools.” What does that mean to me as my family’s historian, especially as I try to uncover our roots in America? Several years ago, it would have meant spending hours poring through hundreds of years of paper documents, going to court houses to look at birth, marriage, and death records, and visiting cemeteries to find one particular person. Today, it means going to websites like or , typing in a name with a birthdate, location and death date, clicking search records, and up comes birth and death certificates, naturalization records, and census records dating back to 1790, when the first U.S. Federal Census took place. Using the Internet, discovering information about your family has become easier by using a few words, numbers, and hitting the search button.

As you research your family tree on various websites, hints will appear as the record databases are searched through to find records that may pertain to a person. These hints may not always be correct, but when they are, you discover another piece of data about someone in your family, whether it be the town they lived in during a census or learning what branch of the military they served in.

Family history data does not need to only come from the Internet. Data includes sources found offline as well:

  • Family Bibles or other religious books might have births, marriages, and death dates recorded in them;
  • Cemetery records can provide names, death, and burial dates;
  • Military headstones can provide what conflict a person served in, their military branch and rank, or even the name of the unit they served in; and
  • Photographs can also depict where a person lived and what time period they lived, based upon what they were wearing.

Sometimes going back to the way genealogical research was before the age of the Internet is the way to go. Data can’t replace the human factor, and by examining records ourselves, we can find something we would have missed if we only looked at a computer screen.

DNA testing has become the newest trend in genealogy research, and by taking the test, you are going to find all sorts of data about yourself you otherwise might not have learned. One of the many DNA testing kits available is from In this test:

  • A saliva sample is sent in to the company to be analyzed;
  • Snips from their genome are “then sent to for genetic matching against their genetic database.” (Baker 2014); and
  • A science algorithm is run to establish the ethnicity of the DNA and then is matched to the existing DNA database they have on file.

Through this test, the user is able to discover their ethnicity and make connections with possible remote cousins who share the same family members from one or two generations.

In the words of Dr. Ken Chahine, Senior Vice President and General Manager for DNA, Big data allows companies like to compare 700,000 DNA letters for a single individual against the 700,000 DNA letters of several hundred thousand other test takers to find genetic cousins. That’s a lot of computational power, and the problem grows exponentially. To make all of this possible, big data and statistical analytics tools, such as Hadoop and HBase, are used to reduce the time associated with processing DNA results.

Big Data has made big connections for both businesses and consumers. These connections can tell a person a multitude of things—their search habits, give a recommendation to a restaurant, and map out their family tree.

Alexandra Cooper is a Junior at Saint John Paul the Great High School in Dumfries, VA and lives in Alexandria, VA with her well-researched genealogy family.  


See also: