Reading List: Big Data and the Sciences

August 4, 2014
General Foundation

The U.S. Chamber of Commerce Foundation has compiled a reading list for those interested in topics related to Big Data and data-driven innovation. This list includes articles from newspapers, magazines, websites, and academic journals. Many of the more notable articles are annotated.

The reading list is divided into 13 sections. (Read the full list here.)

 

The section below includes items offering an overview of Big Data and the sciences. To add to the list, email tlemke@uschamber.com

Big Data and the Sciences

Aiden, E. and Michel, J-P. (2013) Uncharted: Big Data as a Lens on Human Culture, New York: Riverhead Books.

Akil, H. et al. (2011) “Challenges and Opportunities in Mining Neuroscience Data,” Science, 331, pp. 708-712, February.

The authors discuss how and why neuroscience requires the acquisition and integration of vast amounts of data of many types, arguing for a neuroinformatics approach to the study of the brain. The opportunities and challenges of data mining across multiple tiers of neuroscience information are discussed, and a case is made that cultural and infrastructural adaptation is necessary to profit from this approach.

Anderson, C. (2008) “The end of theory: the Data Deluge Makes the Scientific Method Obsolete,” Wired Magazine 16.07, June.

Wired’s Chris Anderson argues that Big Data renders models and, correspondingly, the scientific method, superfluous. In particular, large data sets (coupled with our ability to parse and analyze them) allow us to find novel correlations between otherwise diverse data sets; this, accordingly, allows us to dispense with causal or semantic questions in favor of dependable relations of correlation and prediction. Some examples of this are considered, such as J. Craig Venter’s biological research into bacteria.

Blair, A. M. (2011) Too Much to Know: Managing Scholarly Information before the Modern Age, New Haven: Yale University Press 2011.

Curry, A. (2011) “Rescue of Old Data Offers Lesson for Particle Physics,” Science, 331, pp. 694-95, February.

Frankel, F., & Reid, R. (2008). Big data: Distilling meaning from data. Nature, 455(7209), 30.

King, G. (2011). Ensuring the Data-Rich Future of the Social Sciences. Science, 331(6018)

Kitching, T. D., Rhodes, J. Heymans, C. (2012) “Image Analysis for Cosmology: Shape Measurement Challenge Review & Results from the Mapping Dark Matter Challenge,” April.

Koonin, S. E., Dobler, G. Wurtele, J. S. (2014) “Urban Physics,” American Physical Society  News, March.

Lemonick, M. D. (2008) “Watching the Skies: Space is Really Big – But Not Too Bid to Map,” Wired Magazine, June.           

Einav, L. and Levin, J. (2013) “The Data Revolution and Economic Analysis,” Working Paper, No. 19035, National Bureau of Economic Research.

This article discusses the ways in which big data will transform economic policy and economic research, focusing on how large-scale datasets can improve the measurement and monitoring of economic activity and the potential benefits of predictive modeling techniques. The challenges to accessing the relevant data are also discussed.

McCulloch, E. S. (2013) “Harnessing the Power of Big Data in Biological Research,” American Institute of Biological Sciences, Washington Watch, September.

Norvig, P. (2009) “All We Want Are the Facts Ma’am,” Norvig.com

Nuzzo, R. (2014) “Statistical Errors, P values, the ‘gold standard’ of statistical validity, are not as reliable as many scientists assume.” Nature Vol. 506, 13 February.

This piece argues that the traditional unit of statistical validity, the p-value, is put to work for which it was not intended.  Hence researchers must be sensitive not only to the statistical significance of the relevant phenomenon studied (determined by the p-value), but also the plausibility and general significance of the hypothesis upon which the research is conducted. Partial solutions are considered, from Bayesian analyses to a more pluralistic methodology, especially emphasizing methodological transparency and broader scientific discussion – rather than forcing the numbers to do all the talking. These issues have relevance for big data insofar as the human element of sound judgment is shown to be indispensable to contextualize frame and evaluate data.

Overpeck, J. T.  et al. (2011) “Climate Data Challenges in the 21st Century,” Science 331, pp. 700-702, February

Saleem, M. et al. (2013) “Fostering Serendipity through Big Linked Data,” Semantic Web Challenge.

Siegfried, T. (2013) “Why Big Data is Bad for Science,” Magazine of the Society for Science and the Public, November.

Simon, H. A. (2002). “Science seeks parsimony, not simplicity: searching for pattern in Phenomena,” in Zellner, A., Keuzenkamp, H. A.  and McAleer, M. (eds.), Simplicity, Inference and Modelling: Keeping It Sophisticatedly Simple (pp. 32–72). Cambridge UP.

Von Baeyer, H. C. (2005) Information: The New Language of Science. Harvard UP.

Vardi, M. (2012) “The Consequences of Machine Intelligence,” The Atlantic, October.

The author argues that technology has been destroying jobs since the beginning of the Industrial Revolution. However, unlike previous technological advances, the Artificial Intelligence revolution is not continually creating new jobs. Instead, as smart machines compete, not with “human brawn” but with the “human brain” fewer jobs are fewer jobs will remain. By 2045, it is suggested, technological progress will be such that little employment will be left for humans.

Weidman, S. and Arrison, T. (2010) Steps toward Large Scale Data Integration in the Sciences, National Research Council.

This is a summary of a workshop held to outline best practices for large-scale data integration in the sciences. The report discusses case studies in various scientific fields, from astronomy to biology, as well as the relevant technologies.