Wonder, Junk and Innovation

April 18, 2014

Thomas A. Edison once opined, “To invent, you need a good imagination and a pile of junk. 

Indeed, there’s value in wading through junk – or through venues historically viewed as all the “wrong places” – to realize breakthrough insights and innovations. But when it comes to innovation in genomics, a “switch” in orientation toward a quote from Socrates might better represent the opportunities: "Wisdom begins in wonder."  

Significant innovations are emerging out of global, big-data enabled DNA research collaboration. In 2012, the National Human Genome Research Institute announced the results of the Encyclopedia of DNA Elements (ENCODE), a five-year international study of the regulation and organization of the human genome. The goal was to build an exhaustive list of functional elements in the human genome, and to delineate the regulatory elements that control cells and circumstances in which a gene is active. The analysis was daunting, involving 442 consortium members, 32 research institutes and 1650 individual experiences. The researchers generated 15 trillion bytes of raw data.

The results specifically refuted the prevalent scientific belief of the time asserting that all but a small percentage of our DNA is “junk.” For years before the ENCODE study, the vast stretches of DNA between our approximately 20,000 protein-coding genes (more than 98% of the genetic sequence inside of our cells) was written off as junk DNA. It was “Big Data” that helped reveal the components between the genes were not junk at all.

Importantly, had this widely-held belief not been refuted by Project ENCODE and consigned to the history books, the world would be missing out on some significant medical and other contributions. Now, future opportunities are wide open to discovery, and resulting innovations are coming fast. Project ENCODE involved deploying a new people, process and technology framework for massive, collaborative research and for affecting the efficacious dissemination of findings to rapidly yield breakthrough innovations.

One key ENCODE finding has truly been a watershed development: those elements of DNA that are referred to as genes are but a very small piece of what makes our cells and our bodies work. In contrast, what had been hidden and became so valuable after all the analysis was the extensive presence of critical “switches” within DNA. The resulting biomedical approaches and technologies owing to this discovery promise vast societal impact. 

The breakthrough was described in a cancerfocus.net forum denoting the practical applications derived from ENCODE. From an applied science perspective, most of the changes to cells that affect cancer don't occur in our genes but occur within the 40 million different “switches” that control the genes, switching them on and off in complicated and nuanced ways. 

Researchers have also linked gene switches to an array of human diseases, such as multiple sclerosis, lupus, rheumatoid arthritis, Crohn’s disease, and celiac disease. New innovation-enabling approaches include catalyzing basic science studies to identify the genetic basis of complex diseases, such as diabetes, cardiovascular disease, and neurological disorders. This will in turn result in advancing other innovations, such as enhanced biopharmaceutical and agricultural production.  Indeed, recent genome modifications have been used in applications as diverse as: correcting mutations that cause genetic disease and inhibiting HIV infection of human cells; engineering cell lines to produce biopharmaceuticals; and even generating pesticide-resistant crops. 


A Collaborative Approach to Mega-Data

Beyond breakthrough findings, ENCODE also innovated the process for R&D itself. ENCODE redefined how people worked together on massive research and applications. Pardis Sabeti, an assistant professor at Harvard University, said, "You need the large projects that really galvanize effort [and] get people working with each other across groups to create these resources that would not be possible in any one lab alone."

A fundamental aspect of ENCODE, for example, was that all data generated has been and will continue to be rapidly released into public databases. This has radically changed the research publishing model. Rather than just publishing papers, ENCODE enables access to collaborative “threads” (like the Nature ENCODE Explorer) and wiki management. ENCODE outputs include a vast analytics portal, 741 Wiki collaborative-content pages, threads and more. Today, ENCODE results can be freely accessed by anyone on the Internet via ENCODE’s portal, or at the University of California, Santa Cruz Genome Browser, the National Center for Biotechnology Information, and the European Bioinformatics Institute.

A recent genomeweb.com article looked at the success of the Human Genome Project overall, including ENCODE’s mega-scale approach. It reviewed a debate in the research community about balancing funding for consortium programs with individual grants. The article noted, “The real advantage of these large-scale programs lies in the databases and other infrastructure they generate. Done properly, these resources are so enabling that they can change what's possible for the rest of the scientific community.”

Beyond simply fostering large-scale efforts that can yield vast impacts, ENCODE also suggests that purpose and passion might even trump prestige, power and profit in understanding what truly drives breakthrough innovation. With groups of scientists worldwide pitching in on ENCODE, there was no guarantee that every scientist would get credit for his or her work. People recognized that even truly great contributions could get lost in the collaborative work producing an outcome. Yet, they persevered, using data for public (rather than personal) good.

Matthew Meyerson, an associate professor at Dana-Farber Cancer Institute and member of The Cancer Genome Atlas, said, "To do this, I think you just have to say, 'I'm going to go ahead and do this because it interests me, and I think the results are going to be important and I'm not going to worry too much about its impact on my career.'"

While the yield from ENCODE is enormous and growing,  perhaps the most valuable outcome is the realization that there is a new way of innovating through harnessing really big data that sets the benchmark and lays the foundation for rapid global impact. Given the impacts that we’re already beginning to experience from Project ENCODE, we should establish a new model of innovation management to which we aspire. As with ENCODE, the use of Big Data, generated by substantial collaborations, can create significant impacts. Through this approach, we can drive purposeful and passionate innovation planning, funding and operations.