How Big Data Could Save the Newspaper Industry

June 18, 2014
General Foundation

For four years I had the job growing up that a lot of young men had—I was a “paperboy” for the local newspaper, in this case The Pittsburgh Press. This was back in the Stone Age, as my kids would call it, before the Internet when the news actually came to your front door and not your immediate fingertips via an electronic screen.  

My old first job has certainly changed as larger delivery services have replaced the local paperboy/girl routes but no industry has probably encountered greater change and disruption as the newspaper business.  The reason—the Internet.   

For decades, delivering the news required huge investments and a broad infrastructure of journalists, printers and distributors. The Internet changed all of that by stripping away the financial barriers to entry, allowing most anyone to write and publish. Because so much of digital content is free, the old business models fell apart, leading to diminishing profits, layoffs of thousands of journalists, (and paperboys/girls)  and even the shuttering of some of America’s oldest news publications—including my old paper. 

As any reporter or news junkie will tell you, news travels fast and people want to be kept in the immediate loop for world events, sports scores or the birth announcement that is heralded for the family down the street. As such ,newspapers can no longer assume readers will find and select their product from a narrow set of news outlets. There is so much content being produced so quickly that everyone must compete for attention, from the “Gray Lady” in New York to the blogger sitting in his basement.

While there are myriad tactics and strategies for capturing attention online, targeting readers through Big Data is increasingly giving media outlets a way back to profitably compete in a dynamic, digital world.

Newspapers rely primarily on advertising revenue. One consequence of the digital age is that readers increasingly consume their news online—not in a printed paper so a readers eyes no longer see the full page ad for the big sale on cars or mattresses. Moreover, classified ads went by the wayside with the growth of Craiglist and similar websites. This trend is matched with falling subscriptions, also taking a chunk out of print ad profitability. Since 2005, print ad revenue for the newspaper industry has fallen 58 percent, reports the Wall Street Journal. For the New York Times Co.’s media group, ad revenue dropped from $1.26 billion to just $666.7 million during the last decade. 

One way to address falling profits is by increasing (or at least maintaining) subscriptions. The New York Times announced in February that it hired its first chief data scientist to help the newspaper better adapt to the changing times and hyper competitive environment. He is leading a team that is using machine learning to predict which readers may unsubscribe (allowing marketing and sales teams to target these readers with messages and offers that may keep them as subscribers). For the industry overall, finding ways to grow subscribership is a big priority.

“Decades ago, nearly any new subscriber would lead to a favorable return and high retention rate,” writes Sean O’Leary, Director of Communication for the Newspaper Association of America. “Those same principles simply do not exist anymore. And while technology has expanded the ways for readers to consume information, it has also provided new methods for newspapers to identify and ‘micro-target’ new subscribers based on a wealth of information.”

The data-based concept of microtargeting has been successfully used in other ways, such as in politics, famously used by the  successful campaigns of Presidents Bush and Obama. For the news industry, it means breaking the overall potential readership into segments that can be reached with targeted content and advertising. For example, figuring out which news stories are most popular (and why) allows newspaper to tailor their content to their readership. 

Researchers at the University of Bristol conducted a study on how to statistically identify the kinds of news stories and writing that most appeal to a news outlet’s readership. The researchers used a machine learning technique (Ranking Support Vector Machine) to identify why popular news stories are popular. They paired articles on a large scale, matching one popular article and one non-popular article. The machine learning algorithm then identified the words that frequently occurred in the popular articles. Using this technique, the researchers were able to predict which of the articles would perform better about 70% of the time. 

These kinds of data-based predictions could be used to guide the kinds and style of content a news outlet produces, better targeting their readers’ interests (and upping their advertising and subscription revenue by consequence). What is more, newspapers can monitor how an article performs in real time, allowing editors and publishers to continually adjust content and marketing tactics to capture more readers and ad dollars. Data flows can also be broken down along subject matter lines, which would help newspapers better target silos of readers, tailoring advertising, marketing and content at a more granular level.

Print news aside, there are a host of online tools that can leverage reader information in massive databases to put the right content in front of the right person at the right time. Take Outbrain, a “content discovery platform.” A paid service, companies use the tool to promote their content. We’ve all seen news stories that conclude with a list of other articles or media. While there are some simple website widgets that simply list related content, Outbrain and other tools like it use behavioral targeting, drawing on what the Internet user has read or bought in the past to determine the kinds of content that may be relevant to the reader. 

Behavioral targeting data tools help cultivate readership, but challenges remain, limited in part by the kind of data available. For example, imagine a household computer where a child reads about a new Disney movie (and in doing so, allows a “cookie” to be left in their browser history). Then the child’s mother takes to the Web looking for news about the U.S. economy. Data tools cannot tell the difference between the child and its mother, because it is limited to analyzing sites that have been visited, not the person who visited them. Thus, the child may be given content about the economy or the mother, ads for cartoon products. This can lead to wasted advertising. 

Even as organizations improve their ability to target and acquire readers, we see yet again the overwhelming good that comes from data. Readers are receiving more relevant content at a faster pace and often, at a better price. Meanwhile, even as there are significant growing pains for the industry, data is helping newspapers begin to recover from the Internet’s massive disruption.

 

Now if we could just help the paperboys/girls out….