For the most part, traditional news outlets lead and blogs follow, typically by 2.5 hours, a new computer analysis of news articles and commentary on the Web during the last three months of last year’s US presidential campaign.
The finding was one of several in a study that Internet experts say is the first time the Web has been used to track — and try to measure — the news cycle, the process by which information becomes news, competes for attention and fades.
Researchers at Cornell, using powerful computers and clever algorithms, studied the news cycle by looking for repeated phrases and tracking their appearances on 1.6 million mainstream media sites and blogs. Some 90 million articles and blog posts, which appeared from August through October, were scrutinized with their phrase-finding software.
Frequently repeated short phrases, according to the researchers, are the equivalent of “genetic signatures” for ideas, or memes, and story lines. The biggest text-snippet surge in the study was generated by “lipstick on a pig.”
That originated in US President Barack Obama’s colorful put-down of the claim by Senator John McCain and Alaskan Governor Sarah Palin that they were the genuine voices for change in the campaign. Associates of McCain suggested that the remark was meant as an insult to Palin.
The researchers’ data points to an evolving model of news media. While most news flowed from traditional media to blogs, the study found that 3.5 percent of story lines originated in blogs and later made their way to traditional media. For example, when Obama said that the question of when life begins after conception was “above my pay grade,” the remark was first reported extensively in blogs.
And though the blogosphere as a whole lags behind, a relative handful of blog sites are the quickest to pick up on things that later gain wide attention on the Web, led by Hot Air and Talking Points Memo.
The Cornell research, like so much of the data mining on the Web, does raise the issue of whether something is necessarily significant just because it can be measured by a computer — especially when mouse clicks are assumed to represent broad patterns of human behavior.
“You can see this kind of research as further elevating the role of sound bites,” said Jon Kleinberg, a professor of computer science at Cornell and a co-author of a paper on the research that was presented two weeks ago at a conference in Paris. “But what we’re doing is more using them as the approximation for ideas and story lines.”
“We don’t view quotes as the most important object, but algorithms can capture quotes,” Kleinberg said. “And we see this research as using a rich data set as a step toward understanding why certain points of view and story lines win out, and others don’t.”
The paper, “Meme-tracking and the Dynamics of the News Cycle,” was also written by Jure Leskovec, a postgraduate researcher at Cornell, who this summer will become an assistant professor at Stanford, and Lars Backstrom, a doctoral student at Cornell, who is joining Facebook. The team has set up interactive displays of their findings at memetracker.org.
Social scientists and media analysts have long examined news cycles, though focusing mainly on case studies instead of working with large Web data sets. And computer scientists have developed tools for clustering and tracking articles and blog posts, typically by subject or political leaning.
But the Cornell research, experts say, goes further in trying to track the phenomenon of news ideas rising and falling.
“This is a landmark piece of work on the flow of news through the world,” said Eric Horvitz, a researcher at Microsoft and president of the Association for the Advancement of Artificial Intelligence. “And the study shows how Web-scale analytics can serve as powerful sociological laboratories.”
Sreenath Sreenivasan, a professor specializing in new media at the Columbia University Graduate School of Journalism, said the research was an ambitious effort to measure a social phenomenon that is not easily quantified.
“To the extent this kind of approach could open the door to a new understanding of the news cycle, that is very interesting,” he said.
A challenge in this kind of research, Sreenivasan said, will be to account for and model how quickly online news sources and distribution networks are changing. Sreenivasan pointed to social media, especially the rapidly rising Twitter, as an informal but highly influential news recommendation and distribution network.
“Even from last fall to today, the dynamics of the news cycle are very different because of Twitter,” he said.