Twitter ‘can be used to predict flu outbreaks’

The technique involves collecting data from hospitals, clinics and other sources.

Washington: A computer science expert at Southeastern Louisiana University has revealed that keeping track of disease trends such as influenza outbreaks has the potential to be far quicker and less costly by monitoring a social network program such as Twitter than following the traditional methods of disease surveillance.

A process called syndromic surveillance uses collected health-related data to alert health officials to the probability of an outbreak of disease, typically influenza or other contagious diseases. The technique involves collecting data from hospitals, clinics and other sources, a labour-intensive and time consuming approach. By monitoring a social network such as Twitter, researchers can capture comments from people with the flu who are sending out status messages.

“A micro-blogging service such as Twitter is a promising new data source for Internet-based surveillance because of the volume of messages, their frequency and public availability,” said Aron Culotta, assistant professor of computer science. “This approach is much cheaper and faster than having thousands of hospitals and health care providers fill out forms each week.”

“The Centres for Disease Control produces weekly estimates,” he added, “but those reports typically lag a week or two behind. This approach produces estimates daily.”

Culotta and two student assistants analysed more than 500 million Twitter messages over the eight-month period of August 2009 to May 2010, collected using Twitter’s application programming interface (API). By using a small number of keywords to track rates of influenza-related messages on Twitter, the team was able to forecast future influenza rates.

“Once the program is running, it’s actually neither time consuming nor expensive,” he said. “It’s entirely automated because we’re running software that samples each day’s messages, analyses them and produces an estimate of the current proportion of people with the flu.”

Southeastern’s group obtained a 95 percent correlation with the national health statistics collected by the CDC. In addition, the results were comparable to figures collected by Google with its Flu Trends service, which tracks influenza rates by analyzing trends in query terms.

Culotta said using Twitter has an advantage over Google because the high message and posting frequency of Twitter enables up-to-the minute analysis of an outbreak. Twitter, he said, reports having more than 105 million users posting nearly 65 million messages a day. Approximately 300,000 new users are added daily.

“Despite the fact that Twitter appears targeted to a young demographic, it does in fact have quite a diverse set of users,” he said. “The majority of Twitter’s nearly 10 million unique visitors in February 2009 were over 35 years old, and a nearly equal percentage of users are between the ages 55 and 64 as between 18 and 24.”

Culotta’s research was presented at the 2010 Workshop on Social Media Analytics at the Conference on Knowledge Discovery and Data Mining in Washington, DC. The work was funded in part by the Louisiana Board of Regents.