Now, an algorithm to predict Twitter trends
Washington: Indian origin researchers have come up with a new algorithm that predicts which Twitter topics will trend hours in advance and offers a new technique for analyzing data that fluctuate over time.
Twitter’s home page features a regularly updated list of topics that are “trending” - meaning that tweets about them have suddenly exploded in volume.
A position on the list is highly coveted as a source of free publicity, but the selection of topics is automatic, based on a proprietary algorithm that factors in both the number of tweets and recent increases in that number.
At the Interdisciplinary Workshop on Information and Decision in Social Networks at MIT in November, Associate Professor Devavrat Shah and his student, Stanislav Nikolov, will present a new algorithm that can, with 95 percent accuracy, predict which topics will trend an average of an hour and a half before Twitter’s algorithm puts them on the list — and sometimes as much as four or five hours before.
The algorithm could be of great interest to Twitter, which could charge a premium for ads linked to popular topics, but it also represents a new approach to statistical analysis that could, in theory, apply to any quantity that varies over time: the duration of a bus ride, ticket sales for films, maybe even stock prices.
Like all machine-learning algorithms, Shah and Nikolov’s needs to be “trained”: it combs through data in a sample set — in this case, data about topics that previously did and did not trend — and tries to find meaningful patterns.
What distinguishes it is that it’s nonparametric, meaning that it makes no assumptions about the shape of patterns.
In the standard approach to machine learning, Shah explains, researchers would posit a “model” — a general hypothesis about the shape of the pattern whose specifics need to be inferred.
“You’d say, ‘Series of trending things … remain small for some time and then there is a step,’” Shah said.
“This is a very simplistic model. Now, based on the data, you try to train for when the jump happens, and how much of a jump happens.
“The problem with this is, I don’t know that things that trend have a step function.
“There are a thousand things that could happen,” he said.
So instead, he says, he and Nikolov “just let the data decide.”
In particular, their algorithm compares changes over time in the number of tweets about each new topic to the changes over time of every sample in the training set.
Samples whose statistics resemble those of the new topic are given more weight in predicting whether the new topic will trend or not. In effect, Shah explains, each sample “votes” on whether the new topic will trend, but some samples’ votes count more than others’.
The weighted votes are then combined, giving a probabilistic estimate of the likelihood that the new topic will trend.
In Shah and Nikolov’s experiments, the training set consisted of data on 200 Twitter topics that did trend and 200 that didn’t. In real time, they set their algorithm loose on live tweets, predicting trending with 95 percent accuracy and a 4 percent false-positive rate.
Shah predicts, however, that the system’s accuracy will improve as the size of the training set increases. “The training sets are very small,” he says, “but we still get strong results.”
Of course, the larger the training set, the greater the computational cost of executing Shah and Nikolov’s algorithm. Indeed, Shah says, curbing computational complexity is the reason that machine-learning algorithms typically employ parametric models in the first place.
“Our computation scales proportionately with the data,” Shah said.
But on the Web, he adds, computational resources scale with the data, too: As Facebook or Google add customers, they also add servers. So his and Nikolov’s algorithm is designed so that its execution can be split up among separate machines.
“It is perfectly suited to the modern computational framework,” he said.
In principle, Shah says, the new algorithm could be applied to any sequence of measurements performed at regular intervals. But the correlation between historical data and future events may not always be as clear cut as in the case of Twitter posts.
Filtering out all the noise in the historical data might require such enormous training sets that the problem becomes computationally intractable even for a massively distributed program. But if the right subset of training data can be identified, Shah says, “It will work.”
“People go to social-media sites to find out what’s happening now,” Ashish Goel from Stanford University and a member of Twitter’s technical advisory board said.
“So in that sense, speeding up the process is something that is very useful,” Goel added.
More from India
More from World
More from Sports
More from Entertaiment
- UP: Jammu-bound Muri Express derails in Kaushambi
- Five unmanned parachute-like objects spotted over Mumbai airport
- Aapke Sitare: Astro prediction for May 24, 2015
- Watch: Fast N Facts @ 7:30pm
- CBSE Board Class 12th Exam Results 2015 declared
- Dawood's intercepted phone calls prove yet again he is in Pakistan
- Jaipur girl Mini Rajpal wins Miss India Deaf 2015!
- One year of Modi govt: How satisfied is the common man?
- Mumbai architecture creates double decker bus from scrap!
- Delhi's 2015 CBSE Class 12 topper M Gayatri talks to Zee Media
- One year of Modi govt: How satisfied is the common man? Part-2
- One year of Modi govt: PM addresses mega rally in Mathura
- No respite from heat for next three days: IMD
- Growing medical negligence in Delhi's AIIMS
- Police registers case after parachute-like objects were spotted over Mumbai airport
- IPL 2015 Final: MI vs CSK - As it happened...
- CBSE.nic.in 12th XII Results 2015: CBSE Board (cbseresults.nic.in) Class 12th XII Exam Results 2015 to be announced on May 25
- Heatwave claims over 550 lives so far, Delhi saw hottest day
- LIVE - Cbse.nic.in & cbseresults.nic.in Class 12th XII Results 2015: CBSE Board Class 12th XII Exam Results 2015 to be announced shortly
- IPL 2015 final: Mumbai Indians thrash Chennai Super Kings by 41 runs to lift second title
- Cbse.nic.in & cbseresults.nic.in Class 12th XII Results 2015: CBSE Board Class 12th XII Exam Results 2015 declared
- Check cbseresults.nic.in for CBSE Class 12 result 2015
- Two dead, over 100 injured as Muri Express derails in UP; Prabhu announces Rs 2 lakhs ex-gratia for kin of dead
- Mathura rally: PM Modi slams 'corrupt' UPA regime, lauds NDA govt's efforts in past one year
- Check WBBSE (wbresults.nic.in) Class 12th Results: West Bengal Board (wbscvet.org) Higher Secondary Class XII Vocational Results to be declared today at 4 PM
- PM Narendra Modi's Mathura rally: As it happened
- CBSE Class 12 results out: Here's how you can check
- Public rally on 100 days of AAP govt: As it happened
- CBSE Board 12th XII Results 2015: Cbse.nic.in & cbseresults.nic.in Class 12th XII Exam Results 2015 to be announced today at 12 Noon
- Kejriwal lists AAP govt's achievements of 100 days, slams Centre's 'dictatorial attitude'