How can machine learning algorithms find drunk Twitter users?
A University of Rochester study was able to develop a system for identifying and tracking Twitter users' drinking habits based on their Tweets.
Kacper Pempel/Reuters/File
Using Twitter to follow trends is nothing new; the social media platform is known for actively tracking popular topics and highlighting them on its website. But a new algorithm may be able to detect a different type of pattern among its users: drinking habits.
Twitter keeps track of what its users post, when they post, and where they post from, and with that data a team of University of Rochester researchers was able to develop a method for evaluating how and where Twitter users drink alcohol.
“Analysis of Twitter has become a widespread approach for geo-spatial studies of human behavior, such as alcohol consumption and exercise, and human latent states, such as sickness and depression,” the researchers wrote in a summary of their study.
“However, nearly all prior work … does not attempt to distinguish mere mentions of activities or states from self-reports of activity. Moreover, no attempt has been made to distinguish reports about future or past activities and in-the-moment reports that provide finer details when geo-tagged tweets are used to map specific locations of activities,” they added, highlighting what they hoped to address through their investigation.
In order to track regional drinking habits through Twitter, the team came up with a system with which they could identify relevant tweets. The Rochester analysts came up with a series of three questions they used to determine if a tweet originated from a drinking user: Does the tweet mention alcoholic beverages – did they use words such as “drunk,” “beer,” or “alcohol?” Is the tweet about the tweeter consuming such beverages? And, is it likely the tweet was sent while the tweeter was drinking?
The study used volunteers on Amazon's Mechanical Turk – an online marketplace where “requesters” can post tasks to be completed by human “turkers” – to best evaluate how to find drinking-related tweets. Using data from the human trials, the team was able to program a support vector machine to follow the same line of inquiry as the humans did in order to accurately find relevant tweets.
Using that initial process, and further machine learning predictive algorithms to estimate tweeters’ locations, an analysis of Twitter users’ alcohol consumption habits was compiled. All tweets in the study were taken from the New York City metropolitan area, and the results are based around drinking preferences in the city versus the suburbs, and drinking at home versus drinking away from home.
The Rochester team found that most drinkers stay relatively close to home when imbibing in both residential situations, with suburban drinkers more likely to stray farther away. The researchers also found a positive correlation between the density of “alcohol outlets” such as liquor stores and bars and the amount of Tweets sent out about drinking. While the paper notes that “correlation does not necessarily imply causation,” it cites several previous studies that arrived at similar conclusions regarding alcohol availability and drinking.
The final results painted an interesting picture of New York’s drinking habits, but also suggested that similar algorithms and research methodology could be used to “help to create a tool for improving a community’s health, given social networks can become a resource to spread positive health behaviour,” wrote the researchers. They did, however, note one significant bias in the report: the relatively high rate of young and minority users on the Twitter platform. But they said that studies in all fields see similar problems and could be weighted accordingly, and that their final conclusions were fairly successful in analyzing the New York drinking scene, with high potential for the future of complementary Twitter-based systematic studies.
“Our results demonstrate that tweets can provide powerful and fine-grained cues of activities going on in cities,” the team said.