We are pleased to have community member Greg Toth present this event review. Greg is a consultant and entrepreneur in the Washington DC area. As a consultant, he helps clients design and build large-scale information systems, process and analyze data, and solve business and technical problems. As an entrepreneur, he connects the dots between what’s possible and what’s needed, and brings people together to pursue new business opportunities. Greg is the president of Tricarta Corporation and the CTO of EIC Data Systems, Inc. The March 2013 meetup of Data Science DC generated quite a buzz! Well over a hundred data scientists and practitioners gathered in Chevy Chase to hear Prof. Jennifer Golbeck from the Univ. of Maryland give a very interesting – and at times somewhat startling – talk about how hidden information can be uncovered from people’s online social media activities.
Prof. Golbeck develops methods for discovering things about people online. She opened her talk with a brief example of how bees reveal specific information to their hive’s social network through the characteristics of their “waggle dance.” The figure eight patterns of the waggle dance convey distance and direction to pollen sources and water to the rest of the hive – which is a large social network.
Facebook Information Sharing
From there the discussion turned to how Facebook’s information sharing defaults have evolved from 2005 through 2010. In 2005, Facebook’s default settings shared a relatively narrow set of your personal data with friends and other Facebook users. At this point none of your information was – by default – shared with the entire Internet.
In subsequent years the default settings changed each year, sharing more and more information with a wider and wider audience. By 2009, several pieces of your information were being shared openly with anyone on the Internet unless you had changed the default settings. By 2010 the default settings were sharing significant amounts of information with a large swath of other people, including people you don’t even know.
The Facebook sharing information Prof. Golbeck described came from Matt McKeon’s work, which can be found here: http://mattmckeon.com/facebook-privacy/
This ever-increasing amount of shared information has opened up new avenues for people to find out things about you, and many people may be shocked at what's possible. Prof. Golbeck gave a live demonstration of a web site called Take This Lollipop, using her own Facebook account. I won’t spoil things by telling you what it does, but suffice to say it was quite startling. If this piques your interest, check out www.takethislollipop.com
Predicting Personality Traits
From there the discussion shifted to a research project intended to determine whether it's possible to predict people's personality traits by analyzing what they put on social media. First, a group of research participants were asked to identify their core personality traits by going through a standardized psychological evaluation. The Big Five factors that they measured are openness, conscientiousness, extraversion, agreeableness, and neuroticism.
Next the research team gathered information from these people’s Facebook and Twitter accounts, including language features (e.g. words they use in posts), personal information, activities and preferences, internal Facebook stats, and other factors. Tweets were processed in an application called LIWC, which stands for Linguistic Inquiry and Word Count. LIWC is a text analysis program that examines a piece of text and the individual words it contains, and computes numeric values for positive and negative emotions as well as several other factors.
The data gathered from Twitter and Facebook was fed into a personality prediction algorithm developed by the research team and implemented using the Weka machine learning toolkit. Predicted personality trait values from the algorithm were compared to the original Big Five assessment results to evaluate how well the prediction model performed. Overall, the difference between predicted and measured personality traits was roughly 10 to 12% for Facebook (considered very good) and roughly 12 to 18% for Twitter (not quite as good). The overall conclusion was that yes, it is possible to predict personality traits by analyzing what people put on social media.
Predicting Political Preferences
The second research project was about computing political preference in Twitter audiences. Originally this project started with the intention of looking at the Twitter feeds of news media outlets and trying to predict media bias. However, the topic of media bias in general was deemed too problematic and controversial and they decided instead to focus on predicting the political preferences of the media audiences.
The objective was to come up with a method for computing the political orientation of people who followed popular news media outlets on Twitter. To do this, the team computed the political preference of about 1 million Twitter users by finding which Congresspeople they followed on Twitter, and looking at the liberal to conservative ratings of those Congresspeople. A key assumption was that people's political preferences will, on average, reflect those of the Congresspeople they follow.
From there, the team looked at 20 different Twitter news outlets and identified who followed each one. The political preferences of each media outlet's followers were composited together to compute an overall audience political preference factor ranging from heavily conservative to heavily liberal at the two extremes, with moderate ranges in the middle. The results showed that Fox News had the most conservative audience, NPR Morning Edition had the most liberal audience, and Good Morning America was in the middle with a balanced mix of both conservative and liberal followers. Further details on the results can be found in the paper here.
Summary & Wrap-up
An awful lot of things about you can be figured out by looking at public information in your social media streams. Personality traits and political preferences are but two examples. Sometimes this information can be used for beneficial purposes, such as showing you useful recommendations. Likewise, a future employer could use this kind of information to form opinions during the hiring process. People don't always think about this (or necessarily even realize what's possible) when they post things to social media.
Overall Prof. Golbeck’s presentation was well received and generated a number of questions and conversations after the talk. The key takeaway was that “We know who you are and what you are thinking” and that information can be used for a variety of purposes – in most cases without you even being aware. The situation was summed up pretty well in one of Prof. Golbeck’s opening slides:
I develop methods for discovering things about people online.
I never want anyone to use those methods on me.
-- Jennifer Golbeck
For those who want to delve deeper, several resources are available:
- Dr. Golbeck's presentation slides and audio from the event (MP3, ~15MB)
- Dr. Golbeck’s home page and research papers
- Dr. Golbeck’s new book titled Analyzing the Social Web
- Univ. of Maryland’s Human-Computer Interaction Laboratory 30th Annual Symposium, May 22-23, 2013
- Dr. Golbeck recommended a recent similar paper by Berkeley researchers that got some press: Private traits and attributes are predictable from digital records of human behavior.
Overall I found this presentation to be very worthwhile and thought-provoking. Prof. Golbeck was an engaging speaker who was both informative and entertaining. She provided a number of useful references, links and papers for delving deeper into the topics covered. The venue and logistics were great and there were plenty of opportunities for networking and talking with colleagues both before and after the presentation.
The topic of predicting people's traits and behaviors is very relevant, particularly in the realm of politics. At least one other Data Science DC meetup held within the last few months focused on how data sciences were used in the last presidential election and the tremendous impact it had. That trend is sure to continue, fueled by research like this coupled with the availability of data, more sophisticated tools, and the right kinds of data scientists to connect the dots and put it all together.
If you have the time, I would recommend listening to the audio recording and following along the slide deck. There were many more interesting details in the talk than what I could cover here.
My personal opinion is that too few people realize the data footprint they leave when using social media. That footprint has a long memory and can be used for many purposes, including purposes that haven't even been invented yet. Many people seem to think that either the data they put on social media is trivial and doesn't reveal anything, or think that no-one cares and it's just "personal stuff." But as we've seen in this talk, people can discover a lot more than you may think.
This post contains affiliate links.