Analyzing Big Data with Twitter

Take a crash course in Twitter’s big-data practices. Just like the students from Professor Marti Hearst’s class at the University of California at Berkeley, who could work with data sets straight from the Twitter fire hose, in a course called I290: “Analyzing Big Data with Twitter.”

Here is the course description:

How to store, process, analyze and make sense of Big Data is of increasing interest and importance to technology companies, a wide range of industries, and academic institutions. In this course, UC Berkeley professors and Twitter engineers will lecture on the most cutting-edge algorithms and software tools for data analytics as applied to Twitter microblog data. Topics will include applied natural language processing algorithms such as sentiment analysis, large scale anomaly detection, real-time search, information diffusion and outbreak detection, trend detection in social streams, recommendation algorithms, and advanced frameworks for distributed computing. Social science perspectives on analyzing social media will also be covered.

Basically, the engineering and math students spent their time in a lecture hall listening to a host of guest product managers and engineers from Twitter, who explained the company’s approaches to analyzing and dealing with the massive amount of data that flows through Twitter’s pipes every single moment.

At the end of the course 40 students from Professor Hearst’s course visited Twitter’s headquarters in San Francisco and presented the things they created using the Twitter data. One of my favorite assingsments they completed was analyzing and comparing a portion of the Twitter “conversation graph” and the “interest graph”. Conversations were found by looking for Twitter “@mentions” and interest graph by looking at the friend/follow graphs for a user. One graph that caught my attention was made by Achal Soni, using Java and the Twitter4J library to obtain 3000 tweets for four rappers (Drake, Kendrick Lamar, J Cole, and Big Sean). He extracted @mentions from these tweets, and created a graph recording edges were between the celebrities and who they were conversing with. Read more on this project over here.

Professor Hearst posted video lectures from the semester on her blog at the UC Berkeley School of Information. So we can all join the course at some level. It’s also a great oppurtunity to see how Twitter thinks and talks about theit data and their data-drivin plan for future growth. I have embedded the lecture on how Twitter computes his trending topics below, all the lectures can be found over here. 

Leave a Reply