Big Data in the Big Apple: understanding New York using millions of Foursquare check-ins

What can we learn from millions of check-ins on Foursquare? An unprecedented view into the behavior of cities. How big data is making cities easier to use.

Foursquare began its life as a way to see what your friends are up to, but it has quickly evolved into a artificial intelligence / recommendation engine that knows you and your surroundings. Location and detecting patterns based on socially generated data are key to the company. Blake Shaw, data scientist at Foursquare, talked about what we can learn about New York City from aggregating the check-ins of millions of New Yorkers at the event DataGotham, ‘a celebration of New York City’s data community’.

Foursquare has already has 2.5 billion check-ins from over 20 million people (including me and Sander Duivestein) and generates about new 80 check-ins every second. 135 million of those check-ins are in New York. Visualizing this data creates a real time activity stream that shows the beat of the city.

One question that can be asked of these types of data is if we can see some kind of activity stream per neighborhood. The answer is yes. The image on the right shows the activity stream of Soho versus the East Village. But what is more interesting here is that this opens up possibilities to define similarities and differences between parts of the city based on activity. This opens up a whole new datastream for city planning or entrepreneurs looking for a location for a new bar or diner.

The example mentioned is only one question you could ask. What about how activity is related to the weather or how social dynamics are related to activity. What makes Foursquare data different from more traditional location data is that Foursquare also knows who your friends are and who they are friends with. How do social dynamics affect the virality of a specific location is a question that a city will be able to answer in the future. Take a look at the social graph from a singel coffee bar:

The data aggregated from the millions of check-ins provides cities with new insights in how the city works. It works like a microscope enabling one to zoom in and out on a specific location. This can be a useful add-on to existing data that cities already use. You can watch the entire talk by Blake on Foursquare’s engineers blog and I strongly recommend you to watch the data visualisation of New York’s activity stream starting at 02:14. Do have any suggestions for new questions we could ask these types of data, please feel free to share them with us in the comments.

Leave a Reply