We talk a lot about the (exponential) growth of data. But with all this data, it seems unhealthy that only people with programming skills can get value out of this data by mining and analyzing. The tendency to open up this process with tools that allow non-programmers to get their hands dirty with data science has been called the democratization of data. Last night I spend some time with a tool that aims to do just that: ScraperWiki.
ScraperWiki is a web-scraping service that has been around for a while. So far the focus was on users with some coding chops or data journalists willing to pay to have someone scrape data sets for them. A new feature, currently in beta, also makes it possible for anyone to scrape Twitter to create a custom data set without having to write a single line of code. And that, is awesome.
Of course the tool is not about analyzing data, but more about generating data. But that in itself, is the first barrier non-programmers face when wanting to do something with public data from Twitter for instance.
I used the Twitter Scraper (that uses Twitter Search) to collect tweets with the word ‘glasshole’ (a term that gained traction over the last couple of weeks to refer to someone wearing Google Glass). It quickly found over 600 tweets going back to May 6 and showing me the ID-string, URL, time+date, language, amount of ReTweets, if it was part of a conversation and attached media.
After the scraping it offers you several options, one of them being a tool that automatically searches for trends and patterns. So a basic level of analysis just one click away. Here are some screens from the results:
I know that there are a lot of tools out there that do much more than what I just showed you. But we need to understand the bigger story that this is part of. The results cost me less than 5 minutes and I didn’t have to write a single line of code. Plus it’s free.
Opening up the data science space is a good thing. I will be looking for more tools like ScraperWiki and see how they evolve. If you know similar tools, feel free to share them in the comments.