One important part of the discussion on Big Data is focused on the role of data scientists; the people we need to gain insights from big data by providing us with actionable information derived from all kinds of data. Some call it the sexiest job in the next ten years and the well-known report by McKinsey estimates a shortage of 200,000 skilled data managers in the U.S. by 2016. However solutions for this problem are popping up everywhere.
The first piece of evidence comes from Kaggle, a platform where companies and organizations award prizes for the best solutions to their predictive-modeling needs. I wrote about this platform a while back and Kaggle is signing up new companies and users everyday. A sign that data scientists might not have to be a in-company must if we can tap into a community that has the knowledge and power to come up with results when needed.
Another exciting new practice within the data scientist field comes from online education tools like Coursera, an educational technology company founded by computer science professors AndrewNg and Daphne Koller from Stanford University. Coursera partners with various universities and makes a few of their courses available online free for a large audience. As of August 16, 2012 more than 1,080,000 students from 196 countries have enrolled in at least one course. Regarding the need for data scientists: Coursera offers 14 courses related to statistics and data analysis. Free and open education in new fields of expertise is great and free access to the best teachers around doesn’t hurt either. Here is Stanford professor and Coursera co-founder Andrew Ng introducing his machine-learning class:
And it’s not just a good initiative, it is showing results in the data scientist field already. Several novice programmers who signed up for a free machine-learning class on Coursera have gone on recently to win predictive-modeling competitions. GigaOM reports on a guy that took a handful of free online classes last year on Coursera and recently scored his first victory in a Kaggle competition hosted by the Hewlett Foundation where he came up with a model for accurately grading short-answer questions on exams. The second- and third-place finishers in the Heritage Foundation competition also learned machine learning on Coursera.
Naturally Coursera (and similar online education tools) and Kaggle are not the primary solution for a shortage of data scientists, however they are signs of a larger movement towards dealing with big data practices in new ways. By educating more people and crowdsourcing solutions in community’s of experts, companies are offered more options in how to handle big data. And just like free access to the best teachers around doesn’t hurt, acces to more solutions doesn’t hurt either.