It’s 2019, and data science is one of the most attractive career options out there. Data scientists get to spend their days working with bleeding-edge tools building models and performing analyses for everything from predicting rainfall to trading the stock market. The typical data science team is staffed with a diverse array of people, from talented newcomers to grizzled stats PhDs, lending a kind of hybrid vigor to the whole field.
And there are more books, tutorials, courses, and bootcamps for data science than you can shake a stick at. But the fact remains that getting into data science can be an intimidating prospect. For this reason, the more possible entry points you have, the better. You can never know what explanation or demonstration will finally bring home a concept you’ve been struggling to understand.
One resource for learning and practicing data science which I’ve used myself, and which I’ve heard of other people using, is Kaggle. But since I’ve never seen anyone write up an explanation of how to do this, I decided to create my own.
What’s a ‘Kaggle’?
Kaggle is essentially a massive data science platform. It gathers in one place a huge number of public datasets, most of which have been sanitized and made ready for use in analysis. By itself this is pretty significant, as data gathering and cleaning is a huge part of the data science workflow.
Kaggle also hosts a number of ongoing competitions, where individuals and teams work to solve data science or machine learning problems under a set of constraints. Winning or just placing highly in one of these contests has become a big enough deal that people routinely put it on their resumes and LinkedIn profiles.
Because Kaggle users publish notebooks that are freely available for anyone to browse, adapt, and use, it has become an extraordinarily rich source of code for data science and machine learning projects.
And while Kaggle is most certainly not a proper social network, it has grown in popularity to the point that it has many active messaging boards and a substantial community. It’s a great place to pose questions and to hash out ideas.
Learning By Imitating Good Kaggle Projects
One of the things I did immediately after graduating from the Galvanize Data Science Immersive was to spend some time each day exploring and carefully studying Kaggle notebooks which utilized data science skills I wanted to master.
There were a couple of things I was looking for. I chose notebooks that:
- elegantly solved a common problem, like string matching with regular expressions;
- deployed an important algorithm, like a random forest or gradient boosting;
- created beautiful visualizations of data;
- otherwise took my skills to a higher level.
My basic process for finding notebooks I like went something like this:
- Go to Kaggle’s website.
- Either go to ‘Datasets’ (on the menu at the top of the screen) or ‘Notebooks’ (same place).
- Find something that looks interesting.
- Either read it carefully or duplicate it entirely.
This gives you two ways of tracking down learning materials. Either you can find notebooks associated with cool datasets, or you can find cool notebooks. Notebooks can be sorted by ‘hotness’ (how popular they currently are), by upvotes, and by their relevance to a particular thing you’re interested in at that moment.
Learning With Kaggle’s Courses
You can now also take courses on Kaggle. As far as I can see these come in two flavors. Sometimes individual Kaggle users will put together notebooks which are as structured and extensive as a course.
These aren’t just analyses or solutions to one-off problems, they’re meant to be self-contained expositions of a topic in data science, machine learning, or AI.
But now Kaggle itself hosts ‘micro-courses’ on Python, SQL, Deep Learning, Pandas, and numerous other topics. These consist of a series of notebooks containing explanations and exercises, complete with progress tracking.
Kaggle has come a long way since its inception, and has begun to emerge as one of the best ways to truly grow your data science skills. Be sure you’re utilizing it to its full potential.