If you haven’t heard of data science by now, I hope you’ll tell me who sold you your isolated wilderness cabin so I can get one too. Data science, and related fields like big data and machine learning, are such popular fields at the moment that companies are scrambling to get their hands on people with the relevant skills.
There are lots of ways you can get into data science. You can learn on your own, go to a data science bootcamp, or go to a traditional college.
Whichever route you choose, I’d recommend you cut your teeth doing some simple beginner data science projects. This way you’ll begin building an understanding of the material and learning to troubleshoot the inevitable problems that emerge.
Learning Python is a great way to get into data science, as there’s tons of fun beginner python projects out there. Enrolling in an online coding bootcamp or web development bootcamp is a great way to build the skills expected in a professional data scientist. In the best coding bootcamps, students learn web design, data visualization and coding skills.
Get Your Start with These Data Science Project Ideas
There is an unlimited number of things you could put on a list like this one. I tried to limit myself to truly basic tasks which don’t overlap with machine learning (no regressors or classifiers).
Below are some basic data science projects, with links to relevant datasets. I’ve assumed that you have a basic familiarity with the command line and at least one data science package, like Pandas.
- Data Gathering – This project could be one of the simplest or one of the most complicated parts of the data science pipeline. Since this is an article for beginners, I’m going to recommend that you do something relatively straightforward. Head over to Kaggle/datasets or some other place with many available datasets, download one, and load it into your environment.
- Data Cleaning – What you need to learn to do here is get a dataset, discover its flaws, and correct them. The simplest datasets I could find were two excel spreadsheets (here and here) containing data on children’s gifted and talented scores, reported by their parents. You’ll need to learn how to load in Excel files, fill in missing data, handle inconsistent formatting, and all the other things that make data cleaning so notoriously amusing.
- Exploratory Data Analysis – Take a dataset from the previous two steps or one of your own choosing and explore it. This will require understanding how large the dataset is, whether it contains missing values, and what data types are represented (i.e. the number 19 could come through as text or an integer). You will also have to do some simple plotting to get a feel for any patterns that may exist.
- Statistical Analysis – Finally, do some actual data science! It doesn’t take long for statistical analysis to get pretty complicated, and what analysis is possible depends on the dataset you’re using. You can always start with figuring out the mean, median, and quartile values of whatever data you have. Calculating z-scores and identifying values that are several standard deviations above the mean is a great next step.
It can be tough to know how to even begin something as ambitious as learning data science. But with these project ideas, you’ll be well prepared to start learning and testing your skills as you progress.