You can hardly open your RSS feed these days without seeing some breathtaking new accomplishment in the field of machine learning. Everything from neural networks to hidden Markov models to Kalman filters are being used to spot tumors in X-rays, tutor students in English, recommend movies, predict the stock market, and drive cars.
This is probably why machine learning engineers are in such high demand. If working on the cutting edge of human knowledge and getting paid handsomely for it sounds good to you, get your start with these machine learning projects for beginners.
Who Should Study Machine Learning?
With the right training and enough work, almost anyone can master the basics of machine learning. Even if you don’t intend on becoming a professional machine learning engineer, being able to build simple regressors and classifiers will sharpen your math and programming skills while exposing you to a broad range of ideas that could be useful in your work.
This is especially true for software developers, engineers, and data scientists, but I see no reason why web developers and everyone else shouldn’t get in on the fun.
In order to understand machine learning algorithms you’ll need a non-trivial amount of mathematical knowledge. It’s especially useful to understand the basics of discrete math, i.e. summations, probability, and vector operations.
Surprisingly, you don’t actually need to be that good at programming to do basic machine learning work. The suite of modern machine learning libraries, like Sklearn or Statsmodels, take a lot of really complicated machinery and hide it under a slick, intuitive user interface.
Other than bragging rights there’s not much to be gained by programming the algorithms yourself, unless you’re just really hardcore or you want to focus on becoming a professional machine learning engineer.
Projects to Get You Started in Machine Learning
I’m going to assume you’re using sklearn, which comes off the shelf with some great toy datasets included to practice your chops.
- Using the boston house-prices dataset, start by building a linear regression between one predictor variable (such as per capita crime rate) and one target variable (housing price). Make sure you guard against overfitting by splitting your data into a training and test set. Once you’ve fitted the model, use one the appropriate method to generate housing price predictions. If this is easy, do the same thing again with more predictor variables.
- With the Wisconsin breast cancer dataset, build a logistic regression classifier to predict whether or not someone has cancer. As before, start with one predictor variable and move on to more afterwards.
- Load up the wine dataset and make sure you know how to build some of the other basic classifiers. I’d start with a K-Nearest Neighbors model, and then a random forest classifier. Don’t forget to make a training and test set. Once your model has been built, find out what methods exist for printing out its evaluation metrics.
- Use the iris to do some of the more sophisticated aspects of classification. Make a K-means clustering model, experimenting with different numbers of clusters to see what happens. If you’re feeling ambitious, take a stab at principal component analysis (PCA) to see if you can reduce the number of features your model takes. What happens to accuracy?
- You could do the same thing with the handwritten digits dataset, or you could use TensorFlow + Keras to see if you can’t build a simple neural network to do the job. Don’t worry, this isn’t as hard as it sounds; there are lots of network architectures online you can use.
With the ease of modern software packages, building out basic machine learning models isn’t nearly as hard as it once was. Some of the projects in this list would’ve been PhD dissertations 15 years ago, and now they’ll take less than a month if you work diligently.
Such is the power of progress!