Data science is easily one of the most exciting, rewarding, and fastest-growing fields today. With so many resources available for learning data science and the supply of data scientists not meeting demand, it’s a great time to pick up the skill set and consider switching careers.
But it can be a little tough to know how to start. This is especially true if you don’t have any coding experience. Since learning to code requires significant time and energy, it’s good to have guidance as to which language is best.
How to Learn Data Science
Like music, cooking, or bear wrestling, you have to learn data science by doing data science. What this means is that you need to spend time solving data problems with the appropriate tools.
I recommend taking a project-based approach which broadly covers data cleaning, analysis, and visualization. Popular data science libraries like sklearn feature built-in datasets with which you can practice doing exploratory data analysis, predictive modeling, and visualization. Or you can always find a dataset somewhere online to work with!
The most important thing is to find something you’re interested in accomplishing and taking a data-centric approach to it.
Which Languages Are Good for Data Science?
If you dig into data science a little bit, you’ll see that there are two major languages and several more minor languages which nearly everyone agrees are the standard toolkit.
Python is by far the most popular language in the industry; I’m a professional data scientist who went to a data science bootcamp, and every single person I know in this field uses Python every day.
R is a popular open source language used in academia, machine learning, and statistical computing. There are quite a lot of books and tutorials written with R, and many of the best books on forecasting have all of their coding examples in R.
SQL, or structured query language, is definitely required for data science practitioners. SQL is less like Python or R and more of a way of talking to databases. It lets you grab rows and columns of data you need from a place like MongoDB or Postgresql, perform simple transformations on the data, and otherwise do what is considered ‘data engineering’.
Scala is a popular alternative to Python. Being general-purpose, it’s flexible enough to do anything you’ll need in your career as a data scientist, and being open-source, you know there’s a large and robust community around it. This matters a lot when you have questions about a problem, which will be every day.
As its name (which stands for ‘scalable language’) indicates, it’s built for speed over large datasets. As ‘big’ data gets even bigger, this will become more important.
Which Language Should I Learn for Data Science?
This depends a lot on what you’re trying to do, but it’s hard to go wrong with learning Python and SQL first. Once you’ve picked those two up, you can consider acquiring R and Scala. Not only will you be equipped with the languages used by most of the industry, having already learned two languages will make learning the other two more easier.