Data science is one of the most attractive career options of the past couple of years, with people from all walks of life transitioning into positions that combine analysis, statistics, machine learning, programming, and computer science to draw insights out of numbers.
If you’ve been looking into data science you probably have some questions. There’s a lot of confusion surrounding what data science is and what it entails, how it differs from fields like machine learning, and what the journey to becoming a data scientist involves.
One common source of misunderstanding is just how data science differs from statistics, with which it has much in common.
We’re going to address this issue right now.
What Is Statistics?
Statistics is a field which addresses itself to the problem of appropriately collecting, processing, and drawing understanding from large quantities of information.
It’s a highly interdisciplinary field, looking at times like mathematics, at times like philosophy, and impacting everything from psephology (the analysis of election numbers) to betting on baseball games to investing in the stock market.
There are a number of important ideas that have made their way into the popular imagination via statistics, including the difference between risk and uncertainty and the consequences of rare, high-impact ‘black swan’ events.
What Is Data Science?
Data science is a field lying at the intersection of computer science, programming, and statistics. Its focus is on using tools like cloud computing, parallel processing, and machine learning to guide business decisions and otherwise make sense of the world.
The term ‘data science’ is notoriously hard to pin down definitively, and the problems solved by a data scientist will depend enormously on their place of work. A lot of what I do is building financial models to spot trends in the price of cryptoassets and seeing if these prices correlate with other time series data. I have a friend whose entire job is building different kinds of neural networks with TensorFlow and Keras, and another that uses so-called ‘unsupervised learning’ algorithms to find hidden patterns in legacy data.
But we’re all considered data scientists.
How Are Statistics and Data Science Different?
From the previous two sections it’s obvious that data science and statistics have a lot in common while ultimately being different.
Statistics is primarily a theoretical discipline which builds tools for making sense of data and acting under uncertainty. Data science is a broader term encompassing the application of techniques from statistics and lots of other places to accomplish goals.
Hopefully with this clarity, you’ll be one step closer to becoming a data scientist (or at least being less confused in conversations about data science)!