The two fields getting the most buzz today in the mainstream press are probably data science and machine learning. You’ll see them both referred to often in articles on business, data security, politics, and concerns about social media. For most of the public, they’re synonymous with Big Brother technology that knows what you want and shapes your decisions. But is that what they’re all about? And what do the terms ‘data science’ and ‘machine learning’ even mean?
Turns out that both of these fields are related and utilize tools and research from the other, but they are not the same thing. Let’s start with a description of what’s involved with each one.
Data Science: Understanding the World
There is a staggering amount of data created and available today. More data is created in one day than had been available before the twentieth century. Part of the reason for that is the near ubiquitousness of computers in a variety of forms. Between laptops, phones, and devices connected via the Internet of Things, there are millions and millions of computing devices taking in and generating data.
Data science is as its name states: the science of processing and learning from the ecosystem of data. This involves working with math (specifically statistics), computer programming, human behavior, and some subject knowledge about whatever domain the data used pertains to.
As a data scientist, you’ll gather data from multiple sources, do some predictive analysis on that data, perform tests to see if your hypotheses are true and present that data in a way that makes sense to its stakeholders, possibly by doing some data visualizations. That’s a lot of work in a range of fields that require a good amount of skill.
What do you need to know?
There are a variety of skill sets needed for being a good data scientist. Let’s break them down further.
Programming skills are crucial for managing data sets, running simulations, and a variety of other tasks. The languages of choice are R and Python for writing scripts to sort through data. Other important languages, software packages, and tools to use include MapReduce, Apache Pig, Apache Hive, and Apache Hadoop.
A statistical background is also crucial for data science work. This is the mathematics of working with large numbers and making them understandable and predictable. Specifically, within statistics, you’ll want to have a good understanding of descriptive statistics, probability theory, and Bayesian thinking.
Don’t think the only requirements for being a data scientist are based in the hard sciences. Yes, a programming and math background is important, but a fundamental part of the data scientist’s role is to interpret the data they see. That means using some creative thinking to process, model, and speculate on rows of numbers.
If this wasn’t necessary, the processing work could all be done by machines spitting out values. A data scientist looks at human trends and intuits on human behavior based on the information they see. This requires knowledge about human behavior and the reasoning behind people’s actions.
Today it’s possible to receive an advanced degree in data science. This might be necessary for some positions in the workforce, but certainly not all of them. It’s a new enough field (at least as far as the business world is concerned) that there are a lot of possibilities for education.
One important tool in the data scientist’s toolbox is using machine learning to make predictions about data. So what is machine learning?
Machine Learning: Teaching Computers to Live in the World
Machine learning is the process of having computers learn from data that is given to them. Its goal is to have machines make decisions based on that data with minimal input from humans, saving time and effort. These techniques can also be used for attacking immensely complex problems that would take human minds years to solve. Think of machine learning as a sort of super-powered exoskeleton for the human brain, extending its reach and power.
So how is this work applied to businesses and organizations today? Lots of ways.
A machine learning algorithm can quickly optimize industrial processes to cut costs by making small changes that would take a great deal of computation and integration with numerous parts. Financial data can be analyzed over large markets and spans of time to find patterns of human behavior that would be near impossible for human analysts to find on their own.
One industry that uses machine learning extensively is advertising on social media. The learning algorithms on a site like Facebook or Instagram gather data on what a user looks at, who they’re connected to, and what their browsing patterns are like.
That data is used to instruct the algorithm as to what would be worthwhile advertising to display. Similar processes are used when Amazon recommends products or when Netflix or Hulu recommends programming based on what you’ve already watched.
The number of fields benefiting from machine learning research is vast. Health care, manufacturing, planning, and I.T. have all made use of machine learning to solve complex, sprawling problems.
This sounds like a powerful force: machines that can think and process faster than humans, solving problems that would take us decades. But like the machines humanity has made for millennia, the process is as potentially fallible as humans. Human modeling and design is a necessary part of any machine learning process. That’s where skilled practitioners of machine learning come in.
What do you need to know?
The skills you need for machine learning are very similar to that of data science.
Programming is crucial as you’re work would involve making computers take in data and change their behavior in accordance with it. A strong computer science background is good to have. This would involve a lot of work with algorithm analysis and development. Python is a leading language to learn for this work.
Just like with data science, statistics are important for any work in this field. You’ll want to know probability theory and especially Bayesian thinking, which is concerned with how decisions change with added information.
Since you’re looking at the results of human behavior and modeling it in the digital realm, some background in cognitive psychology can be very useful in your training.
How do Data Science and Machine Learning Connect?
So how do these two fields compare? Generally speaking, data science is inclusive of the field of machine learning; meaning it’s a broader term that covers all the work with data, where machine learning is the very specific work as specified above.
The terms ‘data science’ and ‘data scientist’ get used for a variety of jobs. You’ll also see titles like ‘data analyst’, ‘AI expert’, or ‘expert systems’. It’s a bit of a wild west. So when looking at job postings, make sure to drill down into the specifics.
Both data science and machine learning employment possibilities are growing and show no sign of slowing down. A recent report by IBM states that positions in those fields will increase by 28% by 2020. These jobs currently pay an average of $105,00 for data scientists and $114,000 for machine learning positions. Most of these are in positions that are working for finance or IT companies. Clearly, there’s gold to be had. But, as illustrated above, these jobs require a lot of skill and knowledge.
Both data science and machine learning require some knowledge of statistics. If you don’t have a math background, don’t worry. Some coursework or reading on your own can get you up to speed on this. There are a number of online statistic courses available as well.
A computer science background is also needed for both fields. You’ll specifically want to learn about algorithms, data modeling, databases, and natural language processing. Again, many courses, books, and online tutorials are available to help get you up to speed.
With any of these areas, a coding bootcamp would help you to get you started on the fundamentals and show you what’s involved in working in these fields. Many boot camps have an introduction to data science course and almost all will offer coding in Python. Get the data science fundamentals you need through your local bootcamp.