Statistics is one of those fields that everyone vaguely knows they should learn more about. You can hardly open a newspaper, read a nonfiction book, or turn on the TV without reading, hearing, or encountering people referencing one statistic or another. In some unfortunate circumstances the same statistic is tossed around by multiple people on opposing sides of an issue.
Statistics plays a role in debates ranging from gun control to immigration to climate change in politics, the effect of monetary policy in economics, the usefulness of drugs in psychology, and almost everywhere else.
For this reason, a working knowledge of statistics has become of paramount importance to anyone wanting to navigate the modern world.
But for all that, such an understanding is distressingly rare. Indeed, it’s fair to say that most people couldn’t give a coherent definition of statistics, or distinguish it from related disciplines like probability and data science, or tell when statistics is being abused for the purpose of deception.
This piece will try to take a few steps towards alleviating this issue among Career Karma’s audience. We’ll begin by trying to define statistics while separating it out from fields with which it’s commonly confused. Then we’ll enumerate the many reasons statistics is important, offer some comments on how to spot deception when statistics are being used, and finish with a guide to learning more about this important subject.
It’s easy to imagine that statistics is a dry subject to be relegated to dusty old books that only math nerds read. But it’s actually exceptionally lively, with more schools of thought, heated debates, and long-standing rivalries than you’ll find outside of Game of Thrones.
So let’s explore what statistics has to offer.
There are a few different ways of defining statistics. According to the statistics department at the University of California at Irvine, statistics is “…the science concerned with developing and studying methods for collecting, analyzing, interpreting and presenting empirical data.”
Each part of the definition is important, so let’s examine each in turn:
As a field, statistics contains many different techniques for processing and understanding data, with new ones being developed all the time. If your knowledge of statistics comes entirely from one or two classes in college, it might be easy to miss how active statistics is.
You might imagine that collecting data would be relatively straightforward, but it can be unexpectedly tricky. Say that you want to study happiness, because you’re interested in whether people are happier when they live in places with more sunshine. In order to do that you need to come up with a way of measuring happiness. If you want to go the easy route and just ask people on a survey, then you need to come up with some way of making sure people mean the same thing by ‘happiness’. If you don’t, you’ll wind up with misleading results.
When most people think about statistics, what they usually imagine is the ‘analyzing data’ part. It’s true that a lot of the meat of statistics consists of activities like determining whether two groups of people are really different from each other, and this is almost always done with some kind of statistical test.
All the data and fancy math in the world doesn’t do much good if you don’t understand what any of it means. Interpreting the results of statistical tests is almost as important as correctly running them in the first place. This is even more true when you are working with non-technical people who need statistical information to make good decisions in business or in life.
There’s an entire subfield of statistics which is concerned with visualizing results. I’m constantly amazed at how difficult it can be to create charts and graphs which are accurate and don’t lend themselves to being misinterpreted.
How Are Statistics and Probability Different?
Statistics and probability are commonly confused, but they’re actually distinct subjects. Probability addresses the problem of determining how likely something is to occur. This is usually achieved by measuring the number of times an event could occur against the total number of conceivable outcomes (called the ‘possibility space’, if you want to impress your friends).
To illustrate: if you roll two dice, what is the probability of getting the same number on both dice? First, we enumerate the number of ways in which we can get the same number on two dice; I count six: one-one, two-two, three-three, four-four, five-five, six-six. Second, we figure out how big the possibility space is. With two six-sided dice there are six*six possibilities, for a total of 36.
So the probability of getting the same number on two dice is 6/36, or 1/6th.
This is kind of a toy example, but it’s still helpful. Think about it until it makes sense, then reread the definition of statistics provided above. It should be clear that while probability and statistics are related, they aren’t the same thing.
How Are Statistics and Data Science Different?
It’s also pretty common to confuse statistics and data science, especially because data science is a famously slippery concept. In broad terms, data science is a field which combines statistics, machine learning, computer science, and programming for the purpose of drawing actionable insights out of numerical information. It can be applied to problems like:
- Teaching computers to correctly diagnose patients from x-rays.
- Building algorithms to trade stocks in financial markets.
- Recommending books, movies, and even perfumes based on other things you like.
Data science relies heavily on statistics for its analytical methods, but they’re not the same thing.
The Importance of Statistics
There are several reasons everyone should have some understanding of statistics and how it works.
Statistics in Politics
Statistics is used almost constantly by actual politicians and the commentators that discuss political activity. If one politician wants to raise the minimum wage, they’ll probably cite figures to the effect that the average wage doesn’t cover the expenditures required for living comfortably today. If another wants to prevent the raise, they’re likely to talk about the statistical relationship between minimum wage raises and unemployment.
Statistics in Health
Does eating red meat contribute to cancer? Does drinking red wine contribute to longevity? For any credible diet or exercise routine there’ll be a mountain of information and corresponding statistical claims that will need to be scrutinized. Often it can be hard to draw hard connections between activity A and outcome B. There were bitter debates throughout the 20th century on whether or not smoking causes cancer, for example.
Statistics in Education
It’s well known that a good education is key to a good future. But as with everything else we’ve discussed, figuring out what exactly works in an educational setting can be tough. If you’ve used a new teaching strategy with one group of kids and their test scores are better than another group’s, can we be sure it’s the strategy causing the difference? Perhaps it’s just that we’ve accidentally put the smarter children in one group, and when we reshuffle the groups the effect will go away.
Statistics in Machine Learning
Given that machines will soon be matching us with date partners, setting up the meeting time, driving us to the restaurant, and paying the tab, it’s good to at least be acquainted with how all this is going to work under the hood. Machine learning and AI become a lot less mysterious when you realize they’re usually based on statistics.
To my mind, there’s no escaping the fact that statistical knowledge is rapidly becoming like the ability to read or use the internet: an indispensable skill without which living in the modern world becomes vastly more difficult.
A Brief Guide on Not Being Lied to with Statistics
I could build an entire writing career out of tackling this one subject, so the advice in this section will necessarily have to be brief. What I want to do is mostly to give you a high-level sense of the ways in which statistics can be misused, intentionally or not.
Beware of measures of central tendency:
You might think that the concept of the ‘average’ is pretty straightforward, but it can conceal a lot of information. For example, consider that there are at least two ways to accurately say ‘the average height in that group is six feet five inches’. You might have a bunch of people that are almost all exactly that height. Or you might have a group of people who are much shorter, with one person who is eight feet tall. When you hear a claim like ‘philosophy majors earn more over their lifetimes than people studying other subjects’, make sure this isn’t just because there are a few billionaires who studied philosophy in college.
Remember that causality is hard to establish
Verifying that a caused b is famously difficult, but that doesn’t stop people from structuring their arguments in such a way that this claim is sort of snuck in. Just because two things are correlated doesn’t mean one caused the other.
Sometimes differences are just due to chance
As I noted in an earlier section, gathering valid statistical data can be tough. Part of the reason for this is that the world is a big place and lots of random stuff happens in it. Imagine you want to test a new drug for treating depression. The obvious thing to do is to get two groups of people who are depressed and give the drug to only one group, then see if this group improves more than the one not given the drug. One of the problems here is that it is extremely easy to randomly choose two groups that are different in a subtle way. We might not care that one group contains more women than men, for example, unless it turns out that men are more susceptible to depression than women. As with the previous bullet point, you have to remember that randomness contributes a lot to where people end up in life.
It’s crucial to maintain a skeptical stance when you’re being given statistical justifications for an argument. Whatever the issue, it’s up to you to independently judge the facts for yourself, keeping an eye out for anything that doesn’t look right.
Learning More about Statistics
If you want to extend your knowledge even further, here are some resources for continuing your statistics education.
- Khan Academy was one of the first major online learning platforms, and it deserves its sterling reputation.
- Coursera has a number of stats courses, with varying degrees of specificity and focus on programming.
- If you don’t mind learning R at the same time, consider this datacamp course.
- There is a near-infinite number of books on statistics. I like “The Elements of Statistical Learning”, “ThinkStats”, and “All of Statistics”.
There’s a lot that can be said about statistics. Hopefully this short piece has turned you on the subject’s importance and motivated you to learn more.