Descriptive & Inferential Statistics Normal Distribution
Welcome
What will you learn?
Populations
The Normal Distribution
The Normal Distribution
The Normal Distribution
Checkpoint
Probabilities
Probabilities
Probabilities
Z-Table
Z-Scores
Calculating Z-Scores
Calculating Z-Scores
Calculating Z-Scores
Checkpoint
Summary
Summary
Hello, and welcome to the module on the Normal Distribution.
By the end of this module you should be able to:
● Define population in statistical terms
● Describe the features of the normal distribution
● Calculate probabilities (using the z-table)
● Describe and comprehend the properties of empirical and theoretical probability density functions
● Use probability density functions to determine probabilities
● Define and calculate the standard (z) score and explain its utility
In this module, we will be discussing how data may be distributed and how these distributions are related to our question, based on probabilities.
Recall that a population is the complete set of items you are interested in that share at least one characteristic.
For example, if I were interested in the IQ scores of Canadian citizens, the population of interest would be “all Canadians”.
In order to say this is a population, I would need to have every single IQ score.
The values that summarize our population data are referred to as parameters. In this example, the mean Canadian IQ is the parameter of interest.
When thinking about how groups might be the same or different, based on a given parameter, it is important to talk about populations.
If we want to determine how Canadian IQ compares with IQ scores from another geographical region, we would compare these two separate populations. The parameter of interest would be IQ.
Population data are often displayed graphically as distributions of scores, similar to the distributions that we examine in our module about central tendency.
In this module, we will spend a lot of time talking about a special type of distribution called the normal distribution. You might be more familiar with the name ‘bell-curve’.
The normal distribution, sometimes called the zed-distribution, differs from the distributions we’ve seen so far in a couple of ways.
First, the normal distribution is a density distribution. All the distributions we’ve seen so far have been discrete, meaning they take on only whole number values. A density distribution is continuous instead; it can take on any real value even if the value is not a whole number. Because of this, it is usually a curve, and there is an important rule about the curve of a density distribution: the total area under the curve is always equal to 1. You’ll see why that’s important in a moment.
So, let’s take a look at a histogram of car prices. As expected, we have price on the x-axis and count on the y-axis.
Now let’s have our computer fit a density distribution to these data.
While at this point it’s not important to understand exactly how this works, the computer has created a curve that fits our data, and made it so the area inside the red curve is equal to one.
Notice that this curve only fits the data approximately - it’s an estimate of sorts. You can see that some bars stick out and other don’t quite reach the line.
Also, note that while the x-axis on this plot is the same, the y-axis now says density instead of count and the numbers are much smaller. For now, I’ll just remind you that density curves are all about area under the curve, and not about absolute height. We’ll get back to this idea shortly.
In the module about Central Tendency and Variability, you were introduced to ways to describe distributions of data using measures of central tendency, variability and shape (like skewness and modality). All of these important concepts can also describe density distributions.
In addition to the fact that a normal distribution is continuous instead of discrete, the second reason the normal distribution is different from distributions we’ve looked at previously is that it’s a theoretical distribution. We don’t derive it from any data that exist in the real world. Instead we calculate it from an equation that looks like this. The notation is complex, but don’t worry, that’s the last you’ll see of the equation itself!
When we draw the curve that represents this equation, it looks like this.
Notice that the normal distribution is symmetrical about the mean, and has the same mean, median, and mode values.
The important point to extract from the formula is that the shape of the curve is determined by two values: the mean, mu, and the standard deviation, sigma.
The purple curve you are seeing now is the normal curve, given a mu value of 0 and a sigma value of 1. This particular version of the normal curve is important and has its own name: the standard normal curve. We’ll leave it up for reference.
Now watch what happens to the curve as we begin to change mu. Notice that the blue distribution shifts away from the purple one but retains the same shape. Now let’s change sigma. Notice that the shape of the curve is changing; it get wider and flatter as sigma increases, and it gets taller and thinner as sigma decreases. Although the parameters of these curves are different, each of them represents a type of normal density curve.
This normal distribution is important and useful when thinking about statistics, for a number of reasons.
First, things all around us can be characterized by values that are normally distributed. Height, weight, and the velocity of ideal gas molecules all follow a normal distribution, with a peak in the centre where the majority of cases occur, and symmetrical tails on either side.
Second, the mean and the standard deviation are an excellent summary of the shape of a normal distribution
Third, the normal distribution is central to a great many inferential statistical techniques, including most of the techniques you’ll learn about in these modules
Lastly, the normal distribution has a special relationship to sampling, which we will talk about in the module on Populations and Sampling.
We already saw that it is possible to start with a histogram of our data and generate a density distribution that describes its shape. We can also do the reverse. To work backward, we would start with a theoretical density distribution, like the normal distribution, and examine how well it matches the shape of our data.
Consider this histogram. These data are distributed very similarly to a normal distribution. This allows us to fit a normal curve over the data, without losing too much information. That red curve you see is the normal curve plotted over the actual dataset. Although it doesn’t fit perfectly, it’s very close.
We can also adjust the bin size of the histogram so that it is smaller or larger than what is shown here. Let’s say we make the bin size of the histogram smaller. This allows us to see more precisely where the scores in the distribution lie.
Also note that the more samples we have, the more closely the distribution of our data will match the theoretical normal. We’ll talk more about this important fact later on.
We just said we can calculate probabilities of the z distribution. Before we can talk about calculating probabilities though, we need to define exactly what a probability is. You’ve almost definitely heard the term before and have probably used the concept in your daily life when you wonder things like “what are the chances of this event?” or “how likely does that seem?”
Loosely speaking, probability is the frequency with which something would occur over many repetitions. In statistics, we express probability as a number between 0 and 1, where 0 means something would never occur, and 1 means it would occur every single time.
So, for example, if you flip a coin there are 2 possible outcomes: heads or tails. If you repeated a coin toss 1000 times, you’d expect each side of the coin to appear half of the time, or very close to it. This is expressed as a 1-to-1 ratio, or a 50-50 chance. Stated as a probability with a value between 0 and 1, a single coin toss has a 0.5 probability of coming up on a given side.
Density distributions have an area of one under the curve because we use the area to represent probability. Both area under the normal curve and a probability can range between 0 and 1.
So, why do we care about the shape of a distribution?
If we know the shape of a distribution precisely, we can use this information to describe probabilities. We can think about a population distribution as a distribution representing the probability of an event. If that’s the case, we can use distribution to calculate the probability that a particular event occurred.
Let’s start with the uniform distribution between 0 and 1 as an example. The uniform distribution has constant probability, meaning the probability is the same at all points. It looks like this. Although its shape is different than the normal distribution, it is a well-defined density distribution whose area still equals 1. We can calculate the area easily because we know that the area of a rectangle is equal to the length times the width.
Here, x is one and y is one so our shaded area is one.
If we shaded up to x = .5, then the shaded area would be 1 * .5 which equals .5
It’s also true that if we selected a value of x between 0 and 1 at random, we would have a 50% chance of having the value fall in the shaded area, and 50% chance of being in the unshaded area.
That would also be true here. We can confirm that by subtracting .25, our minimum, from .75, our maximum and seeing that our length of the shaded area is still .5. And .5 * 1 is still .5
Another way of talking about these probabilities is using quartiles, which were discussed in the Central Tendency & Variability module. If we split the range from 0 to 1 into 100 equal parts, we would call those parts percentiles. So a probability of 0.5 is the 50th percentile. In other words, .5 is higher than 50 percent of the scores in this distribution. If we find out what proportion of scores fall below a certain score, that is the same as identifying a percentile.
Probabilities from the uniform distribution are straightforward to calculate. However, in statistics we rarely work with uniform distributions, because they are not very common. The methods we will discuss in the rest of this module will show you how to calculate probabilities using a normally distributed population of scores.
For curves, we can use calculus to find the area between two points in a distribution.
That being said, we don’t have to perform any calculus to understand the concepts in this module. In practice, statisticians use tables or computers to look up pre-calculated values instead of using calculus to calculate probabilities manually every time they do a statistical test.
Looking at this distribution, which represents the population distribution of IQ scores for all Canadians. You might want to know where your score lies on this distribution.
Does your score lie above the midpoint?
Below the midpoint?
Is it within one standard deviation of the mean?
To find out event probabilities from the normal distribution, like how likely our IQ of 124 is, we need to consider the area under the curve. We use the area under the curve to find the probability of a set of events from a normal distribution. Fortunately, we can simply use a table to look up corresponding values, or use excel.
This is a small excerpt from a table that allows us to know the area under the normal curve without resorting to calculus.
Our reference table, called a Z table, tells us two things: 1) the probability of getting a score equal to or less than our calculated z-score, and 2) the probability of getting at least our score or greater [highlight], Note that these two probabilities sum to 1.
Z tables often only show positive numbers. Because the z-distribution is perfectly symmetric, we can find any negative z value by taking its absolute value and looking up the positive value.
So, when looking up a positive z, the column marked “Larger area” [highlight] is the area less than your value and the one marked “smaller area’ is greater than your z-value.
But, that reverses when you are using negative z values - because we’re using the other side of the distribution!
6. A z-score is calculated by taking one score, X, subtracting the mean of the population that score came from, μ , and dividing by the standard deviation of that population, σ.
But the table only gives us the values for normal distributions with a mean of 0 and sigma of 1. What do we do when we have normal distributions with different parameters? Although different normal distributions may differ from one another in scale (that is, variance), they share the same overall shape. We can transform a given normal distribution into the standard normal distribution by using some very simple math, as we’ll soon see. The standard normal distribution has a mean of 0 and a standard deviation of 1.
These transformed values are useful when comparing scores to one another in a standardized way. It allows us to extract meaningful information about our normally distributed data without modifying important properties of the data.
Once we identify where our points fall in the standard normal distribution, we can tie this back to our original data. For example, if I tell you a car has a price of 0.5 on the standard normal distribution, this doesn’t tell me much about how much the car actually costs unless I translate it back to car prices that fall in our original distribution.
Using the z-distributions we just discussed, we can convert between z-scores and probabilities. But, first, we should better understand what a z-score is.
A zed-score measures how far a single observation is from the population mean. A z-score of 0 indicates that the score is exactly equal to the mean. A positive z-score indicates the observed score is above the mean. A negative z-score indicates the score is below the mean.
Zed-scores translate into points on the standard normal distribution. The standard normal distribution has a standard deviation of 1. Therefore, a z-score represents how far the score is from the mean in standard deviations.
So, when we obtain a z score of 1, we are talking about a score that is 1 standard deviation above the mean. You can see here that the probability of obtaining scores between 0 (the mean) and 1 (our score) is 34.1%. The z-score allows us to use a single number to describe how a score relates to a distribution, in standardized units that are easy to understand.
In a standard normal distribution, we have a general rule of thumb called the ‘empirical rule’ that describes what proportion of the scores fall in specific ranges. The ranges are measured in standard deviations, as you can see on this graph. Note that we only have 3 standard deviations on this distribution. Any point that falls more than 3 standard deviations from the mean is typically considered to be an outlier.
There are a few ways we could describe how a score differs from a population or population mean:
How many standard deviations away from the mean is your score?
With respect to the entire population, what is the probability of obtaining either your score or a lower score?
What is the probability of obtaining your score or higher?
What proportion of observations lie between your score and the mean?
You may have noticed that each of the questions involves transforming your score into a standard measure that is common across distributions. We will examine each of these questions by using z-scores in the following example.
To illustrate how z-scores can help us draw conclusions about our distributions, we will refer back to our cars dataset. Let’s imagine we are contracted to design a new bridge for cars to cross a river. As an engineer, you would probably be very interested to know how large the wheelbases are for the different vehicles that will be traversing the bridge once it is built.
If we extract the data pertaining to wheel base measurements of each car, we see that the scores for this variable are normally distributed.
We can draw a normal curve over the histogram of wheelbase data to better visualize the distribution for this population of cars
This population has a mean wheel base of 108.2 inches and a standard deviation of 8.3 inches.
Now that we know about the population, let’s say we want to know the z-score for the Honda Accord LX, for which the wheelbase is 105 inches.
Let’s think about what we know about this particular score. We can see that the Accord wheelbase is below the mean for our data, but it is still a very probable score.
Now, using the z equation we just learned, we can fill in the values that we know: the mean for the Honda Accord LX is 105, the population mean, or Mu, is 108.2, and the standard deviation, or sigma, is 8.3. This gives us a z-score of -0.39
Therefore, our z-score tells us that the wheel base of 105 inches is 0.39 standard deviations below the mean of a population that is normally distributed, when the mean is 108.2 inches and the standard deviation is 8.3 inches
Given a zed-score of -0.39, how do we figure out the proportion of cars that have a wheel base smaller than the Honda Accord LX?
After calculating a z-score, we consult the look-up tables for standardized z-scores. What is the probability of obtaining that value, or less?
We can see that graphically here.
Because the value is below the mean, the zed-score is negative. That means we will have to look at the smaller portion of the curve and, thus, the smaller area on the table.The larger portion of the curve, to the left of Z, would give us the probability of obtaining a z-score of -0.39 or more. Knowing this, we can look up our z-score in the table and find the value.
Here, the probability of obtaining a zed-score of -0.39 or less is 0.3483. Our original value of 105 inches is only slightly below the mean in our distribution, which is 108.2 inches.
Recall how we can think about this probability in terms of percentiles. Our score would be at a percentile of 34.8. In other words, about 35 percent of scores in the distribution will be lower than this score, which is to say that approximately 35% of the cars in our population have a smaller wheelbase than the Honda Accord LX.
You can also use the z-scores of two separate values to estimate the probability of obtaining a score that lies between those two values.
Let’s say I want to find the probability that a car has a wheelbase in between a Honda accord and the mean. The Honda Accord has a zed-score of -0.39, and the mean has a score of 0.
We can see it graphically here.
The trick here is to find the larger value and subtract the smaller value. We know that the mean of zero must be a probability of .5, since it’s the middle of the distribution. And we know that -.39 will be below that, so let’s subtract the values we get from our table.
0.5 - 0.3483 = 0.1517
Therefore, the proportion of scores that fall between z-scores 0 and -.39 is 0.1517, and .1517 is also the probability.. Equivalently, 15.17% of cars have a wheelbase that is between the wheelbase of a Honda accord and the mean wheelbase of cars in our dataset.
Let’s summarize by bringing all of these ideas together.
Here we have two distributions. The one on the left is our standard normal distribution, with a mean of 0 and a standard deviation of 1. The one on the right is a different population, with a mean of 15 and a standard deviation of 2. Recall that we normalize our data and compare the data to the standard normal distribution, because we can easily access the probabilities in our normal distribution using the z-table.
In this animation, x is a single score that we are comparing to our entire population. By modifying mu in the population on the right , we can see how the population distribution changes along a continuum. If we modify sigma, we can see how this makes the population distribution steeper or shallower. As we do this, the zed-score stays the same.
However, if we change the value of our observed score, x, but keep mu and sigma the same, we see how this changes the z-score in the standard normal distribution on the left. We also see how changing x and, therefore, changing our z-score, modifies the area under each portion of the curve. These areas are shown in light and dark blue. This demonstrates the correspondence between our population distribution and our standard normal distribution. We can see how modifying x, our single score of interest, changes both our z-score and the proportions of the two areas under the curve, accordingly.
To conclude, in this module, we discussed populations and described the features and importance of normal distributions.
We discussed the power of probability density functions
We also defined the standard zed score, and explained how to use a z-table to determine the probability of obtaining scores in a given dataset
Finally, we interpreted what our calculated z-score means in terms of the overall population.
Descriptive and Inferential Statistics - Normal Distribution