Hello, and welcome to the second of two modules on t-tests
Before we begin, let’s review a few key concepts from the previous module:
1. The sampling distribution of the mean is the distribution of sample means over repeated sampling from the population.
2. The standard error of the mean is the standard deviation of the sampling distribution of the mean.
Script:
2. A t-distribution is a sampling distribution of the t-scores when the null hypothesis is true.
3. Degrees of freedom (df) are an adjusted value of the sample size, often equal to N-1.
3. A confidence interval is an interval that has a specified probability of including the parameter being estimated. Often this parameter is the true population mean.
4. The effect size (as characterized by Cohen’s d) is a measure of how big an effect is in terms of units of standard deviation.
2.
Recall the paired t-test we learned in the previous module.
In the example, we compared the fuel efficiency of 28 Toyota cars when driven in the city versus on the highway. The city and highway data came from the same cars, so they were correlated. That is, the cars that were fuel-efficient in the city were also likely to be fuel-efficient on the highway. We paired the city and highway data by car, and calculated the difference score for each pair of fuel efficiency measures. By doing so, the variability in efficiency due to individual idiosyncrasies was canceled out, and any pattern attributable to the difference in driving location was not only explained by these individual differences. Instead we can assume that the pattern we see is due to the driving location itself. We then conducted a paired t-test using only the difference scores, testing against the null hypothesis that the mean difference was zero.
We calculated the observed t-score using the following formula
We then compared the observed t-score to a critical t-value to determine whether we should reject the null hypothesis.
2.
In “t-tests part 1”, we learned how to use the paired t-test to compare two means that come from related groups. In “t-tests part 2”, we will learn about the unpaired t-test and use it to compare two means that come from independent groups. You need to learn both part 1 and part 2 to properly understand t-tests. Please make sure you have done the part 1 module before beginning part 2, and when you finish this module we strongly recommend that you revisit part 1 to put the whole picture together.
By the end of t-tests part 2 you will be able to:
Understand the differences between independent and related samples
Conduct an unpaired t-test
Calculate the confidence interval and the effect size for an unpaired t-test
· Identify pros and cons of using related samples
Identify the conditions necessary to conduct any t-test (be it one-sample, paired, or unpaired)
Using related samples has some advantages, but often we need to compare two groups where none of the observations in the first group are related to observations in the second group in any obvious way .
For example, we may want to compare the fuel efficiency of two separate groups of cars from different manufacturers. Knowing the fuel efficiency of a car in group 1 doesn’t tell us anything informative about the fuel efficiency of any car in group 2. We call these two groups of samples “independent samples”.
Independent samples are very common in research. Examples of experiments that might employ an independent-samples design include studies of: helping behaviour in boys versus girls, memory retention in younger versus older adults, symptom improvement in patients receiving a pill or placebo, or statistical savvy of students taught using one pedagogical method versus another.
When the grouping variable is a predetermined category like manufacturer, gender, or age, it usually follows that the groups are mutually exclusive and consist of different, unrelated individuals.
When the grouping variable is an experimental condition, like medical treatment or learning method, researchers often randomly assign individuals to different conditions, so the samples are independent between conditions. Using independent samples prevents the effects of one experimental condition being carried over to the other condition.
For example, you cannot teach the same group of students statistics one way, then another way, and compare which one was more effective.
To compare the means of two independent samples we can conduct an unpaired t-test, also known as independent-samples t-test. The unpaired t-test differs from the paired t-test in that there is no appreciable correlation between the observations in the two groups. That is, knowing the values in one group doesn’t tell us anything about the values in the other group. Importantly, we’re going to be looking at differences between the two samples as a whole, instead of differences between specific pairs. Because of this, the unpaired t-test involves different calculations than those used for the paired t-test.
--
** Mercedez-Benz photo is used under Creative Commons License from: http://media.daimler.com/dcmedia-ca/0-981-710736-1-849530-1-0-3-0-0-1-13006-0-0-3842-0-0-0-0-0.html?TS=1441820887980
* Chevrolet photo is used under Creative Commons Lincense from: http://media.chevrolet.com/media/us/en/chevrolet/press_kits.detail.html/content/Pages/news/us/en/2015/feb/chicago/0212-equinox-2016.html
Let’s use the car example to demonstrate the unpaired t-test. Suppose you want to purchase a car and have narrowed your options to either a basic Chevrolet or a more luxurious Mercedes-Benz. You drive in the city often and saving money on gas is a big concern for you. As much as you dream of owning a Mercedes, you think a Chevrolet may be more sensible not only because it’s cheaper, but also because you’ve heard that Chevys are more fuel efficient than Mercedes.
In order to compare the 2 manufacturers based on fuel efficiency, the dealer pulls together the miles per gallon (MPG) data of 27 Chevrolet cars and 26 Mercedes-Benz cars sold within the previous year. You want to quickly do some analysis of these data in order to confirm that Chevrolet is in fact more fuel-efficient than is Mercedes, which would justify your decision to buy a Chevy. If, however, you find that Chevrolet is less fuel-efficient than Mercedes, then perhaps the high price of a Mercedes is justifiable because of reduced fuel costs in the long run. Here, the Chevrolet and Mercedes samples come from two separate populations, and the two samples are unrelated to each other. The fuel efficiency of any Chevrolet car does not tell us anything about the fuel efficiency of any Mercedes-Benz car. RETAKE It won’t make sense to pair any Chevrolet with any Mercedes to compare fuel efficiency between car types. We want to infer whether Chevrolet cars differ as a whole from Mercedes cars in their average fuel efficiency. Therefore, our research question is: When driving in cities, do Chevrolets and Mercedes have different fuel efficiencies, as measured by miles per gallon? We can conduct an unpaired t-test to compare the two population means.
First, let’s convert the research question to null and alternative hypotheses. Let’s denote the mean MPG of the Chevrolet population as μ1, and the mean MPG of the Mercedes population as μ2. One way to set up the null hypothesis is μ1 – μ2 = 0 or μ1 = μ2. In this case we are doing a two-tailed test against the null hypothesis that on average the Chevrolet MPG is not different from the Mercedes MPG. Correspondingly, our alternative hypothesis would be μ1 – μ2≠0 or μ1 ≠μ2. We choose a two-tailed test because we care about the possibility that Chevrolet differs from Mercedes in either direction. Chevrolet could be more fuel-efficient or less fuel-efficient. However, if we were only interested in whether a Chevrolet is more fuel-efficient than a Mercedes, then we could do a one-tailed test with the null hypothesis (animation 9) μ1 – μ2 ≤ 0 or μ1 ≤ μ2. That is, Chevy is NOT more fuel-efficient than Mercedes. We would test against the null hypothesis in support of the alternative hypothesis μ1 – μ2 > 0 or μ1 > μ2. However, for the purposes of this example we will do a two-tailed test.
The data for MPG in the city from the Chevrolet and Mercedes-Benz samples are as follows: for Chevrolet where N=27, and for Mercedes-Benz where N=26.
First, before we get to our statistical test, let’s visualize the data using a bar graph. Car manufacturer is plotted on the x-axis, and fuel efficiency in miles per gallon is on the y-axis. The height of each bar represents the mean MPG of the group. The vertical bars are the error bars, which represent one standard error of the mean both above and below the mean. Recall that standard error of the mean is calculated using standard deviation divided by the square root of N.
Judging from the bars, the Chevrolet MPG seems higher than the Mercedes MPG, but
Is this difference large enough to be statistically significant?
In any t-test, to determine whether a difference is large enough to be considered statistically significant, we need to compare it to the standard error, which is a measure of variability. If there’s not a true difference between population means, then the observed difference is simply due to chance and it would be small relative to the spread of the data. Referring back to our t-score formula, we can see that if the numerator is small and the denominator is large, the t-score will be close to zero. Alternately, if the t-score is large, then it’s likely that there is a real difference between the population means.
The t-formula for different t-tests may appear to be quite different, but it always has the same basic layout: t equals the difference between the sample mean (our statistic) and the population mean (our parameter) given by the null hypothesis (H0), divided by the estimated standard error of the sample mean
The one-sample t-test has the following formula: where Xbar is the mean of the single sample, μ-sub-Xbar is the population mean specified in the null hypothesis, and the denominator is the standard error of the sample mean.
For related samples, we take the difference between each pair and deal with one set of scores, the difference scores. We compare the sample mean difference, D_bar, to the population mean difference in the numerator. If the null hypothesis is true, then the population mean difference would be zero, so the t-formula equals: D_bar over the standard error of the mean difference.
For independent samples, we will take the difference between the sample means, and compare it to the difference between the population means. This difference would be zero under the null hypothesis, so the numerator collapses to X1bar minus X2bar. The denominator is the standard error of the differences between sample means.
Why is the standard error presented this way? Let’s take a closer look.
Suppose we have two population distributions. The first population is the distribution of variable X1, and the second population is the distribution of variable X2. Population X1 has a mean of μ1 and a variance of sigma 1 squared. Population X2 has mean of μ2 and variance of sigma 2 squared. If we repeatedly sample N1 number of observations from population X1 and calculate the sample mean X1bar each time, then we can plot the sampling distribution of the sample mean X1bar.
According to the Central Limit Theorem, the sampling distribution has a variance that’s equal to σ1^2 over the sample size N1. Similarly, the sampling distribution of the sample mean X2bar has a variance that’s equal to the population variance σ2^2 over the sample size N2.
Now suppose we take a sample from the first population and a sample from the second population, and calculate the difference between the two sample means, X1bar – X2bar. If we repeatedly do so, we will obtain a distribution of the differences between means, that is, the sampling distribution of (X1bar – X2bar). The mean of this distribution will be equal to μ1-μ2, the difference between the two population means.
What’s the variance of this distribution? According to the Variance Sum Law, the variance of the difference of two independent variables equals the sum of their variances.
The variance of the sampling distribution of X1bar is (σ1^2)/N1, and the variance of the sampling distribution of X2bar is (σ2^2)/N2, so the variance of the sampling distribution of (X1bar – X2bar) is (σ1^2)/N1 + (σ2^2)/N2.
Take the square root of this and we will have the standard deviation of the sampling distribution of (X1bar – X2bar)
The standard deviation of the sampling distribution of the differences between means is as follows
Remember that the standard deviation of a sampling distribution is usually referred to as standard error, so this is called the standard errorof the differences between sample means.
Because we don’t know the population standard deviations, we use the sample standard deviations to estimate them in the formula
This estimation is quite good, but we can improve it. Let’s return to our fuel efficiency example. Since under the null hypothesis we assume that the fuel efficiency of Chevrolet and Mercedes cars is not different, this suggests that their MPGs all come from the same distribution.
Let’s call the variance of this distribution σ^2
In this case, both the Chevrolet variance and the Mercedes variance are estimates of σ^2.
To get our best estimate of σ^2, we can take the weighted average of the two sample variances:
We call this the pooled variance, or S-sub-p squared.
Each sample variance is weighted by the sample’s degrees of freedom, which equals the sample size N minus 1.
Why are we taking the weighted average? This is because when we have an unequal number of observations in the groups, we consider the group that has more observations a more reliable source to estimate σ. We weight the reliable group more in the calculation so that its sample standard deviation contributes more to the estimation of σ. In this example, we have 27 Chevrolet data points and only 26 Mercedes data points, so the Chevrolet data points carry more weight when estimating σ.
By replacing S1^2 and S2^2 in the standard error formula with the pooled variance Sp^2, we get the following formula
This is the denominator of the t-score formula for independent samples.
Returning to the example of the fuel efficiency of Chevrolet vs. Mercedes-Benz, let’s calculate the observed t-score step by step in Excel. You can use the following steps in any version of Excel 2008 or newer.
Put the Chevrolet and Mercedes data points in two columns. There are 27 Chevrolets, so we can just type in 27 for N1. Another way to get the sample size is to use the COUNT function. We’ll do that for N2, which is 26.
We calculate the two sample means using the function AVERAGE.
And then we have X1bar – X2bar = 2.3205. This is the numerator of the t-formula.
Next, we’ll calculate the two sample variances. We use the function VAR.S for sample variance in Excel. Alternatively, we can use STDEV.S to calculate the sample standard deviations first, and then square the standard deviations to obtain the variances.
Next, we’ll calculate Sp^2, with the pooled variance formula on the right. We need to plug in N1, N2, and the two sample variances. Pay careful attention to when and where to put the brackets in the Excel equation. Then, we can calculate the denominator of the t-formula, that is, the standard error. We’ll use the SQRT function to calculate the square root.
Finally, we can calculate the observed t-value, which equals (X1bar – X2bar) over the standard error. t_observed equals 2.270.
To explore an additional way to conduct t-tests in Excel, you can visit the following link for instructions.
At this point we want to know whether the observed t-score of 2.27 is large enough to justify rejecting the null hypothesis. To determine this we need to compare it to a critical t-value. Just like in part one, we need three pieces of information to find the critical value: the degrees of freedom (df), whether the test is two-tailed or one-tailed, and the significance level of α.
The degrees of freedom for two independent samples equals the sum of the two sample degrees of freedom
In our example, degrees of freedom is 27+26-2 which equals 51. Why do we subtract 2? Because every time we estimate a parameter with a sample statistic we lose 1 degree of freedom. Here we are estimating two parameters, σ1 and σ2, with the sample standard deviations when we calculate the standard error, so we lose 2 degrees of freedom.
We are doing a two-tailed test with an α of 0.05, so we look for the critical t-value at the intersection of α =0.05 and df =51 under “two-tailed”. Most t-tables don’t have a row for df of 51. In this case we can use the row df=50 to approximate and find the critical value 2.009.If the table doesn’t have df=50 either, use the df in the table that’s closest to 51 and SMALLER than 51, for example, df=30. A smaller df will give us a bigger t-critical value and make our analysis more conservative. That means it’s more difficult to reject the null, so we are more protected from making a type I error.
Alternatively, you can search for online tools that can calculate the exact critical t-value for you given df, α, and two-tailed or one-tailed.
Since our observed t-score of 2.270 is greater than the critical value 2.009, we will reject the null hypothesis, and conclude that Chevrolet and Mercedes-Benz have significantly different fuel efficiencies in cities. Moreover, because 2.270 is a positive value and we obtained it by subtracting the Mercedes MGP from the Chevrolet MPG, we can conclude that on average Chevrolet is more fuel efficient than Mercedes in city driving.
So returning to our original scenario, the conclusion that Chevys are more fuel efficient might convince you to buy a Chevrolet car instead of a Mercedes after all.
Despite this result, you’re still wishing you had a reason to buy the more luxurious Mercedes. Now you’re wondering how many more miles per gallon does a Chevy get compared to a Mercedes car?” One way to get at this answer is to calculate the confidence interval of the true difference between the Chevrolet and Mercedes MPG ratings.
We calculate the 95% confidence interval as follows:
The confidence interval equals the difference between the sample means, X1bar – X2bar, plus or minus the product of the two-tailed critical t-value and the standard error of the difference between means. Remember to use the two-tailed critical t-value to calculate the confidence interval, regardless of whether the test you just conducted was two-tailed or one-tailed.
In our example, the 95% confidence interval ranges from 4.27 to 8.37. This means we are 95% sure that the true difference between the population of Mercedes and Chevrolet cars lies somewhere between these numbers. Clearly, this interval does not include 0, which confirms our earlier conclusion that the populations of Chevrolet and Mercedes cars differ in gas mileage.
Now we know that, on average, Chevys are more fuel-efficient than Mercedes in cities. We also now have a range where we are pretty sure that the true population difference lies. Now, we want to figure out how meaningful the mileage surplus is. Is the surplus sizeable enough to be considered a “large” difference?
To answer this, we can calculate the effect size using Cohen’s d:
Essentially, we are comparing the difference between the sample means to the standard deviation of the population of differences between means. We don’t know the population standard deviation, so we estimate it using the pooled standard deviation of the sample.
The pooled standard deviation simply equals the square root of the pooled variance
In our example, the effect size is 0.62. This means the two population means differ by more than half of a standard deviation. This is a medium effect size based on Cohen’s guidelines for effect size, which state that a d-value of 0.2 is a small effect, a d value of 0.5 is a medium effect, and a d value of 0.8 or greater is a large effect.
We’ve learned about 3 types of t-tests: one-sample, paired, and unpaired. All t-tests are used to analyze means of a continuous variable, such as gas mileages, test scores, or blood pressure. This requires the data to be on an interval or ratio scale of measurement. The grouping variable, which determines the group membership of the data, is nominal. Grouping variables include characteristics such as manufacturer, gender, or treatment.
All t-tests are parametric tests, as are the z-test you have already learned and Analysis of Variance which we’ll discuss in a later module. In order to be valid, all parametric tests require that the underlying populations have certain characteristics, and that the samples are drawn under certain conditions. These necessary characteristics and conditions are called the assumptions of the test. In order to make valid inferences with ANY type of t-test, the following two assumptions must be met:
The first assumption is Independent observations. We assume that individual observations are independently and randomly sampled from the population, in which case any observation doesn’t affect the probability of any other observation being sampled. In the case of one-sample t-tests and unpaired t-tests, this means that the individual scores are independent of one another. In the case of a paired t-test, this means that the PAIRS of scores are independent of one another.
The second assumption is the Normality assumption. We assume the population or populations where the samples came from are normally distributed. We can give this a quick check by plotting the samples in a density graph or a histogram. If the samples look roughly normal, then the underlying population or populations are likely to be normal as well.
For example, we can plot the mileage for our 27 Chevrolet and 26 Mercedes cars in a density graph or a histogram. The dashed lines in the density graph indicate normal distributions. We can see that both samples are roughly bell-shaped and symmetrical around the means, so the normality assumption is met.
Fortunately, despite these assumptions the t-tests are robust to minor violations of normality. This means that even if the sample distributions deviate slightly from normal distributions, it won’t introduce serious errors to the t-tests, especially with large sample sizes. However, if the data are highly skewed, then the normality assumption is likely to be violated.
In addition to the two assumptions required for any t-test, the independent-samples t-test requires a 3rd assumption, called homogeneity of variance. It requires that the two underlying population distributions must have equal variances.
Again, we can plot a histogram of the sample data to quickly check whether this assumption is met. In the histogram, where Chevrolet mileages are shown in red and Mercedes mileages in blue, you can see that the variability around the middle score is about the same for both manufacturers. A common rule of thumb suggests that if the larger sample variance is greater than 4 times the smaller sample variance then the assumption is violated. It would also be equivalent to say that if the larger sample standard deviation is more than twice the smaller sample standard deviation, then the assumption is violated.
The problem is worse when samples sizes are not equal between groups.
However, in the Chevrolet versus Mercedes example, comparing the two variances or standard deviations, we can see that the equal variance assumption is NOT violated, despite unequal sample sizes.
Summary and touch back with ILOs
This module, t-tests part two of two, focuses on the unpaired t-test, which compares two sample means when the observations in each sample are independent of each other.
In this module,
- we learned the differences between paired and unpaired samples.
- Then we derived the formula for an unpaired t-test, which required us to use the Variance Sum Law and the pooled variance to get an estimate of variability. We conducted an unpaired t-test step by step.
- We also calculated the confidence interval and effect size for the difference between unpaired sample means.
- In the end, we explained the underlying assumptions required for conducting any t-test.
Descriptive and Inferential Statistics t-tests Part 2