Student Notebook

Beyond the t-test and F-test

For many psychology researchers and students, finding an appropriate statistical tool for analyzing data can be challenging. Moreover, dealing with issues such as outliers and nonnormal distribution can be frustrating. Methods taught in statistic classes and textbooks (such as Student’s t-test, ANOVA F-test, Pearson’s correlation, and least squares regression) often do not seem to be directly applicable to actual experimental design and datasets. But, many alternative statistical techniques have been developed in recent years to address these issues. These techniques overcome the limitations of traditional tools and have been proven to work well in a wide range of situations where traditional tools fall short. Unfortunately, due to various reasons discussed in Wilcox (2002), these modern techniques are rarely mentioned in the conventional curriculum. The goal of this article is to offer psychology students a glimpse of the usefulness of some modern analytical tools.

Issues with Traditional Methods

Many of the popular statistical methods in psychology, such as the  t-test, ANOVA F-test, and ordinary least squares, were developed more than a century ago. Ordinary least squares was first introduced by Adrien-Marie Legendre in 1805, the t-test was introduced by William S. Gosset in 1908, and the  F-test was introduced by Sir Ronald A. Fisher in the 1920s. These statistical methods are undoubtedly some of the greatest tools developed for data analysis; however, they have restrictive theoretical assumptions. For example, the t-test and ANOVA F-test assume that the population distribution is normal. When such an assumption is violated, which is common in practice, these methods no longer provide control over the probability of a Type I error and can have low statistical power. Although many would argue that with a large enough sample size, violating the normality assumption will not have a detrimental effect, simulation studies have shown that not to be true (Wilcox, 2003).

To illustrate the limitations of the t-test, consider the following two-sample hypothetical data. The samples are generated from two different distributions with distinct means (μ1 = 0.27; μ2 = 0.73). A kernel density plot of the two samples is shown in Figure 1.

Group 1:

-2.40, -1.87, -0.60, -0.54, -0.12,  -0.02,  0.12,  0.34,  0.40,  0.53,  0.55,  0.62,  0.92,  1.21,  1.49,  1.55,  1.57,  1.57, 1.82,  1.87,  1.90,  1.91, 1.93,  2.34,  2.37

Group 2:

-1.32, -1.25, -0.91, -0.62, -0.55, -0.41, -0.40, -0.31, -0.28, -0.21, -0.18, -0.16, -0.03, -0.02, 0.04,  0.22, 0.38, 0.51, 0.53,  0.61,  1.09, 1.47, 1.59, 2.39, 2.47

Just by inspecting the plot without performing any significance test, one would probably conclude that the majority of Group 1 is different from that of Group 2. Testing the null hypothesis that the means of the two groups are equal using a two-sample t-test (assuming unequal variance) yields t = 1.87, p = 0.068. This result suggests that the population means of the groups do not differ significantly. A Type II error is committed because the data were in fact generated from two distributions with different means.

One concern with using the t-test here is that comparing the means may not be the most indicative of how the majority of the two groups differ. Given the skewness, the sample mean may not be the best measure of central tendency. Another concern is that the two distributions differ in skewness. When distributions have different amount of skewness, the t-test tends to have undesirable power properties. Specifically, the probability of rejecting H0 might decrease even as the difference between the two populations means increases.

Alternative Methods

One simple alternative to handle the above data is to use Yuen’s method (1974) with 20 percent trimmed means. Yuen’s method is a modified t-test based on trimmed means. In contrast to the mean, the 20 percent trimmed mean, which removes 20 percent of the largest and smallest observations, is able to downplay the effect of extreme values and better capture the central tendency. Using Yuen’s method to test H0: mt1 = mt2 (i.e. H0: The population 20 percent trimmed means are equal) yields t = 3.15, p = 0.005. Therefore, H0 is rejected.

A common argument against the use of trimmed means is that data are removed from the original set; not only is information lost, it is also unethical. Note that the frequently applied measure of location — median — is indeed a 50 percent trimmed mean. There is even more trimming in a median than in a 20 percent trimmed mean. In terms of guarding against outliers, both the 20 percent trimmed mean and median are preferred over the mean. In some situations, the 20 percent trimmed mean may be better than the median because of its superior mathematical properties such as a smaller standard error under normality.

There is a crucial question researchers should ask when choosing the appropriate measure of location: What is the purpose of the measure of location? If the goal is to indicate where the center of the majority values is, then one should use a measure that is more sensitive to the bulk of the data and less sensitive to extreme values. As demonstrated in the example above, when the distribution is skewed, the mean may not accurately represent the center of the majority. On the other hand, the trimmed mean provides a better sense of where the center of majority lies.

Other than Yuen’s method, there are many other modern methods for interpreting two-sample data. For instance, the two-sample test based on median, the shift function, as well as the percentile bootstrap method based on M-estimators and other robust measures of location. Not only are these methods resistant to outliers, they are also less restricted by distributional assumptions. They often have steady control over Type I error and desirable power properties even with relatively small sample sizes. Robust alternatives for interpreting multivariate data and more complicated designs are also available. Details can be found in Wilcox (2003, 2005).

Conclusion

It is unfortunate that the conventional statistics curriculum offers little beyond the t-test and F-test. Through this article, I hope to encourage fellow students to explore and take advantage of the abundant modern statistical methods available.


References and Further Reading:


Wilcox, R.R. (2002). Can the weak link in psychological research be fixed? Observer, 15, 11 & 38. Wilcox, R.R. (2003). Applying contemporary statistical techniques. San Diego: Academic Press. Wilcox, R.R. (2005). Introduction to robust estimation and hypothesis testing (2nd ed.) San Diego, CA: Academic Press. Yuen, K.K. (1974). The two-sample trimmed t for unequal population variances. Biometrika, 61, 165-170.
Observer Vol.21, No.11 December, 2008

Leave a comment below and continue the conversation.

Comments

Leave a comment.

Comments go live after a short delay. Thank you for contributing.

(required)

(required)