Beyond the t-test and F-test

Marie  Ng

Student Notebook

Beyond the t-test and F-test

Marie Ng

December 1, 2008

Tags:

Log in to Save for Later

For many psychology researchers and students, finding an appropriate statistical tool for analyzing data can be challenging. Moreover, dealing with issues such as outliers and nonnormal distribution can be frustrating. Methods taught in statistic classes and textbooks (such as Student’s t-test, ANOVA F-test, Pearson’s correlation, and least squares regression) often do not seem to be directly applicable to actual experimental design and datasets. But, many alternative statistical techniques have been developed in recent years to address these issues. These techniques overcome the limitations of traditional tools and have been proven to work well in a wide range of situations where traditional tools fall short. Unfortunately, due to various reasons discussed in Wilcox (2002), these modern techniques are rarely mentioned in the conventional curriculum. The goal of this article is to offer psychology students a glimpse of the usefulness of some modern analytical tools.

Issues with Traditional Methods

Many of the popular statistical methods in psychology, such as the t-test, ANOVA F-test, and ordinary least squares, were developed more than a century ago. Ordinary least squares was first introduced by Adrien-Marie Legendre in 1805, the t-test was introduced by William S. Gosset in 1908, and the F-test was introduced by Sir Ronald A. Fisher in the 1920s. These statistical methods are undoubtedly some of the greatest tools developed for data analysis; however, they have restrictive theoretical assumptions. For example, the t-test and ANOVA F-test assume that the population distribution is normal. When such an assumption is violated, which is common in practice, these methods no longer provide control over the probability of a Type I error and can have low statistical power. Although many would argue that with a large enough sample size, violating the normality assumption will not have a detrimental effect, simulation studies have shown that not to be true (Wilcox, 2003).

To illustrate the limitations of the t-test, consider the following two-sample hypothetical data. The samples are generated from two different distributions with distinct means (μ1 = 0.27; μ2 = 0.73). A kernel density plot of the two samples is shown in Figure 1.

Group 1:

-2.40, -1.87, -0.60, -0.54, -0.12, -0.02, 0.12, 0.34, 0.40, 0.53, 0.55, 0.62, 0.92, 1.21, 1.49, 1.55, 1.57, 1.57, 1.82, 1.87, 1.90, 1.91, 1.93, 2.34, 2.37

Group 2:

-1.32, -1.25, -0.91, -0.62, -0.55, -0.41, -0.40, -0.31, -0.28, -0.21, -0.18, -0.16, -0.03, -0.02, 0.04, 0.22, 0.38, 0.51, 0.53, 0.61, 1.09, 1.47, 1.59, 2.39, 2.47

Just by inspecting the plot without performing any significance test, one would probably conclude that the majority of Group 1 is different from that of Group 2. Testing the null hypothesis that the means of the two groups are equal using a two-sample t-test (assuming unequal variance) yields t = 1.87, p = 0.068. This result suggests that the population means of the groups do not differ significantly. A Type II error is committed because the data were in fact generated from two distributions with different means.

One concern with using the t-test here is that comparing the means may not be the most indicative of how the majority of the two groups differ. Given the skewness, the sample mean may not be the best measure of central tendency. Another concern is that the two distributions differ in skewness. When distributions have different amount of skewness, the t-test tends to have undesirable power properties. Specifically, the probability of rejecting H0 might decrease even as the difference between the two populations means increases.

Alternative Methods

One simple alternative to handle the above data is to use Yuen’s method (1974) with 20 percent trimmed means. Yuen’s method is a modified t-test based on trimmed means. In contrast to the mean, the 20 percent trimmed mean, which removes 20 percent of the largest and smallest observations, is able to downplay the effect of extreme values and better capture the central tendency. Using Yuen’s method to test H0: mt1 = mt2 (i.e. H0: The population 20 percent trimmed means are equal) yields t = 3.15, p = 0.005. Therefore, H0 is rejected.

A common argument against the use of trimmed means is that data are removed from the original set; not only is information lost, it is also unethical. Note that the frequently applied measure of location — median — is indeed a 50 percent trimmed mean. There is even more trimming in a median than in a 20 percent trimmed mean. In terms of guarding against outliers, both the 20 percent trimmed mean and median are preferred over the mean. In some situations, the 20 percent trimmed mean may be better than the median because of its superior mathematical properties such as a smaller standard error under normality.

There is a crucial question researchers should ask when choosing the appropriate measure of location: What is the purpose of the measure of location? If the goal is to indicate where the center of the majority values is, then one should use a measure that is more sensitive to the bulk of the data and less sensitive to extreme values. As demonstrated in the example above, when the distribution is skewed, the mean may not accurately represent the center of the majority. On the other hand, the trimmed mean provides a better sense of where the center of majority lies.

Other than Yuen’s method, there are many other modern methods for interpreting two-sample data. For instance, the two-sample test based on median, the shift function, as well as the percentile bootstrap method based on M-estimators and other robust measures of location. Not only are these methods resistant to outliers, they are also less restricted by distributional assumptions. They often have steady control over Type I error and desirable power properties even with relatively small sample sizes. Robust alternatives for interpreting multivariate data and more complicated designs are also available. Details can be found in Wilcox (2003, 2005).

Conclusion

It is unfortunate that the conventional statistics curriculum offers little beyond the t-test and F-test. Through this article, I hope to encourage fellow students to explore and take advantage of the abundant modern statistical methods available.

Observer > 2008 > December > Beyond the t-test and F-test

Cookie	Duration	Description
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
uvc	1 year 27 days	Set by addthis.com to determine the usage of addthis.com service.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_3507334_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.

Cookie	Duration	Description
loc	1 year 27 days	AddThis sets this geolocation cookie to help understand the location of users who share the information.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Student Notebook

Beyond the t-test and F-test

About the Author

Related

“Lesser of Two Evils”: Applying Artificial Intelligence to Move Beyond Self-Reports

Are Psychological Scientists Overvaluing Significance?

Tips for Estimating Power in Complex Statistical Models

About the Author

Related

“Lesser of Two Evils”: Applying Artificial Intelligence to Move Beyond Self-Reports

Are Psychological Scientists Overvaluing Significance?

Tips for Estimating Power in Complex Statistical Models

Cookies