Same or Different? Comparing the Coverage Rate of Five Different Approaches for Testing the Difference of Two Groups Means

testing for statistically significant differences between two group means is one of the most common requirements in psychological research, for example, after an experiment has been conducted. While the classical t-test is probably the most popular approach, its deficiencies under violated assumptio...

Full description

Saved in:
Bibliographic Details
Main Author: Bittmann, Felix
Format: Article
Language:English
Published: Université d'Ottawa 2025-02-01
Series:Tutorials in Quantitative Methods for Psychology
Subjects:
Online Access:https://www.tqmp.org/RegularArticles/vol21-1/p001/p001.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823858035004538880
author Bittmann, Felix
author_facet Bittmann, Felix
author_sort Bittmann, Felix
collection DOAJ
description testing for statistically significant differences between two group means is one of the most common requirements in psychological research, for example, after an experiment has been conducted. While the classical t-test is probably the most popular approach, its deficiencies under violated assumptions have been acknowledged, and various alternative tests have been developed. In this research paper, five widely available methods are compared to investigate the coverage of the generated 95\% confidence intervals. We utilize the coverage of confidence intervals as it corresponds to nominal type-I-error rates (Alpha), yet is more adequate since confidence intervals are preferred in contrast to p-values, which often facilitate binary conclusions. The approaches tested are the classical t-test, Welch’s t-test, OLS regressions with robust standard errors, and two flavors of bootstrapping (normal and bias-corrected). Three different outcome distributions are generated (normal, uniform, skewed), and 75,000 simulations with a wide range of sample sizes (15 to 200 per group) and standard deviations are conducted for each. The results outline that Welch’s t-test and the regression approach perform best. The bootstrap approaches tend to consistent undercoverage. The regular t-test produces larger deviations when its assumptions, especially the equality of variances, are violated. When distributions are skewed, all approaches result in undercoverage.
format Article
id doaj-art-32322f2f3bf9482f88f08a30bb16a3f0
institution Kabale University
issn 1913-4126
language English
publishDate 2025-02-01
publisher Université d'Ottawa
record_format Article
series Tutorials in Quantitative Methods for Psychology
spelling doaj-art-32322f2f3bf9482f88f08a30bb16a3f02025-02-11T15:58:41ZengUniversité d'OttawaTutorials in Quantitative Methods for Psychology1913-41262025-02-0121111210.20982/tqmp.21.1.p001Same or Different? Comparing the Coverage Rate of Five Different Approaches for Testing the Difference of Two Groups MeansBittmann, Felixtesting for statistically significant differences between two group means is one of the most common requirements in psychological research, for example, after an experiment has been conducted. While the classical t-test is probably the most popular approach, its deficiencies under violated assumptions have been acknowledged, and various alternative tests have been developed. In this research paper, five widely available methods are compared to investigate the coverage of the generated 95\% confidence intervals. We utilize the coverage of confidence intervals as it corresponds to nominal type-I-error rates (Alpha), yet is more adequate since confidence intervals are preferred in contrast to p-values, which often facilitate binary conclusions. The approaches tested are the classical t-test, Welch’s t-test, OLS regressions with robust standard errors, and two flavors of bootstrapping (normal and bias-corrected). Three different outcome distributions are generated (normal, uniform, skewed), and 75,000 simulations with a wide range of sample sizes (15 to 200 per group) and standard deviations are conducted for each. The results outline that Welch’s t-test and the regression approach perform best. The bootstrap approaches tend to consistent undercoverage. The regular t-test produces larger deviations when its assumptions, especially the equality of variances, are violated. When distributions are skewed, all approaches result in undercoverage.https://www.tqmp.org/RegularArticles/vol21-1/p001/p001.pdfmean comparisont-test welch testrobust standard errorbootstrappingsimulation
spellingShingle Bittmann, Felix
Same or Different? Comparing the Coverage Rate of Five Different Approaches for Testing the Difference of Two Groups Means
Tutorials in Quantitative Methods for Psychology
mean comparison
t-test welch test
robust standard error
bootstrapping
simulation
title Same or Different? Comparing the Coverage Rate of Five Different Approaches for Testing the Difference of Two Groups Means
title_full Same or Different? Comparing the Coverage Rate of Five Different Approaches for Testing the Difference of Two Groups Means
title_fullStr Same or Different? Comparing the Coverage Rate of Five Different Approaches for Testing the Difference of Two Groups Means
title_full_unstemmed Same or Different? Comparing the Coverage Rate of Five Different Approaches for Testing the Difference of Two Groups Means
title_short Same or Different? Comparing the Coverage Rate of Five Different Approaches for Testing the Difference of Two Groups Means
title_sort same or different comparing the coverage rate of five different approaches for testing the difference of two groups means
topic mean comparison
t-test welch test
robust standard error
bootstrapping
simulation
url https://www.tqmp.org/RegularArticles/vol21-1/p001/p001.pdf
work_keys_str_mv AT bittmannfelix sameordifferentcomparingthecoveragerateoffivedifferentapproachesfortestingthedifferenceoftwogroupsmeans