Treatment Effect Estimation Using Self-Estimated Counterfactuals Under Varying Conditions A Meta-Analytic Exploration
Main Article Content
Abstract
Background: Randomized controlled trials (RCTs) are frequently not an option in evaluation practice, which is why evaluators switch to non-experimental methods–such as the “counterfactual as self-estimated by program participants” (CSEPP) for estimating intervention effects. Unfortunately, no systematic attempt has been made to test under what conditions CSEPP provides valid estimates.
Purpose: As a first step in this direction, this research compared the performance of CSEPP in terms of bias when applied in different groups of participants with different levels of education, when used for assessing the effects on different outcome variables, and when employed with different question orders within the questionnaire.
Setting: NA
Intervention: The treatment used in this research was a short educational video, in which the audience is educated about important concepts and aspects of organ donation.
Research Design: Since investigating bias in CSEPP is difficult at participant level, a series of 40 studies was conducted and bias was analyzed at study-level. For each study, the effect of the same treatment was estimated by CSEPP and compared with the effect estimated by a simultaneously conducted RCT. Afterwards, it was analyzed whether differences between CSEPP and RCT across the studies were determined by variation in the conditions under which the studies were conducted. Despite small sample sizes of the single trials, the meta-analysis was sufficiently powered to detect even small differences between CSEPP and RCT.
Data Collection and Analysis: The data was collected via online surveys on a crowdsourcing portal. For data analysis, we applied meta-analytic methods such as random-effects meta-analysis and meta-regression.
Findings: Results show that CSEPP provided accurate effect estimates, no matter under what conditions the method was applied.
Downloads
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright and Permissions
Authors retain full copyright for articles published in JMDE. JMDE publishes under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY - NC 4.0). Users are allowed to copy, distribute, and transmit the work in any medium or format for noncommercial purposes, provided that the original authors and source are credited accurately and appropriately. Only the original authors may distribute the article for commercial or compensatory purposes. To view a copy of this license, visit creativecommons.org
References
Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to meta-analysis. Chichester, UK: Wiley & Sons. https://doi.org/10.1002/9780470743386 DOI: https://doi.org/10.1002/9780470743386
Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2010). A basic introduction to fixed-effect and random-effects models for meta-analysis. Research Synthesis Methods, 1, 97-111. https://doi.org/10.1002/jrsm.12 DOI: https://doi.org/10.1002/jrsm.12
Byrne, R. M. J. (2005). The rational imagination: How people create alternatives to reality. Cambridge: MIT Press. https://doi.org/10.7551/mitpress/5756.001.0001 DOI: https://doi.org/10.7551/mitpress/5756.001.0001
Carlsson, M., Dahl, G. B., Öckert, B., & Rooth, D.O. (2015). The effect of schooling on cognitive skills. The Review of Economics and Statistics, 97, 533-547. https://doi.org/10.1162/REST_a_00501 DOI: https://doi.org/10.1162/REST_a_00501
Celsi, R. L., & Olson, J. C. (1988). The role of involvement in attention and comprehension processes. Journal of Consumer Research, 15, 210-224. https://doi.org/10.1086/209158 DOI: https://doi.org/10.1086/209158
Cook, T. D., Shadish, W. R., & Wong, V. C. (2008). Three conditions under which experiments and observational studies produce comparable causal estimates: New findings from within study comparisons. Journal of Policy Analysis and Management, 27, 724-750. https://doi.org/10.1002/pam.20375 DOI: https://doi.org/10.1002/pam.20375
Croyle, R. T., Loftus, E. F., Barger, S. D., Sun, Y.-C., Hart, M., & Gettig, J. (2006). How well do people recall risk factor test results? Accuracy and bias among cholesterol screening participants. Health Psychology, 25, 425-432. https://doi.org/10.1037/0278-6133.25.3.425 DOI: https://doi.org/10.1037/0278-6133.25.3.425
Dunlap, W. P., Cortina, J. M., Vaslow, J. B., & Burke, M. J. (1996). Meta-analysis of experiments with matched groups or repeated measures designs. Psychological Methods, 1, 170-177. https://doi.org/10.1037/1082-989X.1.2.170 DOI: https://doi.org/10.1037//1082-989X.1.2.170
Efron, B. (1979). Bootstrap methods: Another look at the jackknife. The Annals of Statistics, 7, 1- 26. https://doi.org/10.1214/aos/1176344552 DOI: https://doi.org/10.1214/aos/1176344552
Falch, T., & Massih, S. (2010). The effect of education on cognitive ability. Economic Inquiry, 49, 838-856. https://doi.org/10.1111/j.1465-7295.2010.00312.x DOI: https://doi.org/10.1111/j.1465-7295.2010.00312.x
Farel, A., Umble, K., & Polhamus, B. (2001). Impact of an online analytic skills course. Evaluation & the Health Professions, 24, 446-459. https://doi.org/10.1177/01632780122035019 DOI: https://doi.org/10.1177/01632780122035019
Harbord, R. M., & Higgins, J. P. T. (2008). Metaregression in Stata. The Stata Journal, 8, 493- 519. https://doi.org/10.1177/1536867X0800800403 DOI: https://doi.org/10.1177/1536867X0800800403
Harris, R. J., Bradburn, M. J., Deeks, J. J., Harbord, R. M., Altman, D. G., & Sterne, J. A. C. (2008). Metan: Fixed- and random-effects meta analysis. The Stata Journal, 8, 3-28. https://doi.org/10.1177/1536867X0800800102 DOI: https://doi.org/10.1177/1536867X0800800102
Hill, L. G., & Betz, D. L. (2005). Revisiting the retrospective pretest. American Journal of Evaluation, 26, 501-517. https://doi.org/10.1177/1098214005281356 DOI: https://doi.org/10.1177/1098214005281356
Holland, P. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81, 954-960. https://doi.org/10.1080/01621459.1986.10478354 DOI: https://doi.org/10.1080/01621459.1986.10478354
Jauk, E., Benedek, M., Dunst, B., & Neubauer, A. C. (2013). The relationship between intelligence and creativity: New support for the threshold hypothesis by means of empirical breakpoint detection. Intelligence, 41, 212-221. https://doi.org/10.1016/j.intell.2013.03.003 DOI: https://doi.org/10.1016/j.intell.2013.03.003
Krell, M. (2015). Evaluating an instrument to measure mental load and mental effort using Item Response Theory. Science Education Review Letters, 2015, 1-6.
Mueller, C. E. (2015). Evaluating the effectiveness of website content features using retrospective pretest methodology: An experimental test. Evaluation Review, 39, 283-307. https://doi.org/10.1177/0193841X15582142 DOI: https://doi.org/10.1177/0193841X15582142
Mueller, C. E., & Gaus, H. (2015). Assessing the performance of the "counterfactual as self estimated by program participants": Results from a randomized controlled trial. American Journal of Evaluation, 36, 7-24. https://doi.org/10.1177/1098214014538487 DOI: https://doi.org/10.1177/1098214014538487
Mueller, C. E., Gaus, H., & Rech, J. (2014). The counterfactual self-estimation of program participants: Impact assessment without control groups or pretests. American Journal of Evaluation, 35, 8-26. https://doi.org/10.1177/1098214013503182 DOI: https://doi.org/10.1177/1098214013503182
Musch, J., Brockhaus, R., & Bröder, A. (2002). An inventory for the assessment of two factors of social desirability [Ein Inventar zur Erfassung von zwei Faktoren sozialer Erwünschtheit]. Diagnostica, 48, 121-129. https://doi.org/10.1026//0012-1924.48.3.121 DOI: https://doi.org/10.1026//0012-1924.48.3.121
Nimon, K., Zigarmi, D., & Allen, J. (2011). Measures of program effectiveness based on retrospective pretest data: Are all created equal? American Journal of Evaluation, 32, 8-28. https://doi.org/10.1177/1098214010378354 DOI: https://doi.org/10.1177/1098214010378354
Parisi, J. M., Rebok, G. W., Xue, Q.-L., Fried, L. P., Seeman, T. E., Tanner, E. K., Gruenewald, T. L., Frick, K. D., & Carlsson, M. C. (2012). The role of education and intellectual activity on cognition. Journal of Aging Research, 2012. https://doi.org/10.1155/2012/416132 DOI: https://doi.org/10.1155/2012/416132
Paulhus, D. L. (1992). Assessing self-deception and impression management in self-reports: The balanced inventory of desirable responding (Reference manual, version 6). Vancouver, BC: University of British Columbia. https://doi.org/10.1037/t13624-000 DOI: https://doi.org/10.1037/t13624-000
Roese, N. J. (1997). Counterfactual thinking. Psychological Bulletin, 121, 133-148. https://doi.org/10.1037/0033-2909.121.1.133 DOI: https://doi.org/10.1037//0033-2909.121.1.133
Roese, N. J., & Olson, J. M. (2014). What might have been: The social psychology of counterfactual thinking. New York, NY: Psychology Press. https://doi.org/10.4324/9781315806419 DOI: https://doi.org/10.4324/9781315806419
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41-55. https://doi.org/10.1093/biomet/70.1.41 DOI: https://doi.org/10.1093/biomet/70.1.41
Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66, 688-701. https://doi.org/10.1037/h0037350 DOI: https://doi.org/10.1037/h0037350
Shadish, W. R., Cook, T., & Campbell, D. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin.
Sherif, M., & Hovland, C. I. (1961). Social judgment: Assimilation and contrast effects in communication and attitude change. New Haven, CT: Yale University Press.
Skeff, K. M., Stratos, G. A., & Bergen, M. R. (1992). Evaluation of a medical faculty development program: A comparison of traditional pre/post and retrospective pre/post self-assessment ratings. Evaluation & the Health Professions, 15, 350-366. https://doi.org/10.1177/016327879201500307 DOI: https://doi.org/10.1177/016327879201500307
Taylor, P. J., Russ-Eft, D. F., & Taylor, H. (2009). Gilding the outcome by tarnishing the past: Inflationary biases in retrospective pretest. American Journal of Evaluation, 30, 31-34. https://doi.org/10.1177/1098214008328517 DOI: https://doi.org/10.1177/1098214008328517
Tian, Y. (2010). Organ donation on web 2.0: Content and audience analysis of organ donation videos on YouTube. Health Communication, 25, 238-246. https://doi.org/10.1080/10410231003698911 DOI: https://doi.org/10.1080/10410231003698911
Tourangeau, R., Rips, L. J., & Rasinski, K. (2000). The psychology of survey response. Cambridge, UK: Cambridge University Press. https://doi.org/10.1017/CBO9780511819322 DOI: https://doi.org/10.1017/CBO9780511819322
Valentine, J. C., Pigott, T. D., & Rothstein, H. R. (2010). How many studies do you need? A primer on statistical power for meta-analysis. Journal of Educational and Behavioral Statistics, 35, 215-247. https://doi.org/10.3102/1076998609346961 DOI: https://doi.org/10.3102/1076998609346961