Treatment Effect Estimation Using Self-Estimated Counterfactuals Under Varying Conditions A Meta-Analytic Exploration

Main Article Content

Christoph Emanuel Mueller
Hansjoerg Gaus

Abstract

Background: Randomized controlled trials (RCTs) are frequently not an option in evaluation practice, which is why evaluators switch to non-experimental methods–such as the “counterfactual as self-estimated by program participants” (CSEPP) for estimating intervention effects. Unfortunately, no systematic attempt has been made to test under what conditions CSEPP provides valid estimates.


Purpose: As a first step in this direction, this research compared the performance of CSEPP in terms of bias when applied in different groups of participants with different levels of education, when used for assessing the effects on different outcome variables, and when employed with different question orders within the questionnaire.


Setting: NA


Intervention: The treatment used in this research was a short educational video, in which the audience is educated about important concepts and aspects of organ donation.


Research Design: Since investigating bias in CSEPP is difficult at participant level, a series of 40 studies was conducted and bias was analyzed at study-level. For each study, the effect of the same treatment was estimated by CSEPP and compared with the effect estimated by a simultaneously conducted RCT. Afterwards, it was analyzed whether differences between CSEPP and RCT across the studies were determined by variation in the conditions under which the studies were conducted. Despite small sample sizes of the single trials, the meta-analysis was sufficiently powered to detect even small differences between CSEPP and RCT.


Data Collection and Analysis: The data was collected via online surveys on a crowdsourcing portal. For data analysis, we applied meta-analytic methods such as random-effects meta-analysis and meta-regression.


Findings: Results show that CSEPP provided accurate effect estimates, no matter under what conditions the method was applied.  

Downloads

Download data is not yet available.

Article Details

How to Cite
Mueller, C. E., & Gaus, H. (2018). Treatment Effect Estimation Using Self-Estimated Counterfactuals Under Varying Conditions: A Meta-Analytic Exploration. Journal of MultiDisciplinary Evaluation, 14(30), 16–36. https://doi.org/10.56645/jmde.v14i30.484
Section
Research on Evaluation Articles

References

Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to meta-analysis. Chichester, UK: Wiley & Sons. https://doi.org/10.1002/9780470743386 DOI: https://doi.org/10.1002/9780470743386

Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2010). A basic introduction to fixed-effect and random-effects models for meta-analysis. Research Synthesis Methods, 1, 97-111. https://doi.org/10.1002/jrsm.12 DOI: https://doi.org/10.1002/jrsm.12

Byrne, R. M. J. (2005). The rational imagination: How people create alternatives to reality. Cambridge: MIT Press. https://doi.org/10.7551/mitpress/5756.001.0001 DOI: https://doi.org/10.7551/mitpress/5756.001.0001

Carlsson, M., Dahl, G. B., Öckert, B., & Rooth, D.O. (2015). The effect of schooling on cognitive skills. The Review of Economics and Statistics, 97, 533-547. https://doi.org/10.1162/REST_a_00501 DOI: https://doi.org/10.1162/REST_a_00501

Celsi, R. L., & Olson, J. C. (1988). The role of involvement in attention and comprehension processes. Journal of Consumer Research, 15, 210-224. https://doi.org/10.1086/209158 DOI: https://doi.org/10.1086/209158

Cook, T. D., Shadish, W. R., & Wong, V. C. (2008). Three conditions under which experiments and observational studies produce comparable causal estimates: New findings from within study comparisons. Journal of Policy Analysis and Management, 27, 724-750. https://doi.org/10.1002/pam.20375 DOI: https://doi.org/10.1002/pam.20375

Croyle, R. T., Loftus, E. F., Barger, S. D., Sun, Y.-C., Hart, M., & Gettig, J. (2006). How well do people recall risk factor test results? Accuracy and bias among cholesterol screening participants. Health Psychology, 25, 425-432. https://doi.org/10.1037/0278-6133.25.3.425 DOI: https://doi.org/10.1037/0278-6133.25.3.425

Dunlap, W. P., Cortina, J. M., Vaslow, J. B., & Burke, M. J. (1996). Meta-analysis of experiments with matched groups or repeated measures designs. Psychological Methods, 1, 170-177. https://doi.org/10.1037/1082-989X.1.2.170 DOI: https://doi.org/10.1037//1082-989X.1.2.170

Efron, B. (1979). Bootstrap methods: Another look at the jackknife. The Annals of Statistics, 7, 1- 26. https://doi.org/10.1214/aos/1176344552 DOI: https://doi.org/10.1214/aos/1176344552

Falch, T., & Massih, S. (2010). The effect of education on cognitive ability. Economic Inquiry, 49, 838-856. https://doi.org/10.1111/j.1465-7295.2010.00312.x DOI: https://doi.org/10.1111/j.1465-7295.2010.00312.x

Farel, A., Umble, K., & Polhamus, B. (2001). Impact of an online analytic skills course. Evaluation & the Health Professions, 24, 446-459. https://doi.org/10.1177/01632780122035019 DOI: https://doi.org/10.1177/01632780122035019

Harbord, R. M., & Higgins, J. P. T. (2008). Metaregression in Stata. The Stata Journal, 8, 493- 519. https://doi.org/10.1177/1536867X0800800403 DOI: https://doi.org/10.1177/1536867X0800800403

Harris, R. J., Bradburn, M. J., Deeks, J. J., Harbord, R. M., Altman, D. G., & Sterne, J. A. C. (2008). Metan: Fixed- and random-effects meta analysis. The Stata Journal, 8, 3-28. https://doi.org/10.1177/1536867X0800800102 DOI: https://doi.org/10.1177/1536867X0800800102

Hill, L. G., & Betz, D. L. (2005). Revisiting the retrospective pretest. American Journal of Evaluation, 26, 501-517. https://doi.org/10.1177/1098214005281356 DOI: https://doi.org/10.1177/1098214005281356

Holland, P. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81, 954-960. https://doi.org/10.1080/01621459.1986.10478354 DOI: https://doi.org/10.1080/01621459.1986.10478354

Jauk, E., Benedek, M., Dunst, B., & Neubauer, A. C. (2013). The relationship between intelligence and creativity: New support for the threshold hypothesis by means of empirical breakpoint detection. Intelligence, 41, 212-221. https://doi.org/10.1016/j.intell.2013.03.003 DOI: https://doi.org/10.1016/j.intell.2013.03.003

Krell, M. (2015). Evaluating an instrument to measure mental load and mental effort using Item Response Theory. Science Education Review Letters, 2015, 1-6.

Mueller, C. E. (2015). Evaluating the effectiveness of website content features using retrospective pretest methodology: An experimental test. Evaluation Review, 39, 283-307. https://doi.org/10.1177/0193841X15582142 DOI: https://doi.org/10.1177/0193841X15582142

Mueller, C. E., & Gaus, H. (2015). Assessing the performance of the "counterfactual as self estimated by program participants": Results from a randomized controlled trial. American Journal of Evaluation, 36, 7-24. https://doi.org/10.1177/1098214014538487 DOI: https://doi.org/10.1177/1098214014538487

Mueller, C. E., Gaus, H., & Rech, J. (2014). The counterfactual self-estimation of program participants: Impact assessment without control groups or pretests. American Journal of Evaluation, 35, 8-26. https://doi.org/10.1177/1098214013503182 DOI: https://doi.org/10.1177/1098214013503182

Musch, J., Brockhaus, R., & Bröder, A. (2002). An inventory for the assessment of two factors of social desirability [Ein Inventar zur Erfassung von zwei Faktoren sozialer Erwünschtheit]. Diagnostica, 48, 121-129. https://doi.org/10.1026//0012-1924.48.3.121 DOI: https://doi.org/10.1026//0012-1924.48.3.121

Nimon, K., Zigarmi, D., & Allen, J. (2011). Measures of program effectiveness based on retrospective pretest data: Are all created equal? American Journal of Evaluation, 32, 8-28. https://doi.org/10.1177/1098214010378354 DOI: https://doi.org/10.1177/1098214010378354

Parisi, J. M., Rebok, G. W., Xue, Q.-L., Fried, L. P., Seeman, T. E., Tanner, E. K., Gruenewald, T. L., Frick, K. D., & Carlsson, M. C. (2012). The role of education and intellectual activity on cognition. Journal of Aging Research, 2012. https://doi.org/10.1155/2012/416132 DOI: https://doi.org/10.1155/2012/416132

Paulhus, D. L. (1992). Assessing self-deception and impression management in self-reports: The balanced inventory of desirable responding (Reference manual, version 6). Vancouver, BC: University of British Columbia. https://doi.org/10.1037/t13624-000 DOI: https://doi.org/10.1037/t13624-000

Roese, N. J. (1997). Counterfactual thinking. Psychological Bulletin, 121, 133-148. https://doi.org/10.1037/0033-2909.121.1.133 DOI: https://doi.org/10.1037//0033-2909.121.1.133

Roese, N. J., & Olson, J. M. (2014). What might have been: The social psychology of counterfactual thinking. New York, NY: Psychology Press. https://doi.org/10.4324/9781315806419 DOI: https://doi.org/10.4324/9781315806419

Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41-55. https://doi.org/10.1093/biomet/70.1.41 DOI: https://doi.org/10.1093/biomet/70.1.41

Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66, 688-701. https://doi.org/10.1037/h0037350 DOI: https://doi.org/10.1037/h0037350

Shadish, W. R., Cook, T., & Campbell, D. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin.

Sherif, M., & Hovland, C. I. (1961). Social judgment: Assimilation and contrast effects in communication and attitude change. New Haven, CT: Yale University Press.

Skeff, K. M., Stratos, G. A., & Bergen, M. R. (1992). Evaluation of a medical faculty development program: A comparison of traditional pre/post and retrospective pre/post self-assessment ratings. Evaluation & the Health Professions, 15, 350-366. https://doi.org/10.1177/016327879201500307 DOI: https://doi.org/10.1177/016327879201500307

Taylor, P. J., Russ-Eft, D. F., & Taylor, H. (2009). Gilding the outcome by tarnishing the past: Inflationary biases in retrospective pretest. American Journal of Evaluation, 30, 31-34. https://doi.org/10.1177/1098214008328517 DOI: https://doi.org/10.1177/1098214008328517

Tian, Y. (2010). Organ donation on web 2.0: Content and audience analysis of organ donation videos on YouTube. Health Communication, 25, 238-246. https://doi.org/10.1080/10410231003698911 DOI: https://doi.org/10.1080/10410231003698911

Tourangeau, R., Rips, L. J., & Rasinski, K. (2000). The psychology of survey response. Cambridge, UK: Cambridge University Press. https://doi.org/10.1017/CBO9780511819322 DOI: https://doi.org/10.1017/CBO9780511819322

Valentine, J. C., Pigott, T. D., & Rothstein, H. R. (2010). How many studies do you need? A primer on statistical power for meta-analysis. Journal of Educational and Behavioral Statistics, 35, 215-247. https://doi.org/10.3102/1076998609346961 DOI: https://doi.org/10.3102/1076998609346961