Retrospective Pretest and Counterfactual Self-Report: Different or Same?
Main Article Content
Abstract
Purpose: To examine discriminant validity of treatment participants’ self-report of the state they would be in had they not received treatment (counterfactual); specifically, the distinction between self-report of counterfactual and self-report of preintervention state (retrospective pretest).
Setting: An education department of a large University in North America.
Intervention: Methods of self-reporting research self-efficacy with counterfactual items and with retrospective pretest items.
Research design: A randomized comparison group design with two treatments that were defined by the version of the survey used in each. In the survey for the counterfactual condition, items about research self-efficacy without the influence of their program of studies were included. The survey in the retrospective pretest condition contained items regarding research self-efficacy before participating in their program of study. The same items about research self-efficacy at the current time (posttest) were included in both treatment conditions.
Data collection & analysis: Participants were graduate students recruited via email who answered an online survey about research self-efficacy. These students were randomly assigned to one of the two aforementioned treatments. Responses were analyzed using a mixed 2 by 2 randomized factorial ANOVA design with self-report method (counterfactual or retrospective pretest) as the between-subjects factor and time (pre and post intervention) as the within-subjects factor.
Findings: Our findings show that counterfactual and retrospective pretest scores and treatment effects computed based on these two sets of scores are virtually identical, casting doubt on participants’ ability to differentiate between a state of no treatment and a state at treatment commencement after they have received treatment.
Downloads
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright and Permissions
Authors retain full copyright for articles published in JMDE. JMDE publishes under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY - NC 4.0). Users are allowed to copy, distribute, and transmit the work in any medium or format for noncommercial purposes, provided that the original authors and source are credited accurately and appropriately. Only the original authors may distribute the article for commercial or compensatory purposes. To view a copy of this license, visit creativecommons.org
References
American Educational Research Association (AERA), American Psychological Association (APA), & National Council on Measurement in Education (NCME) (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Belcher, B., & Palenberg, M. (2018). Outcomes and Impacts of Development Interventions: Toward Conceptual Clarity. American Journal of Evaluation, 1098214018765698. https://doi.org/10.1177/1098214018765698 DOI: https://doi.org/10.1177/1098214018765698
Bell, S.H. & Peck, L.R. (2016). On the feasibility of extending social experiments to wider applications. Journal of MultiDisciplinary Evaluation, 12(27), 2016 DOI: https://doi.org/10.56645/jmde.v12i27.452
https://doi.org/10.56645/jmde.v12i27.452 DOI: https://doi.org/10.56645/jmde.v12i27.452
Blamey, A., & Mackenzie, M. (2007). Theories of Change and Realistic Evaluation: Peas in a Pod or Apples and Oranges? Evaluation, 13(4), 439-455. https://doi.org/10.1177/1356389007082129 DOI: https://doi.org/10.1177/1356389007082129
Blanch-Hartigan, D. (2011). Medical students' self-assessment of performance: results from three meta-analyses. Patient Education and Counseling, 84(1), 3-9. https://doi.org/10.1016/j.pec.2010.06.037 DOI: https://doi.org/10.1016/j.pec.2010.06.037
Bowman, N. (2010). Can 1st-Year College Students Accurately Report Their Learning and Development? American Educational Research Journal, 47(2), 466-496. https://doi.org/10.3102/0002831209353595 DOI: https://doi.org/10.3102/0002831209353595
Bray, J. H., Maxwell, S. E., & Howard, G. S. (1984). Methods of Analysis with Response-Shift Bias. Educational and Psychological Measurement, 44(4), 781-804. DOI: https://doi.org/10.1177/0013164484444002
https://doi.org/10.1177/0013164484444002 DOI: https://doi.org/10.1177/0013164484444002
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56(2), 81-105. https://doi.org/10.1037/h0046016 DOI: https://doi.org/10.1037/h0046016
Campbell, D. T., & Stanley, J. (1963). Experimental and Quasi-Experimental Designs for Research (1 edition). Boston: Wadsworth Publishing.
Carifio, J., & Perla, R. J. (2007). Ten common misunderstandings, misconceptions, persistent myths and urban legends about Likert scales and Likert response formats and their antidotes. Journal of Social Sciences, 3(3), 106-116. DOI: https://doi.org/10.3844/jssp.2007.106.116
https://doi.org/10.3844/jssp.2007.106.116 DOI: https://doi.org/10.3844/jssp.2007.106.116
Chen, H.-T. (1990). Theory-driven evaluations. Newbury Park: Sage.
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2 edition). Hillsdale, N.J: Routledge.
Cook, T. D. (2002). Randomized Experiments in Educational Policy Research: A Critical Examination of the Reasons the Educational Evaluation Community Has Offered for Not Doing Them. Educational Evaluation and Policy Analysis, 24(3), 175-199. DOI: https://doi.org/10.3102/01623737024003175
https://doi.org/10.3102/01623737024003175 DOI: https://doi.org/10.3102/01623737024003175
Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: design & analysis issues for field settings. Rand McNally College Pub. Co.
Coryn, C. L. S., & Hobson, K. A. (2011). Using nonequivalent dependent variables to reduce internal validity threats in quasi-experiments: Rationale, history, and examples from practice. New Directions for Evaluation, 2011(131), 31-39. https://doi.org/10.1002/ev.375 DOI: https://doi.org/10.1002/ev.375
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281-302. https://doi.org/10.1037/h0040957 DOI: https://doi.org/10.1037/h0040957
Davis, D. A., Mazmanian, P. E., Fordis, M., Van Harrison, R., Thorpe, K. E., & Perrier, L. (2006). Accuracy of physician self-assessment compared with observed measures of competence: a systematic review. JAMA: The Journal of the American Medical Association, 296(9), 1094-1102. https://doi.org/10.1001/jama.296.9.1094 DOI: https://doi.org/10.1001/jama.296.9.1094
Dennis, M. L. (1990). Assessing the Validity of Randomized Field Experiments: An Example from Drug Abuse Treatment Research. Evaluation Review, 14(4), 347-373. https://doi.org/10.1177/0193841X9001400402 DOI: https://doi.org/10.1177/0193841X9001400402
Eckert, A. (2000). Situational enhancement of design validity: the case of training evaluation at the World Bank Institute. The American Journal of Evaluation, 21(2), 185-193. https://doi.org/10.1016/S1098-2140(00)00065-5 DOI: https://doi.org/10.1016/S1098-2140(00)00065-5
Ellis, P. D. (2010). The Essential Guide to Effect Sizes: Statistical Power, Meta-Analysis, and the Interpretation of Research Results. Cambridge University Press DOI: https://doi.org/10.1017/CBO9780511761676
https://doi.org/10.1017/CBO9780511761676 DOI: https://doi.org/10.1017/CBO9780511761676
Ford, J. K., & Weissbein, D. A. (1997). Transfer of Training: An Updated Review and Analysis. Performance Improvement Quarterly, 10(2), 22-41. https://doi.org/10.1111/j.1937-8327.1997.tb00047.x DOI: https://doi.org/10.1111/j.1937-8327.1997.tb00047.x
Gilham, S. A., Lucas, W. L., & Sivewright, D. (1997). The Impact of Drug Education and Prevention Programs: Disparity Between Impressionistic and Empirical Assessments. Evaluation Review, 21(5), 589-613. https://doi.org/10.1177/0193841X9702100504 DOI: https://doi.org/10.1177/0193841X9702100504
Hill, L. G., & Betz, D. L. (2005). Revisiting the Retrospective Pretest. American Journal of Evaluation, 26(4), 501-517. https://doi.org/10.1177/1098214005281356 DOI: https://doi.org/10.1177/1098214005281356
Holland, P. W. (1986). Statistics and Causal Inference. Journal of the American Statistical Association, 81(396), 945. https://doi.org/10.2307/2289064 DOI: https://doi.org/10.2307/2289064
Howard, G. S. (1980). Response-Shift Bias A Problem in Evaluating Interventions with Pre/Post Self-Reports. Evaluation Review, 4(1), 93-106. https://doi.org/10.1177/0193841X8000400105 DOI: https://doi.org/10.1177/0193841X8000400105
Howard, G. S., & Dailey, P. R. (1979). Response-shift bias: A source of contamination of self-report measures. Journal of Applied Psychology, 64(2), 144-150. DOI: https://doi.org/10.1037//0021-9010.64.2.144
https://doi.org/10.1037/0021-9010.64.2.144 DOI: https://doi.org/10.1037/0021-9010.64.2.144
Howard, G. S., Schmeck, R. R., & Bray, J. H. (1979). Internal Invalidity in Studies Employing Self-Report Instruments: A Suggested Remedy. Journal of Educational Measurement, 16(2), 129-135. https://doi.org/10.1111/j.1745-3984.1979.tb00094.x DOI: https://doi.org/10.1111/j.1745-3984.1979.tb00094.x
Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed, pp. 17-64). Westport, CT: Praeger Publishers.
Keele, L. (2015). Causal Mediation Analysis: Warning! Assumptions Ahead. American Journal of Evaluation, 36(4), 500-513. https://doi.org/10.1177/1098214015594689 DOI: https://doi.org/10.1177/1098214015594689
Lam, T.C.M. (2008). Estimating program impact through judgment: A simple but bad idea? Paper presented at the annual meeting of the American Evaluation Association.
Lam, T.C.M. (2009). Do Self-Assessments Work to Detect Workshop Success? An Analysis of D'Eon et al.'s Argument and Recommendation. American Journal of Evaluation, 30(1), 93-105. DOI: https://doi.org/10.1177/1098214008327931
https://doi.org/10.1177/1098214008327931 DOI: https://doi.org/10.1177/1098214008327931
Lam, T.C.M. & Bengo, P. (2003). A comparison of three retrospective self-reporting methods of measuring change in instructional practices. American Journal of Evaluation, 24(1), 65-80. DOI: https://doi.org/10.1177/109821400302400106
https://doi.org/10.1177/109821400302400106 DOI: https://doi.org/10.1177/109821400302400106
Lam, T.C.M., Valencia, E., & Ardeshri, M. (2014). Item order effect in retrospective pretest method of self-reporting change. Paper presented at the annual meeting of the American Evaluation Association.
Lam, T.C.M. & Valve, L. (In Press). Cognitive Interview in Survey Development: Item Construction and Response Validation. In Harbaugh, G. & Luhanga, U. (Ed.) Basic Elements of Survey Research in Education: Addressing the Problems Your Advisor Never Told You About. American Educational Research Association (AERA).
LeBaron Wallace, T. (2011). An argument-based approach to validity in evaluation. Evaluation, 17(3), 233-246. https://doi.org/10.1177/1356389011410522 DOI: https://doi.org/10.1177/1356389011410522
Lipsey, M. W. (1993). Theory as method: Small theories of treatments. New Directions for Program Evaluation, 1993(57), 5-38. https://doi.org/10.1002/ev.1637 DOI: https://doi.org/10.1002/ev.1637
Lipsey, M. W., & Wilson, D. B. (1993). The efficacy of psychological, educational, and behavioral treatment: Confirmation from meta-analysis. American Psychologist, 48(12), 1181-1209. https://doi.org/10.1037/0003-066X.48.12.1181 DOI: https://doi.org/10.1037//0003-066X.48.12.1181
Mayne, J. (2001). Addressing Attribution through Contribution Analysis: Using Performance Measures Sensibly. Canadian Journal of Program Evaluation, 16(1), 1-24. DOI: https://doi.org/10.3138/cjpe.016.001
https://doi.org/10.3138/cjpe.016.001 DOI: https://doi.org/10.3138/cjpe.016.001
Mayne, J. (2012). Contribution analysis: Coming of age? Evaluation, 18(3), 270-280. https://doi.org/10.1177/1356389012451663 DOI: https://doi.org/10.1177/1356389012451663
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational Measurement (3rd ed., pp. 13-110). New York, NY: MacMillan.
Mills, B. A., Caetano, R., & Rhea, A. E. (2014). Factor Structure of the Clinical Research Appraisal Inventory (CRAI). Evaluation & the Health Professions, 37(1), 71-82. https://doi.org/10.1177/0163278713500303 DOI: https://doi.org/10.1177/0163278713500303
Mueller, C. E., & Gaus, H. (2015). Assessing the Performance of the "Counterfactual as Self-Estimated by Program Participants": Results From a Randomized Controlled Trial. American Journal of Evaluation, 36(1), 7-24. https://doi.org/10.1177/1098214014538487 DOI: https://doi.org/10.1177/1098214014538487
Mueller, C. E., & Gaus, H. (2018). Treatment Effect Estimation Using Self-Estimated Counterfactuals Under Varying Conditions. Journal of MultiDisciplinary Evaluation, 14(30), 16-36. DOI: https://doi.org/10.56645/jmde.v14i30.484
https://doi.org/10.56645/jmde.v14i30.484 DOI: https://doi.org/10.56645/jmde.v14i30.484
Mueller, C. E., Gaus, H., & Rech, J. (2014). The Counterfactual Self-Estimation of Program Participants: Impact Assessment Without Control Groups or Pretests. American Journal of Evaluation, 35(1), 8-25. https://doi.org/10.1177/1098214013503182 DOI: https://doi.org/10.1177/1098214013503182
Mullikin, E. A., Bakken, L. L., & Betz, N. E. (2007). Assessing Research Self-Efficacy in Physician-Scientists: The Clinical Research APPraisal Inventory. Journal of Career Assessment, 15(3), 367-387. https://doi.org/10.1177/1069072707301232 DOI: https://doi.org/10.1177/1069072707301232
Nath, S. R. (2007). Self-Reporting and Test Discrepancy: Evidence from a National Literacy Survey in Bangladesh. International Review of Education / Internationale Zeitschrift Für Erziehungswissenschaft / Revue Internationale de l'Education, 53(2), 119-133. DOI: https://doi.org/10.1007/s11159-007-9037-0
https://doi.org/10.1007/s11159-007-9037-0 DOI: https://doi.org/10.1007/s11159-007-9037-0
Nimon, K., Zigarmi, D., & Allen, J. (2011). Measures of Program Effectiveness Based on Retrospective Pretest Data: Are All Created Equal? American Journal of Evaluation, 32(1), 8-28. https://doi.org/10.1177/1098214010378354 DOI: https://doi.org/10.1177/1098214010378354
Pawson, R., & Tilley, N. (2001). Realistic Evaluation Bloodlines. American Journal of Evaluation, 22(3), 317-324. https://doi.org/10.1177/109821400102200305 DOI: https://doi.org/10.1177/109821400102200305
Peck, L. R., Kim, Y., & Lucio, J. (2012). An Empirical Examination of Validity in Evaluation. American Journal of Evaluation, 33(3), 350-365. https://doi.org/10.1177/1098214012439929 DOI: https://doi.org/10.1177/1098214012439929
Phillip, J. J. (1996). Measuring ROI: the fifth level of evaluation. Technical & Skills Training, (April), 10-13.
Phillip, J. J., & Phillip, P. (2006). Measuring return of investment in leadership development. In K. Hannum, J. W. Martineau, & C. Reinelt (Eds.), The Handbook of Leadership Development Evaluation. John Wiley & Sons.
Phillips, J., & Stone, R. (2002). How to Measure Training Results: A Practical Guide to Tracking the Six Key Indicators (1 edition). New York: McGraw-Hill Education.
Porter, S., & O'Halloran, P. (2011). The use and limitation of realistic evaluation as a tool for evidence-based practice: a critical realist perspective. Nursing Inquiry, 19(1), 18-28. https://doi.org/10.1111/j.1440-1800.2011.00551.x DOI: https://doi.org/10.1111/j.1440-1800.2011.00551.x
Pratt, C. C., McGuigan, W. M., & Katzev, A. R. (2000). Measuring Program Outcomes: Using Retrospective Pretest Methodology. American Journal of Evaluation, 21(3), 341-349. https://doi.org/10.1177/109821400002100305 DOI: https://doi.org/10.1177/109821400002100305
Reichardt, C. S. (2000). A Typology of Strategies for Ruling Out Threats to Validity. In L. Bickman (Ed.), Research Design: Donald Campbell's Legacy (1 edition). Thousand Oaks, Calif: SAGE Publications, Inc.
Rindskopf, D. (2000). Plausible rival hypotheses in measurement, design and scientific theory. In L. Bickman (Ed.), Validity and social experimentation. Sage Publications.
Rogers, P. J., & Weiss, C. H. (2007). Theory-based evaluation: Reflections ten years on: Theory-based evaluation: Past, present, and future. New Directions for Evaluation, (114), 63-81. https://doi.org/10.1002/ev.225 DOI: https://doi.org/10.1002/ev.225
Schmitt, J., & Beach, D. (2015). The contribution of process tracing to theory-based evaluations of complex aid instruments. Evaluation, 21(4), 429-447. https://doi.org/10.1177/1356389015607739 DOI: https://doi.org/10.1177/1356389015607739
Schneider, B., Carnoy, M., Kilpatrick, J., Schmidt, W. H., & Shavelson, R. J. (2007). Estimating causal effects using experimental and observational designs. Washington, DC: American Educational Research Association.
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin.
Solmeyer, A. R., & Constance, N. (2015). Unpacking the "Black Box" of Social Programs and Policies: Introduction. American Journal of Evaluation, 36(4), 470-474. https://doi.org/10.1177/1098214015600786 DOI: https://doi.org/10.1177/1098214015600786
Taylor, P. J., Russ-Eft, D. F., & Taylor, H. (2009). Gilding the Outcome by Tarnishing the Past Inflationary Biases in Retrospective Pretests. American Journal of Evaluation, 30(1), 31-43. https://doi.org/10.1177/1098214008328517 DOI: https://doi.org/10.1177/1098214008328517
Trochim, W. M. K. (1986). Editor's notes. New Directions for Program Evaluation, (31), 1-7. https://doi.org/10.1002/ev.1430 DOI: https://doi.org/10.1002/ev.1430
Weiss, C. H. (1997). Theory-based evaluation: Past, present, and future. New Directions for Evaluation, (76), 41-55. https://doi.org/10.1002/ev.1086 DOI: https://doi.org/10.1002/ev.1086
White, H. (2009). Theory-based impact evaluation: principles and practice. Journal of Development Effectiveness, 1(3), 271-284. https://doi.org/10.1080/19439340903114628 DOI: https://doi.org/10.1080/19439340903114628
Willis, G. B. (2004). Cognitive Interviewing: A Tool for Improving Questionnaire Design (1st ed.). Sage Publications, Inc.
Yin, R. K. (2000). Rival Explanations as an Alternative to Reforms as "Experiments." In L. Bickman (Ed.), Validity & social experimentation (Vol. 1). Thousand Oaks: SAGE Publications.