Using Impact Evaluation Tools to Unpack the Black Box and Learn What Works
Main Article Content
Abstract
Researchers and policy makers are increasingly dissatisfied with the “average treatment effect.” Not only are they interested in learning about the overall causal effects of policy interventions, but they want to know what specifically it is about the intervention that is responsible for any observed effects. This discusses Peck's (2003) approach to creating symmetrically-predicted subgroups for analyzing endogenous features of experimentally evaluated interventions and then it identifies several possible extensions that might help evaluators better understand complex interventions. It aims to enrich evaluation methodologists’ toolbox, to improve our ability to analyze “what works” in addressing important questions for policy and program practice.
Downloads
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright and Permissions
Authors retain full copyright for articles published in JMDE. JMDE publishes under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY - NC 4.0). Users are allowed to copy, distribute, and transmit the work in any medium or format for noncommercial purposes, provided that the original authors and source are credited accurately and appropriately. Only the original authors may distribute the article for commercial or compensatory purposes. To view a copy of this license, visit creativecommons.org
References
Abadie, Alberto, Matthew M. Chingos & Martin.R. West. (2014). Endogenous Stratification in Randomized Experiments. Cambridge, MA: Harvard University Working Paper. Available at http://www.ksg.harvard.edu/fs/aabadie/stratification.pdf https://doi.org/10.3386/w19742 DOI: https://doi.org/10.3386/w19742
Angrist, Joshua D., Guido W. Imbens, & Donald B. Rubin. (1996). Identification of Causal Effects Using Instrumental Variables. Journal of the American Statistical Association, 91(434), 444-455. DOI: 10.2307/2291629 https://doi.org/10.2307/2291629 DOI: https://doi.org/10.1080/01621459.1996.10476902
Bein, Edward. (2013). "Proxy Variable and Other Estimators of Principal Effects" to be presented at the Annual Fall Research Conference of the Association for Public Policy Analysis and Management, Washington, DC, November 7.
Bell, Stephen H. (2013). "Extending Analysis of Symmetrically Predicted Endogenous Subgroups to Multiple Mediators" to be presented at the Annual Fall Research Conference of the Association for Public Policy Analysis and Management, Washington, DC, November 7.
Bell, Stephen H., & Laura R. Peck. (2013). Using Symmetric Predication of Endogenous Subgroups for Causal Inferences about Program Effects under Robust Assumptions: Part Two of a Method Note in Three Parts. American Journal of Evaluation, 34(3), 413-426. DOI: 10.1177/1098214013489338 https://doi.org/10.1177/1098214013489338 DOI: https://doi.org/10.1177/1098214013490820
Bloom, Howard S. (1984). Accounting for No-shows in Experimental Evaluation Designs. Evaluation Review, 8(2), 225-246. DOI: 10.1177/0193841X8400800205 https://doi.org/10.1177/0193841X8400800205 DOI: https://doi.org/10.1177/0193841X8400800205
Frangakis, Constantin E., & Donald B. Rubin. (2002). Principal Stratification in Causal Inference. Biometrics, 58(1), 21-29. https://doi.org/10.1111/j.0006-341X.2002.00021.x DOI: https://doi.org/10.1111/j.0006-341X.2002.00021.x
Fernald, Lia C. H., Rita Hamad, Dean Karlan, Emily J. Ozer, & Jonathan Zinman. (2008). Small Individual Loans and Mental Health: A Randomized Controlled Trial Among South African Adults. BMC Public Health, 8, Article 409. DOI: 10.1186/1471-2458-8-409 https://doi.org/10.1186/1471-2458-8-409 DOI: https://doi.org/10.1186/1471-2458-8-409
Gibson, Christina M. (2003). Privileging the Participant: The Importance of Subgroup Analysis in Social Welfare Evaluations. American Journal of Evaluation, 24(4), 443-469. DOI: 10.1177/109821400302400403 https://doi.org/10.1177/109821400302400403 DOI: https://doi.org/10.1016/j.ameval.2003.09.002
Harknett, Kristen. (2006). Estimating Effects for Program Participants Using Propensity Score Does Receiving an Earnings Supplement Affect Union Formation? Evaluation Review, 30(6), 741-778. DOI: 10.1177/0193841X06293411 https://doi.org/10.1177/0193841X06293411 DOI: https://doi.org/10.1177/0193841X06293411
Harvill, Eleanor L., Shawn Moulton & Laura R. Peck. (forthcoming). Health Profession Opportunity Grants Impact Study Technical Supplement to the Evaluation Design Report: Impact Analysis Plan. OPRE Report # XXX, Washington, DC: Office of Planning, Research and Evaluation, Administration for Children and Families, U.S. Department of Health and Human Services.
Harvill, Eleanor L., Laura R. Peck, & Stephen H. Bell. (2013). On Overfitting in Analysis of Symmetrically Predicted Endogenous Subgroups from Randomized Experimental Samples: Part Three of a Method Note in Three Parts. American Journal of Evaluation, 34(4). DOI: 10.1177/1098214013503201 https://doi.org/10.1177/1098214013503201 DOI: https://doi.org/10.1177/1098214013503201
Imbens, Giudo W. (2000). The Role of the Propensity Score in Estimating Dose-response Functions. Biometrika, 87(3), 706-710. DOI: 10.1093/biomet/87.3.706 https://doi.org/10.1093/biomet/87.3.706 DOI: https://doi.org/10.1093/biomet/87.3.706
Kemple, James J., & Jason C. Snipes. (2000). Career Academies: Impacts on Students' Engagement and Performance in High School. New York, NY: Manpower Demonstration Research Corporation.
Macias, Cathaleene, Danson R. Jones, William A. Hargreaves, Qi Wang, Charles F. Rodican, Paul J. Barreira, & Paul B. Gold. (2008). When Programs Benefit Some People More than Others: Tests of Differential Service Effectiveness. Administration and Policy in Mental Health and Mental Health Services Research, 35(4), 283-294. DOI: 10.1007/s10488-008-0174-y https://doi.org/10.1007/s10488-008-0174-y DOI: https://doi.org/10.1007/s10488-008-0174-y
Manski, Charles F. (1995). Identification Problems in the Social Sciences. Cambridge, MA: Harvard University Press.
Manski, Charles F. (1996). Learning about Treatment Effects from Experiments with Random Assignment of Treatments. Journal of Human Resources, 31, 709-733. https://doi.org/10.2307/146144 DOI: https://doi.org/10.2307/146144
Manski, Charles F. (1997). The Mixing Problem in Program Evaluation. Review of Economic Studies, 64(4), 537-553. DOI: 10.2307/2971730 https://doi.org/10.2307/2971730 DOI: https://doi.org/10.2307/2971730
Morris, Pamela A., & Richard Hendra. (2009). Losing the Safety Net: How a Time-Limited Welfare Policy Affects Families at Risk of Reaching Time Limits. Developmental Psychology, 45(2), 383-400. DOI: 10.1037/a0014960 https://doi.org/10.1037/a0014960 DOI: https://doi.org/10.1037/a0014960
Moulton, Shawn, with Laura R. Peck & Stephen H. Bell. (2014). Social Policy Impact Pathfinder (SPI-Path) Analytic Suite: SPI-Path|Individual User Guide. Bethesda, MD: Abt Associates Inc.
Patton, Michael Quinn. (2010). Developmental Evaluation: Applying Complexity Concepts to Enhance Innovation and Use. New York, NY: Guilford Press.
Peck, Laura R. (2003). Subgroup Analysis in Social Experiments: Measuring Program Impacts Based on Post Treatment Choice. American Journal of Evaluation, 24(2), 157-187. DOI: 10.1016/S1098-2140(03)00031-6 https://doi.org/10.1016/S1098-2140(03)00031-6 DOI: https://doi.org/10.1016/S1098-2140(03)00031-6
Peck, Laura R. (2005). Using Cluster Analysis in Program Evaluation. Evaluation Review, 29(2), 178-196. DOI: 10.1177/01933841X04266335 https://doi.org/10.1177/0193841X04266335 DOI: https://doi.org/10.1177/0193841X04266335
Peck, Laura R. (2007). What are the Effects of Welfare Sanction Policies? Or, Using Propensity Scores as a Subgroup Indicator to Learn More from Social Experiments. American Journal of Evaluation, 28(3), 256-274. DOI: 10.1177/1098214007304129 https://doi.org/10.1177/1098214007304129 DOI: https://doi.org/10.1177/1098214007304129
Peck, Laura R. (2013). On Analysis of Symmetrically-Predicted Endogenous Subgroups: Part One of a Method Note in Three Parts. American Journal of Evaluation, 34(2): 225-236. DOI: 10.1177/1098214013481666 https://doi.org/10.1177/1098214013481666 DOI: https://doi.org/10.1177/1098214013481666
Peck, Laura R. and Stephen H. Bell. (2014). The Role of Program Quality in Determining Head Start's Impact on Child Development. OPRE Report #2014-10, Washington, DC: Office of Planning, Research and Evaluation, Administration for Children and Families, U.S. Department of Health and Human Services.
Peck, Laura R., & Shawn Moulton. (2013). On The Use of Instrumental Variables and Symmetric-Prediction for Estimating Impacts of Mediators. Paper presented at the Welfare Research and Evaluation Conference, Washington, DC, May 30.
Raudenbush, Stephen W. & Sally Sadoff. (2008). Statistical Inference When Classroom Quality is Measured With Error. Journal of Research on Educational Effectiveness. 1(2): 138-154. DOI: 10.1080/19345740801982104 https://doi.org/10.1080/19345740801982104 DOI: https://doi.org/10.1080/19345740801982104
Schochet, Peter Z., & John Burghardt,. (2007). Using Propensity Scoring to Estimate Program-Related Subgroup Impacts in Experimental Program Evaluations. Evaluation Review, 31(2), 95-120. https://doi.org/10.1177/0193841X06288736 DOI: https://doi.org/10.1177/0193841X06288736
Unlu, Fatih, Ryoko Yamaguchi, Larry Bernstein, Julie Edmunds. (2011). Estimating Impacts on Program-Related Subgroups in North Carolina's Early College High School Study. Cambridge, MA: Abt Associates Inc. Unpublished manuscript.
Unlu, Fatih, Laurie Bozzi, Carolyn Layzer, Arthur Smith, Cristopher Price, & R. Hurtig. (2013). Linking Implementation Fidelity to Outcomes in an RCT. Cambridge, MA: Abt Associates Inc. Unpublished manuscript.
Yoshikawa, H., E.A. Rosman, & Joann HsuehJ. (2001). Variation in Teenage Mothers' Experiences of Child Care and Other Components of Welfare Reform: Selection Processes and Developmental Consequences. Child Development, 72, 299-317. https://doi.org/10.1111/1467-8624.00280 DOI: https://doi.org/10.1111/1467-8624.00280
Zanutto, Elaine, Bo Lu, & Robert Hornik. (2005). Using Propensity Score Subclassification for Multiple Treatment Doses to Evaluate a National Antidrug Media Campaign. Journal Of Educational And Behavioral Statistics, 30(1), 59-73. DOI: 10.3102/10769986030001059 https://doi.org/10.3102/10769986030001059 DOI: https://doi.org/10.3102/10769986030001059
Zhai, Fuhua, C. Cybele Raver, & Stephanie M. Jones. (2012). Academic Performance of Subsequent Schools and Impacts of Early Interventions: Evidence from a Randomized Controlled Trial in Head Start Settings. Children and Youth Services Review, 34(5), 946-954. DOI: i:10.1016/j.childyouth.2012.01.026 https://doi.org/10.1016/j.childyouth.2012.01.026 DOI: https://doi.org/10.1016/j.childyouth.2012.01.026
Zhai, Fuhua, C. Cybele Raver, & Christine Li-Grining. (2011). Classroom-based Interventions and Teachers' Perceived Job Stressors and Confidence: Evidence from a Randomized Trial in Head Start Settings. Early Childhood Research Quarterly, 26(4), 442-452. DOI: 10.1016/j.ecresq.2011.03.003 https://doi.org/10.1016/j.ecresq.2011.03.003 DOI: https://doi.org/10.1016/j.ecresq.2011.03.003
Zhai, Fuhua, C. Cybele Raver, Stephanie M. Jones, Christine P. Li-Grining, Emily Pressler, & Qin Gao. (2010). Dosage Effects on School Readiness: Evidence from a Randomized Classroom-Based Intervention. Social Service Review, 84(4), 615-655. DOI: 10.1086/657988 https://doi.org/10.1086/657988 DOI: https://doi.org/10.1086/657988