On the Feasibility of Extending Social Experiments to Wider Applications

Main Article Content

Stephen H. Bell
Laura R. Peck
https://orcid.org/0000-0002-8516-9950

Abstract

Background: When deciding how to allocate limited funds for social programs, policymakers and program managers increasingly ask for evidence of effectiveness based on studies that rely on solid methodology, providing credible scientific evidence. The basic claim for the “social experiment”—that the “coin flip” of randomization creates two statistically equivalent groups that do not diverge except through an intervention’s effects—makes resulting estimates unbiased. Despite the transparency and conceptual strength of the experimental strategy for revealing the causal connection between an intervention and the outcomes of its participants, the wisdom or feasibility of conducting social experiments is often questioned on a variety of grounds.


Purpose: This article defines 15 common concerns about the viability and policy reliability of social experiments, in order to assess how much these issues need constrain the use of the method in providing policy evidence.


Setting: NA


Intervention: NA


Research Design: The research uses the authors’ experience designing and conducting dozens of social experiments to examine the basis for and soundness of each concern.  It  provides examples from the scholarly literature and evaluations in practice of both the problems posed and responses to each issue.


Data Collection and Analysis: NA


Findings: We conclude that none of the 15 concerns precludes substantially extending the use of randomized experiments as a means of evaluating the impacts of government and foundation social policies and programs.


 

Downloads

Download data is not yet available.

Article Details

How to Cite
Bell, S. H., & Peck, L. R. (2016). On the Feasibility of Extending Social Experiments to Wider Applications. Journal of MultiDisciplinary Evaluation, 12(27), 93–111. https://doi.org/10.56645/jmde.v12i27.452
Section
Research on Evaluation Articles

References

Angrist, J. D. Imbens, G. W., & Rubin, D. B. (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association, 91, 444-455. https://doi.org/10.2307/2291629 DOI: https://doi.org/10.1080/01621459.1996.10476902

Barnow, B. (1987). The impact of CETA programs on earnings: A review of the literature. The Journal of Human Resources, 22, 157-193. https://doi.org/10.2307/145901 DOI: https://doi.org/10.2307/145901

Baron, J. 1 (2012). Rigorous program evaluations on a budget: how low-cost randomized controlled trials are possible in many areas of social policy. Washington, DC: Coalition for Evidence-Based Policy.

Bell, S. H. (2003). Review of alternative methodologies for employment and training research. Washington, DC: U.S. Department of Labor, Employment and Training Administration Occasional Paper 2003-11.

Bell, S. H.., & Peck, L. R. (2016). On the "How" of Social Experiments: Experimental Designs for Getting Inside the Black Box. New Directions for Evaluation, 152. https://doi.org/10.1002/ev.20210 DOI: https://doi.org/10.1002/ev.20210

Bell, S. H., & Peck, L. R. (2013). Using symmetric predication of endogenous subgroups for causal inferences about program effects under robust assumptions: Part two of a method note in three parts. American Journal of Evaluation, 34, 413-426. https://doi.org/10.1177/1098214013489338 DOI: https://doi.org/10.1177/1098214013490820

Bell, S. H., & Stuart, E. A. (2016). On the "Where" of Social Experiments: The Nature and Extent of the Generalizability Problem. New Directions for Evaluation. 152. https://doi.org/10.1002/ev.20212 DOI: https://doi.org/10.1002/ev.20212

Besharov, D. (2009). From the great society continuous improvement government: Shifting "does it work?" to "what would make it better?" Journal of Policy Analysis and Management, 28, 200-222. https://doi.org/10.1002/pam.20423 DOI: https://doi.org/10.1002/pam.20423

Blustein, J. (2005). Toward more public discussion of the ethics of federal social program evaluation. Journal of Policy Analysis and Management, 24, 824-846. https://doi.org/10.1002/pam.20141 DOI: https://doi.org/10.1002/pam.20141

Bloom, H. S. (1984). Accounting for no-shows in experimental evaluation designs. Evaluation Review, 8, 225-246. https://doi.org/10.1177/0193841X8400800205 DOI: https://doi.org/10.1177/0193841X8400800205

Bloom, H. S., Riccio, J. A., Verma, N, & Walter, J. (2005) Promoting work in public housing: the effectiveness of Jobs-Plus, final report. New York, NY: MDRC.

Boruch, R. F. 1997. Randomized experiments for planning and evaluation: A practical guide, Chapter 3. Thousand Oaks, CA: Sage Publications.

Boruch, R. F., Victor, T., & Cecil, J. S. (2000). Resolving ethical and legal problems in randomized experiments. Crime & Delinquency, 46, 330-353. https://doi.org/10.1177/0011128700046003005 DOI: https://doi.org/10.1177/0011128700046003005

Burtless, G., & Orr, L. L. (1986). Are classical experiments needed for manpower policy? The Journal of Human Resources, 21(4): 606-639. https://doi.org/10.2307/145769 DOI: https://doi.org/10.2307/145769

Bracht, G. H., & Glass, G. V. (1968). The External Validity of Experiments. American Educational Research Journal, 5, 437. https://doi.org/10.2307/1161993 DOI: https://doi.org/10.2307/1161993

Coalition for Evidence Based Policy. (2007). When is it possible to conduct a randomized controlled trial in education at reduced cost, using existing data sources? A brief overview.

Connell, J. P., Kubisch, A. C., Schorr, L. B., & Weiss, C. H. (eds.). (1998). New Approaches to Evaluating Community Initiatives: Concepts, Methods, and Contexts. Washington, DC: Aspen Institute.

Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings. Chicago, IL: Rand McNally.

Cook, T. D., & Payne, M. R. (2002). Objecting to the objections to using random assignment in educational research, in Mosteller, F., & Boruch, R. F. (eds.) Evidence Matters: Randomized Trials in Education Research, Chapter 6, pp. 150-178.

Cook, T. D., Shadish, W. R., & Wong, V. C. (2008). Three conditions under which experiments and observational studies produce comparable causal estimates: New findings from within-study comparisons. Journal of Policy Analysis and Management, 27, 724-750. https://doi.org/10.1002/pam.20375 DOI: https://doi.org/10.1002/pam.20375

Cook, T. D., Steiner, P. M., & Pohl, S. (2008). How bias reduction is affected by covariate choice, unreliability, and mode of data analysis: Results from two types of within-study comparisons. Multivariate Behavioral Research, 44, 828-847. https://doi.org/10.1080/00273170903333673 DOI: https://doi.org/10.1080/00273170903333673

Fraker, T., & Maynard, R. (1987). The adequacy of comparison group designs for evaluations of employment-related programs. Journal of Human Resources, 22, 194-227. https://doi.org/10.2307/145902 DOI: https://doi.org/10.2307/145902

Fulbright-Anderson, K., Kubisch, A. C., & Connell, J. P (eds.). (2002). New approaches to evaluating community initiatives, Vol. 2: Theory, measurement, and analysis. Washington, DC: Aspen Institute.

Glazerman, S., Levy, D. M., & Myers, D. (2003). Nonexperimental versus experimental estimates of earnings impacts. Annals of the American Academy of Political and Social Science, 589, 63-93. https://doi.org/10.1177/0002716203254879 DOI: https://doi.org/10.1177/0002716203254879

Gueron, J. M. (2002). The politics of random assignment: Implementing studies affecting policy, in Mosteller, F., & Boruch, R.F. (eds.), Evidence Matters: Randomized Trials in Education Research, Chapter 2, pp. 15-49.

Greenberg, D., & Barnow, B. (2014). Flaws in evaluations of social programs: Illustrations from randomized controlled trials. Evaluation Review, 38, 359-387. https://doi.org/10.1177/0193841X14545782 DOI: https://doi.org/10.1177/0193841X14545782

Greenberg, D. H., & Robins, P. K. (1986). The changing role of social experiments in policy analysis. Journal of Policy Analysis and Management, 5, 340-362. https://doi.org/10.1002/pam.4050050210 DOI: https://doi.org/10.1002/pam.4050050210

Greenberg, D., & Shroder, M. (2004). The digest of social experiments, third edition. Washington, DC: The Urban Institute Press.

Harvill, E. L., Peck, L. R., & Bell, S. H. (2013). On overfitting in analysis of symmetrically predicted endogenous subgroups from randomized experimental samples: Part three of a method note in three parts. American Journal of Evaluation, 34, 545-556. https://doi.org/10.1177/1098214013503201 DOI: https://doi.org/10.1177/1098214013503201

Jacob, R. Zhu, P., & Bloom, H. S. (2009). New empirical evidence for the design of group randomized trials in education. New York, NY: MDRC.

Jimenez-Buedo, M., & Miller, L. M. (2010). Why a trade-off? The relationship between the external and internal validity of experiments. Theoria, 69, 301-321. https://doi.org/10.1387/theoria.779 DOI: https://doi.org/10.1387/theoria.779

LaLonde, R. J. (1986). Evaluating the econometric evaluations of training programs with experimental data. American Economic Review, 76, 604-20.

Laura and John Arnold Foundation. (2015). Low-Cost Randomized Controlled Trials to Drive Effective Social Spending.

Ludwig, J., Kling, J. R., & Mullainathan, S. (2011). Mechanism experiments and policy evaluations. Journal of Economic Perspectives, 25, 17-38. https://doi.org/10.1257/jep.25.3.17 DOI: https://doi.org/10.1257/jep.25.3.17

Meurer, W. J., & Lewis, R. J. (2015). Cluster randomized trials: evaluating treatments applied to groups. JAMA Guide to Statistics and Methods, 313, 2068-2069. https://doi.org/10.1001/jama.2015.5199 DOI: https://doi.org/10.1001/jama.2015.5199

Moulton, S., Peck, L. R. & Dillman, K. (2014). Moving to Opportunity's Impact on Health and Well-being Among High Dosage Participants. Housing Policy Debate, 24, 415-446. https://doi.org/10.1080/10511482.2013.875051 DOI: https://doi.org/10.1080/10511482.2013.875051

Nichols, A. (2012). Evaluation of community-wide interventions. Washington, DC: The Urban Institute.

Olsen, R. B., Bell, S. H., & Nichols, A. (2016). Using Preferred Applicant Random Assignment (PARA) to Reduce Randomization Bias in Randomized Trials of Discretionary Programs. Available at SSRN: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2850763 https://doi.org/10.2139/ssrn.2850763 DOI: https://doi.org/10.2139/ssrn.2850763

Olsen, R. B., & Orr, L. L. (2016). On the "where" of social experiments: Selecting more representative samples to inform policy. New Directions for Evaluation, 152. https://doi.org/10.1002/ev.20207 DOI: https://doi.org/10.1002/ev.20207

Olsen, R. B., Orr, L. L., Bell, S. H., & Stuart, E. A. (2013). External validity in policy evaluations that choose sites purposively. Journal of Policy Analysis and Management, 32, 107-121. https://doi.org/10.1002/pam.21660 DOI: https://doi.org/10.1002/pam.21660

Orr, L. L. (1999). Social Experiments: Evaluating Public Programs with Experimental Methods. Thousand Oaks, CA: Sage Publications.

Orr, L. L., Bloom, H. S., Bell, S. H., Doolittle, F., Lin, W., & Cave, G. (1996). Does Training for the Disadvantaged Work? Evidence from the National JTPA Study. Washington, DC: The Urban Institute Press.

Peck, L. R. (2003). Subgroup analysis in social experiments: Measuring program impacts based on post treatment choice. American Journal of Evaluation, 24, 157-187. https://doi.org/10.1016/S1098-2140(03)00031-6 DOI: https://doi.org/10.1016/S1098-2140(03)00031-6

Peck, L. R. (2005). Using cluster analysis in program evaluation. Evaluation Review, 29, 178-196. https://doi.org/10.1177/0193841X04266335 DOI: https://doi.org/10.1177/0193841X04266335

Peck, L. R. (2007). What are the effects of welfare sanction policies? Or, using propensity scores as a subgroup indicator to learn more from social experiments. American Journal of Evaluation, 28, 256-274. https://doi.org/10.1177/1098214007304129 DOI: https://doi.org/10.1177/1098214007304129

Peck, L. R. (2013). On analysis of symmetrically predicted endogenous subgroups: Part one of a method note in three parts. American Journal of Evaluation, 34, 225-236. https://doi.org/10.1177/1098214013481666 DOI: https://doi.org/10.1177/1098214013481666

Peck, L. R. (2015). Using Impact Evaluation Tools to Unpack the Black Box and Learn What Works. Journal of MultiDisciplinary Evaluation, 11(24): 54-67. https://doi.org/10.56645/jmde.v11i24.415 DOI: https://doi.org/10.56645/jmde.v11i24.415

Peck, L. R. (2016a). On the "how" of social experiments: Analytic strategies for getting inside the black box. In L.R. Peck (Ed.), Social experiments in practice: The what, why, when, where, and how of experimental design & analysis. New Directions for Evaluation, 152, 85-96. https://doi.org/10.1002/ev.20211 DOI: https://doi.org/10.1002/ev.20211

Peck, L. R., ed. (2016b). Social Experiments in Practice: The What, Why, When, Where, and How of Experimental Design & Analysis. New Directions for Evaluation, 152. San Francisco, CA: Jossey-Bass/Wiley.

Peck, L. R., & Bell, S. H. (2014). The role of program quality in determining Head Start's impact on child development (OPRE Report #2014-10). Washington, DC: Office of Planning, Research and Evaluation, Administration for Children and Families, U.S. Department of Health and Human Services.

Peck, L. R., & Scott, Jr., R. J. (2005). Can welfare case management increase employment? Evidence from a pilot program evaluation. Policy Studies Journal, 33, 509-533. https://doi.org/10.1111/j.1541-0072.2005.00131.x DOI: https://doi.org/10.1111/j.1541-0072.2005.00131.x

Pohl, S., Steiner, P. M., Eisermann, J., Soellner, R., & Cook, T. D. (2009). Unbiased causal inference from an observational study: Results of a within-study comparison. Educational Evaluation and Policy Analysis, 31, 463-479. https://doi.org/10.3102/0162373709343964 DOI: https://doi.org/10.3102/0162373709343964

Puma, M. J., Burstein, N. R., Merrell, K., & Silverstein, G. (1990). Evaluation of the Food Stamp and employment and training program final report: Volume 1. Bethesda, MD: Abt Associates.

Puma, M. J., Cook, R., Bell, S. H., Heid, C., Lopez, M., et al. (2005). The Head Start impact study: First year impacts. Washington, DC: U.S. Department for Health and Human Services.

Puma, M. J., Olsen, R. B., Price, C., & Bell, S. H. (2009). What to do when data are missing in group randomized controlled trials. Washington, DC: U.S. Department of Education, Institute of Education Sciences.

Reichardt, C. S. (2011). Evaluating methods for estimating program effects. American Journal of Evaluation, 32, 246-272. https://doi.org/10.1177/1098214011398954 DOI: https://doi.org/10.1177/1098214011398954

Schochet, P. Z., Burghardt, J., & Glazerman, S. (2001). National Job Corps study: The impacts of job corps on participants' employment and related outcomes. Princeton, NJ: Mathematica Policy Research.

Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and Quasi-Experimental Designs for Generalized Causal Inference. New York, NY: Houghton Mifflin.

Shadish, W. R. (2011). Randomized controlled studies and alternative designs in outcome studies challenges and opportunities. Research on Social Work Practice, 21, 636-643. https://doi.org/10.1177/1049731511403324 DOI: https://doi.org/10.1177/1049731511403324

Smith, J. (2002). Evaluating Active Labor Market Policies: Lessons from North America, unpublished manuscript.

St. Clair, T., Cook, T. D., & Hallberg, K. (2014). Examining the internal validity and statistical precision of the comparative interrupted time series design by comparison with a randomized experiment. American Journal of Evaluation. https://doi.org/10.1177/1098214014527337 DOI: https://doi.org/10.1177/1098214014527337

Tipton, E., Hedges, L. V., Vaden-Kiernan, M., Borman, G. D., Sullivan, K., & Caverly, S. (2014) Sample selection in randomized experiments: A new method using propensity score stratified sampling. Journal of Research on Educational Effectiveness, 7, 114-135. https://doi.org/10.1080/19345747.2013.831154 DOI: https://doi.org/10.1080/19345747.2013.831154

Tipton, E. & Peck, L. R. (2016). A Design-based Approach to Improve External Validity in Welfare Policy Evaluation. Evaluation Review. https://doi.org/10.1177/0193841X16655656 DOI: https://doi.org/10.1177/0193841X16655656