Propensity Scores: A Practical Introduction Using R
Main Article Content
Abstract
Background: This paper provides an introduction to propensity scores for evaluation practitioners.
Purpose: The purpose of this paper is to provide the reader with a conceptual and practical introduction to propensity scores, matching using propensity scores, and its implementation using statistical R program/software.
Setting: Not applicable
Intervention: Not applicable
Research Design: Not applicable
Data Collection and Analysis: Not applicable
Findings: In this demonstration paper, we describe the context in which propensity scores are used, including the conditions under which the use of propensity scores is recommended, as well as the basic assumptions needed for a correct implementation of the technique. Next, we describe some of the more common techniques used to conduct propensity score matching. We conclude with a description of the recommended steps associated with the implementation of propensity score matching using several packages developed in R, including syntax and brief interpretations of the output associated with every step.
Downloads
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright and Permissions
Authors retain full copyright for articles published in JMDE. JMDE publishes under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY - NC 4.0). Users are allowed to copy, distribute, and transmit the work in any medium or format for noncommercial purposes, provided that the original authors and source are credited accurately and appropriately. Only the original authors may distribute the article for commercial or compensatory purposes. To view a copy of this license, visit creativecommons.org
References
Austin, P. C., Grootendorst, P., & Anderson, G. M. (2007). A comparison of the ability of different propensity score models to balance measured variables between treated and untreated subjects: A Monte Carlo study. Statistics in Medicine, 26, 734-753. https://doi.org/10.1002/sim.2580 DOI: https://doi.org/10.1002/sim.2580
Austin, P.C. (2008). A critical appraisal of propensity score matching in the medical literature between 19996-2003. Statistics in Medicine, 27, 2037-2049. https://doi.org/10.1002/sim.3150 DOI: https://doi.org/10.1002/sim.3150
Austin, P. C. (2011). An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behavioral Research, 46, 399-424. https://doi.org/10.1080/00273171.2011.568786 DOI: https://doi.org/10.1080/00273171.2011.568786
Bai, H., & Clark, M. H. (2012, October). Propensity score matching: Theories and Applications. Workshop presented at the American Evaluation Association, Minneapolis, MN.
Bowers, J., Fredrickson, M., & Hansen, B. (2014). RItools: Randomization Inference Tools. R package version 0.1-12.
Bonell, C. P., Hargreaves, J., Cousens, S., Ross, D., Hayes, R., Petticrew, M., & Kirkwood, B. R. (2009). Journal of Epidemiology Community Health, 1-6. https://doi.org/10.1136/jech.2008.082602 DOI: https://doi.org/10.1136/jech.2008.082602
Caliendo, M., & Kopeinig, S. (2008). Some practical guidance for the implementation of propensity score matching. Journal of Economic Surveys, 22(1), 31-72. https://doi.org/10.1111/j.1467-6419.2007.00527.x DOI: https://doi.org/10.1111/j.1467-6419.2007.00527.x
Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research. United States of America: Houghton Mifflin Company.
Cochran, W. G., & Rubin, D. B. (1973). Controlling bias in observational studies: A review. Indian Journal of Statistics Series, 35(4), 417-446.
Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings. Boston: Houghton Mifflin Company.
D'Agostino, R. B., & D'Agostino, R. B. (2007). Estimating treatment effects using observational data. Journal of American Medical Association, 297(3), 314-316. https://doi.org/10.1001/jama.297.3.314 DOI: https://doi.org/10.1001/jama.297.3.314
Drake, R. E., Goldman, H. H., Leff, H. S., Lehman, A. F., Dixon, L., Mueser, K. T., & Torrey, W. C. (2001). Implementing evidence-based practices in routine mental health service settings. Psychiatric Service, 52(2), 197-182. https://doi.org/10.1176/appi.ps.52.2.179 DOI: https://doi.org/10.1176/appi.ps.52.2.179
Draper, N. R., & Smith, H. (1998). Applied regression analysis. (3rd ed.). United States of America: John Wiley & Sons, Inc. https://doi.org/10.1002/9781118625590 DOI: https://doi.org/10.1002/9781118625590
Gagne, J. J. (2010). High-dimensional propensity scores for comparative effectiveness research. Presentation at the Lewin Summit, June 15, 2010
Gliner, J. A., Morgan, G. A., & Leech, N. L. (2009). Research methods in applied settings (2nd. Ed). Mahwah, NJ: Lawrence Erlbaum.
Gu, X. S., & Rosenbaum, P. R. (1993). Comparison of multivariate matching methods: Structures, distances, and algorithms. Journal of Computational and Graphical Statistics, 2(4), 405-420. https://doi.org/10.1080/10618600.1993.10474623 DOI: https://doi.org/10.1080/10618600.1993.10474623
Guo, X. S., & Fraser, M. W. (2015). Propensity score analysis: Statistical methods and applications (2nd ed.). Thousand Oaks, CA: Sage Publications, Inc.
Guskey. T. (1999). The age of our accountability. Journal of Staff Development, 19(4), 36-44.
Hansen, B. B., Fredrickson, M., Bertsekas, D., & Tseng, P., (2013) Package optmatch. R package version 0.8-1
Hansen, B. B. (2004). Full Matching in an Observational Study of Coaching for the SAT. Journal of the American Statistical Association, 99(467). https://doi.org/10.1198/016214504000000647 DOI: https://doi.org/10.1198/016214504000000647
Hansen, B. B., & Bowers, J. (2008). Covariate balance in simple, stratified and clustered comparative studies. Statistical Science, 23(2), 219-236. https://doi.org/10.1214/08-STS254 DOI: https://doi.org/10.1214/08-STS254
Harrell, F. E. (2015). Hmisc: Harrell Miscellaneous. R package version 3.15-0
Ho, D. E., Imai, K., King, G., & Stuart, E. A. (2011). MatchIt: Nonparametric preprocessing for parametric causal inference. Journal of Statistical Software, 42(8), 1-28. https://doi.org/10.18637/jss.v042.i08 DOI: https://doi.org/10.18637/jss.v042.i08
Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81(396), 945-960. https://doi.org/10.1080/01621459.1986.10478354 DOI: https://doi.org/10.1080/01621459.1986.10478354
Holmes, W. M. (2014). Using propensity scores in quasi-experimental design. United States of America: Sage Publication, Inc. https://doi.org/10.4135/9781452270098 DOI: https://doi.org/10.4135/9781452270098
Imbens, G. W., & Wooldridge, J. M. (2009). Recent developments in the econometrics of program evaluation. Journal of Economic Literature, 47(10), 5-86. https://doi.org/10.1257/jel.47.1.5 DOI: https://doi.org/10.1257/jel.47.1.5
Keele, L.J. (2015). Rbounds: An R Package For Sensitivity Analysis with Matched Data. R. package version 2.1
Lechner, M. (2008). A note on the common support problem in applied evaluation studies. Econometric Evaluation of Public Policies: Methods and Applications, 91/92, 217-235. https://doi.org/10.2307/27917246 DOI: https://doi.org/10.2307/27917246
McCaffrey, D. F., Ridgeway, G., & Morral, A. R. (2004). Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychological Methods, 9(4), 403-425. https://doi.org/10.1037/1082-989X.9.4.403 DOI: https://doi.org/10.1037/1082-989X.9.4.403
Morgan S. L., & Winship, C. (2012). Counterfactuals and causal inference: Methods and principles for social research. New York: Cambridge University Press.
Olmos, A. & Govindasamy, P. (2014). Randomized experiments vs. Propensity scores matching: A Meta-analysis. Paper presented at the American Evaluation Association, Denver, CO.
R Core Team (2014). R: A language and environment for statistical computing. (3.0.3) [Computer software]. Vienna, Austria: Foundation for Statistical Computing.
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41-55. https://doi.org/10.1093/biomet/70.1.41 DOI: https://doi.org/10.1093/biomet/70.1.41
Rosenbaum, P.R., & Rubin, D.B. (1985). Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. The American Statistician, 39, 33-38. https://doi.org/10.1080/00031305.1985.10479383 DOI: https://doi.org/10.1080/00031305.1985.10479383
Rosenbaum, P. R. (2002). Observational studies. NY: Springer. https://doi.org/10.1007/978-1-4757-3692-2 DOI: https://doi.org/10.1007/978-1-4757-3692-2_1
Rosenbaum, P. R. (2005). Observational Study. In Everitt, B. S., & Howell, D. C. (3rd ed.), Encyclopedia of Statistics in Behavioral Science (pp. 1451-1462). Chichester: John Wiley & Sons. https://doi.org/10.1002/0470013192.bsa454 DOI: https://doi.org/10.1002/0470013192.bsa454
Rubin, D. B. (1979). Using multivariate matched sampling and regression adjustment to control bias in observational studies. Journal of the American Statistical Association, 74(366), 318-328. https://doi.org/10.1080/01621459.1979.10482513 DOI: https://doi.org/10.1080/01621459.1979.10482513
Rubin, D. B. (2005). Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association, 100(469), 322-331. https://doi.org/10.1198/016214504000001880 DOI: https://doi.org/10.1198/016214504000001880
Scriven, M. (1991). Evaluation Thesaurus. Thousand Oaks, CA: Sage
Sekhon, J. S. (2011). Multivariate and propensity score matching software with automated balance optimization: The matching package for R. Journal of Statistical Software, 42(7), 1-52. https://doi.org/10.18637/jss.v042.i07 DOI: https://doi.org/10.18637/jss.v042.i07
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental design for generalized causal inference. Boston: Houghton Mifflin Company.
Stuart, E. A., & Rubin, D. B. (2008). Best practices in quasi-experimental design: Matching methods for causal inference. In Osborne, J. Best Practices in Quantitative Methods (pp. 155-177). Thousand Oaks, CA: Sage. https://doi.org/10.4135/9781412995627.d14 DOI: https://doi.org/10.4135/9781412995627.d14
Stuart, E. A. (2010). Matching methods for causal inference: A review and a look forward. Statistical Science, 25(1), 1-21. https://doi.org/10.1214/09-STS313 DOI: https://doi.org/10.1214/09-STS313
Trochim, W. M. K. (1984). Research design for program evaluation. Thousand Oaks, CA: Sage.
Weiss, C. H. (1998). Evaluation: Methods for Studying Programs and Policies. Upper Saddle NJ: Prentice Hall
Zhao, Z. (2004). Using matching to estimate treatment effects: Data requirements, matching metrics, and Monte Carlo evidence. Review of Economics and Statistics, 86(1), 91-107. https://doi.org/10.1162/003465304323023705 DOI: https://doi.org/10.1162/003465304323023705