Using Test Standard-Setting Methods in Educational Program Evaluation: Addressing the Issue of How Good is Good Enough
Main Article Content
Abstract
School districts in the United States and elsewhere commonly use standard setting to assign value to student test and assessment scores. That is, they set standards to show “how good is good enough.” This paper presents a summary of the empirical findings on the most widely-studied test standard-setting method and describes what the conclusions of the summary suggest about the use of test standard-setting in educational program evaluations.
Downloads
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright and Permissions
Authors retain full copyright for articles published in JMDE. JMDE publishes under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY - NC 4.0). Users are allowed to copy, distribute, and transmit the work in any medium or format for noncommercial purposes, provided that the original authors and source are credited accurately and appropriately. Only the original authors may distribute the article for commercial or compensatory purposes. To view a copy of this license, visit creativecommons.org
References
Angoff, W. H. (1971). Scales, norms and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 508-600). Washington, DC: American Council on Education.
Brandon, P. R. (2002). Two versions of the contrasting-groups standard-setting method: A review. Measurement and Evaluation in Counseling and Development, 35, 167-181. https://doi.org/10.1080/07481756.2002.12069061 DOI: https://doi.org/10.1080/07481756.2002.12069061
Brandon, P. R. (2004). Conclusions about frequently studied modified Angoff standard-setting topics. Applied Measurement in Education, 17, 59-88. https://doi.org/10.1207/s15324818ame1701_4 DOI: https://doi.org/10.1207/s15324818ame1701_4
Brandon, P. R., and Higa, T. F. (1998, April). Setting standards to use when judging program performance in stakeholder-assisted evaluations of small educational programs. Paper presented at the meeting of the American Educational Research Association, San Diego, CA.
Burton, N. W. (1978). Societal standards. Journal of Educational Measurement, 15, 263-271. https://doi.org/10.1111/j.1745-3984.1978.tb00073.x DOI: https://doi.org/10.1111/j.1745-3984.1978.tb00073.x
Camilli, G., Cizek, G. J., & Lugg, C. A. (2001). Psychometric theory and the validation of performance standards: History and future perspectives. In G. C. Cizek (Ed.), Setting performance standards: Concepts, methods, and perspectives (pp. 445-475). Mahwah, NJ: Lawrence Erlbaum.
Cizek, G. C. (2001). (Ed.). Setting performance standards: Concepts, methods, and perspectives. Mahwah, NJ: Lawrence Erlbaum.
Cook, T. D.; Leviton, L. C., & Shadish Jr., W. R. (1985). Program evaluation. In G. Lindzey and E. Aronson, Handbook of social psychology (3rd ed.). New York: Random House.
Fink, A. Kosecoff, J., & Brook, R. H. (1986). Setting standards of performance for program evaluations: The case of the teaching hospital general medicine group practice program. Evaluation and Program Planning, 9, 143-151. https://doi.org/10.1016/0149-7189(86)90034-0 DOI: https://doi.org/10.1016/0149-7189(86)90034-0
Hanser, L. M. (1998). Lessons for the National Assessment of Educational Progress from military standard setting. Applied Measurement in Education, 11, 81-95. Henry, G. T., McTaggart, M. J., & McMillan, J. H. (1992). Establishing benchmarks for outcome indicators: A statistical approach to developing performance standards. Evaluation Review, 16, 131-150. https://doi.org/10.1177/0193841X9201600202 DOI: https://doi.org/10.1177/0193841X9201600202
Hurtz, G. M., & Auerbach, M. A. (2003). A meta-analysis of the effects of modifications to the Angoff method on cutoff scores and judgment consensus. Educational and Psychological Measurement, 63, 584-601. https://doi.org/10.1177/0013164403251284 DOI: https://doi.org/10.1177/0013164403251284
Jaeger, R. M. (1989). Certification of student competence. In R. L. Linn (Ed.),Educational measurement (3rd ed., pp. 485-514). New York: American Council on Education/Macmillan.
Joint Committee on Standards for Educational Evaluation. (1994). The program evaluation standards (2nd ed.). Newbury Park, CA: Sage.
Kane, M. T. (2001). So much remains the same: Conception and status of validation in setting standards. In G. C. Cizek (Ed.), Setting performance standards: Concepts, methods, and perspectives (pp. 53-88). Mahwah, NJ: Lawrence Erlbaum.
Lipsey, M. W. (1990). Design sensitivity: Statistical power for experimental research. Newbury Park, CA: Sage.
Livingston, S. A. & Zieky, M. J. (1989). A comparative study of standard-setting methods. Applied Measurement in Education, 2, 121-141. https://doi.org/10.1207/s15324818ame0202_3 DOI: https://doi.org/10.1207/s15324818ame0202_3
Lynch, K. B. (1987). The size of education effects: An analysis of programs reviewed by the Joint Dissemination Review panel. Educational Evaluation and Policy Analysis, 9, 55-61. https://doi.org/10.3102/01623737009001055 DOI: https://doi.org/10.3102/01623737009001055
Mills, C. N., Melican, G. J., & Ahluwalia, N. T. (1991). Defining minimal competence. Educational Measurement: Issues and Practice, 10(2):7-10. https://doi.org/10.1111/j.1745-3992.1991.tb00186.x DOI: https://doi.org/10.1111/j.1745-3992.1991.tb00186.x
Patton, M. Q. (1997) Utilization-focused evaluation: The new century text. 3rd ed. Newbury Park, CA: Sage.
Rossi, P. H., & Freeman, H. E. (1993). Evaluation: A systematic approach (5th ed.). Newbury Park, CA: Sage.
Shadish, W. R., Cook, T. D., & Leviton, L. C. (1991) Foundations of program evaluation: Theories of practice. Newbury Park, CA: Sage.
Shepard, L. A. (1995). Implications for standard setting of the National Academy of Education Evaluation of the National Assessment of Educational Progress Achievement Levels. In Joint conference on standard setting for large-scale assessments. Vol.2. Proceedings (pp. 143-160). Washington, DC: U.S. Government Printing Office.
Smith, N. L. (1981). Constructing reasonable expectations in evaluation. Evaluation News, 2, 265-267. https://doi.org/10.1177/109821408100200322 DOI: https://doi.org/10.1177/109821408100200322
Smith, N. L. (1999). A framework for characterizing the practice of evaluation, with application to empowerment evaluation. Canadian Journal of Program Evaluation, Special Issue, 39-68. https://doi.org/10.3138/cjpe.0014.003 DOI: https://doi.org/10.3138/cjpe.0014.003
Wholey, J. S. (1979). Evaluation: Promise and performance. Washington, DC: Urban Institute.
Worthen, B. R., Sanders, J. R., & Fitzpatrick, J. L. (1997). Program evaluation: Alternative approaches and practical guideline (2nd ed.). New York: Longman.