Demands on Users for Interpretation of Achievement Test Scores: Implications for the Evaluation Profession
Main Article Content
Abstract
Background: Professional standards for validity of achievement tests have long reflected a consensus that validity is the degree to which evidence and theory support interpretations of test scores entailed by the intended uses of tests. Yet there are convincing lines of evidence that the standards are not adequately followed in practice, that standards alone are not sufficient guides to action, and that reviewers of tests do not call attention to important kinds of validity evidence that might support the demanding process of making sense of test scores or reasoning from test scores.
Purpose: The intent of this article is to make more transparent the demands of achievement test interpretation on users in instructional contexts and to open up a dialogue on implications for the evaluation profession for improvement of practice along lines already set out by evaluation theorists.
Setting: Not applicable.
Intervention: Not applicable.
Research Design: Not applicable.
Data Collection and Analysis: Review of current practice.
Findings: The article makes transparent the lack of attention to validation of achievement tests to support inferences relevant to intended uses in instruction and project evaluation. Elements of a model for the process of reasoning from test scores are articulated. The cognitive demands on the test score user are illustrated in achievement test contexts in writing, science, and mathematics. Implications are drawn for deliberation on issues and for the development of casebooks to guide practice.
Downloads
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright and Permissions
Authors retain full copyright for articles published in JMDE. JMDE publishes under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY - NC 4.0). Users are allowed to copy, distribute, and transmit the work in any medium or format for noncommercial purposes, provided that the original authors and source are credited accurately and appropriately. Only the original authors may distribute the article for commercial or compensatory purposes. To view a copy of this license, visit creativecommons.org
References
AERA, APA, NCME (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
American Evaluation Association (AEA). (2007). Guiding Principles for Evaluators. Retrieved 14 March 2011, http://www.eval.org/publications/aea06.GPBrochure.pdf
Berliner, D. C. (2009). Our impoverished view of educational reform. Teachers College Record, 108 (66),949-996. https://doi.org/10.1177/016146810610800606 DOI: https://doi.org/10.1177/016146810610800606
Brickell, H.M. (1976, 2011). Needed: Instruments as good as our eyes. Journal of MultiDisciplinary Evaluation, 7(15), 171-179. https://doi.org/10.56645/jmde.v7i15.302 DOI: https://doi.org/10.56645/jmde.v7i15.302
Bronfenbrenner, U. (1979). The ecology of human development: Experiments by nature and design. Cambridge, MA: Harvard University Press. https://doi.org/10.4159/9780674028845 DOI: https://doi.org/10.4159/9780674028845
Buros Mental Measurements Institute (undated). Reviewers Guide. Retrieved 31 May 2011. http://www.unl.edu/buros/bimm/html/suggestions.html
Cizek, G. J., Rosenberg, S. L., and Koons, H. H. (2008). Sources of validity evidence for educational and psychological tests. Educational and Psychological Measurement, 68(3), 397-412. https://doi.org/10.1177/0013164407310130 DOI: https://doi.org/10.1177/0013164407310130
Davidson, E. J. (2005). Evaluation methodology basics. Thousand Oaks, CA: Sage.
Della-Piana, G. M. (2008). Enduring issues in educational assessment. Phi Delta Kappan, 89(8), 590-592. https://doi.org/10.1177/003172170808900811 DOI: https://doi.org/10.1177/003172170808900811
Dewey, J. D., Montrosse, B. E., Schroter, D. C., Sulllins, C. D., & Mattoz II, J. R. (2008). Evaluator competencies: What's taught and what's sought. American Journal of Evaluation, 29, 268-287. https://doi.org/10.1177/1098214008321152 DOI: https://doi.org/10.1177/1098214008321152
Fu, A. C., Raizen, S. A., and Shavelson, R. J. (2009). The nation's report card: A vision of large-scale science assessment. Science, 326, 1637-1638. https://doi.org/10.1126/science.1177780 DOI: https://doi.org/10.1126/science.1177780
Funnell, S. C. & Rogers, P. J. (2011). Purposeful program theory: Effective use of theories of change and logic models. San Francisco, CA: Jossey- Bass.
Herszenhorn, D. M. (May 5, 2006). As Test-Taking Grows, Test-Makers Grow Rarer. New York Times. Retrieved 16 March 2011. http://www.nytimes.com/2006/05/05/education/05testers.html?_r=1&scp=1&sq=As+test-taking+grows+&st=nyt/
Hastings, J. T. (1966). Curriculum evaluation: The why of the outcomes. Journal of Educational Measurement, 3(1), 27-32. https://doi.org/10.1111/j.1745-3984.1966.tb00861.x DOI: https://doi.org/10.1111/j.1745-3984.1966.tb00861.x
House, E. R. (1980). Evaluating with validity. Beverly Hills, CA: Sage.
House, E. R. (1995). Putting things together coherently: Logic and justice. In D. Fournier (Ed.), Reasoning in evaluation: Inferential links and leaps. New Directions for Evaluation, 68. San Francisco: Jossey-Bass. https://doi.org/10.1002/ev.1018 DOI: https://doi.org/10.1002/ev.1018
Joint Committee on Standards for Educational Evaluation (2001). The Student Evaluation Standards: How to Improve Evaluations of Students. Thousand Oaks, CA: Corwin Press.
Kirkhart, K.E. (2008). Commentary: Consumers, culture, and validity. In M. Morris (Ed.). Evaluation ethics for best practice: Cases and commentaries (pp. 31-53). New York: Guilford.
Linn, R. L. (2006). Following the standards: Is it time for another revision? Educational Measurement: Issues and Practice, 25(3), 54-56. https://doi.org/10.1111/j.1745-3992.2006.00070.x DOI: https://doi.org/10.1111/j.1745-3992.2006.00070.x
Linn, R. L. (1998). Partitioning responsibility for the evaluation of the consequences of assessment programs. Educational Measurement: Issues and Practice, 17(2), 28-30. https://doi.org/10.1111/j.1745-3992.1998.tb00831.x DOI: https://doi.org/10.1111/j.1745-3992.1998.tb00831.x
Lissitz, R. W. (2009) (ed.). The concept of validity: Revisions, new direction, and applications. Charlotte, NC: Information Age Publishing. DOI: https://doi.org/10.1108/978-1-61735-269-0
Lohman, D.F. & Nichols, P. (2006). Meeting the NRC panel's recommendations. Educational Measurement: Issues and Practice, 25(4), 58-64. https://doi.org/10.1111/j.1745-3992.2006.00079.x DOI: https://doi.org/10.1111/j.1745-3992.2006.00079.x
Madaus, G., Russell, M. & Higgins, J. (2009). The paradoxes of high stakes testing. Charlotte, NC: Information Age Publishing. DOI: https://doi.org/10.1108/978-1-60752-983-5
Nichols, P. D. & Williams, N. (2009). Consequences of test score use as validity evidence: Roles and responsibilities. Educational Measurement: Issues and Practice, 28(1), 3-9. https://doi.org/10.1111/j.1745-3992.2009.01132.x DOI: https://doi.org/10.1111/j.1745-3992.2009.01132.x
Patton, M. Q. (2008). Utilization-focused evaluation (4th ed.). Thousand Oaks, CA: Sage.
Pellegrino, J. W., Chudowsky, N., and Glaser, R. (Eds.) (2001). Knowing what students know: The science and design of student assessment. Washington, DC: National Academy Press.
Schwandt, T. A. (2008a). Educating for intelligent belief in evaluation. American Journal of Evaluation. 29(2), 139-150. https://doi.org/10.1177/1098214008316889 DOI: https://doi.org/10.1177/1098214008316889
Schwandt, T. A. (2008b). The relevance of practical knowledge traditions to evaluation practice. In N. L.Smith & P. R. Brandon (Eds.). Fundamental issues in evaluation (pp. 29-40). NY: Guilford.
Schwandt, T. A. (1998). The interpretive review of educational matters: Is there any other kind? Review of Educational Research. 68(4), 409- 412. https://doi.org/10.3102/00346543068004409 DOI: https://doi.org/10.3102/00346543068004409
Scriven, M. (2009). Meta-evaluation revisited. Journal of MultliDisciplinary Evaluation, 6(11), iii-viii. https://doi.org/10.56645/jmde.v6i11.220 DOI: https://doi.org/10.56645/jmde.v6i11.220
Scriven, M. (2007). Key evaluation checklist. Retrieved 17 March 2011 from http://www.wmich.edu/evalctr/checklists/metaevaluation/
Scriven, M.(1991). Evaluation thesaurus (4th ed.). Newbury Park, CA: Sage.
Stufflebeam, D. (2011). Meta-evaluation checklists. Retrieved 17 March 2011 from http://www.wmich.edu/evalctr/checkl ists/checklistmenu.html
U.S. Department of Education. Race to the top assessment funding. Retrieved 11March, 2011 http://www2.ed.gov/programs/racetothetop-assessment/index.html.
Wise, L. L. (2006). Encouraging and supporting compliance with standards for educational tests. Educational Measurement: Issues and Practice, 25(3), 27-34. https://doi.org/10.1111/j.1745-3992.2006.00069.x DOI: https://doi.org/10.1111/j.1745-3992.2006.00069.x
Yarbrough, D. B., Shulha, L. M., Hopson, R. K., and Caruthers, F. A. (2011). The program evaluation standards: A guide for evaluators and evaluation users (3rd ed.). Thousand Oaks, CA: Sage