Demands on Users for Interpretation of Achievement Test Scores: Implications for the Evaluation Profession

Main Article Content

Gabriel Mario Della-Piana
https://orcid.org/0000-0003-3043-6959
Michael Gardner

Abstract

Background:  Professional standards for validity of achievement tests have long reflected a consensus that validity is the degree to which evidence and theory support interpretations of test scores entailed by the intended uses of tests.  Yet there are convincing lines of evidence that the standards are not adequately followed in practice, that standards alone are not sufficient guides to action, and that reviewers of tests do not call attention to important kinds of validity evidence that might support the demanding process of making sense of test scores or reasoning from test scores.


Purpose: The intent of this article is to make more transparent the demands of achievement test interpretation on users in instructional contexts and to open up a dialogue on implications for the evaluation profession for improvement of practice along lines already set out by evaluation theorists.


Setting:  Not applicable.


Intervention: Not applicable.


Research Design: Not applicable.


Data Collection and Analysis: Review of current practice.


Findings:  The article makes transparent the lack of attention to validation of achievement tests to support inferences relevant to intended uses in instruction and project evaluation. Elements of a model for the process of reasoning from test scores are articulated. The cognitive demands on the test score user are illustrated in achievement test contexts in writing, science, and mathematics. Implications are drawn for deliberation on issues and for the development of casebooks to guide practice.

Downloads

Download data is not yet available.

Article Details

How to Cite
Della-Piana, G. M., & Gardner, M. (2011). Demands on Users for Interpretation of Achievement Test Scores: Implications for the Evaluation Profession. Journal of MultiDisciplinary Evaluation, 7(16), 20–31. https://doi.org/10.56645/jmde.v7i16.318
Section
Research on Evaluation Articles

References

AERA, APA, NCME (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.

American Evaluation Association (AEA). (2007). Guiding Principles for Evaluators. Retrieved 14 March 2011, http://www.eval.org/publications/aea06.GPBrochure.pdf

Berliner, D. C. (2009). Our impoverished view of educational reform. Teachers College Record, 108 (66),949-996. https://doi.org/10.1177/016146810610800606 DOI: https://doi.org/10.1177/016146810610800606

Brickell, H.M. (1976, 2011). Needed: Instruments as good as our eyes. Journal of MultiDisciplinary Evaluation, 7(15), 171-179. https://doi.org/10.56645/jmde.v7i15.302 DOI: https://doi.org/10.56645/jmde.v7i15.302

Bronfenbrenner, U. (1979). The ecology of human development: Experiments by nature and design. Cambridge, MA: Harvard University Press. https://doi.org/10.4159/9780674028845 DOI: https://doi.org/10.4159/9780674028845

Buros Mental Measurements Institute (undated). Reviewers Guide. Retrieved 31 May 2011. http://www.unl.edu/buros/bimm/html/suggestions.html

Cizek, G. J., Rosenberg, S. L., and Koons, H. H. (2008). Sources of validity evidence for educational and psychological tests. Educational and Psychological Measurement, 68(3), 397-412. https://doi.org/10.1177/0013164407310130 DOI: https://doi.org/10.1177/0013164407310130

Davidson, E. J. (2005). Evaluation methodology basics. Thousand Oaks, CA: Sage.

Della-Piana, G. M. (2008). Enduring issues in educational assessment. Phi Delta Kappan, 89(8), 590-592. https://doi.org/10.1177/003172170808900811 DOI: https://doi.org/10.1177/003172170808900811

Dewey, J. D., Montrosse, B. E., Schroter, D. C., Sulllins, C. D., & Mattoz II, J. R. (2008). Evaluator competencies: What's taught and what's sought. American Journal of Evaluation, 29, 268-287. https://doi.org/10.1177/1098214008321152 DOI: https://doi.org/10.1177/1098214008321152

Fu, A. C., Raizen, S. A., and Shavelson, R. J. (2009). The nation's report card: A vision of large-scale science assessment. Science, 326, 1637-1638. https://doi.org/10.1126/science.1177780 DOI: https://doi.org/10.1126/science.1177780

Funnell, S. C. & Rogers, P. J. (2011). Purposeful program theory: Effective use of theories of change and logic models. San Francisco, CA: Jossey- Bass.

Herszenhorn, D. M. (May 5, 2006). As Test-Taking Grows, Test-Makers Grow Rarer. New York Times. Retrieved 16 March 2011. http://www.nytimes.com/2006/05/05/education/05testers.html?_r=1&scp=1&sq=As+test-taking+grows+&st=nyt/

Hastings, J. T. (1966). Curriculum evaluation: The why of the outcomes. Journal of Educational Measurement, 3(1), 27-32. https://doi.org/10.1111/j.1745-3984.1966.tb00861.x DOI: https://doi.org/10.1111/j.1745-3984.1966.tb00861.x

House, E. R. (1980). Evaluating with validity. Beverly Hills, CA: Sage.

House, E. R. (1995). Putting things together coherently: Logic and justice. In D. Fournier (Ed.), Reasoning in evaluation: Inferential links and leaps. New Directions for Evaluation, 68. San Francisco: Jossey-Bass. https://doi.org/10.1002/ev.1018 DOI: https://doi.org/10.1002/ev.1018

Joint Committee on Standards for Educational Evaluation (2001). The Student Evaluation Standards: How to Improve Evaluations of Students. Thousand Oaks, CA: Corwin Press.

Kirkhart, K.E. (2008). Commentary: Consumers, culture, and validity. In M. Morris (Ed.). Evaluation ethics for best practice: Cases and commentaries (pp. 31-53). New York: Guilford.

Linn, R. L. (2006). Following the standards: Is it time for another revision? Educational Measurement: Issues and Practice, 25(3), 54-56. https://doi.org/10.1111/j.1745-3992.2006.00070.x DOI: https://doi.org/10.1111/j.1745-3992.2006.00070.x

Linn, R. L. (1998). Partitioning responsibility for the evaluation of the consequences of assessment programs. Educational Measurement: Issues and Practice, 17(2), 28-30. https://doi.org/10.1111/j.1745-3992.1998.tb00831.x DOI: https://doi.org/10.1111/j.1745-3992.1998.tb00831.x

Lissitz, R. W. (2009) (ed.). The concept of validity: Revisions, new direction, and applications. Charlotte, NC: Information Age Publishing. DOI: https://doi.org/10.1108/978-1-61735-269-0

Lohman, D.F. & Nichols, P. (2006). Meeting the NRC panel's recommendations. Educational Measurement: Issues and Practice, 25(4), 58-64. https://doi.org/10.1111/j.1745-3992.2006.00079.x DOI: https://doi.org/10.1111/j.1745-3992.2006.00079.x

Madaus, G., Russell, M. & Higgins, J. (2009). The paradoxes of high stakes testing. Charlotte, NC: Information Age Publishing. DOI: https://doi.org/10.1108/978-1-60752-983-5

Nichols, P. D. & Williams, N. (2009). Consequences of test score use as validity evidence: Roles and responsibilities. Educational Measurement: Issues and Practice, 28(1), 3-9. https://doi.org/10.1111/j.1745-3992.2009.01132.x DOI: https://doi.org/10.1111/j.1745-3992.2009.01132.x

Patton, M. Q. (2008). Utilization-focused evaluation (4th ed.). Thousand Oaks, CA: Sage.

Pellegrino, J. W., Chudowsky, N., and Glaser, R. (Eds.) (2001). Knowing what students know: The science and design of student assessment. Washington, DC: National Academy Press.

Schwandt, T. A. (2008a). Educating for intelligent belief in evaluation. American Journal of Evaluation. 29(2), 139-150. https://doi.org/10.1177/1098214008316889 DOI: https://doi.org/10.1177/1098214008316889

Schwandt, T. A. (2008b). The relevance of practical knowledge traditions to evaluation practice. In N. L.Smith & P. R. Brandon (Eds.). Fundamental issues in evaluation (pp. 29-40). NY: Guilford.

Schwandt, T. A. (1998). The interpretive review of educational matters: Is there any other kind? Review of Educational Research. 68(4), 409- 412. https://doi.org/10.3102/00346543068004409 DOI: https://doi.org/10.3102/00346543068004409

Scriven, M. (2009). Meta-evaluation revisited. Journal of MultliDisciplinary Evaluation, 6(11), iii-viii. https://doi.org/10.56645/jmde.v6i11.220 DOI: https://doi.org/10.56645/jmde.v6i11.220

Scriven, M. (2007). Key evaluation checklist. Retrieved 17 March 2011 from http://www.wmich.edu/evalctr/checklists/metaevaluation/

Scriven, M.(1991). Evaluation thesaurus (4th ed.). Newbury Park, CA: Sage.

Stufflebeam, D. (2011). Meta-evaluation checklists. Retrieved 17 March 2011 from http://www.wmich.edu/evalctr/checkl ists/checklistmenu.html

U.S. Department of Education. Race to the top assessment funding. Retrieved 11March, 2011 http://www2.ed.gov/programs/racetothetop-assessment/index.html.

Wise, L. L. (2006). Encouraging and supporting compliance with standards for educational tests. Educational Measurement: Issues and Practice, 25(3), 27-34. https://doi.org/10.1111/j.1745-3992.2006.00069.x DOI: https://doi.org/10.1111/j.1745-3992.2006.00069.x

Yarbrough, D. B., Shulha, L. M., Hopson, R. K., and Caruthers, F. A. (2011). The program evaluation standards: A guide for evaluators and evaluation users (3rd ed.). Thousand Oaks, CA: Sage