Quantitative Methods for Estimating the Reliability of Qualitative Data
Main Article Content
Abstract
Background: Measurement is an indispensable aspect of conducting both quantitative and qualitative research and evaluation. With respect to qualitative research, measurement typically occurs during the coding process.
Purpose: This paper presents quantitative methods for determining the reliability of conclusions from qualitative data sources. Although some qualitative researchers disagree with such applications, a link between the qualitative and quantitative fields is successfully established through data collection and coding procedures.
Setting: Not applicable.
Intervention: Not applicable.
Research Design: Case study.
Data Collection and Analysis: Narrative data were collected from a random sample of 528 undergraduate students and 28 professors.
Findings: The calculation of the kappa statistic, weighted kappa statistic, ANOVA Binary Intraclass Correlation, and Kuder-Richardson 20 is illustrated through a fictitious example. Formulae are presented so that the researcher can calculate these estimators without the use of sophisticated statistical software.
Downloads
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright and Permissions
Authors retain full copyright for articles published in JMDE. JMDE publishes under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY - NC 4.0). Users are allowed to copy, distribute, and transmit the work in any medium or format for noncommercial purposes, provided that the original authors and source are credited accurately and appropriately. Only the original authors may distribute the article for commercial or compensatory purposes. To view a copy of this license, visit creativecommons.org
References
Armstrong, D., Gosling, A., Weinman, J., & Marteau, T. (1997). The place of inter-rater reliability in qualitative research: An empirical study. Sociology, 31(3), 597-606. https://doi.org/10.1177/0038038597031003015 DOI: https://doi.org/10.1177/0038038597031003015
Bartoszynski, R., & Niewiadomska-Bugaj, M. (1996). Probability and statistical inference. New York, NY: John Wiley.
Benaquisto, L. (2008). Axial coding. In L. M. Given (Ed.), The Sage encyclopedia of qualitative research methods (Vol. 1, pp. 51-52). Thousand Oaks, CA: SAGE.
Benaquisto, L. (2008). Coding frame. In L. M. Given (Ed.), The Sage encyclopedia of qualitative research methods (pp. 88-89). Thousand Oaks, CA: Sage.
Benaquisto, L. (2008). Open coding. In L. M. Given (Ed.), The Sage encyclopedia of qualitative research methods (Vol. 2, pp. 581-582). Thousand Oaks, CA: Sage.
Benaquisto, L. (2008). Selective coding. In L. M. Given (Ed.), The Sage encyclopedia of qualitative research methods. Thousand Oaks, CA: Sage.
Bonett, D. G. (2002). Sample size requirements for testing and estimating coefficient alpha. Journal of Educational and Behavioral Statistics, 27, 335-340. https://doi.org/10.3102/10769986027004335 DOI: https://doi.org/10.3102/10769986027004335
Bonett, D. G. & Wright, T. A. (2000). Sample size requirements for estimating Pearson, Kendall, and Spearman correlations. Psychometrika, 65, 23-28. https://doi.org/10.1007/BF02294183 DOI: https://doi.org/10.1007/BF02294183
Brodsky, A. E. (2008). Researcher as instrument. In L. M. Given (Ed.), The Sage encyclopedia of qualitative research methods (Vol. 2, p. 766). Thousand Oaks, CA: Sage.
Burla, L., Knierim, B., Barth, J., Liewald, K., Duetz, M., & Abel, T. (2008). From text to codings: Intercoder reliability assessment in qualitative content analysis. Nursing Research, 57, 113- 117. https://doi.org/10.1097/01.NNR.0000313482.33917.7d DOI: https://doi.org/10.1097/01.NNR.0000313482.33917.7d
Cascio, W. F. (1991). Applied psychology in personnel management (4th ed.). Englewood Cliffs, NJ: Prentice-Hall International.
Cheek, J. (2008). Funding. In L. M. Given (Ed.), The Sage encyclopedia of qualitative research methods (Vol. 1, pp. 360-363). Thousand Oaks, CA: Sage.
Cohen, J. (1960). A coefficient of agreement from nominal scales. Educational and Psychological Measurement, 20, 37-46. https://doi.org/10.1177/001316446002000104 DOI: https://doi.org/10.1177/001316446002000104
Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213-220. https://doi.org/10.1037/h0026256 DOI: https://doi.org/10.1037/h0026256
Coryn, C. L. S. (2007). The holy trinity of methodological rigor: A skeptical view. Journal of MultiDisciplinary Evaluation, 4(7), 26-31. https://doi.org/10.56645/jmde.v4i7.7 DOI: https://doi.org/10.56645/jmde.v4i7.7
Creswell, J. W. (2007). Qualitative inquiry & research design: Choosing among five approaches (2nd ed.). Thousand Oaks, CA: Sage.
Crocker, L., & Algina, J. (1986). Introduction to classical & modern test theory. Fort Worth, TX: Holt, Rinehart, & Winston.
Davis, C. S. (2008). Hypothesis. In L. M. Given (Ed.), The Sage encyclopedia of qualitative research methods (Vol. 1, pp. 408-409). Thousand Oaks, CA: SageAGE.
Dillon, W. R., & Mulani, N. (1984). A probabilistic latent class model for assessing inter-judge reliability. Multivariate Behavioral Research, 19, 438-458. https://doi.org/10.1207/s15327906mbr1904_5 DOI: https://doi.org/10.1207/s15327906mbr1904_5
Efron, B. & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York, NY: Chapman & Hall/CRC. https://doi.org/10.1007/978-1-4899-4541-9 DOI: https://doi.org/10.1007/978-1-4899-4541-9
Elston, R. C., Hill, W. G., & Smith, C. (1977). Query: Estimating "Heritability" of a dichotomous trait. Biometrics, 33, 231-236. https://doi.org/10.2307/2529318 DOI: https://doi.org/10.2307/2529318
Everitt, B. S. (1968). Moments of the statistics kappa and weighted kappa. The British Journal of Mathematical and Statistical Psychology, 21, 97-103. https://doi.org/10.1111/j.2044-8317.1968.tb00400.x DOI: https://doi.org/10.1111/j.2044-8317.1968.tb00400.x
Feldt, L. S. & Ankenmann, R. D. (1998). Appropriate sample size for comparison alpha reliabilities. Applied Psychological Measurement, 22, 170- 178. https://doi.org/10.1177/01466216980222006 DOI: https://doi.org/10.1177/01466216980222006
Firmin, M. W. (2008). Replication. In L. M. Given (Ed.), The Sage encyclopedia of qualitative research methods (Vol. 2, pp. 754-755). Thousand Oaks, CA: Sage.
Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76, 378-382. https://doi.org/10.1037/h0031619 DOI: https://doi.org/10.1037/h0031619
Fleiss, J. L., Cohen, J., & Everitt, B. S. (1969).Largesamplestandarderrors of kappa and weighted kappa. Psychological Bulletin, 72, 323-327. https://doi.org/10.1037/h0028106 DOI: https://doi.org/10.1037/h0028106
Fleiss, J. L., & Cuzick, J. (1979). The reliability of dichotomous judgments: Unequal numbers of judges per subject. Applied Psychological Measurement, 3, 537-542. https://doi.org/10.1177/014662167900300410 DOI: https://doi.org/10.1177/014662167900300410
George, D. & Mallery, P. (2003). SPSS for Windows step by step: A simple guide and reference. 11.0 update (4th ed.). Boston, MA: Allyn & Bacon.
Given, L. M., & Saumure, K. (2008). Trustworthiness. In L. M. Given (Ed.), The Sage encyclopedia of qualitative research methods (Vol. 2, pp. 895- 896). Thousand Oaks, CA: Sage. https://doi.org/10.4135/9781412963909 DOI: https://doi.org/10.4135/9781412963909
Golafshani, N. (2003). Understanding reliability and validity in qualitative research. The Qualitative Report, 8(4), 597-607.
Greene, J. C. (2007). Mixed methods in social inquiry. Thousand Oaks, CA: Sage.
Gulliksen, H. (1950). Theory of mental tests. New York: Wiley. https://doi.org/10.1037/13240-000 DOI: https://doi.org/10.1037/13240-000
Hettmansperger, T. P. & McKean, J. (1998). Kendalls library of statistics 5, robust nonparametric statistical models. London: Arnold.
Hogg, R. V. & Craig, A. T. (1995). Introduction to mathematical statistics (5th ed.). Upper Saddle River, NJ: Prentice Hall.
Hogg, R. V., McKean, J. W., & Craig, A. T. (2004). Introduction to mathematical statistics (6th ed.). Upper Saddle Rover, NJ: Prentice Hall.
Hopkins, K. D. (1998). Educational and psychological measurement and evaluation (8th ed.). Boston, MA: Allyn and Bacon.
Jensen, D. (2008). Confirmability. In L. M. Given (Ed.), The Sage encyclopedia of qualitative research methods (Vol. 1, p. 112). Thousand Oaks, CA: Sage.
Jensen, D. (2008). Credibility. In L. M. Given(Ed.),TheSageencyclopediaof qualitative research methods (Vol. 1, pp. 138-139). Thousand Oaks, CA: Sage.
Jensen, D. (2008). Dependability. In L. M. Given (Ed.), The Sage encyclopedia of qualitative research methods (Vol. 1, pp. 208-209). Thousand Oaks, CA: Sage.
Karlin, S., Cameron, P. E., & Williams, P. (1981). Sibling and parent-offspring correlation with variable family age. Proceedings of the National Academy of Science, U.S.A. 78, 2664-2668. https://doi.org/10.1073/pnas.78.5.2664 DOI: https://doi.org/10.1073/pnas.78.5.2664
Kim, K. & Timm, N. (2007). Univariate and multivariate general linear models: Theory and applications with SAS (2nd ed.). New York, NY: Chapman & Hall/CRC. https://doi.org/10.1201/b15891 DOI: https://doi.org/10.1201/b15891
Kleinman, J. C. (1973). Proportions with extraneous variance: Single and independent samples. Journal of the American Statistical Association, 68, 46-54. https://doi.org/10.1080/01621459.1973.10481332 DOI: https://doi.org/10.1080/01621459.1973.10481332
Krippendorf, K. (2004). Content analysis: An introduction to its methodology (2nd ed.). Thousand Oaks, CA: Sage.
Kuder,G.F.,&Richardson,M.W.(1937). The theory of estimation of test reliability. Psychometrika, 2, 151-160. https://doi.org/10.1007/BF02288391 DOI: https://doi.org/10.1007/BF02288391
Landis, J. R., & Koch, G. C. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159-174. https://doi.org/10.2307/2529310 DOI: https://doi.org/10.2307/2529310
Lincoln, Y. S., & Guba, E. G. (1985). Naturalistic inquiry. Newbury Park, CA: Sage. https://doi.org/10.1016/0147-1767(85)90062-8 DOI: https://doi.org/10.1016/0147-1767(85)90062-8
Lipsitz, S. R., Laird, N. M., & Brennan, T. A. (1994). Simple moment estimates of the κ-coefficient and its variance. Applied Statistics, 43, 309-323. https://doi.org/10.2307/2986022 DOI: https://doi.org/10.2307/2986022
Lord, F. M. & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Maclure, M. & Willett, W. C. (1987). Misinterpretation and misuse of the kappa statistic. Journal of Epidemiology, 126, 161-169. https://doi.org/10.1093/aje/126.2.161 DOI: https://doi.org/10.1093/aje/126.2.161
Magee, B. (1985). Popper. London: Routledge Falmer.
Mak, T. K. (1988). Analyzing intraclass correlation for dichotomous variables. Applied Statistics, 37, 344-252. https://doi.org/10.2307/2347309 DOI: https://doi.org/10.2307/2347309
Marshall, C., & Rossman, G. B. (2006). Designing qualitative research (4th ed.). Thousand Oaks, CA: Sage.
Maxwell, A. E. (1977). Coefficients of agreement between observers and their interpretation. British Journal of Psychiatry, 130, 79-83. https://doi.org/10.1192/bjp.130.1.79 DOI: https://doi.org/10.1192/bjp.130.1.79
Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis: An expanded sourcebook (2nd ed.). Thousand Oaks, CA: Sage.
Miller, P. (2008). Reliability. In L. M. Given (Ed.), The Sage encyclopedia of qualitative research methods (Vol. 2, pp. 753-754). Thousand Oaks, CA: Sage.
Mitchell, S. K. (1979). Interobserver agreement, reliability, and generalizability of data collected in observational studies. Psychological Bulletin, 86, 376-390. https://doi.org/10.1037/0033-2909.86.2.376 DOI: https://doi.org/10.1037//0033-2909.86.2.376
Morse, J. M., Barrett, M., Mayan, M., Olson, K., & Spiers, J. (2002). Verification strategies for establishing reliability and validity in qualitative research. International Journal of Qualitative Methods, 1(2), 13-22. https://doi.org/10.1177/160940690200100202 DOI: https://doi.org/10.1177/160940690200100202
Nelder, J. A., & Pregibon, D. (1987). An extended quasi-likelihood function. Biometrika, 74, 221-232. https://doi.org/10.1093/biomet/74.2.221 DOI: https://doi.org/10.1093/biomet/74.2.221
Nunnally, J. C. (1978). Psychometric theory(2nded.).NewYork:McGraw- Hill.
Paley, J. (2008). Positivism. In L. M. Given (Ed.), The Sage encyclopedia of qualitative research methods (Vol. 2, pp. 646-650). Thousand Oaks, CA: Sage.
Ridout, M. S., Demétrio, C. G. B., & Firth, D. (1999). Estimating intraclass correlations for binary data. Biometrics, 55, 137-148. https://doi.org/10.1111/j.0006-341X.1999.00137.x DOI: https://doi.org/10.1111/j.0006-341X.1999.00137.x
Ross, S. (1997). A first course in probability (5th ed.). Upper Saddle River, NJ: Prentice Hall.
Rozzeboom, W. W. (1966). Foundations of the theory of prediction. Homewood, IL: Dorsey.
Saumure, K., & Given, L. M. (2008). Rigor in qualitative research. In L. M. Given (Ed.), The Sage encyclopedia of qualitative research methods (Vol. 2, pp. 795-796). Thousand Oaks, CA: Sage.
Schmitt, N. (1996). Uses and abuses of coefficient alpha. Psychological Assessment, 8, 81-84. https://doi.org/10.1037/1040-3590.8.4.350 DOI: https://doi.org/10.1037/1040-3590.8.4.350
Seale, C. (1999). Quality in qualitative research. Qualitative Inquiry, 5(4), 465-478. https://doi.org/10.1177/107780049900500402 DOI: https://doi.org/10.1177/107780049900500402
Smith, D. M. (1983). Algorithm AS189: Maximum likelihood estimation of the parameters of the beta binomial distribution. Applied Statistics, 32, 196-204. https://doi.org/10.2307/2347299 DOI: https://doi.org/10.2307/2347299
Soeken, K. L., & Prescott, P. A. (1986). Issues in the use of kappa to estimate reliability. Medical Care, 24, 733-741. https://doi.org/10.1097/00005650-198608000-00008 DOI: https://doi.org/10.1097/00005650-198608000-00008
Stapleton, J. H. (1995). Linear statistical models. New York, NY: John Wiley & Sons, Inc. https://doi.org/10.1002/9780470316924 DOI: https://doi.org/10.1002/9780470316924
Stenbacka, C. (2001). Qualitative research requires quality concepts of its own. Management Decision, 39(7), 551-555. https://doi.org/10.1108/EUM0000000005801 DOI: https://doi.org/10.1108/EUM0000000005801
Tamura, R. N., & Young, S. S. (1987). A stabilized moment estimator for the beta-binomial distribution. Biometrics, 43, 813-824. https://doi.org/10.2307/2531535 DOI: https://doi.org/10.2307/2531535
van den Hoonaard, W. C. (2008). Inter- and intracoder reliability. In L. M. Given (Ed.), The Sage encyclopedia of qualitative research methods (Vol. 1, pp. 445-446). Thousand Oaks, CA: Sage.
Yamamoto, E., & Yanagimoto, T. (1992). Moment estimators for the binomial distribution. Journal of Applied Statistics, 19, 273-283. https://doi.org/10.1080/02664769200000023 DOI: https://doi.org/10.1080/02664769200000023