Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2014

Open Access 01-12-2014 | Technical advance

Observer agreement paradoxes in 2x2 tables: comparison of agreement measures

Authors: Viswanathan Shankar, Shrikant I Bangdiwala

Published in: BMC Medical Research Methodology | Issue 1/2014

Login to get access

Abstract

Background

Various measures of observer agreement have been proposed for 2x2 tables. We examine the behavior of alternative measures of observer agreement for 2x2 tables.

Methods

The alternative measures of observer agreement and the corresponding agreement chart were calculated under various scenarios of marginal distributions (symmetrical or not, balanced or not) and of degree of diagonal agreement, and their behaviors are compared. Specifically, two specific paradoxes previously identified for kappa were examined: (1) low kappa values despite high observed agreement under highly symmetrically imbalanced marginals, and (2) higher kappa values for asymmetrical imbalanced marginal distributions.

Results

Kappa and alpha behave similarly and are affected by the marginal distributions more so than the B-statistic, AC1-index and delta measures. Delta and kappa provide values that are similar when the marginal totals are asymmetrically imbalanced or symmetrical but not excessively imbalanced. The AC1-index and B-statistics provide closer results when the marginal distributions are symmetrically imbalanced and the observed agreement is greater than 50%. Also, the B-statistic and the AC1-index provide values closer to the observed agreement when the subjects are classified mostly in one of the diagonal cells. Finally, the B-statistic is seen to be consistent and more stable than kappa under both types of paradoxes studied.

Conclusions

The B-statistic behaved better under all scenarios studied as well as with varying prevalences, sensitivities and specificities than the other measures, we recommend using B-statistic along with its corresponding agreement chart as an alternative to kappa when assessing agreement in 2x2 tables.
Appendix
Available only for authorised users
Literature
1.
go back to reference Banerjee M, Capozzoli M, McSweeney L, Sinha D: Beyond kappa: A review of interrater agreement measures. Can J Stat. 1999, 27 (1): 3-23. 10.2307/3315487.CrossRef Banerjee M, Capozzoli M, McSweeney L, Sinha D: Beyond kappa: A review of interrater agreement measures. Can J Stat. 1999, 27 (1): 3-23. 10.2307/3315487.CrossRef
2.
go back to reference Kraemer HC, Periyakoil VS, Noda A: Kappa coefficients in medical research. Stat Med. 2002, 21 (14): 2109-2129. 10.1002/sim.1180.CrossRef Kraemer HC, Periyakoil VS, Noda A: Kappa coefficients in medical research. Stat Med. 2002, 21 (14): 2109-2129. 10.1002/sim.1180.CrossRef
3.
go back to reference Landis JR, King TS, Choi JW, Chinchilli VM, Koch GG: Measures of agreement and concordance with clinical research applications. Stat Biopharma Res. 2011, 3 (2): doi:10.1198/sbr.2011.10019 Landis JR, King TS, Choi JW, Chinchilli VM, Koch GG: Measures of agreement and concordance with clinical research applications. Stat Biopharma Res. 2011, 3 (2): doi:10.1198/sbr.2011.10019
4.
go back to reference Cohen J: A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960, 20: 37-46. 10.1177/001316446002000104.CrossRef Cohen J: A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960, 20: 37-46. 10.1177/001316446002000104.CrossRef
5.
go back to reference Brennan RL, Prediger DJ: Coefficient kappa: Some uses, misuses, and alternatives. Educ Psychol Meas. 1981, 41 (3): 687-699. 10.1177/001316448104100307.CrossRef Brennan RL, Prediger DJ: Coefficient kappa: Some uses, misuses, and alternatives. Educ Psychol Meas. 1981, 41 (3): 687-699. 10.1177/001316448104100307.CrossRef
6.
go back to reference Byrt T, Bishop J, Carlin JB: Bias, prevalence and kappa. J Clin Epidemiol. 1993, 46 (5): 423-429. 10.1016/0895-4356(93)90018-V.CrossRefPubMed Byrt T, Bishop J, Carlin JB: Bias, prevalence and kappa. J Clin Epidemiol. 1993, 46 (5): 423-429. 10.1016/0895-4356(93)90018-V.CrossRefPubMed
7.
go back to reference Feinstein AR, Cicchetti DV: High agreement but low kappa: I. The problems of two paradoxes. J Clin Epidemiol. 1990, 43 (6): 543-549. 10.1016/0895-4356(90)90158-L.CrossRefPubMed Feinstein AR, Cicchetti DV: High agreement but low kappa: I. The problems of two paradoxes. J Clin Epidemiol. 1990, 43 (6): 543-549. 10.1016/0895-4356(90)90158-L.CrossRefPubMed
8.
go back to reference Cicchetti DV, Feinstein AR: High agreement but low kappa: II. Resolving the paradoxes. J Clin Epidemiol. 1990, 43 (6): 551-558. 10.1016/0895-4356(90)90159-M.CrossRefPubMed Cicchetti DV, Feinstein AR: High agreement but low kappa: II. Resolving the paradoxes. J Clin Epidemiol. 1990, 43 (6): 551-558. 10.1016/0895-4356(90)90159-M.CrossRefPubMed
9.
go back to reference Kraemer HC: Ramifications of a population model forκ as a coefficient of reliability. Psychometrika. 1979, 44 (4): 461-472. 10.1007/BF02296208.CrossRef Kraemer HC: Ramifications of a population model forκ as a coefficient of reliability. Psychometrika. 1979, 44 (4): 461-472. 10.1007/BF02296208.CrossRef
10.
go back to reference Nelson JC, Pepe MS: Statistical description of interrater variability in ordinal ratings. Stat Methods Med Res. 2000, 9 (5): 475-496. 10.1191/096228000701555262.CrossRefPubMed Nelson JC, Pepe MS: Statistical description of interrater variability in ordinal ratings. Stat Methods Med Res. 2000, 9 (5): 475-496. 10.1191/096228000701555262.CrossRefPubMed
11.
go back to reference Thompson WD, Walter SD: A reappraisal of the kappa coefficient. J Clin Epidemiol. 1988, 41 (10): 949-958. 10.1016/0895-4356(88)90031-5.CrossRefPubMed Thompson WD, Walter SD: A reappraisal of the kappa coefficient. J Clin Epidemiol. 1988, 41 (10): 949-958. 10.1016/0895-4356(88)90031-5.CrossRefPubMed
12.
go back to reference Lantz CA, Nebenzahl E: Behavior and interpretation of the kappa statistic: resolution of the two paradoxes. J Clin Epidemiol. 1996, 49 (4): 431-434. 10.1016/0895-4356(95)00571-4.CrossRefPubMed Lantz CA, Nebenzahl E: Behavior and interpretation of the kappa statistic: resolution of the two paradoxes. J Clin Epidemiol. 1996, 49 (4): 431-434. 10.1016/0895-4356(95)00571-4.CrossRefPubMed
13.
go back to reference Bangdiwala SI: The Agreement Chart. 1988, Chapel Hill: The University of North Carolina Bangdiwala SI: The Agreement Chart. 1988, Chapel Hill: The University of North Carolina
15.
go back to reference Aickin M: Maximum likelihood estimation of agreement in the constant predictive probability model, and its relation to Cohen’s kappa. Biometrics. 1990, 46: 293-302. 10.2307/2531434.CrossRefPubMed Aickin M: Maximum likelihood estimation of agreement in the constant predictive probability model, and its relation to Cohen’s kappa. Biometrics. 1990, 46: 293-302. 10.2307/2531434.CrossRefPubMed
16.
go back to reference Andres AM, Femia-Marzo P: Chance-corrected measures of reliability and validity in 2× 2 tables. Commun Stat Theory Met. 2008, 37 (5): 760-772. 10.1080/03610920701669884.CrossRef Andres AM, Femia-Marzo P: Chance-corrected measures of reliability and validity in 2× 2 tables. Commun Stat Theory Met. 2008, 37 (5): 760-772. 10.1080/03610920701669884.CrossRef
17.
go back to reference Andrés AM, Marzo PF: Delta: a new measure of agreement between two raters. Brit J Math Stat Psychol. 2004, 57 (1): 1-19. 10.1348/000711004849268.CrossRef Andrés AM, Marzo PF: Delta: a new measure of agreement between two raters. Brit J Math Stat Psychol. 2004, 57 (1): 1-19. 10.1348/000711004849268.CrossRef
18.
go back to reference Gwet KL: Computing inter‒rater reliability and its variance in the presence of high agreement. Brit J Math Stat Psychol. 2008, 61 (1): 29-48. 10.1348/000711006X126600.CrossRef Gwet KL: Computing inter‒rater reliability and its variance in the presence of high agreement. Brit J Math Stat Psychol. 2008, 61 (1): 29-48. 10.1348/000711006X126600.CrossRef
19.
go back to reference Bangdiwala SI: A Graphical Test for Observer Agreement. 45th International Statistical Institute Meeting, 1985. 1985, Amsterdam, 307-308. Bangdiwala SI: A Graphical Test for Observer Agreement. 45th International Statistical Institute Meeting, 1985. 1985, Amsterdam, 307-308.
20.
go back to reference Meyer D, Zeileis A, Hornik K, Meyer MD, KernSmooth S: The vcd package. Retrieved October. 2007, 3: 2007- Meyer D, Zeileis A, Hornik K, Meyer MD, KernSmooth S: The vcd package. Retrieved October. 2007, 3: 2007-
21.
go back to reference Friendly M: Visualizing Categorical Data. 2000, Cary, NC: SAS Institute Friendly M: Visualizing Categorical Data. 2000, Cary, NC: SAS Institute
22.
go back to reference Guggenmoos‒Holzmann I: How reliable are change‒corrected measures of agreement?. Stat Med. 1993, 12 (23): 2191-2205. 10.1002/sim.4780122305.CrossRef Guggenmoos‒Holzmann I: How reliable are change‒corrected measures of agreement?. Stat Med. 1993, 12 (23): 2191-2205. 10.1002/sim.4780122305.CrossRef
23.
go back to reference Guggenmoos-Holzmann I: The meaning of kappa: probabilistic concepts of reliability and validity revisited. J Clin Epidemiol. 1996, 49 (7): 775-782. 10.1016/0895-4356(96)00011-X.CrossRefPubMed Guggenmoos-Holzmann I: The meaning of kappa: probabilistic concepts of reliability and validity revisited. J Clin Epidemiol. 1996, 49 (7): 775-782. 10.1016/0895-4356(96)00011-X.CrossRefPubMed
Metadata
Title
Observer agreement paradoxes in 2x2 tables: comparison of agreement measures
Authors
Viswanathan Shankar
Shrikant I Bangdiwala
Publication date
01-12-2014
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2014
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/1471-2288-14-100

Other articles of this Issue 1/2014

BMC Medical Research Methodology 1/2014 Go to the issue