Are two readers more reliable than one? A study of upper neck ligament scoring on magnetic resonance images

Espeland, Ansgar; Vetti, Nils; Kråkenes, Jostein

dc.contributor.author	Espeland, Ansgar	en_US
dc.contributor.author	Vetti, Nils	en_US
dc.contributor.author	Kråkenes, Jostein	en_US
dc.date.accessioned	2014-09-22T14:00:45Z
dc.date.available	2014-09-22T14:00:45Z
dc.date.issued	2013-01-17	eng
dc.identifier.issn	1471-2342
dc.identifier.uri	https://hdl.handle.net/1956/8521
dc.description.abstract	Background: Magnetic resonance imaging (MRI) studies typically employ either a single expert or multiple readers in collaboration to evaluate (read) the image results. However, no study has examined whether evaluations from multiple readers provide more reliable results than a single reader. We examined whether consistency in image interpretation by a single expert might be equal to the consistency of combined readings, defined as independent interpretations by two readers, where cases of disagreement were reconciled by consensus. Methods: One expert neuroradiologist and one trained radiology resident independently evaluated 102 MRIs of the upper neck. The signal intensities of the alar and transverse ligaments were scored 0, 1, 2, or 3. Disagreements were resolved by consensus. They repeated the grading process after 3–8 months (second evaluation). We used kappa statistics and intraclass correlation coefficients (ICCs) to assess agreement between the initial and second evaluations for each radiologist and for combined determinations. Disagreements on score prevalence were evaluated with McNemar’s test. Results: Higher consistency between the initial and second evaluations was obtained with the combined readings than with individual readings for signal intensity scores of ligaments on both the right and left sides of the spine. The weighted kappa ranges were 0.65-0.71 vs. 0.48-0.62 for combined vs. individual scoring, respectively. The combined scores also showed better agreement between evaluations than individual scores for the presence of grade 2–3 signal intensities on any side in a given subject (unweighted kappa 0.69-0.74 vs. 0.52-0.63, respectively). Disagreement between the initial and second evaluations on the prevalence of grades 2–3 was less marked for combined scores than for individual scores (P ≥ 0.039 vs. P ≤ 0.004, respectively). ICCs indicated a more reliable sum score per patient for combined scores (0.74) and both readers’ average scores (0.78) than for individual scores (0.55-0.69). Conclusions: This study was the first to provide empirical support for the principle that an additional reader can improve the reproducibility of MRI interpretations compared to one expert alone. Furthermore, even a moderately experienced second reader improved the reliability compared to a single expert reader. The implications of this for clinical work require further study.	en_US
dc.language.iso	eng	eng
dc.publisher	BioMed Central	eng
dc.rights	Attribution CC BY	eng
dc.rights.uri	http://creativecommons.org/licenses/by/2.0	eng
dc.title	Are two readers more reliable than one? A study of upper neck ligament scoring on magnetic resonance images	en_US
dc.type	Peer reviewed
dc.type	Journal article
dc.date.updated	2013-08-23T08:56:22Z
dc.description.version	publishedVersion	en_US
dc.rights.holder	Copyright 2013 Espeland et al.; licensee BioMed Central Ltd.
dc.rights.holder	Ansgar Espeland et al.; licensee BioMed Central Ltd.
dc.source.articlenumber	4
dc.identifier.doi	https://doi.org/10.1186/1471-2342-13-4
dc.identifier.cristin	1044605
dc.source.journal	BMC Medical Imaging
dc.source.40	13