Authors: Arpit Talwar
Poster presented at RSNA 2022 (W5B-SPCH-10)
Purpose
Chest x-ray (CXR) interpretation is one of the most subjective and complex of radiology tasks. Accuracy depends on the radiologists’ level of experience and training with the potential of human error related to workload, fatigue and interruptions impacting on patient care. Validated comprehensive deep-learning models may assist in improving diagnostic accuracy and departmental workflow. We therefore aimed to evaluate the performance of radiologist reported normal chest x-rays within the hospital setting against a comprehensive deep-learning model.
Methods and Materials
A retrospective analysis of CXRs that had been reported as normal in adults (≥ 18 years) was performed on consecutive patients at St. Vincent’s Hospital Melbourne from Jan-May 2016. A validated deep-learning model with a total dataset of 60 significant findings was applied on the included studies. Significant findings were adjudicated by two chest radiologists with over 10 years-experience. Disagreements were re-reviewed to reach a consensus. Level of agreement between radiologists and the model was also assessed.
Results
Of the 490 studies included in this preliminary analysis, 444 (90.6%) showed no significant finding predictions by the model (specificity 93.3%). 64 significant findings were identified by the model across 46 studies (9.4%). 50% were rejected by the adjudicators (model PPV 0.5) with 32 findings (6.3%) across 22 studies deemed to be missed by the reporting radiologist. The most common missed finding was distended bowel (18.8%) with a single case of solitary pulmonary nodule (3.1%) and superior mediastinal mass (3.1%) identified. Within our cohort, a total of 4 studies (0.8%) had findings that were of particular concern. Substantial inter-rater agreement was achieved between the model and radiologists (weighted κ value of 0.64 (95% CI 0.52 – 0.75)).
Conclusions
Our preliminary results demonstrate that an AI model provided a useful tool in auditing CXRs reported as normal in a public hospital setting, where radiology trainees report the majority of these studies. The AI model demonstrated that a significant percentage (6.3%) of CXRs had findings deemed significant, missed in the initial report.
Clinical Relevance/Application
Integration of a highly specific AI model in a real-world reporting environment has the potential to improve departmental workflow and reduce human error, ultimately improving patient care.