Radiology: Artificial Intelligence. Published 22 March 2023.
Authors: Tang JSN, Lai JKC, Bui J, Wang W, Simkin P, Gai D, Chan J, Pascoe DM, Heinze SB, Gaillard F, Lui E
Abstract
Presentation of artificial intelligence outputs through different user interfaces affected radiologist performance in the detection of lung nodules and masses on chest radiographs; user preference did not correspond with user performance.
Purpose
To explore the impact of different user interfaces (UIs) for artificial intelligence (AI) outputs on radiologist performance and user preference in detecting lung nodules and masses on chest radiographs.
Materials and Methods
A retrospective paired-reader study with a 4-week washout period was used to evaluate three different AI UIs compared with no AI output. Ten radiologists (eight radiology attending physicians and two trainees) evaluated 140 chest radiographs (81 with histologically confirmed nodules and 59 confirmed as normal with CT), with either no AI or one of three UI outputs: (a) text-only, (b) combined AI confidence score and text, or (c) combined text, AI confidence score, and image overlay. Areas under the receiver operating characteristic curve were calculated to compare radiologist diagnostic performance with each UI with their diagnostic performance without AI. Radiologists reported their UI preference.
Results
The area under the receiver operating characteristic curve improved when radiologists used the text-only output compared with no AI (0.87 vs 0.82; P < .001). There was no difference in performance for the combined text and AI confidence score output compared with no AI (0.77 vs 0.82; P = .46) and for the combined text, AI confidence score, and image overlay output compared with no AI (0.80 vs 0.82; P = .66). Eight of the 10 radiologists (80%) preferred the combined text, AI confidence score, and image overlay output over the other two interfaces.
Conclusion
Text-only UI output significantly improved radiologist performance compared with no AI in the detection of lung nodules and masses on chest radiographs, but user preference did not correspond with user performance.