RJafroc - Artificial Intelligence Systems and Observer Performance
Analyzing the performance of artificial intelligence (AI)
systems/algorithms characterized by a 'search-and-report'
strategy. Historically observer performance has dealt with
measuring radiologists' performances in search tasks, e.g.,
searching for lesions in medical images and reporting them, but
the implicit location information has been ignored. The
implemented methods apply to analyzing the absolute and
relative performances of AI systems, comparing AI performance
to a group of human readers or optimizing the reporting
threshold of an AI system. In addition to performing historical
receiver operating receiver operating characteristic (ROC)
analysis (localization information ignored), the software also
performs free-response receiver operating characteristic (FROC)
analysis, where lesion localization information is used. A book
using the software has been published: Chakraborty DP: Observer
Performance Methods for Diagnostic Imaging - Foundations,
Modeling, and Applications with R-Based Examples,
Taylor-Francis LLC; 2017:
<https://www.routledge.com/Observer-Performance-Methods-for-Diagnostic-Imaging-Foundations-Modeling/Chakraborty/p/book/9781482214840>.
Online updates to this book, which use the software, are at
<https://dpc10ster.github.io/RJafrocQuickStart/>,
<https://dpc10ster.github.io/RJafrocRocBook/> and at
<https://dpc10ster.github.io/RJafrocFrocBook/>. Supported data
collection paradigms are the ROC, FROC and the location ROC
(LROC). ROC data consists of single ratings per images, where a
rating is the perceived confidence level that the image is that
of a diseased patient. An ROC curve is a plot of true positive
fraction vs. false positive fraction. FROC data consists of a
variable number (zero or more) of mark-rating pairs per image,
where a mark is the location of a reported suspicious region
and the rating is the confidence level that it is a real
lesion. LROC data consists of a rating and a location of the
most suspicious region, for every image. Four models of
observer performance, and curve-fitting software, are
implemented: the binormal model (BM), the contaminated binormal
model (CBM), the correlated contaminated binormal model
(CORCBM), and the radiological search model (RSM). Unlike the
binormal model, CBM, CORCBM and RSM predict 'proper' ROC curves
that do not inappropriately cross the chance diagonal.
Additionally, RSM parameters are related to search performance
(not measured in conventional ROC analysis) and classification
performance. Search performance refers to finding lesions,
i.e., true positives, while simultaneously not finding false
positive locations. Classification performance measures the
ability to distinguish between true and false positive
locations. Knowing these separate performances allows
principled optimization of reader or AI system performance.
This package supersedes Windows JAFROC (jackknife alternative
FROC) software V4.2.1,
<https://github.com/dpc10ster/WindowsJafroc>. Package functions
are organized as follows. Data file related function names are
preceded by 'Df', curve fitting functions by 'Fit', included
data sets by 'dataset', plotting functions by 'Plot',
significance testing functions by 'St', sample size related
functions by 'Ss', data simulation functions by 'Simulate' and
utility functions by 'Util'. Implemented are figures of merit
(FOMs) for quantifying performance and functions for
visualizing empirical or fitted operating characteristics:
e.g., ROC, FROC, alternative FROC (AFROC) and weighted AFROC
(wAFROC) curves. For fully crossed study designs significance
testing of reader-averaged FOM differences between modalities
is implemented via either Dorfman-Berbaum-Metz or the
Obuchowski-Rockette methods. Also implemented is single
modality analysis, which allows comparison of performance of a
group of radiologists to a specified value, or comparison of AI
to a group of radiologists interpreting the same cases.
Crossed-modality analysis is implemented wherein there are two
crossed modality factors and the aim is to determined
performance in each modality factor averaged over all levels of
the second factor. Sample size estimation tools are provided
for ROC and FROC studies; these use estimates of the relevant
variances from a pilot study to predict required numbers of
readers and cases in a pivotal study to achieve the desired
power. Utility and data file manipulation functions allow data
to be read in any of the currently used input formats,
including Excel, and the results of the analysis can be viewed
in text or Excel output files. The methods are illustrated with
several included datasets from the author's collaborations.
This update includes improvements to the code, some as a result
of user-reported bugs and new feature requests, and others
discovered during ongoing testing and code simplification.