automated detection of simulated motion blur in digital...

63
Automated Detection of Simulated Motion Blur in Digital Mammograms Nada Kamona, B.S. and Murray Loew, Ph.D. Department of Biomedical Engineering, George Washington University, Washington, D.C. Rationale Motion blur is a known phenomenon in full-field digital mammography that arises during image acquisition. It has been reported to reduce lesion detection performance and mask small microcalcifications, resulting in failure to detect smaller abnormalities until they reach more advanced stages. It is estimated that 20% of screening mammograms show elements of blur. Motion blur has been found to be due mainly to paddle motion (up to 1.5 mm vertically) during the clamping phase of the mammography exam. We propose using machine learning algorithms to automatically detect motion blur, which could support the clinical decision-making process during the mammography exam by allowing for an immediate retake, thereby preventing unnecessary expense, time, and patient anxiety. Methods To mimic blur seen in mammograms, we simulated it mathematically. The blur point-spread function mask is generated by displacing an individual pixel by a random vector (within the range of the blur effect) and the pixel contribution to the overall image is then sampled on a regular pixel grid using subpixel linear interpolation. This randomly-generated motion trajectory is constrained by several factors; we examined the effects of variations in tissue elasticity, imaging exposure time, and size of blur effect (motion boundary in millimeters). The blur mask is convolved with a mammogram to create blur. Three motion blur magnitudes (0.5, 1.0, and 1.5 mm) were simulated on 68 mammograms (INbreast Database, normal cases, CC and MLO views). Blur was quantified using 17 blur operators for each mammogram and at each blur level (272 images total). Machine learning classifiers, including Linear Support Vector Machine (SVM) and Subspace Discriminant Ensemble (SDE), were trained to distinguish three levels of blurred from unblurred mammograms, using four-way classification. Results The average accuracy for classifying unblurred and blurred mammograms at three levels of magnitude was 75.40% and 74.60% for Linear SVM and SDE respectively. The true positive rate was highest for classifying mammograms with no simulated blur, reaching 99% for both classifiers with a false negative rate of 1%. For Linear SVM, the true-positive rates for blur levels 0.5 mm, 1.0 mm, and 1.5 mm are 75%, 57%, and 71% respectively, while the false-negative rates are 25%, 43%, and 29% respectively. For SDE, the true-positive rates are 72%, 51%, and 76% and the false-negative rates are 28%, 49%, and 24% for blur levels 0.5 mm, 1.0 mm, and 1.5 mm respectively. Training the classifiers to distinguish mammograms with no blur from those with the lowest simulated blur level (0.5 mm) had accuracies of 98.5% and 97.8% for the Linear SVM and SDE respectively. Conclusion Our preliminary results show the potential to detect simulated blur automatically using machine learning classifiers and blur operators. Although limited work has been done to quantify the effects of motion blur on radiologists’ performance, there is evidence that although motion blur might not be detected visually by a human observer, it can nevertheless affect diagnostic performance. We are now using larger mammographic datasets to train convolutional neural networks and validate the developed blur model.

Upload: others

Post on 25-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • Automated Detection of Simulated Motion Blur

    in Digital Mammograms

    Nada Kamona, B.S. and Murray Loew, Ph.D.

    Department of Biomedical Engineering, George Washington University, Washington, D.C.

    Rationale

    Motion blur is a known phenomenon in full-field digital mammography that arises during image acquisition. It

    has been reported to reduce lesion detection performance and mask small microcalcifications, resulting in

    failure to detect smaller abnormalities until they reach more advanced stages. It is estimated that 20% of

    screening mammograms show elements of blur. Motion blur has been found to be due mainly to paddle motion

    (up to 1.5 mm vertically) during the clamping phase of the mammography exam. We propose using machine

    learning algorithms to automatically detect motion blur, which could support the clinical decision-making

    process during the mammography exam by allowing for an immediate retake, thereby preventing unnecessary

    expense, time, and patient anxiety.

    Methods

    To mimic blur seen in mammograms, we simulated it mathematically. The blur point-spread function mask is

    generated by displacing an individual pixel by a random vector (within the range of the blur effect) and the

    pixel contribution to the overall image is then sampled on a regular pixel grid using subpixel linear

    interpolation. This randomly-generated motion trajectory is constrained by several factors; we examined the

    effects of variations in tissue elasticity, imaging exposure time, and size of blur effect (motion boundary in

    millimeters). The blur mask is convolved with a mammogram to create blur. Three motion blur magnitudes

    (0.5, 1.0, and 1.5 mm) were simulated on 68 mammograms (INbreast Database, normal cases, CC and MLO

    views). Blur was quantified using 17 blur operators for each mammogram and at each blur level (272 images

    total). Machine learning classifiers, including Linear Support Vector Machine (SVM) and Subspace

    Discriminant Ensemble (SDE), were trained to distinguish three levels of blurred from unblurred mammograms,

    using four-way classification.

    Results

    The average accuracy for classifying unblurred and blurred mammograms at three levels of magnitude was

    75.40% and 74.60% for Linear SVM and SDE respectively. The true positive rate was highest for classifying

    mammograms with no simulated blur, reaching 99% for both classifiers with a false negative rate of 1%. For

    Linear SVM, the true-positive rates for blur levels 0.5 mm, 1.0 mm, and 1.5 mm are 75%, 57%, and 71%

    respectively, while the false-negative rates are 25%, 43%, and 29% respectively. For SDE, the true-positive

    rates are 72%, 51%, and 76% and the false-negative rates are 28%, 49%, and 24% for blur levels 0.5 mm, 1.0

    mm, and 1.5 mm respectively. Training the classifiers to distinguish mammograms with no blur from those with

    the lowest simulated blur level (0.5 mm) had accuracies of 98.5% and 97.8% for the Linear SVM and SDE

    respectively.

    Conclusion

    Our preliminary results show the potential to detect simulated blur automatically using machine learning

    classifiers and blur operators. Although limited work has been done to quantify the effects of motion blur on

    radiologists’ performance, there is evidence that although motion blur might not be detected visually by a

    human observer, it can nevertheless affect diagnostic performance. We are now using larger mammographic

    datasets to train convolutional neural networks and validate the developed blur model.

  • Assessment of BREAST as a learning tool for

    breast cancer detection for trainees using digital

    mammography

    A Ganesan MSc1, PC Brennan PhD1,2, K Tapia MSc2, C Mello-Thoms PhD 1,3

    1Medical Image Optimization and Perception research Group (MIOPeG), Faculty of Health Sciences, University

    of Sydney, NSW, Australia. 2BreastScreen Reader Assessment Strategy (BREAST), University of Sydney, NSW, Australia.

    3University of Iowa, Department of Radiology, IA, USA.

    Rationale

    Mammography is the primary screening tool for early detection of breast cancer. However, about 30% of cancers

    are missed. Previous research suggested that the level of readers’ experience is one of the most important factors

    affecting their accuracy in detecting lesions. The Breast Screen Reader Assessment Strategy (BREAST) is an online

    testing platform that enables assessment of clinicians’ performance including radiologists, trainees and breast

    physicians in detecting breast cancer using digital mammograms. A recent study showed that engaging with test-

    sets significantly improved the radiologists breast cancer detection performance. In this study, we aim to study the

    impact of BREAST as training tool in improving trainees’ breast cancer detection performance.

    Methods

    This study was conducted using BREAST, an online screen reading test which allows readers to read test sets, report

    the cancerous cases, mark their location and rate them on a scoring scale of 1-5 (1-“normal”, 2-“benign”, 3-

    “equivocal”, 4-“suspicious, and 5 -“malignant”). Five test-sets including Hobart, Sydney, Darwin, Melbourne and

    Gold Coast and twenty-three trainees, who completed at least three of the test-sets in chronological order of release,

    were included in this study. To demonstrate the level of improvement, the test sets were grouped in three (G1, G2

    and G3) based on the order of release and readers who completed the three test-sets from each group were arranged

    in chronological order of completion and named as TS1, TS2 and TS3 respectively. Performance measures including

    sensitivity, specificity, location sensitivity, area under the receiver operating characteristics curve (AUC ROC) and

    jackknife alternative free-response receiver operating characteristic (JAFROC) figure–of-merit of every test-set

    were compared between each pairs of test set from each group.

    Results

    The results showed significant improvement in specificity between some of the test sets for one group of trainees.

    No other significant improvement in trainees’ performance was shown.

    Conclusion

    It is interesting to note that whilst BREAST has been a very effective tool for improving radiologists’ performance,

    the improvements are not so evident with registrars. The most likely explanation is that the cases used for BREAST

    are highly challenging ones which may be too difficult for educational purposes for more junior doctors. The need

    to tailor test sets specific to the level of training and experience is emphasized.

  • Human and model observer study for task

    detection in digital breast tomosynthesis

    Seungyeon Choi1), Sunghoon Choi2), Donghoon Lee1), Young-Wook Choi3), and Hee-Joung Kim1),2)*

    1) Department of Radiation Convergence Engineering, Yonsei University, Wonju, Korea

    2) Department of Radiological Science, Yonsei University, Wonju, Korea

    3) Pioneering Medical-Physics Research Center, Korea Electrotechnology Research Institute

    (KERI), Ansan 15588, Republic of Korea

    Rationale

    Task-based assessment of image quality through theoretical observer model has recently brought the

    attention in medical imaging fields. Observer models which can suitably match with the human

    observer performance under various imaging conditions have been considered as a key idea for virtual

    clinical studies. The current work is mainly focused on the experimental studies using a prototype

    digital breast tomosyntehsis (DBT) system to compare between the task-based metrics of detectability

    index and the human observer performance within various tomosynthesis angular range imaging

    protocols.

    Methods

    We used the prototype DBT system developed by Korea Electrotechnology Research Institute with

    the different angular range setups from ±10.5° to ±24.5° while using the same 15 projection images.

    Human observer performance was measured in four alternative-forced-choice (AFC) tests for detection

    of different tasks including spheroidal masses and microcalcification clusters. For task-based

    detectability index (d’), the non-prewhitening matched filter observer were calculated by analyzing

    task function, local spatial resolution and local noise of spheroidal masses. The percentage correctly

    detected signals (𝑃𝑐𝑜𝑟𝑟) of 4AFC tests were then compared with the d’.

    Results

    In the human observer study, the average 𝑃𝑐𝑜𝑟𝑟 from seven observers were 0.87, ranging 𝑃𝑐𝑜𝑟𝑟 values

    from 0.71 to 0.92. The resulted patterns of 𝑃𝑐𝑜𝑟𝑟 decreased with increasing the angular ranges from ±10.5° to ±24.5° with different size of tasks. Moreover, the performance of the theoretical model

    observer values resulted in similar trend to the human observers’ 𝑃𝑐𝑜𝑟𝑟 results.

    Conclusions

    In this study, we focused on the evaluation of the task-based human and model observer study by

    comparing detectability index and 𝑃𝑐𝑜𝑟𝑟 among several tomosynthesis angular range setups. The performance of the model observer resulted in similar trend to the human observer results in our

    prototype DBT system. The correlation between theoretical and measured performance is necessary

    for better description of task-based model observer performance for future study.

  • Sneak Peak: Are Radiologist Search Patterns

    Altered by a 2D Preview Before a Breast

    Tomosynthesis Image?

    Nicholas M. D’Ardenne, MBBS1; Robert M. Nishikawa, PhD1; Margarita L. Zuley, MD 1,2,

    Chia-Chien Wu, PhD3; Jeremy M. Wolfe, PhD3.

    1. Department of Radiology, University of Pittsburgh, Pittsburgh, PA.

    2. University of Pittsburgh Medical Center, Magee Womens Hospital, Pittsburgh, PA.

    3. Visual Attention Lab, Harvard University, Cambridge, MA.

    Rationale

    Digital Breast Tomosynthesis (DBT) is beginning to be used more frequently alongside Full Field

    Digital Mammography (FFDM) in routine breast screening. One draw back of this newer

    technology is the longer reading times. We aim to investigate if search patterns, duration of reading

    and accuracy of diagnosis differ if radiologists are given a 2D preview before viewing 3D

    tomosynthesis cases.

    Methods

    Readers were instructed to search for lesions as they would under normal clinical conditions and

    were informed that this would be an enriched study (10 positive cases out of 20). Eye tracking

    used a SMI RED250mobile Eye Tracker sampling at 250Hz. Calibration aimed for tracking error

    below 0.5 deg. The images were read on an EIZO RadiForce GS520 5MP (2048 x 2560 native

    resolution) monitor. There were three viewing conditions: 1) FFDM images alone, 2) DBT alone,

    3) DBT with a FFDM preview. A single view was presrnt for each case. Cases were read over 3

    sessions with a washout period of at least one week. Accuracy of diagnosis, time spent on the study

    and search patterns were recorded for each case.

    Results

    Preliminary results from 3 (out of 12) readers (table 1) have been reviewed. Two were experienced

    readers (with 20 and 30 years of experience); the third with 3 years experience. These preliminary

    results indicate that there is a decrease in the time spent viewing DBT when a 2D preview is

    provided from a mean of 63.7seconds (range 9.6-217.3) without preview to 47.0 seconds (8.1-

    134.4) with the preview. The mean sensitivity and specificity of the observers findings are

    essentially unchanged despite this decrease in time taken. In the eye tracking data, a somewhat

  • smaller percentage of breast area is covered when a reader has a preview (32%) compared to when

    they do not have a preview (37%), assuming a 5 degree window around each fixation.

    Table 1

    Conclusions

    Our preliminary results suggest there is a decrease in the time taken to view DBT cases when a 2D

    preview is supplied. As there is relative decrease of 14% of breast area reviewed by readers with

    a 2D preview, it may allow readers to focus search on a smaller fraction of the image without

    sacrificing accuracy. We will present results from all 12 readers at the meeting.

    Without 2D Preview With 2D Preview Change Between Viewing Conditions (Δ)

    Subj.

    Mean View

    Time with

    Range

    (sec)

    Sensitivity

    (sens.)

    Specificity

    (spec.)

    Area of

    Breast

    Viewed

    (%)

    Mean View

    Time with

    Range

    (sec)

    Sens. Spec. Area of

    Breast

    Viewed

    (%)

    Mean View

    Time (sec)

    Sens. Spec. Area of

    Breast

    Viewed

    (%)

    1 24.9 (9.6-

    44.9)

    0.50 0.70 28 19.5 (9.7-

    47.1)

    0.50 0.70 25 -5.4 0 0 -11

    2 58.2 (17.8-

    127)

    0.80 0.90 34 58.9 (30.2-

    133)

    0.80 0.80 35 0.7 0 -0.1 3

    3 104 (45-

    217)

    0.90 0.50 50 56.4 (8.1-

    134)

    0.80 0.70 36 -47.3 -0.1 0.2 -28

    Mean 63.7 (9.6-

    217)

    0.73 0.70 37 47.0 (8.1-

    134)

    0.70 0.73 32 16.7 -0.03 0.03 -14

  • Identifying Sources for Improving Breast

    Image Quality within the setting of the

    MQSA EQUIP

    Lonie R Salkowski MD MS PhD1,2, Jess Harried RT3

    University of Wisconsin School of Medicine & Public Health, Department of Radiology1

    University of Wisconsin School of Medicine & Public Health, Department of Medical Physics2

    University of Wisconsin Health Sciences3

    Rationale In January 2017, the Enhancing Quality Using the Inspection Program (EQUIP) was added to the

    FDA/MQSA breast imaging program to ensure image quality review and implementation of

    corrective processes. Breast image quality is the responsibility of both the technologists and

    radiologists. Improper image quality can result in potentially missed breast cancers. Prior

    research has suggested that positioning is a major reason for technical recalls. Breast imaging

    fellowship trained radiologists spend a full year learning all elements of breast imaging including

    assessment of image quality. General radiologists receive training about image quality in their

    three months of required breast imaging during residency. Based on training practices it is

    reasonable to expect differences in the type and number of technical recalls from fellowship

    trained and general radiologists who practice breast imaging.

    Methods This HIPAA-compliant study was exempt from IRB review. In consecutive screening

    mammograms (January 2015 through December 2018), prospectively recorded technical recalls

    were collected from a hybrid breast imaging service. The technical recalls were compared for

    imaging modality (FFDM or DBT), images requested, and indication(s) for technical recall

    (motion, positioning, technical/artifact). Chi-squared tests evaluated statistical significance

    between proportions.

    Results During the study interval, 58,448 screening mammograms were performed with 141 technical

    recalls requested by the radiologists (0.24%). During the 1013 clinical days, 32.3% had coverage

    with a breast fellowship trained radiologist. The general radiologists made 33 technical recalls,

    and fellowship trained radiologists made 108 recalls. Comparing the images requested for

    technical recall, general radiologists (28.3%) requested significantly more Left CC views than

    fellowship trained (11.8%) (p=0.0059). The differences in requests for Right CC, Right MLO

    and Left MLO were not significantly different. Although there was a trend for fellowship trained

    radiologists to recall more Right MLO and Left MO views.

    The general radiologists had 38 reasons for recalling 33 cases, compared to 150 reasons for 108

    recalls for the fellowship trained radiologists. There were significant differences in three groups

    of reasons (motion, positioning, technical/artifact) for technical recall between fellowship trained

    and general radiologists. General radiologists (36.8%) requested significantly more technical

    recalls for motion compared to fellowship trained radiologists (14.0%)(p=0.0013). Fellowship

  • trained radiologists (68.0%) requested significantly more recalls for errors in positioning

    compared to general radiologists (39.5%) (p=0.0012). There was no significant difference

    (p=0.4279) in fellowship trained and general radiologists for artifact based technical recalls (18%

    and 23.7% respectively).

    Conclusions The EQUIP program requires that there is mechanism for image quality improvement and

    feedback. Fellowship trained breast imagers have more concentrated and longer training in

    image quality than general radiologists. Additional training for both general and fellowship

    trained radiologists in identifying image quality, with attention to positioning errors, will

    enhance a breast imaging program and provide improved patient care.

  • Relationship between Obuchowski-

    Rockette and Gallas U-statistic methods

    for analyzing multi-reader diagnostic

    imaging data

    Stephen L. Hillis, PhD

    Departments of Radiology & Biostatistics, University of Iowa

    Rationale

    The Obuchowski-Rockette (OR) and Gallas U-statistic (U-stat) methods have been the two most

    frequently used methods for analyzing multireader multicase (MRMC) diagnostic imaging data

    that allow conclusions to generalize to both the reader and case populations. The OR method is

    the more general method because it can be used with any reader-performance measure, whereas

    the U-stat method is limited to a U-statistic outcome, such as the empirical (or trapezoidal) AUC

    statistic. On the other hand, advantages of the U-stat method are that it provides exact

    expressions for the outcome variance, provides unbiased variance estimates, and makes it easy

    to size future studies having a different abnormal-to-normal case ratio than was used in a pilot

    study. However, previously it has not been clear if there is a direct link between the two

    methods. In this talk I discuss a particular version of the OR method that produces the same test

    statistic as the U-stats method

    Methods

    I discuss a new way to estimate the error covariances when using the OR model which utilizes

    the U-statistic approach.

    Results

    I show analytically that this version of the OR method produces the same test statistic as the U-

    stats method.

  • Conclusions

    Showing that a U-stats analysis can be performed using the OR method is useful in several ways:

    (1) Previously the U-stats method was previously limited to comparison of two modalities. Now

    the U-stats method can be used for testing for equivalence for several modalities, because the OR

    method allows for this. (2) The equivalence of the statistics establishes that there is now an

    unbiased variance version of the OR method available for U-statistic outcomes. (3) For U-

    statistic outcomes, it is now easy to use the OR method to compute sample size for studies

    having a different abnormal-to-normal case ratio than was used in a pilot study. (4) Negative

    variances using the U-stat method can be avoided by using the well-tested OR approach for

    computing degrees of freedom and constraining the variance to be positive. (5) If researchers

    want to analyze a U-statistic outcome, they no longer have to be concerned with the question of

    which method is better?

  • The strength of the gist of the abnormal in the

    unilateral and bilateral mammograms

    Ziba Gandomkar*a , Ernest U. Ekpoa , Sarah J. Lewisa , Karla K. Evansb , Kriscia Tapiaa , Tong Lia, Seyedamir

    Tavakoli Tabaa, Jeremy M. Wolfec , Patrick C. Brennana a Medical Imaging Sciences, Faculty of Health Sciences, University of Sydney, Sydney, NSW, Australia;

    BreastScreen Reader Assessment Strategy (BREAST), University of Sydney, Sydney, NSW, Australia. b Department of Psychology, University of York, Heslington, York, UK. c Visual Attention Lab, Harvard Medical School, Cambridge, MA, USA.

    Rationale

    Experts can perceive the gist of the abnormal in the negative prior unilateral mammograms of women who subsequently

    diagnosed with breast cancer. Here, we compared the strength of the gist from unilateral and bilateral mammograms.

    Methods

    Seventeen radiologists viewed 60 cases in two different experiments (GistUnilateral and GistBilateral). In GistUnilateral, 60

    unilateral craniocaudal mammograms were presented in a randomly generated sequence for a half-second to the

    radiologists, who were asked to provide an abnormality probability for each case on a scale from 0 (confident normal)

    to 100 (confident abnormal). In GistBilateral, we presented bilateral mammograms of the same cases using a similar

    experimental protocol. Readers were randomly assigned to two groups, the first did the unilateral experiment first while

    the second group did the bilateral experiment first. Four categories of mammograms (15 cases per category) were

    included: 1) Cancer cases, which contained biopsy-proven malignancies; 2) Normal cases, which remained normal at

    least for next two years; 3) Prior_Vis cases, which contained retrospectively visible non-actionable cancer signs; 4)

    Prior_Invis cases, which did not contain visible cancer signs. Mammograms from the last two groups were from women

    who subsequently developed biopsy-proven malignancies. For each radiologist and each category, the Pearson

    correlation between the unilateral and bilateral gist responses was calculated. In each experiment, three pair-wise

    classifications, i.e. Cancer/Normal, Prior_Vis/Normal, Prior_Invis/Normal were analysed. A paired, two-sided

    Wilcoxon Signed Rank test was used to investigate whether the values of area under receiver operating characteristic

    curves (AUC) were at an above-chance (AUC=0.5) level. The same test was also used to show whether the AUC values

    from two experiments differed significantly for each pair-wise classification. For each radiologist and each case, we also

    calculated the average of the two gist responses recorded in the two experiments and produced GistAVE, i.e.

    ½(GistUnilateral+GistBilateral).

    Results

    The averages of correlation coefficient across 17 readers for Cancer, Normal, Prior_Vis, Prior_Invis, and all cases were

    0.17 (CI=0.03-0.31), 0.26 (CI=0.09-0.43), 0.30 (CI=0.12-0.49), 0.35 (CI=0.21-0.49), and 0.35 (CI=0.25-0.44),

    respectively. The order of median AUCs in Cancer/Normal and Prior_Vis/Normal classifications from the highest to the

    lowest was GistAVE>GistUnilateral>GistBilateral. All differences except the difference for GistAVE and GistUnilateral in

    Prior_Vis/Normal classification were significant. In Prior_Invis/Normal classification, the order was

    GistAVE>GistBilateral>GistUnilateral. None of the differences in the AUC values for Prior_Invis/Normal classification were

    significant. On average, the AUCs of Cancer/Normal, Prior_Vis/Normal, Prior_Invis/Normal classifications based on

    GistUnilateral respectively dropped by 8%±6%, 10%±8%, and 1%±8% in the bilateral experiment while these AUCs

    increased by 5%±3%, 2%±4%, and 4%±6% after averaging two signals. On average, the AUCs of Cancer/Normal,

    Prior_Vis/Normal, Prior_Invis/Normal classifications based on GistAVE were 82%±4%, 74%±3%, and 67%±5%.

    Conclusions

    There is weak association between the gist signal from unilateral and bilateral mammograms. The signal was stronger in

    the unilateral experiment. When two signals were averaged, the AUCs increased. The improvement could be as a result

  • of cancelling out random noise by averaging two values. Further investigation of intra-reader variability and exploring

    the AUC when unilateral gist responses of a reader were averaged in multiple experiments is required.

  • Perceptual Training –

    Learning versus Attentional Shift

    Soham Banerjee, MD [1]; Megan Mills, MD [1]; Trafton Drew, PhD [2];

    William F. Auffermann, MD/PhD [1*]

    [1] Department of Radiology and Imaging Sciences, University of Utah Health, Salt Lake City,

    UT, USA; [2] Department of Psychology, University of Utah, Salt Lake City, UT, USA;

    [*] Corresponding Author

    Rationale: Perceptual training (PT) has been shown to improve healthcare trainees’ ability to identify

    abnormalities on chest radiography (CXR). Specifically, recent studies have examined the

    effects of search pattern training, and showed improved performance with training. However, it

    was not clear if the improved performance was due to learning, or due to an attentional shift

    resulting from queuing related to the training. The objective of this study is to determine

    whether improved subject performance on CXR evaluation after PT is due to learning or

    attentional shift.

    Methods: A perceptual training experiment with 41 physician assistant trainees was performed. All

    subjects voluntarily participated and provided informed consent. Subjects evaluated CXRs for

    appropriate central venous catheter (CVC) positioning and other imaging related tasks before and

    after educational interventions. For the intervention, the control group received an attentional

    control task, and the experimental group received perceptual training in the form of search

    pattern training for CVC characterization.

    Many of the subjects' tasks were similar to prior studies and included: 1) Marking the tip of the

    catheter, 2) Indicating their confidence in catheter tip localization, 3) Indicating whether the

    catheter was adequately positioned or malpositioned.

    In addition, subjects were asked to rate whether the cardiac silhouette was normal or abnormally

    enlarged using a 5-point scale. Information on how to perform cardiac evaluation was given

    only at the beginning of the study with the study’s introductory materials.

    Subject ability to characterize the adequacy of catheter positioning (Line-Safe) and the heart size

    (Heart-Size) were quantified using receiver operating characteristic (ROC) analysis. Subject

    ability to localize the catheter tip (Line-Loc) was quantified using localization ROC (LROC)

    analysis. The figure of merit for performance was the area under the curve (AUC).

  • Results: The difference in AUC for subject performance before and after the educational intervention and

    the corresponding p-values are given in the table below.

    Line-

    Loc

    Line-

    Loc

    Line-

    Safe

    Line-

    Safe

    Heart-

    Size

    Heart-

    Size

    ΔAUC P-Value ΔAUC P-Value ΔAUC P-Value

    Control -0.11 0.88 0.06 0.01 0.04 0.01

    Experimental 0.30

  • RadSimP - A Custom Software Solution

    for Perceptual Training Compared with

    Current Perceptual Software

    Soham Banerjee, MD [1]; Megan Mills, MD [1]; Trafton Drew, PhD [2];

    William F. Auffermann, MD/PhD [1*]

    [1] Department of Radiology and Imaging Sciences, University of Utah Health, Salt Lake City,

    UT, USA; [2] Department of Psychology, University of Utah, Salt Lake City, UT, USA;

    [*] Corresponding Author

    Rationale: Recent studies have shown the utility of perceptual training (PT) for teaching healthcare

    trainees good perceptual habits when evaluating medical images. Prior studies were performed

    using software designed for perceptual observer studies. As most software packages for image

    perception are geared towards research, they were not optimized for perceptual training and

    assessment. To date, there had been no software packages specifically designed for perceptual

    training. The goal of this study is to determine if perceptual training using our custom software

    solution, RadSimP, resulted in improved performance relative to training using current

    perceptual research software.

    Methods: PT for central venous catheter (CVC) positioning was performed using a counterbalanced

    design. Subjects were shown several sets of chest radiographs (CXRs) with CVCs that were

    either adequately or malpositioned. Subjects were asked to: mark the tip of the catheters, rate

    their confidence in catheter tip localization, and state whether or not the catheters were

    adequately positioned.

    The same study was conducted twice using two different PT software packages. Study-A used

    ViewDEX (https://sas.vgregion.se/en/for-dig-som-ar/vardgivare/viewdex/), a software package

    for perceptual research. Study-B used RadSimP, our custom perceptual training and radiology

    workstation simulator software package, written in Python.

    All subjects voluntarily participated and provided informed consent. For Study-A, 14 physician

    assistant students participated. For Study-B, 41 physician assistant students participated.

    Training and assessment was done at individual computer workstations in an educational

    computer classroom.

    During Study-A, the trainees had to manually switch between folders on the desktop to access

    the appropriate educational materials and had to manually enter information to get the correct

    set of cases for assessment. For Study-B, the RadSimP program seamlessly integrated subject

  • consent, training, practice, and assessment in a simulated radiology workstation environment.

    In addition, RadSimP was loaded onto the classroom’s network storage drive, such that all

    subjects could run it simultaneously, and the results were automatically collected in a central

    location.

    A survey was given to subjects after the completion of the study to assess the subjects’

    impressions of perceptual training and the RadSimP software package. The survey asked if

    subjects felt the search pattern training and simulator environment were helpful for learning

    about radiology. Responses were collected using a 5-point Likert response format (where 5

    indicates strongly agree).

    Results: Using both training paradigms, the subjects in the experimental group showed a statistically

    significant improvement in their ability to characterize a catheter as acceptable versus

    malpositioned. The difference in areas under the localization receiver operator characteristic

    curves were 0.07 using the conventional software and 0.1 using RadSimP, p-values of 0.02 and

  • Meaningful Feedback in Breast Imaging

    Simulation Assessment

    Lonie R Salkowski, MD MS PhD1,2, Mai A Elezaby MD1, Elizabeth A Krupinski, PhD3

    University of Wisconsin School of Medicine, Department of Radiology1

    University of Wisconsin School of Medicine, Department of Medical Physics2

    Emory University, Department of Radiology & Imaging Sciences3

    Rationale There is a lack of objective assessment on the quality of interpretive skills in radiology residency

    training. This is especially pertinent in breast imaging where there is no independent

    interpretation of exams during residency and the majority of residents will not pursue a breast

    imaging fellowship. Simulation is a validated technique which facilitates independent

    interpretation of thoughtfully developed clinical cases with sequential exposure during residency

    training. The format of meaningful feedback should provide both formative and summative

    information that is beneficial and not punitive for residents.

    Methods We developed a breast imaging simulation to provide serial feedback for residents over the four

    years of their radiology residency training. Users will be provided feedback in several formats.

    First, a modified medical audit (recall rate, cancer detection rate, sensitivity, specificity, PPV1, PPV2, and PPV3) that introduces the residents to the typical MQSA mandated annual feedback

    for radiologists that interpret mammograms. This audit will have higher educational impact when

    the data are presented in the context of the resident’s own work and preparing them to reach

    national performance benchmarks. Second, the users will be provided feedback on their

    assessment of lesion types (masses, calcifications, asymmetries, architectural distortion). This

    will objectively highlight areas that may need additional review and emphasis in the educational

    program for users and clinical educators. Third, since breast density is a recently introduced

    national concern and has implications for clinical practice (from the notification of patients about

    their tissue density to the offering of additional clinical tests for patients with high tissue

    density), it will be important to assess the user’s understanding of breast density. Lastly, to the

    keep the users motivated and provide an element of competitive playfulness in a residency

    program, a gamification component has been added to the assessment. This will lead to more

    engagement by residents with the intent that they will be better prepared for independent

    interpretation at the completion of the residency. This gamification component will provide

    overall scores for assessments that will be displayed on a leader board (user self-assigned names)

    format for comparison to peers.

    Results Within the context of the assessments developed in the simulation (modified medical audit,

    lesion type assessment, breast tissue density, and gamification) the results will be correlated with

    resident level (first, second, third rotations) targets, and how residents’ in-training medical audits

    compare to the national benchmarks trajectory. The residents will be serially debriefed with

  • formative and summative results, and as data is collected peer-level comparisons will be

    provided.

    Conclusions Thoughtful development of objective assessment measures in a simulation design with feedback

    is important, in order to provide users with a meaningful simulation experience.

  • Optimality of tool selection in radiologists and naïve subjects

    Lisa M. Heisterberg, BS & Andrew B. Leber, PhD

    The Ohio State University, Department of Psychology & Medical Scientist Training Program

    Rationale While multiple avenues of research have investigated errors in radiology, one possible cause of errors that has received limited study is the way in which radiologists interact with Picture Achieving and Communication System (PACS) software. PACS software contain critical elements that allow radiologists to view and manipulate medical images, but their many tools and features can put individuals at risk for making sub-optimal choices. Acquiring radiologist subjects, and the high complexity of PACS software can make researching this topic difficult. Hence, we have developed a simple laboratory based visual search task that approximates windowing, a PACS feature that allows for contrast enhancement of images. We sought to determine if non-expert performance in our task could inform us about radiologist performance, and if subjects would approach our task optimally.

    Methods 26 radiologists and 26 subjects naïve to radiological image interpretation completed our study. Each trial tasked subjects with deciding if a letter T was present or absent in displays containing distractor Ls. One of three classes of displays were shown on each trial. For the 80% of T present trials, the T was not always immediately visible. Each display was initially shown with a default contrast adjustment setting applied that revealed the target on 5% of trials. Subjects could select from 3 additional adjustment settings; an optimal setting that revealed the target on 75% of trials, and 2 other settings that each displayed the target on 10% of trials. Subjects were pre-informed which adjustment setting was optimal for each display class. Selecting the optimal setting first, then selecting additional settings if the target was not found, would allow for the most accurate and efficient search.

    Results Accuracy for reporting the absence or presence of a target was not significantly different for radiologists (85.3%) and naïve subjects (83.5%). Naïve subjects spent significantly less time per trial (13.0s) than radiologists (16.5s). The percentage of the time the optimal adjustment setting was selected first was not significantly different between radiologists (79.6%) and naïve subjects (86.7%). For both radiologists and naïve subjects, those that more often selected the optimal setting first had significantly faster average trial completion times, with no differences in accuracy. Lastly, on target present trials where the target was not visible with the optimal setting, subjects were significantly more likely to decide a target was absent; indicating that if a target was not visible using the optimal setting, subjects often neglected selecting other settings or continuing their search.

    Conclusions These results demonstrate that in our simplified search task, radiologists are not completely accurate, are more efficient when making optimal choices, and can display sub-optimal behaviors; all of which are similar to naïve subjects. Such results reveal that the performance of non-radiologist subjects can inform us about radiologist performance in our task. Future studies will correlate radiologist performance in this simplified task with their performance using professional PACS. Overall we hope to understand how radiologists interact with PACS software, why they may act sub-optimally, and how sub-optimal behaviors can be reduced.

  • Reducing Errors in Pathology Image-

    based Decisions through Maximum

    Confidence Slating

    Jennifer S. Trueblood1, PhD, William R. Holmes2, PhD, Adam C. Seegmiller3, MD, PhD,

    Charles Stratton3, MD, Quentin Eichbaum3, MD, PhD 1Department of Psychology, Vanderbilt University, 2Department of Physics and Astronomy,

    Vanderbilt University, 3Department of Pathology, Microbiology and Immunology, Vanderbilt

    University Medical Center

    Rationale Second opinions can significantly improve diagnostic accuracy. However, multiple readings by

    different individuals are not always feasible due to shortages in pathology and laboratory medicine

    workforce, particularly in low resource settings. We examine whether it is possible to reduce errors

    by having the same person perform multiple readings. Research in decision-making has shown a

    “wisdom of the crowd within” effect, improving accuracy by aggregating responses from a single

    individual. We apply a similar strategy to decisions about the pathology images.

    Methods In two experiments, participants (novices in Exp 1 and experts in Exp 2) viewed images of white

    blood cells and decided if it contained a blast cell (pathological white blood cell) or not. On each

    trial, participants were asked to make a binary choice followed by a confidence rating, indicating

    their confidence in their decision. Participants viewed each image twice.

    Results Results showed confidence was greater for correct as compared to incorrect responses (Exp 1: F(1,

    36) = 106.83, p < .001 and Exp 2: F(1, 21) = 33.77, p < .001). We then applied a maximum

    confidence slating algorithm (MAX, Koriat, 2012) to each individual’s decisions. For each image,

    MAX selects the trial with the higher confidence. We compared this approach with average

    performance (AP) as well as a minimum confidence slating (MIN) algorithm that selects the lower

    confidence response for each image. We found a main effect of algorithm (Exp 1: F(2, 72) = 29.41,

    p < .001 and Exp 2: F(2, 42) = 12.21, p < .001), with post hoc tests showing that MIN generated

    lower accuracy than AP, and AP generated lower accuracy than MAX.

    Conclusions In sum, our results show that confidence is associated with accuracy in pathology decisions and

    suggests that it can be used as a way to aggregate multiple readings within the same individual.

  • A novel learning-based paradigm to

    investigate the visual-cognitive bases of

    lung nodule detection

    Frank Tong1,2, Ph.D., Malerie G. McDowell1, B.A., William R. Winter3,

    M.D., M.S.,

    and Edwin F. Donnelly3, M.D., Ph.D

    1 Psychology Department, Vanderbilt University

    2 Vanderbilt Vision Research Center, Vanderbilt University

    3 Department of Radiology, Vanderbilt University Medical School

    Rationale: Even expert radiologists will sometimes fail to detect the presence of a pulmonary

    nodule in a chest X-ray image, with estimated rates of missed detection of 20-30%.

    The challenging nature of this diagnostic task lies not only in the visual contrast or

    the size of the nodule, but also in the heterogeneity of nodule appearance and the

    variability of the local anatomical background. The goal of our study was to

    develop a learning-based paradigm, using image processing software to generate a

    large, heterogeneous set of visually realistic simulated nodules, to gain insight into

    the visual and cognitive bases of lung nodule detection.

    Methods: The current version of our software allows for the creation of simulated nodules

    with heterogeneous appearance, allowing for rigorous control over the size, shape,

    brightness, contrast, and placement of nodules in 2D chest radiographs.

    Results: At the MIP Lab at RSNA, we tested radiologist participants (n=10) with both real

    and computer-simulated nodules at a challenging nodule localization task.

    Performance accuracy was significantly better for real nodules than for the subtle

    simulated nodules we created (70.5% vs. 59.0% accuracy, p < 0.005). Of greater

  • interest, radiologists performed no greater than chance level at discriminating

    whether nodules were real or simulated (mean accuracy 52.9%). Next, we

    evaluated the impact of training naive undergraduate participants at a localization

    task involving simulated nodules. Participants underwent 3-4 training sessions and

    viewed a total of 600 simulated cases. We observed significant improvements

    following training for both simulated nodules (30.3% accuracy pre-test, 78.2%

    accuracy post-test, p < 0.00001) and real nodules (37.5% pre-test, 62.5% accuracy

    post-test, p < 0.0005.). In our next experiment, we investigated whether extended

    training with either light or dark polarity nodules would lead to polarity-specific

    training benefits in initially naive undergraduates. This indeed proved to be the

    case, implying that this training regimen led to the learning of a polarity-specific

    perceptual template of nodule appearance. Finally, we conducted an exploratory

    pilot study with 6 radiology residents to see whether they might show performance

    improvements following training with our nodule localization task. The results of

    this initial pilot revealed a highly significant improvement in performance with

    simulated nodules on the final test day, and a non-significant trend of improvement

    for real nodule cases.

    Conclusions: Taken together, our results demonstrate that marked improvements in nodule

    detection can be achieved by implementing a training regimen with numerous

    realistic examples, and moreover, that trained undergraduates can serve as useful

    model observers for investigating the visual-cognitive bases of nodule detection.

    With continued refinement of our simulation methods and training set of images,

    we anticipate that it should be possible to further boost generalization of these

    training benefits to real nodule test cases. Future developments of this nodule

    localization training paradigm could prove useful as a software tool for enhancing

    the diagnostic training of radiology residents.

  • The Importance of Peripheral Visual

    Processing and Eye Movements in Search

    with 3D Images Miguel P. Eckstein, Miguel A. Lago, Craig K. Abbey

    Department of Psychological & Brain Sciences, UC Santa Barbara, Santa Barbara, CA. 93106, USA

    RATIONALE When radiologists use a 3D imaging modality to diagnose a disease they often read the data as a

    stack of 2D slices and scroll through the slices. The foveated nature of the human visual system

    and the typical reading times prevent radiologists from exhaustively exploring all regions of the

    image set with their high-resolution fovea. Thus, radiologists must rely on vision away from the

    fovea (the visual periphery) to process many regions of the images. Here, we investigate how

    target detectability varies with retinal eccentricity and explore eye movement patterns and

    detection accuracy during 3D search.

    METHODS We measured target detectability of various targets (small and large targets) briefly presented at a

    known location (50% probability) in filtered noise and digital breast tomosynthesis phantoms.

    Eye position monitoring allowed us to ensure that observers maintained gaze on a fixation point.

    In a separate study, observers searched for the large and small targets (50 % probability of target

    presence) in 3D volumetric images with the two backgrounds. Observers were given unlimited

    time to scroll and search. We measured search accuracy (true positive rate, false positive rate),

    eye movements and scrolls.

    RESULTS The results show strong dissociations on detectability in the visual periphery across large and

    small targets for both synthetic textures and DBT phantoms. Detectability for the small target

    degraded abruptly in the visual periphery while that of the larger target reduced more

    moderately. For the 3D search, participants were unable to explore significant portions of the

    data with fixational eye movements suggesting that they relied on peripheral processing for their

    decisions. We found that 3D search led to a significant reduction in target detectability of the

    small targets. We found large variability in human performance detecting the small targets in 3D

    search. Individual detectabilities for the small signals in 3D search were related to the observers’

    eye movements: observers’ search accuracies were inversely correlated with the average closest

    distance of the observers’ fovea to the signal. Detectability did not correlate with search times.

    CONCLUSION For 3D imaging modalities, the properties of the human visual periphery and eye movements are

    critical in determining the detectability of searched targets and might play an important role

    determining individual variability in search accuracy.

  • Foveated Model Observers applied to DBT

    image phantoms Miguel A. Lago1, Bruno B. Barufaldi2, Predrag R. Bakic2, Craig K. Abbey1,Susan P. Weinstein2, Brian

    Englander2, Andrew D. Maidment2, Miguel P. Eckstein1

    1Department of Psychological & Brain Sciences, UC Santa Barbara, Santa Barbara, CA, USA 2Department of Radiology, University of Pennsylvania, Philadelphia, PA, USA

    RATIONALE Digital Breast Tomosynthesis (DBT) is becoming the standard for breast imaging. New 3D imaging

    modalities bring large volumes of data that cannot be exhaustively explored with eye movement fixations.

    Thus, radiologists read the images with visual processing away from the fovea (visual periphery).

    However, accuracy in the visual periphery is degraded relative to foveal processing. Current model

    observers (Channelized Hotelling and Non-Prewhitening Matched Filter with an Eye Filter) only model

    visual processing at the fovea and might not be sufficient to account for human performance with 3D

    imaging modalities. Here, we propose a new Foveated Channelized Hotelling Observer (FCHO) that

    incorporates vision across the entire visual field with reduced spatial detail away from the fovea. We

    compare the FCHO model to traditional non-foveated model observers in their ability to predict human

    performance when searching for simulated microcalcifications and masses in breast phantoms.

    METHODS We designed an experiment consisting of a free search of a simulated signal (microcalcification or mass)

    within the UPENN DBT phantom. We ran 12 radiologists in 28 trials of a 3D DBT (64 slices) search and

    28 trials of a single slice 2D DBT search with 50% signal presence. Additionally, we trained our FCHO

    to search for the simulated signals within the breast phantoms. Our model analyzes the visual field with

    different templates at different distances from the fixation point. It also includes an eye movement model

    that selects the next fixation point and a scrolling model that goes through the slices of the volumetric

    image. We compared human and model accuracy for a single (central) slice of DBT and the complete 3D

    DBT.

    RESULTS Human observer performance shows a significantly lower detectability for the microcalcification in 3D

    search (d'=1.36±0.33) compared with the 2D (d'=3.78±0.54) while masses do not show a significant

    difference (2D: d’=1.89±0.38; 3D: d’=1.65±0.40). The FCHO model performance shows an agreement

    with these results and is capable to capture the interactions in detectability between the signal type

    (microcalcification vs. mass) and the search task (2D vs. 3D) that we also have seen with previous results

    for our FCHO model for images with correlated Gaussian noise.

    CONCLUSION We presented the first application of the FCHO to more realistic images containing structures, different

    tissues, and backgrounds that are more complex. The FCHO model correctly predicted the dissociation

    in results across signal types while traditional model observers did not. The results motivate the use of

    foveation in model observers in order to assess image quality in 3D DBTs.

  • The Role of Comparison in Categorization

    Learning of Chest X-ray: An Eye

    Movement Study

    Yanju Ren, Ph.D.1; Yuanjie Zheng, Ph.D.2 1School of Psychology, 2School of Information Science and Engineering, Shandong Normal

    University, Jinan, P. R. China; E-mail: [email protected]; [email protected]

    Rationale: The chest X-ray is one of the most commonly accessible radiological examinations for

    screening and diagnosis of many lung diseases. How a Novice observer learn to classify a

    chest X-ray as normal or abnormal (furthermore, Benign or malignant) is an important

    research topic in the domain of medical education. Comparison learning is one of the key

    processes by which people learn and also is broadly found to be very effective in the context

    of, for example, category learning. So the present study is to explore the role of comparison

    in chest X-ray categorization using eye tracking technology.

    Methods: To this end, two eye movement experiments were conducted. Experiment 1 consisted of three

    phases. In the pretest phase, forty-eight undergraduate students participated in the chest X-ray

    categorization (normal or abnormal) task to obtain the baseline performance. In the second

    phase, an half of participants (24 out of 48) was assigned to comparative learning condition

    and the other half of participants was assigned to non-comparative learning condition. In the

    test phase, the participants perform the categorization task. Experiment 2 also contained

    similar three phases, only comparison task was employed and presentation time of the chest

    X-ray was manipulated.

    Results: In Experiment 1, the participants category the chest X-ray at the chance level in the pretest

    phase, and by comparison learning, the comparative learning group obtained remarkable

    improvement in chest X-ray categorization task in the test phase, reflected in the shorter

    reaction time, less number of saccade, shorter fixation duration, and longer saccade amplitude

    etc. The participants from long presentation time group get the better categorization

    performance than those from short one.

    Conclusions: The two eye tracking experiments demonstrate that learning style and presentation time of

    chest X-ray have important roles in medical image categorization.

  • Investigating Observer Gaze Patterns on Facial

    Disfigurements from Head and Neck Cancer

    Krista M. Nicklaus, MSE1,2, Enrique Callado3, Joowon Cho, MSE3, Jun Liu, PhD2, Mary Catherine Bordes,

    BS2, Gregory P. Reece, MD2, Summer E. Hanson, MD, PhD2, Jeffery M. Engelmann, PhD4,

    Mia K. Markey, PhD1,5

    1Biomedical Engineering, The University of Texas at Austin, 2Plastic Surgery, The University of Texas MD

    Anderson Cancer Center, 3Electrical Engineering, The University of Texas at Austin, 4Psychiatry and

    Behavioral Medicine, Medical College of Wisconsin, 5Imaging Physics, The University of Texas MD Anderson

    Cancer Center

    Rationale

    Facial disfigurement resulting from head and neck cancer can have devastating effects on psychosocial

    functioning. Body image changes experienced by head and neck cancer patients contribute to high levels of

    depression and anxiety, social isolation, impaired quality of life, and sexual difficulties. Many head and neck

    cancer patients feel discounted or stigmatized, are preoccupied by appearance changes, or avoid social

    situations due to changes in appearance and functioning. Feeling stigmatized can arise from concerns about how

    others will react to one’s appearance as well as how others actually react. Individuals with facial disfigurement

    are highly aware of how others behave toward them (e.g., staring, gaze aversion, unwelcomed comments) and

    facial expressions perceived to convey negative emotional reactions (i.e., disgust). Our long-term goal is to

    support head and neck cancer patients in developing realistic expectations about how other people in non-social

    group settings (e.g., customers in a grocery store) will respond to their appearance by presenting selected

    information from a normative database of the responses of lay observers to facial disfigurement resulting from

    head and neck cancer. While several methodologies could be used to study observers’ cognitive, behavioral, and

    emotional responses to facial disfigurement, eye tracking has the potential to help us understand behavioral

    responses such as staring and gaze aversion. This preliminary study examines lay observers’ gaze patterns when

    looking at clinical photographs of head and neck cancer patients.

    Methods

    Eye movements were recorded and tracked with a Tobii TX300 Eye tracker (Tobii Technology Inc., Falls

    Church, VA), with a sampling rate of 300 Hz. 20 lay observers viewed 144 face images for 6 seconds each (4

    images were used for practice). The images are from 35 head and neck cancer patients with varying degrees of

    disfigurement over multiple time points from 1 to 12 months post face reconstruction. Two clinical experts

    determined whether the facial disfigurement was on the left or right side of the face, or undetermined. The

    midline of the face was defined by the line from the central hairline, through the pronasale, to the lowest point

    of the chin. Gaze fixations and saccades were mapped using the EyeMMV toolbox in MATLAB (Mathworks,

    WA, USA). Fixation location, dwell time, and saccades were investigated in relation to the location of

    disfigurement.

    Results

    The locations of fixations, duration of fixations, and saccades were mapped to each image. There was

    substantial variation in the gaze patterns across observers and stimuli.

    Conclusions

  • Eye tracking data has the potential to identify features of disfigured faces that attract attention of lay observers.

    However, substantial inter-observer variability suggests that future work is needed to investigate factors that

    influence lay people’s cognitive, behavioral, and emotional responses to facial disfigurement. Factors to

    consider include quantitative measures of facial disfigurement; the lay person’s body image and affective state;

    and the layperson’s demographic variables.

  • Influence of radiology expertise on the perception of nonmedical images Brendan Kelly, Louise A. Rainford, Mark F. McEntee, Eoin C. Kavanagh

    Abstract Identifying if participants with differing diagnostic accuracy and visual search behavior during radiologic tasks also differ in nonradiologic tasks is investigated. Four clinician groups with different radiologic experience were used: a reference expert group of five consultant radiologists, four radiology registrars, five senior house officers, and six interns. Each of the four clinician groups is known to have significantly different performance in the identification of pneumothoraces in chest x-ray. Each of the 20 participants was shown 6 nonradiologic images (3 maps and 3 sets of geometric shapes) and was asked to perform search tasks. Eye movements were recorded with a Tobii TX300 (Tobii Technology, Stockholm, Sweden) eye tracker. Four eye-tracking metrics were analyzed. Variables were compared to identify any differences among the groups. All data were compared by using nonparametric methods of analysis. The average number of targets identified in the maps did not change among groups

    [ mean=5.8mean=5.8 of 6 targets (range 5.6 to 6 p=0.861p=0.861 )]. None of the four eye-tracking metrics investigated varied with experience in either search task ( p>0.5p>0.5 ). Despite clear differences in radiologic experience, these clinician groups showed no difference in nonradiologic search pattern behavior or skill across complex images. This is another viewpoint adding to the evidence that radiologic image interpretation is a learned skill and is task specific.

  • Variations in Lung Nodule Detection and Functional Visual Field of Radiologists

    Geoffrey D. Rubin, MD, MBA, Brian Harrawood, Kingshuk RoyChoudhury, PhD,

    Justus E. Roos, MD, Martin Tall, Sandy Napel, PhD Departments of Radiology, Duke University and Stanford University

    Rationale The foveal gaze of radiologists is exposed on average to only 27% of the lung volume, yet 76% of imbedded lung nodules are included in that volume. This suggests that radiologists’ functional visual field (FVF) for lung nodule detection in CT scans extends well beyond the limits of central gaze (within a 5° gaze angle). To better characterize radiologists’ FVF while dynamically scrolling through CT scans, we measured the distance between radiologists’ gaze point and a lung nodule immediately prior to its formal detection at the “moment of recognition”.

    Methods Time-varying gaze traces acquired from 13 radiologists using unconstrained stacked transverse section paging and eye tracking during the interpretation of 40 chest CT scans enriched with 157 simulated 5-mm solid lung nodules were subdivided into periods of nodule visibility (exposures). “Gaze distances” were measured between gaze points and lung nodules to quantify their relationship with nodule exposure duration and detection. The moment of recognition (MoR) was defined as the time point immediately preceding the saccade that converged upon and resulted in the immediate detection of a nodule. MoR distances were measured and characterized as central (foveal) versus peripheral vision based upon a 5° gaze angle threshold.

    Results There were 9,751 nodule exposures, defined as discrete periods of nodule visibility exclusive of those following a detection, that consumed 6% of the total search time. 3,371 of these exposures resulted in the detection of 997 TP nodules (49% detection rate). The duration of exposure to undetected (false negative) nodules was 3.5 times longer than TP nodules (p

  • Figure: (A) Free longitudinal (z) search path from a single reader examining a chest CT scan with three 5-mm lung nodules centered on the orange lines and visible over the faded orange bands. Red regions indicate periods when nodules were displayed on visualized cross-sections but were not detected. The green region indicates the period when one of the three nodules was detected by the reader. The three nodules were visible for 3.1, 3.3, and 3.4% of the search time and were exposed to central gaze for 0.2, 0.0, and 0.1%, respectively, across the 357 second search duration. (B) The 3.5 second region contained within the green zone in (A) is magnified and displayed with the corresponding gaze point samples. Vertical gray zones indicate regions where the target lung nodule is not visible at the extremes of the slab through which the subject scrolls. Selected time points (1-6) are illustrated with corresponding CT section, gaze point (red circle with 50-pixel diameter), target (orange circle), and acceptance of the detection (green circle). The subject is positioned such that central gaze (5° gaze angle) is within 90 pixels of the gaze point. At the beginning of the trace, the nodule is not visible, but the subject scrolls down and the nodule is revealed when the gaze is 353 pixels away (1). The gaze then deviates closer to the x, y position of the nodule (2), but moves back to the posterior lung (3). Following a saccade, the gaze shifts anteriorly to within 164 pixels of the nodule (4). Another saccade ensues bringing the gaze within 50 pixels of the nodule, just as the viewer scrolls beyond the nodule, reverses scroll direction and lands on the nodule (5). After 1 second scrutinizing the nodule, it is accepted (6). Based upon the location of the final saccade converging on the nodule, the moment of recognition is classified to occur at the dotted black line and the preceding time period is considered to be search while the subsequent time period is considered to be decision making.

  • Visual search behavior reveals differences

    in diagnostic accuracy based on

    experience

    Joe Thomas, BSc1, Bradley Fawver, PhD1, Megan Mills, MD2, William Auffermann, MD, PhD2,

    Trafton Drew, PhD3, and A. Mark Williams, PhD1

    1 University of Utah; Department of Health, Kinesiology & Recreation 2

    University of Utah; Department of Radiology and Imaging Sciences 3University of Utah; Department of Psychology

    Rationale

    A substantial number of medical errors in radiology are attributed to failures of perception or

    failures of decision making. Although it is believed that experience in diagnostic imaging

    naturally leads to the development of expertise, data from other medical fields suggests this may

    not be the case. The purpose of this study was to explore how diagnostic accuracy differs across

    radiology professionals as a function of experience, as well as ascertain the extent to which

    changes in visual search behaviors underlie improved diagnostic outcomes.

    Methods

    Twenty radiologists (5 Attending, 5 Fellows, 10 Residents) dictated their findings on 10

    musculoskeletal cases (negative and abnormal cases included) obtained from a medical database.

    Mobile eye-tracking glasses sampled gaze behavior at 120 Hz, while Likert-scale measures of

    mental effort and confidence were obtained after each case. Key areas of interest (i.e., where the

    abnormality was located) were identified on each abnormal case, and two radiologists coded

    accuracy. Simple linear regressions were utilized to explore relationships between experience

    (i.e., resident, fellow, attending physician), diagnostic outcomes (e.g., trial time, accuracy), and

    attentional processes (e.g., fixation, saccadic behavior).

    Results

    Participants demonstrated an 89% accurate rate on negative cases and a 67% accurate rate on

    present cases, so analyses proceeded exclusively on abnormal cases. Attending physicians

    exhibited only marginally improved diagnostic accuracy on abnormal cases (67%) compared to

    individuals in the resident program (61%). Level of experience was associated with reduced trial

    time (p < .001) and increased confidence in the diagnosis (p = .004). More experienced

    individuals demonstrated fewer fixations (p =.001) of shorter duration (p =.003) on the dictation

    screen, fewer fixations the medical images (p < .001), and fewer fixations on key areas of

    interest (p = .002). Experience was also associated with increased saccadic amplitude (p = .007)

  • and decreased peak saccadic velocity (p < .001). After controlling for experience, the total

    number (p = .001), duration (p = .015), and percentage of fixations (p = .004) on key areas of

    interest was associated with improved diagnostic accuracy.

    Conclusion

    As expected, experienced radiologists spent less time diagnosing each case and were more

    confident in their diagnosis. Experience was also associated with more purposeful visual search

    behavior on the images and more efficient use of medical imaging technology. However, while

    time spent viewing information-rich areas of the medical images (i.e., the abnormality) was

    positively associated with diagnostic accuracy, it was negatively associated with experience.

    Findings suggest a physician’s confidence in their diagnosis might be misplaced when cases are

    dictated too quickly or when individuals spend insufficient time extracting relevant information

    from key areas of the visual display.

  • Impact of expertise on reading mammograms: An eye-tracking study

    Lucie Lévêque1,2 (MSc), Hilde Bosmans3 (PhD), Lesley Cockmartin3 (PhD), Hantao Liu2 (PhD)

    1School of Computer Science and Informatics, Cardiff University, United Kingdom

    2Department of Computer Science and Software Engineering, Xi’an Jiatong Liverpool University, China

    3Department of Radiology, University Hospitals KU Leuven, Belgium

    Rationale

    Breast cancer screening uses low-dose x-rays to detect cancers early, and thus to allow a more efficient treatment. It is critical to understand how medical professionals perceive and interpret mammograms with a view to reduce errors in screening mammography. Various eye-tracking studies have been undertaken in this area, presenting different experimental designs (e.g, films vs. digital mammograms, public databases vs. selected cases). A prominent topic in the literature is the comparison between experienced and less experienced readers.

    Methods

    An eye-tracking experiment was conducted with several expert radiologists, trainee radiologists, and physicists, who were asked to read 196 medio-lateral oblique (MLO) mammogram views from 98 patients. The cases were free of lesions, but the readers were not informed about this fact. After reading both left and right images of a case, the participants had to answer the following question: “refer or not refer?” by focusing their gaze on one of these options on the screen. The eye movements of the participants were recorded using a non-invasive SMI Red-m eye-tracking system.

    Results

    Gaze information was extracted from the raw eye-tracking data obtained during the experiment, including the number of fixations per stimulus, their coordinates and duration. An analysis of variance (ANOVA) was used to study the similarity between the three expert radiologists in terms of mean fixation duration. Results show no statistically significant difference between the three expert radiologists (i.e., p

  • Fig. 1: Illustration of the mean fixation duration of expert radiologists R1, R2 and R3 (in red), trainee radiologists T1, T2 and T3 (in green), and physicists P1 and P2 (in blue), averaged over all fixations recorded for all test stimuli. Error bars indicate a 95% confidence interval.

    Saliency maps, i.e., topographic representations indicating conspicuousness of scene locations, were created using the fixations obtained from the eye-tracking experiment. Each fixation location gave rise to a greyscale patch simulating the foveal vision of the human system. In a saliency map, salient regions represent where the observers focused their gaze with a higher frequency. It can be noticed on the maps that expert and trainee radiologists’ gaze patterns are concentrated, whereas physicists’ gaze patterns are more distributed over the mammogram.

    Conclusions

    An eye-tracking experiment was designed and conducted to study the impact of medical specialties and level of experience on perceptual behaviour while interpreting mammograms. Results showed that physicists have, in general, a higher dwell time than experts, whereas trainees have a lower dwell time. Furthermore, the physicists gaze patterns were more dispersed than that of the radiologists, whereas the trainees showed similar patterns to that of the radiologists.

  • The strength of the gist of the abnormal in the

    unilateral and bilateral mammograms

    Ziba Gandomkar*a , Ernest U. Ekpoa , Sarah J. Lewisa , Karla K. Evansb , Kriscia Tapiaa , Tong Lia, Seyedamir

    Tavakoli Tabaa, Jeremy M. Wolfec , Patrick C. Brennana a Medical Imaging Sciences, Faculty of Health Sciences, University of Sydney, Sydney, NSW, Australia;

    BreastScreen Reader Assessment Strategy (BREAST), University of Sydney, Sydney, NSW, Australia. b Department of Psychology, University of York, Heslington, York, UK. c Visual Attention Lab, Harvard Medical School, Cambridge, MA, USA.

    Rationale Experts can perceive the gist of the abnormal in the negative prior unilateral mammograms of women who subsequently

    diagnosed with breast cancer. Here, we compared the strength of the gist from unilateral and bilateral mammograms.

    Methods Seventeen radiologists viewed 60 cases in two different experiments (GistUnilateral and GistBilateral). In GistUnilateral, 60

    unilateral craniocaudal mammograms were presented in a randomly generated sequence for a half-second to the

    radiologists, who were asked to provide an abnormality probability for each case on a scale from 0 (confident normal)

    to 100 (confident abnormal). In GistBilateral, we presented bilateral mammograms of the same cases using a similar

    experimental protocol. Readers were randomly assigned to two groups, the first did the unilateral experiment first while

    the second group did the bilateral experiment first. Four categories of mammograms (15 cases per category) were

    included: 1) Cancer cases, which contained biopsy-proven malignancies; 2) Normal cases, which remained normal at

    least for next two years; 3) Prior_Vis cases, which contained retrospectively visible non-actionable cancer signs; 4)

    Prior_Invis cases, which did not contain visible cancer signs. Mammograms from the last two groups were from women

    who subsequently developed biopsy-proven malignancies. For each radiologist and each category, the Pearson

    correlation between the unilateral and bilateral gist responses was calculated. In each experiment, three pair-wise

    classifications, i.e. Cancer/Normal, Prior_Vis/Normal, Prior_Invis/Normal were analysed. A paired, two-sided

    Wilcoxon Signed Rank test was used to investigate whether the values of area under receiver operating characteristic

    curves (AUC) were at an above-chance (AUC=0.5) level. The same test was also used to show whether the AUC values

    from two experiments differed significantly for each pair-wise classification. For each radiologist and each case, we also

    calculated the average of the two gist responses recorded in the two experiments and produced GistAVE, i.e.

    ½(GistUnilateral+GistBilateral).

    Results The averages of correlation coefficient across 17 readers for Cancer, Normal, Prior_Vis, Prior_Invis, and all cases were

    0.17 (CI=0.03-0.31), 0.26 (CI=0.09-0.43), 0.30 (CI=0.12-0.49), 0.35 (CI=0.21-0.49), and 0.35 (CI=0.25-0.44),

    respectively. The order of median AUCs in Cancer/Normal and Prior_Vis/Normal classifications from the highest to the

    lowest was GistAVE>GistUnilateral>GistBilateral. All differences except the difference for GistAVE and GistUnilateral in

    Prior_Vis/Normal classification were significant. In Prior_Invis/Normal classification, the order was

    GistAVE>GistBilateral>GistUnilateral. None of the differences in the AUC values for Prior_Invis/Normal classification were

    significant. On average, the AUCs of Cancer/Normal, Prior_Vis/Normal, Prior_Invis/Normal classifications based on

    GistUnilateral respectively dropped by 8%±6%, 10%±8%, and 1%±8% in the bilateral experiment while these AUCs

    increased by 5%±3%, 2%±4%, and 4%±6% after averaging two signals. On average, the AUCs of Cancer/Normal,

    Prior_Vis/Normal, Prior_Invis/Normal classifications based on GistAVE were 82%±4%, 74%±3%, and 67%±5%.

    Conclusions There is weak association between the gist signal from unilateral and bilateral mammograms. The signal was stronger in

    the unilateral experiment. When two signals were averaged, the AUCs increased. The improvement could be as a result

    of cancelling out random noise by averaging two values. Further investigation of intra-reader variability and exploring

    the AUC when unilateral gist responses of a reader were averaged in multiple experiments is required.

  • Characterizing Image Features That Allow for Rapid Breast Cancer Detection Even Before Appearance of Visibly Actionable Lesions

    Karla K. Evans1, & Jeremy M. Wolfe23

    1Department of Psychology, University of York 2Department of Surgery, Brigham & Women's Hospital

    3Department of Ophthalmology, Harvard Medical School

    Rational Expert radiologists can detect a “global gist signal” in mammograms allowing them to distinguish normal from abnormal cases at above chance levels even in mammograms acquired before the development of visible, actionable lesions (“priors”). In previous studies with filtered images, we found that the gist signal was strong in the high spatial frequencies, not in the low frequencies. In the present study, we seek to more precisely isolate the spatial frequency information that radiologists are using to make successful gist decisions.

    Methods Radiologists were presented with 120 bilateral mammograms. Half were completely normal and remained normal at least for four subsequent years. The other half were abnormal. These were subdivided equally into three different types of mammograms: subtle cancers, obvious cancers and mammograms acquired 3 years prior to the mammograms that showed visibly actionable cancer. Radiologists were asked to rate the abnormality of the images on a 0-100 scale after exposure of 500 msec. We collected ratings on this set from 21 radiologist at different experience levels. They viewed the full set in three different conditions across 3 blocks. The different conditions were; original mammograms without manipulation, mammograms maintaining spatial frequencies above 0.5 cycle per visual angle degree (cpd) and lastly mammograms maintaining spatial frequencies only above 1 cpd. The order of the blocks was counterbalanced across participants.

    Results Using the normal cases for the estimate of false positives in all cases, we can calculate d’ for each of the three type of abnormal case and for each filter condition. The results are shown in the table for all 21 observers and, in parentheses, for the 16 observers who read more than 1000 cases per year:

    Original Freq >0.5 cpd Freq > 1.0 cpd Subtle .79 (.94) .61(.84) .32 (.29) Obvious .85 (1.12) .88 (1.10) .62 (.62) Priors .05 (.18) .67 (.88) .05 (.04)

    The most interesting finding is that performance improves for Priors when frequencies below 0.5 cpd are filtered out (F(1,15)=23.01, p

  • The “Gist” in Prostate Volumetric

    Imaging

    Melissa Treviño, Ph.D.1 and Todd S Horowitz, Ph.D1

    Marcin Czarniecki, M.D.2

    Ismail B Turkbey, M.D.3 and Peter L Choyke, M. D.3

    1Basic Biobehavioral and Psychological Science Branch, National Cancer Institute

    2Medstar Georgetown University Hospital 3Molecular Imaging Branch, National Cancer Institute

    Rationale

    Numerous cognitive psychology studies have demonstrated that we can determine the global

    context of complex real-world scenes (“scene gist”) in a brief glimpse. Similarly, radiologists

    can identify the “gist” of a radiograph (i.e., abnormal vs. normal) better than chance in breast,

    lung, and prostate images presented for half a second. However, this rapid perceptual gist

    processing has only been demonstrated in static two-dimensional images. Standard practice in

    radiology is moving to three-dimensional (3D) “volumetric” modalities. In volumetric imaging,

    such as multiparametric MRI (mpMRI), used in prostate screening, a single case consists of a

    series of image slices through the body that are assembled into a virtual stack. Radiologists can

    acquire a 3D representation of organ structures by scrolling through stacks. Can radiologists

    extract perceptual gist from this more complex imaging modality?

    Methods

    We tested 14 radiologists with prostate mpMRI experience on 56 cases, each comprising a stack

    of 26 T2-weighted prostate mpMRI slices. Lesions (Gleason scores 6-9) were present in 50% of

    cases. In practice, lesions are more prevalent and easier to detect in the peripheral zone (PZ) of

    the prostate than the transition zone (TZ). For lesion present trials, we used a PZ:TZ ratio of 5:2.

    A trial consisted of a single movie of the stack. After each case, participants localized the

    cancerous lesion on a prostate sector map, then indicated whether a cancerous lesion was

    presented, and gave a confidence rating. Presentation duration was varied between groups.

    Radiologists were divided into three groups who viewed cases presented at either 48 ms/slice

    (20.8 Hz, n = 5), 96 ms/slice (10.4 Hz, n = 5), or 144 ms/slice (6.9 Hz, n = 4).

    Results

    Radiologists could detect lesions in both zones above chance, with PZ producing higher d’

    scores (d’ [95% CI]: PZ = 0.73 [0.46 – 1.00]; TZ = .50 [0.08 – 0.91]). Detection performance did

    not vary significantly with slice duration F(2,11) = 0.74, p = 0.50 (d’ [95% CI]: 48 ms = 0.64 [-

    0.22 – 1.50]; 96 ms = 0.78 [0.31 – 1.24]; 144 ms = 0.38 [0.06 – 0.71]) Localization accuracy

  • (chance ~= 0.08) was 0.40, 0.47, and 0.48, respectively. While the interaction between slice

    durations and zone did not reach significance, F(2,11) = 3.53, p =.07, detection peaked at 48 ms

    for PZ and 96 ms for TZ (see Figure 1).

    Conclusions

    Our data indicate that radiologists do develop gist perception for 3D modalities. As expected,

    detecting peripheral lesions was easier than transition lesions. Surprisingly, slower presentation

    rates did not improve performance. There may be an optimal framerate for processing 3D

    anatomical information, depending on anatomical site and/or lesion conspicuity, but further

    research is needed.

    Figure 1. d’ for lesions located in the PZ & TZ as a function of slice duration. Error bars display

    95% confidence intervals.

    -1

    -0.5

    0

    0.5

    1

    1.5

    2

    48 96 144

    d'

    PZ & TZ d'

    PZ TZ

  • Perceptual gist in multiparametric imaging

    Todd S Horowitz, Ph.D1

    and Melissa Treviño, Ph.D.1

    Marcin Czarniecki, M.D.2

    Ismail B Turkbey, M.D.3

    and Peter L Choyke, M. D.3

    1Basic Biobehavioral and Psychological Science Branch, National Cancer Institute

    2Medstar Georgetown University Hospital

    3Molecular Imaging Branch, National Cancer Institute

    Rationale

    Humans can extract the “gist” of a visual scene in a fraction of a second, categorizing it as, say, indoor

    or outdoor, open or closed. This information can facilitate recognition of objects and guide future eye

    movements. Recent studies have demonstrated an analogous ability for radiologists to classify briefly

    presented images (e.g., mammograms) as “normal” or “abnormal”. Previously, we have extended that

    finding to prostate multiparametric magnetic resonance imaging (mpMRI). MpMRI combines

    anatomical information from T2

    -weighted (T2

    W) sequences, and functional sequences such as

    conventional diffusion-weighted imaging (DWI) and the apparent diffusion coefficient (ADC).

    Standard workstation formats present these imaging modalities side-by-side. Our goal was to study the

    nature of mpMRI gist in different modalities. Which modality generates the strongest gist? Are

    anatomical or functional sequences more useful? Do these modalities provide independent gist

    information? Furthermore, we tested the hypothesis that experts performed better because they were

    more likely to fixate lesions during the brief exposure.

    Methods

    Experiment 1:Three groups of five radiologists with prostate mpMRI experience were shown 100

    images from a single modality (T2

    W, DWI, or ADC). The same cases were used across groups. Lesions

    (Gleason scores 6-9) were present in 50% of the images. Images were taken from the base, mid, or

    apex regions of the prostate. Stimuli were presented for 500 ms, followed by a prostate sector map.

    Participants first localized the lesion on the sector map (whether or not they saw a lesion), indicated

  • whether or not a lesion was present, then provided a confidence rating. In Experiment 2, seven novice

    observers with no radiological training and two radiologists with prostate mpMRI experience

    performed the same task on a set of 100 T2

    W images while a Tobii eye tracker recorded their eye

    movements.

    Results

    Experiment 1: All three groups detected lesions better than chance [d' mean(sd): T2

    W 0.83(0.51); DWI

    0.80 (0.29); ADC 1.16(0.31)]. Partial correlations between modalities, holding Gleason score

    constant, were moderate and significant: [r: T2

    W x DWI .23; T2

    W x ADC .35; DWI x ADC .37].

    Experiment 2: One novice was excluded due to poor-quality eye tracking data. Radiologists again

    demonstrated above chance lesion detection [d’ mean (sd) 1.30 (0.47)], while novices did not [d’ mean

    (sd) 0.01 (0.40)]. As expected, radiologists were more likely to fixate the lesion on target- (i.e., lesion-)

    present trials [radiologists: 60%; novices: 26%]. However, this advantage was not driving their superior

    lesion detection. For both groups, lesion detection was no more likely when fixating the lesion than

    when failing to fixate (chi-sq, p: radiologists 0.92, .33; novices 0.06, .80).

    Conclusions

    These results indicate that the ADC modality generates the strongest gist signal, but both anatomical

    and functional sequences can contribute to mpMRI gist. The moderate correlation across modalities

    suggests redundancy. Importantly, performance was unaffected by whether or not observers fixated the

    lesion. Radiologists make more informed eye movements even in brief glimpses, but eye movements

    do not drive gist perception. Future studies should explore how gist perception drives eye movements

    across imaging modalities.

  • The Sum or Parts: Exploring Radiologist

    Reliance on Peripheral Vision and

    Holistic Processing through Gaze-

    Contingent Viewing

    Grace L. Nicora B.A.¹, Victoria Wilson¹, Dustin Stokes PhD², Jeanine Stefanucci PhD¹, &

    Trafton Drew PhD¹

    Department of Psychology¹ and Department of Philosophy² at the University of Utah

    Rationale

    The holistic processing theory of expertise posits that experts can quickly take in global

    information from their stimuli and then use that information to guide their search for a target.

    This perceptual processing advantage is thought to underlie the superior performance associated

    with expertise. In radiology, this theory posits that when presented with a chest x-ray, expert

    radiologists process the Gestalt (or whole) of the image within as little as 100 milliseconds. This

    initial Gestalt impression is thought to quickly guide attention to regions that deviate from

    normal, thereby enabling fast and accurate performance. Importantly, the holistic processing

    theory predicts that experts rely on their peripheral vision to quickly extract the Gestalt

    impression of a case, and that this ability does not generalize to tasks outside of their expertise.

    Methods

    We tested this theory with radiologists through the use of a gaze-contingent viewing (GCV)

    window. A GCV window allows us to restrict the amount of peripheral information available to

    the viewer. In this design, radiologists were only able to see a circular region (5° of visual angle)

    where they were actively fixating. All other parts of the image were occluded. In order to see

    more of the image, radiologists had to move their eyes to reveal different parts of the image.

    Radiologists searched two types of images: one was a chest x-ray and the other a non-

    radiographic image. To test the reliance on peripheral vision, radiologists were exposed to a

    normal viewing condition and the GCV condition. Holistic processing theory predicts these

    experts will be impaired in the presence of the GCV window to a greater extent for the chest x-

    ray image compared to the control image.

    Results

    As expected, GCV led to increased viewing time in both tasks. Critically, this cost was larger for

    the radiology task than the control task. We also observed an interaction in saccadic amplitude.

  • GCV led to shorter saccades or both conditions, but the effect was much larger when viewing

    chest radiographs.

    Conclusions

    Our results support the holistic processing theory of expertise in radiology. As predicted by the

    holistic processing theory, experts were more impaired in their domain of exper