ajr%2e10%2e5718

Upload: laurentiu-radoi

Post on 03-Apr-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 ajr%2E10%2E5718

    1/3

    AJR:196 , June 2011 W781

    to automate the entire process or extraction

    o CT dose inormation rom CT dose report

    images using Microsot Visual Basic or Ap-

    plications (VBA). The tool was able to drive

    MODI to conduct OCR, to parse raw OCR

    results or text extraction, to make error cor-

    rections, and to perorm a quality check on

    the retrieved data. Multiple CT dose report

    images could be batch processed, and all re-

    sults could be saved into an Excel spread-

    sheet (Microsot) or convenient data analy-

    sis. The development was summarized and

    urther illustrated with a case study, which

    included a large number o patient examina-

    tions rom a Siemens Healthcare CT scanner.

    Materials and Methods

    Figure 1 shows a owchart o text recognition

    o CT dose report images and radiation dose in-

    ormationextraction. It has the ollowing steps:1. Conduct OCR. Based on patient dose report

    images routed to a hard drive, each image was

    processed by the OCR engine. Results were as-

    signed to a string, i.e., a sequence o characters.

    2. Parse. Sample cases were manually reviewed to

    identiy the OCR output ormats and error pat-terns in character recognition. Locations o a se-

    ries o keywords were searched in the OCR out-

    put string. They were used to identiy the output

    ormat, scanning series, and starting positions

    o the items o interest, such as total mAs and

    total DLP. The starting positions and the speci-

    fed character lengths were used to extract cor-

    responding items. The examination date was

    retrieved using a date ormat pattern.

    3. Perorm error correction. The OCR outputs

    Automated Extraction of RadiationDose Information From CT DoseReport Images

    Xinhua Li1

    Da Zhang

    Bob Liu

    Li X, Zhang D, Liu B

    1All authors: Department o Radiology, Division o

    Diagnostic Imaging Physics, Massachusetts GeneralHospital, Harvard Medical School, 55 Fruit St, Boston,

    MA 02114. Address correspondence to B. Liu

    ([email protected]).

    Medical Physics and Informatics Technical Innovation

    WEB

    This is a Web exclusive article.

    AJR2011; 196:W781W783

    0361803X/11/1966W781

    American Roentgen Ray Society

    Exposure to radiation in CT exam-

    inations has become a topic o

    high interest in the past ew

    years [13]. To assess the associ-

    ated radiation risks and to evaluate CT ex-

    amination protocols or adopting CT dose re-

    duction strategies, analyses o CT dose data

    are oten required or large numbers o pa-

    tients in many clinical and research projects.

    Quantities o interest include the CT dose in-

    dex (CTDI) or scanning protocol evaluation

    and dose-length product (DLP) or eective

    dose estimation. Historically, these data have

    not been stored in machine-editable ormats

    but in CT dose report images. These studies

    are labor intensive and error prone because

    o the need or manual review o many im-

    ages. In this article, we report the develop-

    ment o an automated tool or extracting CT

    examination inormation rom the CT dose

    report images.

    Optical character recognition (OCR) is a

    technique that is used to translate scanned

    images o text into computer-editable and

    searchable text. Many o the currently avail-

    able OCR packages do not perorm well withtext in the small onts usually ound in CT

    dose report images, such as those rom Sie-

    mens Healthcare and GE Healthcare. Ater a

    trial-and-error process among dierent OCR

    packages, the OCR engine in the Microsot

    Ofce Document Imaging (MODI) library

    [4], which showed relatively superior peror-

    mance with the small-ont text in CT dose re-

    port images, was chosen to perorm text rec-

    ognition. In this work, we developed a tool

    Keywords: CT dose image, optical character recognition,

    text ext raction

    DOI:10.2214/AJR.10.5718

    Received September 3, 2010; accepted ater revision

    November 9, 2010.

    OBJECTIVE. The purpose o this article is to describe the development o an automated

    tool or retrieving texts rom CT dose report images.

    CONCLUSION. Optical character recognition was adopted to perorm text recognitions

    o CT dose report images. The developed tool is able to automate the process o analyzing

    multiple CT examinations, including text recognition, parsing, error correction, and export-

    ing data to spreadsheets. The results were precise or total dose-length product (DLP) and

    were about 95% accurate or CT dose index and DLP o scanned series.

    Li et al.Radiation Dose Inormation

    Medical Physics and InormaticsTechnical Innovation

  • 7/28/2019 ajr%2E10%2E5718

    2/3

    W78 2 AJR:196, June 2011

    Li et al.

    were not 100% accurate but contained errors.

    Thereore, corrections were necessary or high-

    er accuracy. They were perormed on the basis

    o error patterns identifed in step 2.

    4. Assign a quality ag (iFlag). The results o the

    previous step were checked or consistency.

    When total DLP was identifed in a CT exami-

    nation, it should be equal to a sum o DLP values

    o all series. Otherwise, iFlag with a value o 1

    was assigned to the examination. Moreover, the

    mAs value o a series should not be larger than

    the total mAs identifed or the entire examina-

    tion. Otherwise, iFlag with a value o 2 was as-

    signed. These agged cases were a small per-centage o the total and could be corrected either

    by manually checking the original images or by

    additional programming eort.

    5. Export. Results o text extraction were exported

    to Excel.

    A Visual Basic macro was written to implement

    all these steps. Pseudocode shown in Appendix 1

    illustrates the key implementations o the macro.

    The development o this automated tool was

    urther detailed in a case study with a Siemens

    Healthcare CT scanner. Figure 2A shows a CT

    dose report image rom a Somatom Defnition

    scanner (Siemens Healthcare). It contains the ex-

    amination date, total mAs, and total DLP. Eachseries had the ollowing items: description, se-

    ries number, kV, time per rotation, and collimated

    slice. Scanning series other than topogram also

    contained mAs, volume CTDI, and DLP. Reer-

    ence mAs might also be present in some series.

    Descriptions o the series included topogram,

    CaScSeq, TestBolus, and CorCTA. The

    OCR output o the CT dose report image is shown

    in Figure 2B, in which errors are underlined.

    These results were moderately accurate or the CT

    dose report image but contained errors with pat-

    terns that could be summarized as ollows: Wrong

    characters, such as l being recognized as I and

    kV being recognized as IN; split texts, such

    as 100 being recognized as 1 00 in the line

    o TestBolus and 12D being recognized as 1

    2D in the line o CorCTA. (From other cases,

    we ound more descriptions [Calcium Score,

    DS CorCTA, DELAY, and so on]) and addi-

    tional error patterns; and wrong characters, such

    as 17 being recognized as i7, 7D as ID,

    and 1D as ID. Series data were reormatted

    or about 2.5% o all images. We also noticed that

    total DLP was not present in about 60% o the Sie-

    mens Healthcare CT dose report images.

    The OCR output ormat was identifed by the

    location o keywords in the string. When series

    data were not reormatted, the record o a series

    was identifed as text between two descriptions orater the last description in the OCR output. Series

    number, kV, and mAs were determined sequen-

    tially; collimated slice, time per rotation, DLP,

    and volume CTDI were located reversely. In this

    approach, reerence mAs was not misidentifed as

    another parameter i it was present in a series. For

    the cases in which OCR output ormats were di-

    erent rom original images, the parameters o a

    series were identifed according to the reormat-

    ted patterns o the OCR results. Wrong charac-

    ters were corrected according to error patterns

    that were recognized on the basis o the analysis

    o sample cases in the parsing step. Split texts in

    a series were identifed and then corrected on thebasis o the ranges o parameters, e.g., 80140 or

    kV, and the lengths o numeric expressions. These

    corrections greatly improved the accuracy o dose

    inormation extraction.

    Results

    Figure 3 shows a screen shot o the CT

    dose inormation extracted rom the CT dose

    report image shown in Figure 2. All impor-

    tant parameters o the CT examination were

    exported to an Excel spreadsheet. The entry

    o iFlag was not flled because the retrieved

    DLP and mAs o this case were accurate. The

    developed tool processed multiple CT dose re-

    port images in sequence and saved all results

    into a single spreadsheet. About 1000 CT

    dose report images rom the Siemens Health-

    care CT scanner were analyzed. The results

    o total DLP had an accuracy o 100%. The

    values o CTDI and DLP o the scanning se-

    ries were about 95% accurate because o the

    OCR split-text errors. They could be urther

    improved by additional eort in case review

    and programming.

    Discussion

    This ar ticle has described a method or de-

    veloping automated extraction o radiation

    dose inormation rom CT dose record im-ages. Among the OCR packages currently

    available or conducting text recognition

    rom images, the OCR engine in the MODI

    library is capable o text recognition rom

    the CT dose report images. The library is in-

    cluded in the Microsot Ofce suite. The raw

    OCR outputs are moderately accurate but

    contain errors, such as split texts and wrong

    characters. Sample cases can be manually re-

    viewed to identiy the OCR output ormats

    and error patterns in character recognition.

    On the basis o this review, robust algorithms

    can then be developed to extract texts and

    conduct error correction. VBA was chosenbecause it is ully integrated in the Microsot

    Conduct OCR

    Parse

    Make error correction

    Assign quality flag

    Export

    Fig. 1Flowchart shows radiation dose inormationextraction rom CT dose report images. OCR = opticalcharacter recognition.

    Fig. 3Screen shot shows CT dose inormationextracted rom CT dose report image in Figure 2A.

    A

    Fig. 2CT dose report image rom Somatom Defnition scanner (Siemens Healthcare).A and B, Image (A) and optical character recognition(OCR)-generated (B) texts or dose report image.Text recognition errors are underlined in B. CDTIvol = volume CT dose index, DLP = dose-length product,TI = time per rotat ion, cSL = collimated slice.

    B

  • 7/28/2019 ajr%2E10%2E5718

    3/3

    AJR:196 , June 2011 W78 3

    Radiation Dose Information

    Ofce amily and is convenient or the pur-

    pose o automation. In a case study, the de-

    veloped Visual Basic macro was able to drive

    the MODI to perorm OCR, process the

    OCR outputs or text extraction, and perorm

    error correction. Multiple CT dose report im-

    ages were batch processed, and all results

    were saved into an Excel spreadsheet orconvenient data analysis. The tool can be

    easily adapted or other CT examinations

    and or dierent CT scanners by ollowing

    the approach described in this article. CT

    dose data analysis on a large patient popula-

    tion can thereore be greatly eased.

    References

    1. Brenner DJ, Hall EJ. Computed tomography: an

    increasing source o radiation exposure.N Engl J

    Med2007; 357:22772284

    2. National Council on Radiation Protection and

    Measurements.Ionizing radiation exposure of the

    population of the United States: 20 06. Bethesda,

    MD: National Council on Radiation Protection

    and Measurements, 2009

    3. Smith-Bindman R, Lipson J, Marcus R, et al. Ra-

    diation dose associated with common computed

    tomography examinations and the associated lie-

    time attributable risk o cancer. Arch Intern Med

    2009; 169:20782086

    4. Microsot Website. Microsot Ofce Document

    Imaging Visual Basic Reerence. msdn.microsot.

    com/en-us/library/aa279424(v=ofce.11).aspx.

    Accessed August 30, 2010

    APPENDIX 1: Pseudocode of the Microsoft Visual Basic Macro

    Loop through all image fles

    For each Image in a Folder

    Create an object o a specifed type

    Set MIDoc = CreateObject(MODI.document)

    Create a new document

    MIDoc.Create Image Perorm optical character recognition on the entire image

    MIDoc.Images(0).OCR

    Save the OCR results into a string

    OcrText = MIDoc.Images(0).Layout.Text

    MIDoc.Close

    Search the frst occurrence o an item by some characters

    ItemLocation=InStr(StartingPositionForSearch,OcrText,ItemKeyword)

    Retrieve a specifed number o characters in a string

    ItemContent=mid(OcrText,ItemLocation,ItemLength)

    Conduct direct correction should a specifed error occur

    ExamDate = Replace(ExamDate,0ct,Oct)

    Detect additional errors and make corrections i necessary

    I statements and/or loop structures or checking errors and/or perorming corrections

    Write a result to a cell in an Excel spreadsheetWorkSheets.Range($B$ & CStr(Line)).Value=ItemContent

    ...

    Next