ajr%2e10%2e5718

7/28/2019 ajr%2E10%2E5718

1/3

AJR:196 , June 2011 W781

to automate the entire process or extraction

o CT dose inormation rom CT dose report

images using Microsot Visual Basic or Ap-

plications (VBA). The tool was able to drive

MODI to conduct OCR, to parse raw OCR

results or text extraction, to make error cor-

rections, and to perorm a quality check on

the retrieved data. Multiple CT dose report

images could be batch processed, and all re-

sults could be saved into an Excel spread-

sheet (Microsot) or convenient data analy-

sis. The development was summarized and

urther illustrated with a case study, which

included a large number o patient examina-

tions rom a Siemens Healthcare CT scanner.

Materials and Methods

Figure 1 shows a owchart o text recognition

o CT dose report images and radiation dose in-

ormationextraction. It has the ollowing steps:1. Conduct OCR. Based on patient dose report

images routed to a hard drive, each image was

processed by the OCR engine. Results were as-

signed to a string, i.e., a sequence o characters.

2. Parse. Sample cases were manually reviewed to

identiy the OCR output ormats and error pat-terns in character recognition. Locations o a se-

ries o keywords were searched in the OCR out-

put string. They were used to identiy the output

ormat, scanning series, and starting positions

o the items o interest, such as total mAs and

total DLP. The starting positions and the speci-

fed character lengths were used to extract cor-

responding items. The examination date was

retrieved using a date ormat pattern.

3. Perorm error correction. The OCR outputs

Automated Extraction of RadiationDose Information From CT DoseReport Images

Xinhua Li1

Da Zhang

Bob Liu

Li X, Zhang D, Liu B

1All authors: Department o Radiology, Division o

Diagnostic Imaging Physics, Massachusetts GeneralHospital, Harvard Medical School, 55 Fruit St, Boston,

MA 02114. Address correspondence to B. Liu

([email protected]).

Medical Physics and Informatics Technical Innovation

WEB

This is a Web exclusive article.

AJR2011; 196:W781W783

0361803X/11/1966W781

American Roentgen Ray Society

Exposure to radiation in CT exam-

inations has become a topic o

high interest in the past ew

years [13]. To assess the associ-

ated radiation risks and to evaluate CT ex-

amination protocols or adopting CT dose re-

duction strategies, analyses o CT dose data

are oten required or large numbers o pa-

tients in many clinical and research projects.

Quantities o interest include the CT dose in-

dex (CTDI) or scanning protocol evaluation

and dose-length product (DLP) or eective

dose estimation. Historically, these data have

not been stored in machine-editable ormats

but in CT dose report images. These studies

are labor intensive and error prone because

o the need or manual review o many im-

ages. In this article, we report the develop-

ment o an automated tool or extracting CT

examination inormation rom the CT dose

report images.

Optical character recognition (OCR) is a

technique that is used to translate scanned

images o text into computer-editable and

searchable text. Many o the currently avail-

able OCR packages do not perorm well withtext in the small onts usually ound in CT

dose report images, such as those rom Sie-

mens Healthcare and GE Healthcare. Ater a

trial-and-error process among dierent OCR

packages, the OCR engine in the Microsot

Ofce Document Imaging (MODI) library

[4], which showed relatively superior peror-

mance with the small-ont text in CT dose re-

port images, was chosen to perorm text rec-

ognition. In this work, we developed a tool

Keywords: CT dose image, optical character recognition,

text ext raction

DOI:10.2214/AJR.10.5718

Received September 3, 2010; accepted ater revision

November 9, 2010.

OBJECTIVE. The purpose o this article is to describe the development o an automated

tool or retrieving texts rom CT dose report images.

CONCLUSION. Optical character recognition was adopted to perorm text recognitions

o CT dose report images. The developed tool is able to automate the process o analyzing

multiple CT examinations, including text recognition, parsing, error correction, and export-

ing data to spreadsheets. The results were precise or total dose-length product (DLP) and

were about 95% accurate or CT dose index and DLP o scanned series.

Li et al.Radiation Dose Inormation

Medical Physics and InormaticsTechnical Innovation

7/28/2019 ajr%2E10%2E5718

2/3

W78 2 AJR:196, June 2011

Li et al.

were not 100% accurate but contained errors.

Thereore, corrections were necessary or high-

er accuracy. They were perormed on the basis

o error patterns identifed in step 2.

4. Assign a quality ag (iFlag). The results o the

previous step were checked or consistency.

When total DLP was identifed in a CT exami-

nation, it should be equal to a sum o DLP values

o all series. Otherwise, iFlag with a value o 1

was assigned to the examination. Moreover, the

mAs value o a series should not be larger than

the total mAs identifed or the entire examina-

tion. Otherwise, iFlag with a value o 2 was as-

signed. These agged cases were a small per-centage o the total and could be corrected either

by manually checking the original images or by

additional programming eort.

5. Export. Results o text extraction were exported

to Excel.

A Visual Basic macro was written to implement

all these steps. Pseudocode shown in Appendix 1

illustrates the key implementations o the macro.

The development o this automated tool was

urther detailed in a case study with a Siemens

Healthcare CT scanner. Figure 2A shows a CT

dose report image rom a Somatom Defnition

scanner (Siemens Healthcare). It contains the ex-

amination date, total mAs, and total DLP. Eachseries had the ollowing items: description, se-

ries number, kV, time per rotation, and collimated

slice. Scanning series other than topogram also

contained mAs, volume CTDI, and DLP. Reer-

ence mAs might also be present in some series.

Descriptions o the series included topogram,

CaScSeq, TestBolus, and CorCTA. The

OCR output o the CT dose report image is shown

in Figure 2B, in which errors are underlined.

These results were moderately accurate or the CT

dose report image but contained errors with pat-

terns that could be summarized as ollows: Wrong

characters, such as l being recognized as I and

kV being recognized as IN; split texts, such

as 100 being recognized as 1 00 in the line

o TestBolus and 12D being recognized as 1

2D in the line o CorCTA. (From other cases,

we ound more descriptions [Calcium Score,

DS CorCTA, DELAY, and so on]) and addi-

tional error patterns; and wrong characters, such

as 17 being recognized as i7, 7D as ID,

and 1D as ID. Series data were reormatted

or about 2.5% o all images. We also noticed that

total DLP was not present in about 60% o the Sie-

mens Healthcare CT dose report images.

The OCR output ormat was identifed by the

location o keywords in the string. When series

data were not reormatted, the record o a series

was identifed as text between two descriptions orater the last description in the OCR output. Series

number, kV, and mAs were determined sequen-

tially; collimated slice, time per rotation, DLP,

and volume CTDI were located reversely. In this

approach, reerence mAs was not misidentifed as

another parameter i it was present in a series. For

the cases in which OCR output ormats were di-

erent rom original images, the parameters o a

series were identifed according to the reormat-

ted patterns o the OCR results. Wrong charac-

ters were corrected according to error patterns

that were recognized on the basis o the analysis

o sample cases in the parsing step. Split texts in

a series were identifed and then corrected on thebasis o the ranges o parameters, e.g., 80140 or

kV, and the lengths o numeric expressions. These

corrections greatly improved the accuracy o dose

inormation extraction.

Results

Figure 3 shows a screen shot o the CT

dose inormation extracted rom the CT dose

report image shown in Figure 2. All impor-

tant parameters o the CT examination were

exported to an Excel spreadsheet. The entry

o iFlag was not flled because the retrieved

DLP and mAs o this case were accurate. The

developed tool processed multiple CT dose re-

port images in sequence and saved all results

into a single spreadsheet. About 1000 CT

dose report images rom the Siemens Health-

care CT scanner were analyzed. The results

o total DLP had an accuracy o 100%. The

values o CTDI and DLP o the scanning se-

ries were about 95% accurate because o the

OCR split-text errors. They could be urther

improved by additional eort in case review

and programming.

Discussion

This ar ticle has described a method or de-

veloping automated extraction o radiation

dose inormation rom CT dose record im-ages. Among the OCR packages currently

available or conducting text recognition

rom images, the OCR engine in the MODI

library is capable o text recognition rom

the CT dose report images. The library is in-

cluded in the Microsot Ofce suite. The raw

OCR outputs are moderately accurate but

contain errors, such as split texts and wrong

characters. Sample cases can be manually re-

viewed to identiy the OCR output ormats

and error patterns in character recognition.

On the basis o this review, robust algorithms

can then be developed to extract texts and

conduct error correction. VBA was chosenbecause it is ully integrated in the Microsot

Conduct OCR

Parse

Make error correction

Assign quality flag

Export

Fig. 1Flowchart shows radiation dose inormationextraction rom CT dose report images. OCR = opticalcharacter recognition.

Fig. 3Screen shot shows CT dose inormationextracted rom CT dose report image in Figure 2A.

A

Fig. 2CT dose report image rom Somatom Defnition scanner (Siemens Healthcare).A and B, Image (A) and optical character recognition(OCR)-generated (B) texts or dose report image.Text recognition errors are underlined in B. CDTIvol = volume CT dose index, DLP = dose-length product,TI = time per rotat ion, cSL = collimated slice.

B

7/28/2019 ajr%2E10%2E5718

3/3

AJR:196 , June 2011 W78 3

Radiation Dose Information

Ofce amily and is convenient or the pur-

pose o automation. In a case study, the de-

veloped Visual Basic macro was able to drive

the MODI to perorm OCR, process the

OCR outputs or text extraction, and perorm

error correction. Multiple CT dose report im-

ages were batch processed, and all results

were saved into an Excel spreadsheet orconvenient data analysis. The tool can be

easily adapted or other CT examinations

and or dierent CT scanners by ollowing

the approach described in this article. CT

dose data analysis on a large patient popula-

tion can thereore be greatly eased.

References

1. Brenner DJ, Hall EJ. Computed tomography: an

increasing source o radiation exposure.N Engl J

Med2007; 357:22772284

2. National Council on Radiation Protection and

Measurements.Ionizing radiation exposure of the

population of the United States: 20 06. Bethesda,

MD: National Council on Radiation Protection

and Measurements, 2009

3. Smith-Bindman R, Lipson J, Marcus R, et al. Ra-

diation dose associated with common computed

tomography examinations and the associated lie-

time attributable risk o cancer. Arch Intern Med

2009; 169:20782086

4. Microsot Website. Microsot Ofce Document

Imaging Visual Basic Reerence. msdn.microsot.

com/en-us/library/aa279424(v=ofce.11).aspx.

Accessed August 30, 2010

APPENDIX 1: Pseudocode of the Microsoft Visual Basic Macro

Loop through all image fles

For each Image in a Folder

Create an object o a specifed type

Set MIDoc = CreateObject(MODI.document)

Create a new document

MIDoc.Create Image Perorm optical character recognition on the entire image

MIDoc.Images(0).OCR

Save the OCR results into a string

OcrText = MIDoc.Images(0).Layout.Text

MIDoc.Close

Search the frst occurrence o an item by some characters

ItemLocation=InStr(StartingPositionForSearch,OcrText,ItemKeyword)

Retrieve a specifed number o characters in a string

ItemContent=mid(OcrText,ItemLocation,ItemLength)

Conduct direct correction should a specifed error occur

ExamDate = Replace(ExamDate,0ct,Oct)

Detect additional errors and make corrections i necessary

I statements and/or loop structures or checking errors and/or perorming corrections

Write a result to a cell in an Excel spreadsheetWorkSheets.Range($B$ & CStr(Line)).Value=ItemContent

...

Next

ajr%2e10%2e5718

Documents