  • 7/30/2019 Example of Title Page(12-13)



    Text extraction from images


    anchal agarwal (0906331011)

    reetika shukla (0906331076)

    shiv kumar (0906331)

    vimal kumar(0906331)

    Under the Guidance of


    Submitted to the Department of Electronics & Communication

    in partial fulfillment of the requirements

    for the degree of

    Bachelor of Technology


    Electronics & Communication Engineering

    Gautam Buddh Technical University

    December, 2012

  • 7/30/2019 Example of Title Page(12-13)




    ACKNOWLEDGEMENT .................................................................................. i

    ABSTRACT ........................................................................................................... ii

    LIST OF TABLES.................................................................................................. iii

    LIST OF FIGURES................................................................................................ iv

    LIST OF SYMBOLS .............................................................................................. v

    LIST OF ABBREVIATIONS ................................................................................ vi


    STATEMENT OF PROBLEM etc.).............................................................. 1

    1.1. ................................................................................................................. 5

    1.2. ................................................................................................................. 8

    CHAPTER 2 (OTHER MAIN HEADING) ......................................................... 13

    3.1. .................................................................................................................. 15

    3.2. .................................................................................................................. 17

    3.2.1. ......................................................................................................... 19

    3.2.2. ......................................................................................................... 20 ................................................................................................ 21 .......................................................................................... 22

    3.3. ................................................................................................................. 23

    CHAPTER 4 (OTHER MAIN HEADING) ......................................................... 30

    4.1. ................................................................................................................ 36

    4.2. ................................................................................................................ 39

    CHAPTER 5 (CONCLUSIONS) ......................................................................... 40

    APPENDIX A ......................................................................................................... 45

    APPENDIX B ......................................................................................................... 47

    REFERENCES... .................................................................................................... 49

  • 7/30/2019 Example of Title Page(12-13)




    It gives us a great sense of pleasure to present the report of the B. Tech Project undertaken

    during B. Tech. Final Year. We owe special debt of gratitude to Professor/Asst. Prof Mr.

    Diwakar Agarwal, Department of Electronics & Communication Engineering, GLAUniversity, Mathura for his constant support and guidance throughout the course of our work.

    His sincerity, thoroughness and perseverance have been a constant source of inspiration for

    us. It is only his cognizant efforts that our endeavors have seen light of the day.

    We also take the opportunity to acknowledge the contribution of Professor T.N Sharma,

    Head, Department of Electronics & Communication Engineering, GLA University, Mathura

    for his full support and assistance during the development of the project.

    We also do not like to miss the opportunity to acknowledge the contribution of all faculty

    members of the department for their kind assistance and cooperation during the developmentof our project. Last but not the least, we acknowledge our friends for their contribution in the

    completion of the project.


    Name :anchal agarwal

    Roll No.:0906331011

    Date :


    Name :reetika shukla

    Roll No.:0906331076

    Date :


    Name :shiv kumar

    Roll No.:0906331

    Date :


    Name :vimal kumar

    Roll No.:0906331

    Date :

  • 7/30/2019 Example of Title Page(12-13)




    Text extraction in images has been developing rapidly since 1990s and is an important research field

    in content-based information indexing and retrieval, automatic annotation and structuring of

    images.Extraction of this information involves detection, localization, tracking, extraction,

    enhancement, and recognition of the text from a given image. However, variations of text due to

    differences in size, style, orientation, and alignment, as well as low image contrast and complex

    background make the problem of automatic text extraction extremely difficult and challenging job. A

    large number of techniques have been proposed to address this problem and the purpose of this

    paper is to classify and review these techniques, discuss the applications and performance

    evaluation, and to identify promising directions for future research. The amount of pictorial data has

    been growing enormously with the expansion of WWW. From the large number of images, it is very

    important for users to retrieve required images via an efficient and effective mechanism. To solve

    the image retrieval problem, many techniques have been devised addressing the requirement of

    different applications. Problem of the traditional methods of image indexing have led to the rise of

    interest in techniques for retrieving images on the basis of automatically derived features such as

    color, texture andshape a technology generally referred as Content-Based Image Retrieval (CBIR).

    After decade of intensive research, CBIR technology is now beginning to move out of the laboratory

    into the marketplace. However, the technology still lacks maturity and is not yet being used in a

    significant scale.

    List of tables:

    Table 1 properties of text in images

    List of figures:

    fig 1. An image an array or a matrix of pixels arranged in columns and rows.

    Fig 2. Each pixel has a value from 0 (black) to 255 (white). The possible range

    of the pixel values depend on the colour depth of the image, here 8 bit = 256 tones or


    Fig 3: A true-colour image assembled from three greyscale images coloured

    red, green and blue. Such an image may contain up to 16 million different colours.

    Fig4. Difference between Colored image and corresponding gray scaleimage

    Fig5. RGB CUBE

    Fig 6 CMYK Circle

    Fig7. text images

    Fig8. Document images

    Fig9. text images

    Fig10. Flowchart of preprocessing

  • 7/30/2019 Example of Title Page(12-13)



    fig11. Architecture of tie system

    Fig12. Stepwise result of text detection

    Fig13. Result of text extraction


    Extracting text from images is an important problem in many applications like

    document processing , image indexing, . Usually,texts embedded in an image or a frame capture

    important media contexts such as players name,title, date, story introduction, and since including.

    Therefore, the task can provide various advantages for annotating an image and thus

    improves the accuracy of a content-based indexing system to search desired media content.

    Moreover,when analyzing video audios, the recognition result of text line can provide extra

    refinements for correcting the errors of speech recognition. Since 1990s, with rapid growth of

    available multimedia documents and increasing demand for information

    indexing and retrieval, much effort has been doneon text extraction in images . A larger

  • 7/30/2019 Example of Title Page(12-13)



    number of approaches, such as region based, edgebased, morphological based and texture based

    methods, have been proposed and already obtainedimpressive performance. Documents in which

    text is embedded in complex colored backgrounds are increasingly common today, for example, in

    magazines, advertisements and web pages. Robust detection of text from these documents is a

    challenging problem. Text extraction has a vast number of applications :

    Text searches in Images - Currently, Image searches deliver inaccurate results as they do not search

    the image content. Text extraction would enable better searching by extracting the content of an


    Content based Indexing - For the purpose of archiving and indexing documents, the content of the

    document is required in the digital format. Knowledge about the text content of documents can help

    in the building of an intelligent system which archives and indexes the printed documents.

    Reading foreign language text - One of the common problems faced by a person in foreign land is

    that of communication, understanding road signs, signboards etc. The proposed method, aims toalleviate such problems by reading the text information from the image scenes whichare captured

    by a camera.

    Archiving documents - Archives of paper documents in offices or other printed material like

    magazines and newspapers can be electronically converted for more efficient storage and instant

    delivery to home or office computers.

    Content-based image indexing refers to the process of attaching labels to images based on their

    content. Image content can be divided into two main categories: perceptual content and semantic

    content . Perceptual content includes attributes such as color, intensity, shape, texture, and their

    temporal changes, whereas semantic content means objects, events, and their relations. A number

    of studies on the use of relatively low-level perceptual content for image and video indexing have

    already been reported. Studies on semantic image content in the form of text, face, vehicle, and

    human action have also attracted some recent interest . Among them, text within an image is of

    particular interest as (i) it is very useful for describing the contents of an image; (ii) it can be easily

    extracted compared to other semantic contents, and (iii) it enables applications such as keyword-

    based image search, automatic video logging, and text-based image indexing.


    This paper presents a comprehensive survey of TIE from images . Page layout analysis is similar to

    text localization in images. However, most page layout analysis methods assume the characters to

    be black with a high contrast on a homogeneous background. In practice, text in images can have

    any color and be superimposed on a complex background. Although a few TIE surveys have already

    been published, they lack details on individual approaches and are not clearly organized . We

    organize the TIE algorithms into several categories according to their main idea and discuss their

    pros and cons.

  • 7/30/2019 Example of Title Page(12-13)



    It also reviews the various sub-stages of TIE and introduces approaches for text detection,

    localization, tracking, extraction, and enhancement. We also point out the ability of the individual

    techniques to deal with color, scene text, compressed images, etc. The important issue of

    performance evaluation is discussed in Section 3, along with sample public test data sets and a

    review of evaluation methods. Section 4 gives an overview of the application domains for TIE in

    image processing and computer vision. The final conclusions are presented in Section 5

    Chapter 1

    Introduction to Image processing:

    In imaging science, image processing is any form of signal processing for which the input is an image,

    such as a photograph or video frame; the output of image processing may be either an image or a

    set of characteristics or parameters related to the image. Most image-processing techniques involve

    treating the image as a two-dimensional signal and applying standard signal-processing techniques

    to it.

    Image processing usually refers to digital image processing, but optical and analog image processing

    also are possible. This article is about general techniques that apply to all of them. The acquisition of

    images (producing the input image in the first place) is referred to as imaging.

    Image Processing

  • 7/30/2019 Example of Title Page(12-13)



    An image defined in the real world is considered to be a function of two real variables, for

    example, a(x,y) with a as the amplitude (e.g. brightness) of the image at the real coordinate position


    In a sophisticated image processing system it should be possible to apply specific image processing

    operations to selected regions. Thus one part of an image (region) might be processed to suppress

    motion blur while another part might be processed to improve color rendition.

    Modern digital technology has made it possible to manipulate multi-dimensional signals with

    systems that range from simple digital circuits to advanced parallel computers. The goal of this

    manipulation can be divided into three categories: * Image Processing image in -> image out * Image

    Analysis image in -> measurements out * Image Understanding image in -> high-level description out

    Image processing is referred to processing of a 2D picture by a computer. Basic definitions:

    An image defined in the real world is considered to be a function of two real variables, for

    example, a(x,y) with a as the amplitude (e.g. brightness) of the image at the real coordinate position(x,y).

    An image may be considered to contain sub-images sometimes referred to as regions-of-interest,

    ROIs, or simply regions. This concept reflects the fact that images frequently contain collections of

    objects each of which can be the basis for a region. In a sophisticated image processing system it

    should be possible to apply specific image processing operations to selected regions. Thus one part

    of an image (region) might be processed to suppress motion blur while another part might be

    processed to improve color rendition. Sequence of image processing:

    The most requirements for image processing of images is that the images be available in digitized

    form, that is, arrays of finite length binary words. For digitization, the given Image is sampled on a

    discrete grid and each sample or pixel is quantized using a finite number of bits. The digitized image

    is processed by a computer. To display a digital image, it is first converted into analog signal, which is

    scanned onto a display.

    Closely related to image processing are computer graphics and computer vision. In computer

    graphics, images are manually made from physical models of objects, environments, and lighting,

    instead of being acquired (via imaging devices such as cameras) from natural scenes, as in most

    animated movies. Computer vision, on the other hand, is often considered high-level image

    processing out of which a machine/computer/software intends to decipher the physical contents of

    an image or a sequence of images (e.g., videos or 3D full-body magnetic resonance scans).

    In modern sciences and technologies, images also gain much broader scopes due to the ever

    growing importance of scientific visualization (of often large-scale complex scientific/experimental

    data). Examples include microarray data in genetic research, or real-time multi-asset portfolio

    trading in finance.

  • 7/30/2019 Example of Title Page(12-13)



    1.1Image Basics


    An image is an array, or a matrix, of square pixels (picture elements) arranged incolumns and rows.

    fig 1.An imagean array or a matrix of pixels arranged in columns and rows.

    In a (8-bit) greyscale image each picture element has an assigned intensity that rangesfrom 0 to 255. A grey scale image is what people normally call a black and white image,but the name emphasizes that such an image will also include many shades of grey.

  • 7/30/2019 Example of Title Page(12-13)



    Fig 2.Each pixel has a value from 0 (black) to 255 (white). The possible range of the pixel values depend on the colour depth of the image,here 8 bit = 256 tones or greyscales.

    A normal greyscale image has 8 bit colour depth = 256 greyscales. A true colour image

    has 24 bit colour depth = 8 x 8 x 8 bits = 256 x 256 x 256 colours = ~16 millioncolours.

    Fig 3:A true-colour image assembled from three greyscale images coloured red, green and blue. Such an image may contain up to 16

    million different colours.


  • 7/30/2019 Example of Title Page(12-13)



    The picture elements that make up an image, similar to grains in a photograph or dots in a


    Each pixel can represent a number of different shades or colors, depending upon how much

    storage space is allocated for it.


    A)Binary ImageA greyscale image is a two dimensional array of binary pixels. If the value is 0, thepixel is black. If the value is 1, the pixel is white.

    B)Greyscale ImageA greyscale image is a two dimensional array of values indicating the brightness ateach point. The brightness values are generally stored as a value between 0 (black)and 255 (white). Values inbetween are different shades of grey.

    C)Color ImageA color image can be viewed in two equivalent ways. The _rst is as a twodimensional array of pixels, just like a greyscale image, but instead of a brightnessvalue, each pixel has a specific color given by an (R,G,B) triple. The alternative viewis that the image is composed of three separate 2D arrays of pixels (one for red, onefor green, and one for blue), where each element in the three arrays contains theamount of only of the layer color present in the image at that point. Each of these 2Darrays is called a layer.

    Fig4. Difference between Colored image and corresponding gray scale image

    D) Indexed image This is a practical way of representing color images. (In this

    course we will mostly work with gray scaleimages but once you have learned

    how to work with a gray scale image you will also know the principle

    how to work with color images.) An indexed image stores an image as twomatrices. The first matrix hasthe same size as the image and one number for

    each pixel. The second matrix is called the color mapand its size may be

    different from the image. The numbers in the first matrix is an instruction of

    whatnumber to use in the color map matrix.

    1.3 Colours

  • 7/30/2019 Example of Title Page(12-13)



    For science communication, the two main colour spaces are RGB and CMYK.


    Red, green, and blue are the three basic colors. By combining these three colors

    of light, any color can be produced. R, G, and B are specified as relative

    amounts, which describe how much of each color to combine (e.g. [1, 0, 0 ] is

    pure red, [1, 1, 0] means to combine red and green in equal quantities, etc.).

    These combinations can be represented as a cube.

    Fig5. RGB CUBE

    B) CMYKCyan, Magenta, Yellow, and blacK. With these four colors of ink any color can beproduced. Since these colors are the exact inverse of the additive color model, thetwo systems can be interchanged with

    Black is not needed in theory, CMY should color the entire range of possible colors.However,in practice, it is much better to use a fourth color, black. Some reasons are asfollows:

    1) It is cheaper to apply 1 ink (black) than 3 inks (CMY).2) The paper gets wet if too much ink is applied, which often happens when C,

    M, and Y are applied. This is ine_cient because it adds drying time to theprinting process.

    3) Text is often black. Since text requires very _ne detail, it should be easy to

    produce this detail in black. If it was produced with CMY, the C, M, and Y print

  • 7/30/2019 Example of Title Page(12-13)



    heads would have to be very accuratly aligned, which is much more di_cultthan simply using a fourth ink.

    Fig 6.CMYK Circle

    1.3.1 Number of colors

    Images start with differing numbers of colors in them. The simplest images may contain

    only two colors, such as black and white, and will need only 1 bit to represent each pixel.

    Many early PC video cards would support only 16 fixed colors. Later cards would display

    256 simultaneously, any of which could be chosen from a pool of 224, or 16 million colors.

    New cards devote 24 bits to each pixel, and are therefore capable of displaying 224, or 16

    million colors without restriction. A few display even more. Since the eye has trouble

    distinguishing between similar colors, 24 bit or 16 million colors is often called TrueColor

  • 7/30/2019 Example of Title Page(12-13)



    1.4Image file formats

    Image file formats are standardized means of organizing and storingdigital images. Imagefilesare composed of digital data in one of these formats that can berasterizedfor use on a

    computer display or printer. An image file format may store data in uncompressed,

    compressed, orvectorformats. Once rasterized, an image becomes a grid of pixels, each of

    which has a number of bits to designate its color equal to the color depth of the device

    displaying it.

    1.4.1Major graphic file formats

    Including proprietary types, there are hundreds of image file types. The PNG, JPEG, and GIF

    formats are most often used to display images on the Internet. These graphic formats are

    listed and briefly described below, separated into the two main families of graphics: raster

    and vector.

    In addition to straight image formats,Metafileformats are portable formats which can

    include both raster and vector information. Examples are application-independent formatssuch asWMFandEMF. The metafile format is an intermediate format. Most Windows

    applications open metafiles and then save them in their own native format.Page description

    languagerefers to formats used to describe the layout of a printed page containing text,

    objects and images. Examples arePostScript,PDFandPCL.

    1.4.2Digital Image File Types Explained

    JPG, GIF, TIFF, PNG, BMP. What are they, and how do you choose? These and many otherfile types are used to encode digital images. The choices are simpler than you might think.

    Part of the reason for the plethora of file types is the need for compression. Image files can be

    quite large, and larger file types mean more disk usage and slower downloads. Compression

    is a term used to describe ways of cutting the size of the file. Compression schemes can by

    lossy or lossless.

    Another reason for the many file types is that images differ in the number of colors they

    contain. If an image has few colors, a file type can be designed to exploit this as a way of

    reducing file size1.4Image formats supported by Matlab
  • 7/30/2019 Example of Title Page(12-13)



    1.4.3Image format supported by matlab

    The following image formats are supported by Matlab:







    1.4.4Lossy vs. Lossless compression

    You will often hear the terms "lossy" and "lossless" compression. A lossless compression

    algorithm discards no information. It looks for more efficient ways to represent an image,

    while making no compromises in accuracy. In contrast, lossy algorithms accept some

    degradation in the image in order to achieve smaller file size.

    A lossless algorithm might, for example, look for a recurring pattern in the file, and replace

    each occurrence with a short abbreviation, thereby cutting the file size. In contrast, a lossy

    algorithm might store color information at a lower resolution than the image itself, since the

    eye is not so sensitive to changes in color of a small distance.


    1.4.5Raster Image Files Types and Formats.bmp

    Bitmap Image File

    .gif Graphical Interchange Format File

    .jpg JPEG Image File

    .png Portable Network Graphic

    .psd Adobe Photoshop Document

    .pspimage PaintShop Pro Image

    .thm Thumbnail Image File

    .tif Tagged Image File

    .yuv YUV Encoded Image File

  • 7/30/2019 Example of Title Page(12-13)





    JPEG(Joint Photographic Experts Group) is a compression method; JPEG-compressed

    images are usually stored in theJFIF(JPEG File Interchange Format) file format. JPEG

    compression is (in most cases)lossy compression. The JPEG/JFIFfilename extensionis JPG

    or JPEG. Nearly every digital camera can save images in the JPEG/JFIF format, which

    supports 8-bit grayscale images and 24-bit color images (8 bits each for red, green, and blue).

    JPEG applies lossy compression to images, which can result in a significant reduction of the

    file size. The amount of compression can be specified, and the amount of compression affects

    the visual quality of the result. When not too great, the compression does not noticeably

    detract from the image's quality, but JPEG files suffergenerational degradationwhen

    repeatedly edited and saved. (JPEG also provides lossless image storage, but the lossless

    version is not widely supported.)

    B)JPEG 2000

    JPEG 2000is a compression standard enabling both lossless and lossy storage. The

    compression methods used are different from the ones in standard JFIF/JPEG; they improve

    quality and compression ratios, but also require more computational power to process. JPEG

    2000 also adds features that are missing in JPEG. It is not nearly as common as JPEG, but it

    is used currently in professional movie editing and distribution (some digital cinemas, for

    example, use JPEG 2000 for individual movie frames).


    The Exif(Exchangeable image file format) format is a file standard similar to the JFIF format

    with TIFF extensions; it is incorporated in the JPEG-writing software used in most cameras.

    Its purpose is to record and to standardize the exchange of images withimage metadata

    between digital cameras and editing and viewing software. The metadata are recorded for

    individual images and include such things as camera settings, time and date, shutter speed,

    exposure, image size, compression, name of camera, color information. When images are

    viewed or edited by image editing software, all of this image information can be displayed. It

    stores meta informations.

    The actual Exif metadata as such may be carried within different host formats, e.g. TIFF,

    JFIF (JPEG) or PNG. IFF-META is another example.


    The TIFF(Tagged Image File Format) format is a flexible format that normally saves 8 bitsor 16 bits per color (red, green, blue) for 24-bit and 48-bit totals, respectively, usually using

    either the TIFF or TIF filename extension. TIFF's flexibility can be both an advantage and

    disadvantage, since a reader that reads every type of TIFF file does not exist. TIFFs can be

    lossy and lossless; some offer relatively good lossless compression forbi-level (black&white)

    images. Some digital cameras can save in TIFF format, using theLZWcompressionalgorithm for lossless storage. TIFF image format is not widely supported by web browsers.
  • 7/30/2019 Example of Title Page(12-13)



    TIFF remains widely accepted as a photograph file standard in the printing business. TIFF

    can handle device-specific color spaces, such as theCMYKdefined by a particular set of

    printing press inks.OCR(Optical Character Recognition) software packages commonly

    generate some (oftenmonochromatic) form of TIFF image for scanned text pages.

    E) RAW

    RAW refers to a family ofraw image formatsthat are options available on some digital

    cameras. These formats usually use a lossless or nearly lossless compression, and produce

    file sizes much smaller than the TIFF formats of full-size processed images from the same

    cameras. Although there is a standard raw image format, (ISO 12234-2,TIFF/EP), the raw

    formats used by most cameras are not standardized or documented, and differ among camera


    6 )GIF

    GIF(Graphics Interchange Format) is limited to an 8-bit palette, or 256 colors. This makes

    the GIF format suitable for storing graphics with relatively few colors such as simple

    diagrams, shapes, logos and cartoon style images. The GIF format supports animation and is

    still widely used to provide image animation effects. It also uses a lossless compression that

    is more effective when large areas have a single color, and ineffective for detailed images or



    TheBMP file format(Windows bitmap) handles graphics files within the Microsoft

    Windows OS. Typically, BMP files are uncompressed, hence they are large; the advantage istheir simplicity and wide acceptance in Windows programs.


    The PNG(Portable Network Graphics) file format was created as the free, open-source

    successor to GIF. The PNG file format supports 8 bit paletted images (with optional

    transparency for all palette colors) and 24 bit truecolor (16 million colors) or 48 bit truecolor

    with and without alpha channel - while GIF supports only 256 colors and a single transparent

    color. Compared to JPEG, PNG excels when the image has large, uniformly colored areas.

    Thus lossless PNG format is best suited for pictures still under edition - and the lossy

    formats, like JEPG, are best for the final distribution of photographic images, because in thiscase JPG files are usuallysmallerthan PNG files

    Some programs do not handle PNG gamma correctly, which can cause the images to be saved

    or displayed darker than they should be.

    9)PPM, PGM, PBM, PNM and PFM

    Netpbm formatis a family including the portable pixmap file format (PPM), the portable

    graymap file format (PGM) and the portable bitmap file format (PBM). These are either

    pureASCIIfiles or raw binary files with an ASCII header that provide very basic

    functionality and serve as a lowest-common-denominator for converting pixmap, graymap, or

    bitmap files between different platforms. Several applications refer to them collectively as
  • 7/30/2019 Example of Title Page(12-13)



    PNM format (Portable Any Map). PFM was invented later in order to carry floating-point

    based pixel information (as used inHDR).


    A late addition to the PNM family is the PAM format (Portable Arbitrary Format).


    WebPis a new image format that uses lossy compression. It was designed by Google to

    reduce image file size to speed up web page loading: its principal purpose is to supersede

    JPEG as the primary format for photographs on the web.

    WebP is based onVP8's intra-frame coding and uses a container based onRIFF.

    12)HDR Raster formats

    Most typical raster formats cannot storeHDRdata (32 bit floating point values per pixel

    component), which is why some relatively old or complex formats are still predominant here,

    and worth mentioning separately. Newer alternatives are showing up, though.

    13)RGBE (Radiance HDR)

    The classical representation format for HDR images, originating from Radiance and alsosupported by e.g. Adobe Photoshop.


    As TIFF can represent almost any kind of image data, it also can be used to hold HDR data.

    However, many TIFF readers do not support it.


    IFF-RGFXthe native format ofSView5provides a straight-forwardIFF-style representation

    of any kind of image data ranging from 1-128 bit (LDR and HDR), including common meta

    data like ICC profiles, XMP, IPTC or EXIF.



    CGM (Computer Graphics Metafile) is a file format for 2D vector graphics, raster graphics,

    andtext, and is defined byISO/IEC8632. Allgraphicalelements can be specified in a

    textualsource filethat can be compiled into abinary fileor one of two text representations.

    CGM provides a means of graphics data interchange for computer representation of 2D

    graphical information independent from any particular application, system, platform, or

    device. It has been adopted to some extent in the areas oftechnical illustrationand

    professionaldesign, but has largely been superseded by formats such asSVGandDXF.
  • 7/30/2019 Example of Title Page(12-13)



    17)Gerber Format (RS-274X)

    RS-274X ExtendedGerber Format[3]

    was developed by Gerber Systems Corp., nowUcamco.

    This is a 2D bi-level image description format. It is the de facto standard format used by

    printed circuit boardor PCB software. It is also widely used in other industries requiring

    high-precision 2D bi-level images.


    SVG (Scalable Vector Graphics) is anopen standardcreated and developed by theWorld

    Wide Web Consortiumto address the need (and attempts of several corporations) for a

    versatile,scriptableand all-purpose vector format for the web and otherwise. The SVG

    format does not have a compression scheme of its own, but due to the textual nature ofXML,

    an SVG graphic can be compressed using a program such asgzip. Because of its scripting

    potential, SVG is a key component inweb applications: interactive web pages that look and

    act like applications.

    1.5.1When should we use each?


    This is usually the best quality output from a digital camera. Digital cameras often offer

    around three JPG quality settings plus TIFF. Since JPG always means at least some loss of

    quality, TIFF means better quality. However, the file size is huge compared to even the best

    JPG setting, and the advantages may not be noticeable.

    A more important use of TIFF is as the working storage format as you edit and manipulatedigital images. You do not want to go through several load, edit, save cycles with JPG

    storage, as the degradation accumulates with each new save. One or two JPG saves at high

    quality may not be noticeable, but the tenth certainly will be. TIFF is lossless, so there is no

    degradation associated with saving a TIFF file.

    Do NOT use TIFF for web images. They produce big files, and more importantly, most web

    browsers will not display TIFFs.


    This is the format of choice for nearly all photographs on the web. You can achieve excellent

    quality even at rather high compression settings. I also use JPG as the ultimate format for all

    my digital photographs. If I edit a photo, I will use my software's proprietary format until

    finished, and then save the result as a JPG.

    Digital cameras save in a JPG format by default. Switching to TIFF or RAW improves

    quality in principle, but the difference is difficult to see. Shooting in TIFF has two

    disadvantages compared to JPG: fewer photos per memory card, and a longer wait between

    photographs as the image transfers to the card. I rarely shoot in TIFF mode.
  • 7/30/2019 Example of Title Page(12-13)



    Never use JPG for line art. On images such as these with areas of uniform color with sharp

    edges, JPG does a poor job. These are tasks for which GIF and PNG are well suited. SeeJPG

    vs. GIF for web images.


    If your image has fewer than 256 colors and contains large areas of uniform color, GIF is

    your choice. The files will be small yet perfect. Here is an example of an image well-suited

    for GIF:

    Do NOT use GIF for photographic images, since it can contain only 256 colors per image.


    PNG is of principal value in two applications:

    1. If you have an image with large areas of exactly uniform color, but contains more than 256

    colors, PNG is your choice. Its strategy is similar to that of GIF, but it supports 16 million

    colors, not just 256.

    2. If you want to display a photograph exactlywithout loss on the web, PNG is your choice.

    Later generation web browsers support PNG, and PNG is the only lossless format that webbrowsers support.

    PNG is superior to GIF. It produces smaller files and allows more colors. PNG also supports

    partial transparency. Partial transparency can be used for many useful purposes, such as

    fades and antialiasing of text. Unfortunately, Microsoft's Internet Explorer does not properly

    support PNG transparency, so for now web authors must avoid using transparency in PNG


    1.6Other formats

    When using graphics software such as Photoshop or Paint Shop Pro, working files should bein the proprietary format of the software. Save final results in TIFF, PNG, or JPG.

    Use RAW only for in-camera storage, and copy or convert to TIFF, PNG, or JPG as soon as

    you transfer to your PC. You do not want your image archives to be in a proprietary format.

    Although several graphics programs can now read the RAW format for many digital cameras,

    it is unwise to rely on any proprietary format for long term storage. Will you be able to read a

    RAW file in five years? In twenty? JPG is the format most likely to be readable in 50

    years.Thus, it is appropriate to use RAW to store images in the camera and perhaps for

    temporary lossless storage on your PC, but be sure to create a TIFF, or better still a PNG or

    JPG, for archival storage.
  • 7/30/2019 Example of Title Page(12-13)



    Chapter 2

    2.1 tie

    A variety of approaches to text information extraction (TIE) from images have been proposed for

    specific applications including page segmentation , address block location, license plate location, and

    content-based image/video indexing . In spite of extensive studies, it is still not easy to design a

    general-purpose TIE system. This is because there are so many possible sources of variation when

    extracting text from a shaded or textured background, from low-contrast or complex images, or

    from images having variations in font size, style, color, orientation, and alignment. These variations

    make the problem of automatic TIE extremely difficult.

    Fig7.text images

    Figures 1-4 show some examples of text in images. Page layout analysis usually deals with document

    images1 (Fig. 1). Readers may refer to papers on document segmentation/analysis [17, 18] for moreexamples of document images.

    Fig8. Document images

    Although images acquired by scanning book covers, CD covers, or other multi-colored documents

    have similar characteristics as the document images (Fig. 2), they can not be directly dealt with using

    a conventional document image analysis technique Accordingly, this survey distinguishes this

    category of images as multi-color document images from other document images. Text in video

    images can be further classified into caption text , which is artificially overlaid on the image, or scene

  • 7/30/2019 Example of Title Page(12-13)



    text , which exists naturally in the image. Some researchers like to use the term graphics text for

    scene text, and superimposed text or artificial text for caption text .

    Fig9. Caption text

    It is well known that scene text is more difficult to detect and very little work has been done in this

    area. In contrast to caption text, scene text can have any orientation and may be distorted by the

    perspective projection. Text in images can exhibit many variations with respect to the followingproperties:

    1. Geometry:

    Size: Although the text size can vary a lot, assumptions can be made depending on the application


    Alignment: The characters in the caption text appear in clusters and usually lie horizontally,

    although sometimes they can appear as non-planar texts as a result of special effects. This does not

    apply to scene text, which can have various perspective distortions. Scene text can be aligned in any

    direction and can have geometric distortions.

    Inter-character distance: characters in a text line have a uniform distance between them.

    2. Color: The characters in a text line tend to have the same or similar colors. This property makes it

    possible to use a connected component-based approach for text detection. Most of the research

    reported till date has concentrated on finding text strings of a single color (monochrome).

    However, video images and other complex color documents can contain text strings with more than

    two colors (polychrome) for effective visualization, i.e., different colors within one word.

    3. Motion: The same characters usually exist in consecutive frames in a video with or without

    movement. This property is used in text tracking and enhancement. Caption text usually moves in a

  • 7/30/2019 Example of Title Page(12-13)



    uniform way: horizontally or vertically. Scene text can have arbitrary motion due to camera or object


    4. Edge: Most caption and scene text are designed to be easily read, thereby resulting in strong

    edges at the boundaries of text and background.

    5. Compression: Many digital images are recorded, transferred, and processed in a compressed

    format. Thus, a faster TIE system can be achieved if one can extract text without decompression.

    table 1 properties of text in images

  • 7/30/2019 Example of Title Page(12-13)



    2.2 Pre Processing

    A scaled image was the input which was then converted into a gray scaled image. This image

    formed the first stage of the pre-processing part. This was carried out by considering the RGB

    color contents(R: 11%, G: 56%, B: 33%) of each pixel of the image and converting them to

    grayscale. The conversion of a colored image to a gray scaled image was done for easier

    recognition of the text appearing in the images as after gray scaling, the image was converted to

    a black and white image containing black text with a higher contrast on white background.

    The second stage of pre-processing is lines removal.

    The third stage of pre-processing is discontinuities removals that were created in the second

    stage of pre-processing.

    The final output of pre-processing stage is wherein the remaining disturbances like noise are

    eliminated. This was carried out again by scanning each pixel from top left to bottom right and

    taking into consideration each pixel and all its neighbouring pixels. If a pixel under

    consideration was black, and all the neighbouring pixels were white, then that corresponding

    pixel was set as black because all the black neighbouring pixels indicated that the pixel under

    consideration was some unwanted dot .

    Fig10. Flowchart of preprocessing

  • 7/30/2019 Example of Title Page(12-13)



    2.3What is Text Information Extraction (TIE)?

    The problem of Text Information Extraction needs to be defined more precisely before proceeding

    further. A TIE system receives an input in the form of a still image or a sequence of images. The

    images can be in gray scale or color, compressed or un-compressed, and the text in the images may

    or may not move. The TIE problem can be divided into the following sub-problems: (i) detection, (ii)

    localization, (iii) tracking, (iv) extraction and enhancement, and (v) recognition (OCR)



    fig11. Architecture of tie system













  • 7/30/2019 Example of Title Page(12-13)



    A)TEXT DETECTION:In the text detection stage, since there was no prior information on whether or

    not the input image contains any text, the existence or non existence of text in the image must be

    determine. The text detection stage seeks to detect the presence of text in a given image.

    Fig12 Stepwise result of text detection

    However, in the case of video, the number of frames containing text is much smaller than the

    number of frames without text. The text detection stage seeks to detect the presence of text in a

    given image. Selected a frame containing text from shots elected by video framing, very low

    threshold values were needed for scene change detection because the portion occupied by a text

    region relative to the whole image was usually small. This approach is very sensitive to scene change

    detection. This can be a simple and efficient solution for video indexing applications

    that only need key words from video clips, rather than the entire text.

    B)TEXT LOCALIZATION: The localization stage included localizing the text in the image after

    detection. In other words, the text present in the frame was tracked by identifying boxes or regions

    of similar pixel intensity values and returning them to the next stage for further processing. This

    stage used Region Based Methods for text localization. Region based methods use the properties of

    the color or gray scale in a text region or their differences with the corresponding properties of the

    background. This means that most of the text lines are included in the initial text boxes while at the

    same time some text boxes may include more than one text line as well as noise or non-text regions.

    This noise usually comes from non-text objects that connect to the text lines during the dilation

    process. And the low precision comes from detected bounding boxes which do not contain text but

    objects with high vertical edge density. To increase the precision and reject the false alarms we use a

    method based on horizontal and vertical projections. Firstly, the horizontal edge projection of every

    box is computed. A horizontal projection is defined as the sums of the candidate pixels over rows.

    c)TEXT TRACKING: The text tracking stage can serve to verify the text localization results. In addition,

    if text tracking could be performed in a shorter time than text detection and localization, this would

    speed up the overall system. In cases where text is occluded in different frames, text tracking can

    help recover the original image. Text tracking is performed to reduce the processing time for text

  • 7/30/2019 Example of Title Page(12-13)



    localization and to maintain the integrity of position across adjacent frames. Although the precise

    location of text in an image can be indicated by bounding boxes, the text still needs to be segmented

    from the background to facilitate its recognition. This means that the extracted text image has to be

    converted to a binary image and enhanced before it is fed into an OCR engine.

    D)TEXT EXTRACTION Text extraction segments these regions and generates binary images for

    recognition. There often exist many disturbances from background in a text region. They share

    similar intensity with the text and consequently the binary image of the text region is unfit for

    recognition directly. After the text was localized, the text segmentation step deals with the

    separation of the text pixels from the background pixels. The output of this step is a binary image

    where black text characters appear on a white background. This stage included extraction of actual

    text regions by dividing pixels with similar properties into contours or segments and discarding the

    redundant portions of frame.

    Fig13. Result of text extraction

    E)TEXT ENHANCEMENT Text Enhancement of the extracted text components is required because

    the text region usually has low resolution and is prone to noise. Thereafter, the extracted text

    images can be transformed into plain text using OCR technology.

    F)TEXT RECOGNITION: The result of recognition was a ratio between the number of correctly

    extracted characters and that of total characters and evaluates what percentage of a character were

    extracted correctly from its background. For each extraction result of characters, if it did not miss

    the main strokes, it was taken as a correct character. The extraction results were then sent to OCR

    engine directly .A commercial OCR engine was utilized for recognition. Another method was

    proposed for text extraction from a colored image with complex background in which the main idea

    was to first identify potential text line segments from horizontal scan lines. Text line segments were

    then expanded or merged with text line segments from adjacent scan lines to form text blocks. False

    text blocks were filtered based on the irgeometrical properties. The boundaries of the text blocks

    were then adjusted so that text pixels lying outside the initial text region were included. Text pixels

    within text blocks were then detected by using bi-color clustering and connected components



  • 7/30/2019 Example of Title Page(12-13)



    Text extraction in images includes fivestages, among which text detection and text

    localization are closely related and morechallenging stages which attract the attention of

    most researchers. The goal of the two stages is togenerate accurate bounding boxes of all text

    objectsin images and video frames and provide a uniqueidentity to each text. In this section, therecenttechniques focused on text detection andlocalization are reviewed and then the results are



    Region-based methods use the properties of thecolor or gray-scale in a text region or their

    differences with the corresponding properties of thebackground. This method uses a bottom-up

    approach by grouping small components intosuccessively larger components until all regions are

    identified in the image. A geometrical analysis isneeded to merge the text components using the

    spatial arrangement of the components so as tofilter out non-text components and mark the

    boundaries of the text regions.Leon [37] presented a method for caption textdetection. It included in

    a generic indexing systemdealing with other semantic concepts which are tobe automatically

    detected. To have a coherentdetection system, the various object detectionalgorithms use a

    common image description. Theauthor proposed the image description is a hierarchical region-

    based image model and introduced the algorithm for text detection.

    Thisalgorithm is divided into three phases:

    1. Text candidate spotting: an attempt to separatetext from background is done.

    2. Text characteristics verification: where textcandidate regions are grouped to discard those

    regions wrongly selected.

    3. Consistency analysis for output: where regionsrepresenting text are modified to obtain a more

    useful character representation as input for an OCR. This technique takes advantage of texture and

    geometric features to detect the caption text.Texture features are estimated using wavelet

    analysis and mainly applied for Text candidatespotting. In turn, Text characteristics verification is

    basically carried out relying on geometric features,which are estimated exploiting the region-based

    image model. Analysis of the region hierarchyprovides the final caption text objects. The final

    step of Consistency analysis for output is performedby a binarization algorithm that robustly

    estimatesthe thresholds on the caption text area of support..

  • 7/30/2019 Example of Title Page(12-13)




    Edges are a reliable feature of text regardless ofcolor/intensity, layout, orientations, etc. Edge

    strength, density and the orientation variance arethree distinguishing characteristics of text

    embedded in images, which can be used as mainfeatures for detecting text. Edge-based

    textextraction algorithm is a general-purpose method,which can quickly and effectively localize

    andextract the text from both document and indoor/outdoor images. Among the several textual

    properties in an image, edge-based methods focus on the high contrast between the text and the

    background. The edges of the text boundary are identified and merged, and then several heuristics

    are used to filter out the non-text regions. Usually, an edge filter (e.g., a Canny operator) is used for

    the edge detection, and a smoothing operation or a morphological operator is used for the merging



    Mathematical morphology is a topological and geometrical based approach for image analysis.

    It provides powerful tools for extractinggeometrical structures and representing shapes in

    many applications. Morphological featureextraction techniques have been efficiently applied

    to character recognition and document analysis. Itis used to extract important text contrast features

    from the processed images. The feature is invariantagainst various geometrical image changes like

    translation, rotation, and scaling. Even after thelighting condition or text color is changed, the

    feature still can be maintained. This method worksrobustly under different image alterations. a

    morphology-basedtext line extraction algorithm for extracting textregions from cluttered images.

    First of all, themethod defines a novel set of morphologicaloperations for extracting important

    contrast regionsas possible text line candidates. In order to detectskewed text lines, a moment-

    based method is thenused for estimating their orientation. According tothe orientation, an x-

    projection technique can beapplied to extract various text geometries from thetext-analogue

    segments for text verification.However, due to noise, a text line region is oftenfragmented into

    different pieces of segments.Therefore, after the projection, a novel recoveryalgorithm is then

    proposed for recovering acomplete text line from its pieces of segments.that, a verification schemeis then proposefor verifying all extracted potential text lineaccording to their text geometries. In

    order toanalyze the performance of this approach, an imagedatabase including 100 images was used

    for testing.After testing this method, these images havevarious appearance changes like contrast

    changes,complex backgrounds, lightings, different fonts,and sizes. Figure 6 shows the results of text

    linedetection in different images with differentalterations.


    Texture-based methods use the observation that textin images have distinct textural properties that

    distinguish them from the background. Thetechniques based on Gabor filters, Wavelet, FFT,

  • 7/30/2019 Example of Title Page(12-13)



    spatial variance, etc. can be used to detect thetextural properties of a text region in an image.

    Chu Duc[44] presented a novel texture descriptorbased on line-segment features for text detection

    inimages and video sequences, which is applied tobuild a robust car license plate localization system.

    Unlike most of the existing approaches which uselow level features (color, edge) for text / non-text

    discrimination, the aim is to exploit more accurateperceptual information. A scale and rotation

    invariant - texture descriptor which describes thedirectionality, regularity, similarity, alignment and

    connectivity of group of segments are proposed. Animproved algorithm for feature extraction based

    onlocal connective Hough transform has also beeninvestigated.


    There are numerous applications of a text information extraction system, including document

    analysis, vehicle license plate extraction, technical paper analysis, and object-oriented data

    compression. In the following, we briefly describe some of these applications.

    Wearable or portable computers: with the rapid development of computer hardware technology,

    wearable computers are now a reality. A TIE system involving a hand-held device and camera was

    presented as an application of a wearable vision system. Watanabes *74+ translation camera can

    detect text in a scene image and translate Japanese text into English after performing character

    recognition. Haritaoglu] also demonstrated his TIE system on a hand-held device.

    Content-based video coding or document coding: The MPEG-4 standard supports object-based

    encoding. When text regions are segmented from other regions in an image, this can provide highercompression rates and better image quality. Feng et al. [76] and Cheng et al. [77] apply adaptive

    dithering after segmenting a document into several different classes. As a result, they can achieve a

    higher quality rendering of documents containing text, pictures, and graphics.

    License/container plate recognition: There has already been a lot of work done on vehicle license

    plate and container plate recognition. Although container and vehicle license plates share many

    characteristics with scene text, many assumptions have been made regarding the image acquisition

    process (camera and vehicle position and direction,

    illumination, character types, and color) and geometric attributes of the text. Cui and Huang [9]model the extraction of characters in license plates using Markov random field. Meanwhile, Park et

    al. [44] use a learning-based approach for license plate extraction, which is similar to a texture-based

    text detection method [47, 49]. Kim et al. [88] use gradient information to extract license plates. Lee

    and Kankanhalli [34] apply a connected component-based method for cargo container verification.

    Text-based image indexing: This involves automatic text-based video structuring methods using

    caption data [11, 78].

    Texts in WWW images: The extraction of text from WWW images can provide relevant information

    on the Internet. Zhou and Lopresti use a CC-based method after color quantization.

  • 7/30/2019 Example of Title Page(12-13)



    Video content analysis: Extracted text regions or the output of character recognition can be useful

    in genre recognition . The size, position, frequency, text alignment, and OCR-ed results can all be

    used for this.

    Industrial automation: Part identification can be accomplished by using the text information on

    each part


    Text extraction in images, as an important research branch of content-based information

    retrieval and text-based image indexing, continuesto be a topic of much interest to researchers. A

    large number of newly proposed approaches in theliterature have contributed to an impressive

    progress of text extraction techniques Althoughmany researchers have already investigated text

    localization, text detection and tracking for imagesis required for utilization in real applications (e.g.,

    mobile handheld devices with a camera and realtimeindexing systems). A text-image-analysis, is

    needed to enable a text information extractionsystem to be used for any type of image, including

    both scanned document images and real sceneimages through a video camera. Despite the many

    difficulties in using TIE systems in real worldapplications, the importance and usefulness of this

    field continues to attract much attention.

  • 7/30/2019 Example of Title Page(12-13)




    TECHNOLOGIES Vol No. 10, Issue No. 2, 309 313

    2.Text Information Extraction in Images and Video: A Survey Keechul Jung, Kwang In Kim, Anil K.


    3.In: Stilla U, Rottensteiner F, Paparoditis N (Eds) CMRT09. IAPRS, Vol. XXXVIII, Part 3/W4 --- Paris,

    France, 3-4 September, 2009

    4.Character recognition overview

    5.Journal of Theoretical and Applied Information Technology 31st January 2012. Vol. 35 No.2

    techniques and challenges of automatic text extraction in complex images : a survey

Top Related