capturing business value from inbound documents · pdf fileinvoice. sales order documents...

11
Capturing Business Value from Inbound Documents with OCR and Dynamic Document Capture WHITE PAPER www.esker.com

Upload: phamthien

Post on 26-Mar-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

Capturing Business Value from Inbound Documents

with OCR and Dynamic Document Capture

white papeR

www.esker.com

eskeR DeliveRywaRe

Introduction .......................................................................................................................................................................................3

Document Capture Basics .............................................................................................................................................................4

Document Structure ........................................................................................................................................................................5

Optical Character Recognition ....................................................................................................................................................6

Dynamic Document Capture .........................................................................................................................................................7

How Dynamic Document Capture Works ..................................................................................................................................8

The Esker DeliveryWare Solution ...............................................................................................................................................9

About Esker ......................................................................................................................................................................................10

Table of Contents

white papeR

eskeR DeliveRywaRe

Doing business is about exchange — of goods and services, and of information support it all. this exchange is facilitated by documents entering the enterprise and going out to customers, suppliers and other stakeholders.

to optimize this exchange and achieve process improvement goals, businesses today need to ask these questions:

§ How do we cost-effectively insert incoming documents into our business processes so that they get to the right places as efficiently as possible?

§ How can we remove human intervention and manual paper handling from document processes?

§ How can we improve the efficiency of our back-office operations and reduce human errors?

§ What capabilities do we need?

as companies seek to reduce the use of paper, document capture is only part of the answer. the issue is much broader, raising the larger question of how to manage business processes that rely on information received in paper form as well as by fax or email. in answering that question, what matters most is what you want to do with the information.

No matter what channel a document comes through and what type of document it is, the challenge is how to remove manual handling inefficiency and risk of errors. When the information from incoming business documents has been captured, it needs to be analyzed, understood and managed. This means putting the information through a workflow for validation and eventually routing this information to a business application and/or a storage repository. the more you can automate this process from end to end, the more value you can extract from inbound business documents.

Optical Character Recognition (OCR) technology has provided a functional means of making business documents machine-readable, OCR is most practical for structured documents, which have formats and layouts that do not vary. But with OCR, human intervention may still be required to assure the quality of the captured data.

today, Dynamic Document Capture technology offers additional automation intelligence for automated processing of semi-structured documents like sales orders and invoices, which contain consistent information in varying layouts and formats. Dynamic Document Capture technology improves recognition and accuracy, helping organizations eliminate traditional issues with managing different types of inbound business documents.

the ultimate purpose of this white paper is to help CeOs and CiOs understand the business value of operational efficiency gains, cost savings and error reduction resulting from document processes automation. In reading this paper, it is crucial to keep in mind what you want to do with the information in documents that come into your company and launch business processes every day.

3

Introduction

white papeR

eskeR DeliveRywaRe

What is it?Relative to the different types of documents that enter an organization in various formats and via different channels, document capture is the ability to automatically import different types of documents in different formats into a document management system.

this inbound document automation has its beginnings with inbound fax routing functionality. with the emergence of OCR technology, organizations have gained the ability use the content of inbound faxes by converting images to a “readable” format. Rules-based automation technology enabled routing and archiving of inbound documents arriving by fax. But recognizing and using information contained in the inbound faxes for processing documents such as sales orders was prohibitive, as it was necessary to define a business rule for each type of sales order.

Document sourcesBusiness cycles that have a direct impact on a company’s performance involve processes that begin and end with a document. For example, the order-to-cash cycle starts with receiving a sales order and finishes with sending an invoice. sales order documents reach a company via fax, email, mail (paper documents) or even by phone.

in the case of phone order processing, customer service staff enter orders directly into the customer order system, often after filling out a form. But if the document arrives by fax, email or mail, the customer service representative must manually enter the sales order information into the appropriate system. the process of managing such documents is inherently inefficient and labor-intensive, and carries considerable risk of errors and/or lost documents.

And the issue of documents in relation to business process efficiency improvement is here to stay — because the volume of documents received every day by companies is constantly increasing.

the types of documents a business typically receives fall into three general categories:

§ Structured documents

§ Semi-structured documents

§ Unstructured documents

to put things into perspective, today 20% of incoming documents are structured documents while 80% are semi-structured and unstructured.

4

Document Capture Basics

white papeR

Read

Analyze

� Purchase Orders� Invoices� Reminders� Account Statements� Payments

Convert

Captured text and barcodes are made available to Esker DeliveryWare for content analysis

OCR enginereads the image

Content Analysis

Fax

Email

ScannerSi comes obierit, cujus filius nobiscum nostris, ordinet de his.

Qui illi plus familiares et propinquiores fuerint, qui cum ministerialibus ipsius

ipsum comitatum , usque dum nobis renuntietur.

eskeR DeliveRywaRe

Structured, semi-structured and unstructured documentsBusiness documents are sometimes categorized as fixed content documents and dynamic content documents. Fixed content is the final form of a documents such as contracts, invoices, statements, reports, technical documentation and even email. Dynamic content documents are created as objects that will undergo changes and modifications over time. Eventually, many dynamic objects become fixed, such as a contract that has been negotiated, modified and eventually finalized. This paper focuses on fixed content documents.

Structured documents

Structured documents always have the same layout and an unchanging number of fields at a fixed position. For these types of documents, the goal is usually to extract data from a form and not necessarily keep the form. this data will then be migrated into a database for eRp, order entry, records, etc. the way to capture this information depends on the information. such documents (health insurance claim forms, for example) are typically handled using OCR, iCR, OMR or barcode recognition. In this case, the best approach is to create a specific business rule for processing of each document type.

Semi-structured documents

semi-structured documents have an unlimited number of variations, which makes the structured document approach inefficient. With semi-structured documents, data must be extracted from the document — no matter what its layout is — for entry into an eRp application or other business system. information can be automatically read and extracted from semi-structured documents such as sales orders, purchase orders and invoices, providing the ability to define a generic rule for each type of semi-structured document. this is possible because these documents contain always the same type of data introduced by keywords.

Unstructured documents

With unstructured documents, the document itself is what’s most important. You do not need to capture specific information from the page; you only need an image of the document so you can route and/or index it. Routing and indexing of such documents can be automated.

5

Document Structure

white papeR

Structured

Constant format and layout

� Benefits forms� Order forms� Questionnaires� Tax forms, claim forms, application forms

Semi-Structured

Common informationbut with varying layoutand data structures

� Sales orders� Delivery notes� Invoices

Unstructured

Unknown layout and unstructured content

� Letters� Emails� Contracts� etc.

Si comes obierit,

cujus filius nobiscum sit, filius noster, cum ceteris

ministerialibus ipsius comitatus et episcopo ipsum comitatum

episcopo, in cujus parochia consistit, eundem comitatum pravideat, doatur, nisi solummodo ut

pergat.

eskeR DeliveRywaRe

OCR functionalityOptical Character Recognition (OCR) technology gives scanning and imaging systems the ability to turn images of machine-printed characters into machine-readable characters. simply put, to translate the images into a form that a computer can manipulate.

OCR automatically recognizes characters on image files such as faxes (tiFF format) and pDFs attached to email. For example, you receive sales orders by fax and you want to archive them. you can read data on the inbound faxes you obtained and archive them according to their date, the source they come from or their amount.

as OCR is not always 100% accurate, adding some type of validation step to the data workflow is necessary for quality assurance. A flexible solution enables users to check the recognition results from a web interface and allow the automatic process to continue.

OCR technology offers the capability to read information in document image files from scanners as well as fax and email. the OCR engine should support several languages, color as well as grayscale and black & white image input, multiple image file formats and image enhancement technologies like deskewing, auto-orientation and intelligent page-layout decomposition. these features help to optimize recognition accuracy.

a primary use of OCR technology is to convert data into reusable format so that it can be integrated with other systems and processes. another application of reading data via OCR is to recognize the content of inbound faxes and thereby allow routing of these faxes according to both standard telecommunications data (such as the caller fax number) and fax content such as a name on the cover page. inbound faxes can then be archived or routed to different user types (email, printing, web publishing, etc). OCR is also optimal for extracting text from pDF documents (typically received as email attachments).

Beyond imaging and the limitations of OCRScanning documents can help to address data and document issues such as security, efficiency in data search and retrieval, document distribution and auditing. and in efforts to reduce paper handling costs, imaging has become a basic method to archive documents and data in image files. To reach the next level of process efficiency, it is necessary to recognize the content of such files. Making data contained in documents available to other applications offers added value by replacing manual keystroke entry of entry into eRp systems. Moreover, automation of structured and semi-structured faxed or scanned documents relies heavily on successful image recognition. skews and shifts, stretching and shrinking on faxed or scanned documents can lead to less than accurate recognition of defined areas.

Additional flexibility is offered by recognition that is not based on defined areas, but rather on the content of documents — keywords (located anywhere in the document) and fields to extract data and make intelligent assessments of document type and data required for processing. Compared with OCR alone, this type of technology offers vastly improved extraction accuracy with the ability to continuously “learn” how to handle different types of document formats automatically. The more a dynamic system like this is used, the more efficient it and your organization becomes.

Recognition rates with OCR alone can be as low as 60%, depending on the quality of the electronic document and any additional technology needed to facilitate capture. OCR only reads an image and translates it into text. Beyond this, OCR does not “understand” content and how words relate to each other. in contrast, Dynamic Document Capture technology provides the ability to look for positions relative to key information. this additional business intelligence and logic ensures that accuracy of the capture does not only rely on OCR. this type of system is also able to run checks such as calculations and database lookups.

6

Optical Character Recognition

white papeR

IMAGE(.jpg, .bmp, .tif, etc.)

DATABASE

PURCHASE ORDERIMAGE

EXTRACTED DATA

ABC253.9 %19.6 %Smith

OCR ENGINE

eskeR DeliveRywaRe

Automation helps organizations raise operational efficiency by reducing document processing time and costs, whether the document is coming in from outside the company, flowing through the company or going out of the company to a customer or supplier. Dynamic Document Capture capabilities facilitate automatic processing of electronic or scanned semi-structured documents such as invoices, sales orders, shipping papers, etc. — recognizing and capturing information that resides in these business documents, and making them readily available in enterprise applications independent of format.

with Dynamic Document Capture you can speed up business processes and:

§ Free your business from paper-based workflow

§ Increase data entry accuracy

§ Simplify document search and retrieval

§ Enhance security of business data and documents

Using business rules technology, Dynamic Document Capture allows intuitive rule customization via a server-based graphical interface. Dynamic Document Capture incorporates OCR and iCR technologies to read information contained in image files whether they have been scanned or have arrived by fax or email.

additionally, Dynamic Document Capture uses logic, keywords, relative positions and business rules to allow relevant document data to be extracted. in a sales order scenario, the system automatically extracts the customer name and ship-to address, document date, number and total as well as line items such as quantity, description and amount. all information necessary to complete the sales order processing is captured directly from the document, regardless of its location within the document.

The ideal system comes with a set of predefined rules for processing for specific types of documents, such as sales orders. One rule extracts the information from inbound faxes or scanned documents. a validation form is used to validate extracted data. The output is an XML file which contains the extracted data and that is submitted to another rule for processing.

Dynamic Document Capture enables enterprises to avoid the cost-prohibitive task of defining a rule for each document variation, specifying the data you want to capture (data required for accounting purposes such as sales order number, invoice date, payment date, supplier references, totals, etc.) instead of specifying an area where it is located. Most of the time, data is introduced by keywords. For example, the total amount is introduced by the keyword total. a generic rule captures data that is introduced by those keywords.

Because it is impossible to take into account the wide variety of different designs, Dynamic Document Capture lets you teach the generic rule to solve possible conflicts (for example, documents on which a keyword appears twice) or to improve system performance. through a web interface, administrators and end users have the ability to improve recognition conditions or optimize document identification during document validation. With its intelligent business rules logic, Dynamic Document Capture is “free form” extraction that makes sense for semi-structured documents.

7

Dynamic Document Capture

VALIDATECAPTURE & READINBOUND DOCUMENT CONVERT & LOAD

PO Date

PO Num.

Price Qty

Total

Ref

100100110010110110100101100100110010110110100101100100110010110110100101

white papeR

eskeR DeliveRywaRe

CaptureData contained in incoming business documents such as sales orders and invoices can be captured from faxes, scanned images, eDi, email attachments, web or other formats and media.

ReadRelevant document header, footer and line item information can be extracted regardless of document layout or number of pages. The validity of the extracted data is automatically cross-checked according to predefined scenarios, and against company databases when necessary. this additional business intelligence and logic ensures that accuracy of the capture does not only rely on OCR technology to manage documents such as scanned or faxed images.

Verify/approveafter the data is automatically captured, users are presented with a simple interface to double-check correct recognition of the data. Users can compare data extraction against the displayed image of the original document. Additionally, the interface displays warning and errors messages next to the captured fields when necessary to draw the user’s attention to identified or potential issues. For example, when a new customer has been detected or if a possible duplicate document has been identified. When this stage is complete, the user will have the choice to save for later processing, approve, reject or even forward the validation to another user.

Load data and documentOnce the information is correct, it can be used to “fill in” other applications — together with the original document image — and to create new business documents if necessary.

8

How Dynamic Document Capture Works

white papeR

Content Analysis

Capture

Workflow

Content Formatting

Delivery

DataValidation

If captured datais not accurate for a specific document

Teaching

Semi-structured documents (e.g., purchase orders, invoices)

1 Captured data is accurate based on predefined generic rule

Operator/manager can teach a new rule online through Document Manager

A new specific rule is created for better recognition of upcoming documents

The document is reprocessed

Document is made available to operator for validation

2

Operator validates or sends document to be validated by manager

3

eskeR DeliveRywaRe

Document process automationEsker is the first vendor to offer a single platform for automation of document-intensive business processes without technological restrictions. as a comprehensive tool designed to help organizations reduce the use of paper, esker Deliveryware makes document processing faster and easier to manage — regardless of the information source or method of delivery. esker Deliveryware eliminates manual touch points and brings visibility to document processes while causing no disruption of current business operations.

esker Deliveryware brings Dynamic Document Capture together with patented esker automation technology to help companies address the challenges of manual data entry, manual document routing and filing, lack of coordination and lack of transparency associated with conventional methods of processing customer orders, vendor invoices, claims, expense reports and other incoming documents.

Key Benefits of Dynamic Document Capture

§ Remove slow and costly paper handling from business processes

§ Eliminate error-prone manual data entry

§ I mprove process efficiency based on Key Performance Indicators

§ Save time and money spent on manual document archiving and retrieval

§ Support and enhance regulatory compliance efforts cost-effectively

9

The Esker DeliveryWare Solution

Capture

SALESORDER

ContentAnalysis

DATAEXTRACTION

ValidationWorkflow

AUTOMATICDB LOOKUPS

ContentFormatting

VALIDATEDDATA

Delivery

XML FILE

Fax

Scan

EDI

Emai

l

Web

SALES ORDER

� Customer Name� Sales Order # � Ship To � Part Numbers

ERP

ERP Sales Order is created in ERP

VALIDATION WORKFLOW& EXCEPTION HANDLING

SALES ORDERIMAGE

EXTRACTED DATA

ABC253.9 %19.6 %Smith

Users & operators are notified of order creation

VALIDATED DATA

ABC 12.6253.99 GL Act19.6% 102.20John 45.00

XML CONVERSION

Sales Order is automatically archived

Sales Order image is available from ERPERP

User can search & retrieve archived documents

white papeR

eskeR DeliveRywaRe

esker is a recognized leader in helping organizations eliminate paper and improve business processes with on-premise and on-demand document automation solutions. integrating seamlessly with enterprise systems and other applications, esker solutions delivery end-to-end automation of any inbound and outbound document processes.

Esker helps businesses achieve strategic objectives by eliminating manual document handling to gain efficiencies within sales order management, accounts payable, billing, invoicing, cash collection and other key business processes. with patented document delivery automation software and hosted document delivery services, esker offers a total solution to automate every phase and every type of business information exchange. esker addresses many common problems that organizations experience related to their business correspondence — manual handling errors, increasing it complexity, slow processes, high postage costs, competitive pressures and more.

esker was founded in 1985 and operates globally with more than 80,000 customers and millions of licensed users worldwide. esker has global headquarters in lyon, France and U.s. headquarters in Madison, wisconsin.

For more information, visit www.esker.com.

10

About Esker

white papeR

© 2008 esker s.a. all rights reserved. esker and the esker logo are registered trademarks of esker s.a. in the U.s. and other countries. all other trademarks are the property of their respective owners.

white papeR

WORlDWIDE ESkER lOCATIOnS

asia § www.esker.com.sg

australia § www.esker.com.au

France § www.esker.fr

Germany § www.esker.de

italy § www.esker.it

spain § www.esker.es

United kingdom § www.esker.co.uk

MADISOn, WISCOnSIn § U.S. HEADqUARTERS

esker, inc. 1212 Deming way

suite 350 Madison, wi 53717

tel : 608.828.6000 Fax : 608.828.6001

email : [email protected]

www.esker.com