meetup lfoppiano

15
SuperVISOR Adaptive system for paper data form recognition and extraction

Upload: luca-foppiano

Post on 11-May-2015

315 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Meetup Lfoppiano

SuperVISORAdaptive system for paper data form recognition and extraction

Page 2: Meetup Lfoppiano

Summary

Introduction & problem analisysImplementation and flow chartResultsDemo

Page 3: Meetup Lfoppiano

Problems analisys

Data requirements:Huge data amounth Simple data 

Flexible data importEnvironments where people cant use computers or technological advantages

Page 4: Meetup Lfoppiano

Status of the art

Commercial software: few software, too many specific and customized, integration and expansion are difficult and expansive.

Opensource software: no software

Page 5: Meetup Lfoppiano

supervisor­engine

Page 6: Meetup Lfoppiano

Input example (1)

Page 7: Meetup Lfoppiano

Input example (2)

Page 8: Meetup Lfoppiano

Input Example (3)

Page 9: Meetup Lfoppiano

supervisor­builderXML creation simple and easyDeveloped as GIMP extension.Written in python

Page 10: Meetup Lfoppiano

OCR /1

OCR Engine: Tesseract, but is simple to change.Tesseract: is one of best opensource OCR engine, is simple to use but (a bit) bugged and hard to train.

Page 11: Meetup Lfoppiano

OCR /2

Bounding boxStatistic evaluationTrue = black > 20/30%

Page 12: Meetup Lfoppiano

Results

High accuracyHigh flexibilitySlow comutation: 20 seconds on 300dpi (~10 Mb) image size on a single core. 

Page 13: Meetup Lfoppiano

:þ :F :þ Demo :) :F :þ

Page 14: Meetup Lfoppiano

Thanks

“There's no place like 127.0.0.1”

Page 15: Meetup Lfoppiano