meetup lfoppiano
TRANSCRIPT
SuperVISORAdaptive system for paper data form recognition and extraction
Summary
Introduction & problem analisysImplementation and flow chartResultsDemo
Problems analisys
Data requirements:Huge data amounth Simple data
Flexible data importEnvironments where people cant use computers or technological advantages
Status of the art
Commercial software: few software, too many specific and customized, integration and expansion are difficult and expansive.
Opensource software: no software
supervisorengine
Input example (1)
Input example (2)
Input Example (3)
supervisorbuilderXML creation simple and easyDeveloped as GIMP extension.Written in python
OCR /1
OCR Engine: Tesseract, but is simple to change.Tesseract: is one of best opensource OCR engine, is simple to use but (a bit) bugged and hard to train.
OCR /2
Bounding boxStatistic evaluationTrue = black > 20/30%
Results
High accuracyHigh flexibilitySlow comutation: 20 seconds on 300dpi (~10 Mb) image size on a single core.
:þ :F :þ Demo :) :F :þ
Thanks
“There's no place like 127.0.0.1”