goobi overview
TRANSCRIPT
Steff
en Hankiew
icz, intranda GmbH
The Goobi workflow system, introduction and demos Steffen Hankiewicz, intranda GmbH London, 10.11.2014
1
10.11.2014
Steff
en Hankiew
icz, intranda GmbH
2
10.11.20141. What are we doing?
We����������� ������������������ are����������� ������������������ content����������� ������������������ providers
+ + =
Steff
en Hankiew
icz, intranda GmbH
3
10.11.20142. We are content providers …
Imagine:
‣ 10.000 pages
Precondi1on:
‣ digi?zed images on the web online
Search for: Illustra1on
Steff
en Hankiew
icz, intranda GmbH
4
10.11.20142. We are content providers …
Imagine:
‣ 10.000 pages
Precondi1on:
‣ digi?zed images on the web online
Search for: Illustra1on
‣ … in the full text
Steff
en Hankiew
icz, intranda GmbH
5
10.11.20142. We are content providers …
Imagine:
‣ 10.000 pages
Precondi1on:
‣ digi?zed images on the web online
Search for: Illustra1on
‣ … in the full text
‣ … on page CCXVI
Steff
en Hankiew
icz, intranda GmbH
6
10.11.20142. We are content providers …
Imagine:
‣ 10.000 pages
Precondi1on:
‣ digi?zed images on the web online
Search for: Illustra1on
‣ … in the full text
‣ … on page CCXVI
‣ … as structure element
Steff
en Hankiew
icz, intranda GmbH
7
10.11.20142. We are content providers …
Imagine:
‣ 10.000 pages
Precondi1on:
‣ digi?zed images on the web online
Search for: Illustra1on
‣ … in the full text
‣ … on page CCXVI
‣ … as structure element
‣ … as word in the 1tle of a chapter
Steff
en Hankiew
icz, intranda GmbH
8
10.11.20142. We are content providers …
Imagine:
‣ 10.000 pages
Precondi1on:
‣ digi?zed images on the web online
Search for: Illustra1on
‣ … in the full text
‣ … on page CCXVI
‣ … as structure element
‣ … as word in the ?tle of a chapter
‣ … as synonym for ‚drawing’
Steff
en Hankiew
icz, intranda GmbH
9
10.11.20143. We need some workflow
1. Source material
2. Create digital version
3. Transforma?on & Enrichment
4. Publish digital version(s)
for����������� ������������������ each����������� ������������������ item����������� ������������������
Steff
en Hankiew
icz, intranda GmbH
9
10.11.20143. We need some workflow
1. Source material
2. Create digital version
3. Transforma?on & Enrichment
4. Publish digital version(s)
Image conversion
Image valida?on
OCR
ALTO genera?on
Descrip?ve metadata
Technical metadata
Pagina?on
Catalogue enrichment
Ingest into archive
NER
Authority data
Persistent Iden?fiers
for����������� ������������������ each����������� ������������������ item����������� ������������������
Logical structures Invoicing
Steff
en Hankiew
icz, intranda GmbH
10
10.11.20144. Goobi -‐ a quick overview
... try to solve common problems
‣ Web applica?on
‣ Workflow tool
‣ Manage users
‣ Organize projects
‣ Deadlines
‣ Data storage
‣ Metadata formats
Steff
en Hankiew
icz, intranda GmbH
11
10.11.20145. Goobi -‐ how it works ...
... a simple approach
‣ Workflows cut into small pieces
‣ Simple sequen?al order of tasks
‣ As much valida?on as early as possible
‣ Restrict access to the requirements
‣ Hide everything else from the user
Steff
en Hankiew
icz, intranda GmbH
12
10.11.20146. Goobi -‐ the users perspective
... avoid difficul?es
‣ Simple UI
‣ Work with To-‐Do-‐List
‣ Hidden complexi?es:
‣ Storage
‣ Projects
‣ Infrastructure
‣ Clean desk
Steff
en Hankiew
icz, intranda GmbH
13
10.11.20146. Goobi -‐ the users perspective
Steffen Goobi web interface
Steff
en Hankiew
icz, intranda GmbH
13
10.11.20146. Goobi -‐ the users perspective
working directory of Steffen
Steffen Goobi web interface
Steff
en Hankiew
icz, intranda GmbH
13
10.11.20146. Goobi -‐ the users perspective
working directory of Steffen
Steffen Goobi web interface
Steff
en Hankiew
icz, intranda GmbH
14
10.11.20146. Goobi -‐ the users perspective
Server side programs
Plugins without user interface
Plugins with user interface
Steff
en Hankiew
icz, intranda GmbH
15
10.11.20147. Goobi -‐ Management overview
... manage your workflows
‣ Manage all typical configura?on in the UI
‣ Workflows
‣ Projects
‣ Users
‣ User groups
‣ Imports
‣ Exports
Steff
en Hankiew
icz, intranda GmbH
16
10.11.20147. Goobi -‐ Management overview
... control your progress
‣ Controlling and sta?s?cs
‣ Manipulate workflows aferwards (e.g. with GoobiScript)
‣ Collaborate with external partners or agencies
Steff
en Hankiew
icz, intranda GmbH
17
10.11.20148. Goobi -‐ technical background
... what else can be done?
‣Workflows can be ...
‣ simple or complex
‣ short or long
‣ contain tasks
‣ have a progress
‣ used as template
Import from catalogue
Scanning
Quality control
Image conversion
OCR
Structure- & metadata
ID-Generating
Presentation
Archiving
Steff
en Hankiew
icz, intranda GmbH
18
10.11.20148. Goobi -‐ technical background
... what else can be done?‣ Workflow steps can ...
‣ be executed manually by a user
‣ be executed automa?cally by the server
‣ interrupt the workflow for a given ?me
‣ contain a valida?on
‣ allow or forbid access or changes
‣ be triggered by a web-‐API
‣ call scripts or external programs
‣ have their own UI as plugins
Import from catalogue
Scanning
Quality control
Image conversion
OCR
Structure- & metadata
ID-Generating
Presentation
Archiving
Steff
en Hankiew
icz, intranda GmbH
19
10.11.20149. Goobi -‐ Extend its functionality
GoobiWeb API
command plugins
Import plugins
Validation plugins
Step plugins
... ... ... ...
Close step
Create process
Run script
Delete files
...
...
OAI Word
Steff
en Hankiew
icz, intranda GmbH
19
10.11.20149. Goobi -‐ Extend its functionality
GoobiWeb API
command plugins
Import plugins
Validation plugins
Step plugins
PICA MARC
...
...
QA
JP2
Ingest
Export
...
...
JP2 MD5
Schema Color depth
...
...
... ...... ...
Steff
en Hankiew
icz, intranda GmbH
20
10.11.201410. Goobi -‐ Scripts & applications
‣ OCR
‣ JPEG
‣ JPEG 2000
‣ Jpylyzer
‣ Archiving
‣ Download-‐Jobs
‣ Exporters
‣ Named En?ty Recogni?on
Steff
en Hankiew
icz, intranda GmbH
21
10.11.201411. Goobi -‐ production proven
‣ Lots of ins?tu?ons
‣ Different kinds of material
‣ Community driven
‣ Open Source
‣ Ac?ve development
‣ Inhouse or hosted
‣ Scalabilty
A����������� ������������������ lot����������� ������������������ of����������� ������������������ happy����������� ������������������ content����������� ������������������ providers
Steff
en Hankiew
icz, intranda GmbH
Questions?
22
10.11.2014
‣ hip://www.intranda.com
‣ +49 551 29176100
intranda GmbH -‐ Steffen Hankiewicz