controlled vocabularies and text mining - use cases at the goettingen

23
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State and University Library Ralf Stockmann (UGOE)

Upload: ralf-stockmann

Post on 11-May-2015

407 views

Category:

Education


2 download

DESCRIPTION

The amount of online data that supplies geo-spatial and temporal metadata has grown rapidly in recent years. Social networks like Twitter, Flickr, and YouTube are popular providers of masses of data that are hard to browse. Our europeana 4D interface – e4D – enables comparative visualisation of multiple queries and supports data annotated with time span data. We implemented our design in a prototype application in the context of the European project EuropeanaConnect. It is based on a client-server architecture that charges the client with the main functionality of the system.

TRANSCRIPT

Page 1: Controlled Vocabularies and Text Mining - Use Cases at the Goettingen

Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State and University LibraryRalf Stockmann (UGOE)

Page 2: Controlled Vocabularies and Text Mining - Use Cases at the Goettingen

DATA

RESS

OU

RCES

COM

PUTI

NG

RESE

ARCH

ENVI

RON

MEN

TS

Textmining

Enhanced Context-Search

Multilingual Access

DBPedia, ...

Visualisation

Metadata

OCR/Fulltext

Named Entity Recognition

Catalog Data

Crowd- sourcing

Annotation Tools

Relationship Graphs

Linked Open Data

Ontologies

Scholars

Libraries

Reposi-tories

Page 3: Controlled Vocabularies and Text Mining - Use Cases at the Goettingen

DATA

RESS

OU

RCES

COM

PUTI

NG

RESE

ARCH

ENVI

RON

MEN

TS

Textmining

Enhanced Context-Search

Multilingual Access

DBPedia, ...

Visualisation

Metadata

OCR/Fulltext

Named Entity Recognition

Catalog Data

Crowd- sourcing

Annotation Tools

Relationship Graphs

Linked Open Data

Ontologies

Scholars

Libraries

Reposi-tories

Use case #1:

eAqua

Page 4: Controlled Vocabularies and Text Mining - Use Cases at the Goettingen

Projekt: eAqua

• Partners:– Institut of Computer Science - Computerlinguistic,

Leipzig (Büchler, Eckart, Heyer, Baumgardt)– SUB Göttingen (Stockmann, Kothe, Mahnke)

• Comparing semantic graphs between– Headings of journal articles and– Fulltext of the same articles

Page 5: Controlled Vocabularies and Text Mining - Use Cases at the Goettingen

Search Term „socialism“ on title elements

Page 6: Controlled Vocabularies and Text Mining - Use Cases at the Goettingen

„Mephisto“ on fulltext

Page 7: Controlled Vocabularies and Text Mining - Use Cases at the Goettingen

DATA

RESS

OU

RCES

COM

PUTI

NG

RESE

ARCH

ENVI

RON

MEN

TS

Textmining

Enhanced Context-Search

Multilingual Access

DBPedia, ...

Visualisation

Metadata

OCR/Fulltext

Named Entity Recognition

Catalog Data

Crowd- sourcing

Annotation Tools

Relationship Graphs

Linked Open Data

Ontologies

Scholars

Libraries

Reposi-tories

Use case #2:

Europeana 4D visualisation

- Prof. Dr. Gerik Scheuermann- Stefan Jänicke- Christian Mahnke- Ralf Stockmann

Page 8: Controlled Vocabularies and Text Mining - Use Cases at the Goettingen

Concept

MAP

Page 9: Controlled Vocabularies and Text Mining - Use Cases at the Goettingen

Concept

MAP TIMELINE

Page 10: Controlled Vocabularies and Text Mining - Use Cases at the Goettingen

Concept

MAP TIMELINE

Page 11: Controlled Vocabularies and Text Mining - Use Cases at the Goettingen

• Multiple data layers• Interaction• Animation• Aggregation of data• Connections• Drilldown• Historical/custom

maps• Result table• Splitting Datasets• ...

Refinement

Page 12: Controlled Vocabularies and Text Mining - Use Cases at the Goettingen
Page 13: Controlled Vocabularies and Text Mining - Use Cases at the Goettingen

Technological Framework

• OpenLayers• Simile Timeline/Timeplot• GeoNames (Geoparser...)• Explorer Canvas (Google)• GeoServer (OpenStreetmap, Google Maps)• Google Web Toolkit (GWT)• KML (XML)

Page 14: Controlled Vocabularies and Text Mining - Use Cases at the Goettingen

KML

Data Model

WHAT?

WHERE? WHEN?

NAME

description

url

COORDINATES

address

TIMESTAMP

range

MANDATORY

optional

Page 15: Controlled Vocabularies and Text Mining - Use Cases at the Goettingen

Exchange Format: KML (XML)

Page 16: Controlled Vocabularies and Text Mining - Use Cases at the Goettingen

Questonnaire

Who was the first football playerborn on the continent and playing for an English team?

Page 17: Controlled Vocabularies and Text Mining - Use Cases at the Goettingen

Questonnaire

In how many years more museumsare established

in the western states of the USA than

in the eastern states?

Page 18: Controlled Vocabularies and Text Mining - Use Cases at the Goettingen

Questonnaire

In how many years more museums

are founded in the western states

of the USA than in the eastern states?

Page 19: Controlled Vocabularies and Text Mining - Use Cases at the Goettingen

Questonnaire

In how many years more museums

are founded in the western states

of the USA than in the eastern states?

Page 20: Controlled Vocabularies and Text Mining - Use Cases at the Goettingen

Questonnaire

Iraq police claims „only 265 civilian casualties since March 2007 in

Bagdhad“. Are they right?

Page 21: Controlled Vocabularies and Text Mining - Use Cases at the Goettingen

Datasets

• Library catalog• Flickr• IMDB• DBpedia• WikiLeaks

Flickr: „tsunami“

Page 22: Controlled Vocabularies and Text Mining - Use Cases at the Goettingen

Use your own data in 5 easy steps!

1. Take a look at the .kml specificationhttp://tinyurl.com/e4d-kml

2. Build your own KML dataset3. Upload it to a webserver4. Put the URL into the prototype at http://tinyurl.com/e4d-

demo5. Share your set via the magnetic link!

Page 23: Controlled Vocabularies and Text Mining - Use Cases at the Goettingen

Ressources

• e4D info website:

• Europeana thoughtLab: http://www.europeana.eu/portal/thoughtlab.html