ipw slides

30
EUROPEAN LEGISLATIVE RESPONSES TO INTERNATIONAL TERRORISM A Database of Laws in German Plenary Protocols

Upload: cetin-sert

Post on 29-Jan-2018

161 views

Category:

Presentations & Public Speaking


0 download

TRANSCRIPT

Page 1: Ipw slides

EUROPEAN LEGISLATIVE

RESPONSES TO

INTERNATIONAL TERRORISM

A Database of Laws in German Plenary Protocols

Page 2: Ipw slides

Outline

1. Introduction

2. Xtract: a software for extraction

3. Expected results

4. Discussion

Page 3: Ipw slides

Introduction1

Page 4: Ipw slides

Linking Laws and Plenary Protocols

Extract agenda items and participants‘ information

from plenary protocols from terms 12 – 16

Use GESTA as an index of laws

Link laws to plenary speeches and vice versa

1 introduction

Page 5: Ipw slides

We have ...

Plenary protocol PDFs from electoral terms 12 – 16

1990-12-10 – present

120.655 pages in 1162 documents

GESTA database of laws, terms 8 – 16

1 introduction

Page 6: Ipw slides

We have ...

Plenary protocol PDFs from electoral terms 12 – 16

1990-12-10 – present

120.655 pages in 1162 documents

GESTA database of laws, terms 8 – 16

: ) and ambition to deliver excellent results

1 introduction

Page 7: Ipw slides

We want to ...

Extract from 1990 up to the present time

For each plenary session

Session number, date, ...

For each item on the agenda

Descriptions

list of participants

printed matter references

speech texts

tables

Link the results with our database of laws

1 introduction

Page 8: Ipw slides

Challanges

Older electoral terms are not digitalized

Each electoral term requires different pattern matching strategies

GESTA tables generated for the project

No consistent, direct links to plenary protocols

Course of legislation undetailed

Quality difference between older and newer terms

OCR errors

GESTA Database – no improvements possible for older terms

1 introduction

Page 9: Ipw slides

Xtract2

Page 10: Ipw slides

Xtract – software for data mining

a set of modern tools to annotate plenary protocols with relevant pieces of information

preserves document layout

uses multiple strategies to mark important text blocks

location, shape and internal structure of blocks

pattern matching

Euclidean distances

statistics

comes with its own document viewer

2 software

Page 11: Ipw slides

Xtract – implementation details

PDF access

pdftohtml (custom builds)

Acrobat Professional 9 Extended (older terms)

Data manipulation

C# 4.0: LINQ to XML

Visualization

C# 4.0: WPF (Windows Presentation Foundation)

Statistics

CORSIS: my personal open-source project for corpus analysis

2 software

Page 12: Ipw slides

Xtract – why XML?

Simple and highly-`liquid´ file format

based on simple international standards

excellent APIs in many programming languages

converts easily into other formats

used in Microsoft Office, OpenOffice.org

2 software

Page 13: Ipw slides

Xtract – XML crash course

<event><speaker id=„12“><name>Franz Müntefering</name><is>Bundesminister für Arbeit und Soziales</is>

</speaker></event>

elements

attributes

hierarchical relations

2 software

Page 14: Ipw slides

Xtract – XML crash course

<event><speaker id=„12“><name>Franz Müntefering</name><is>Bundesminister für Arbeit und Soziales</is>

</speaker></event>

elements: event, speaker, name, is

2 software

Page 15: Ipw slides

Xtract – XML crash course

<event><speaker id=„12“><name>Franz Müntefering</name><is>Bundesminister für Arbeit und Soziales</is>

</speaker></event>

attributes: id

2 software

Page 16: Ipw slides

Xtract – XML crash course

<event><speaker id=„12“><name>Franz Müntefering</name><is>Bundesminister für Arbeit und Soziales</is>

</speaker></event>

children: event → speaker

parents: event ← speaker

2 software

Page 17: Ipw slides

Xtract – XML crash course

<event><speaker id=„12“><name>Franz Müntefering</name><is>Bundesminister für Arbeit und Soziales</is>

</speaker></event>

descendants: event → speaker, name, is

2 software

Page 18: Ipw slides

Xtract – XML crash course

<event><speaker id=„12“><name>Franz Müntefering</name><is>Bundesminister für Arbeit und Soziales</is>

</speaker></event>

siblings: name ↔ is

2 software

Page 19: Ipw slides

Xtract – how does it function?

extracts texts from PDF files along with layout

information

2 software

Page 20: Ipw slides

Xtract – how does it function?

merges texts into proximity blocks

2 software

Page 21: Ipw slides

Xtract – how does it function?

marks ambient constructs

2 software

Page 22: Ipw slides

Xtract – how does it function?

marks agenda items

2 software

Page 23: Ipw slides

Xtract – how does it function?

annotates blocks with sections they belong to

2 software

Page 24: Ipw slides

Expected Results3

Page 25: Ipw slides

DIGESTA

Based on `GESTA Gesamtausgaben´: terms 14 – 16

Always up-to-date

Detailed course of legislation information

Direct links to plenary protocols

Can be complemented with keywords from MZES

http://corsis.sf.net/ipw/digesta/

3 results

Done!!

Page 26: Ipw slides

PLEDA – Plenary Protocols Database

Based on plenary protocols

Links agenda items multidirectionally with

participants

Interesting for different linguistic/political research

purposes

3 results

Page 27: Ipw slides

PLEDA – Project Status

12 13 14 15 16

OC

R Run X X - - -

Correction - - -

XML Conversion * * X X X

Division C./S. X X X

Block Merging * * X X X

Ambient Constructs X X X

Page Sections X X X

Interjections * * X X X

Contents * * X

Speeches * * X

Contents-speech links * * X

3 results

Page 28: Ipw slides

GLIT – German Legislative Resp ...

Laws

• .law files

• from GESTA

Protocols

• .pro files

• from BTP

GLIT

• German part of ELIT

3 results

Page 29: Ipw slides

Discussion4

Page 30: Ipw slides

Open questions

Project hosting

Where can we host the results?

Initial GLIT interface

Web service?

Rich client-side app?

Any questions from your side?

4 discussion