presentacion tesisangel-revisada

62
Donostia, 17 de Octubre de 2014 LiDom Builder: Automatising the Construction of Multilingual Domain Modules Ángel Conde Manjón GaLan Research Group – LSI Department University of the Basque Country (UPV/EHU) Supervisors: Dr. Mikel Larrañaga Olagaray & Dr. Ana Arruarte Lasa UPV/EHU 25 February 2016

Upload: angel-conde-manjon

Post on 13-Apr-2017

114 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Presentacion tesisAngel-revisada

LiDom Builder: Automatising the Construction of Multilingual Domain Modules

Ángel Conde ManjónGaLan Research Group – LSI Department

University of the Basque Country (UPV/EHU)

Supervisors:Dr. Mikel Larrañaga Olagaray & Dr. Ana Arruarte Lasa

UPV/EHU

25 February 2016

Page 2: Presentacion tesisAngel-revisada

• Technology Supported Learning Systems (TSLS)• Learning Management Systems: • Massive Open Online Courses: • Intelligent Tutoring Systems: SQL-Tutor• …

• Bilingual and Multilingual Contexts are a reality (Unesco, 2003)

• Acquiring the Domain Module is a cost and work intensive task

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical

Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Context

2

Page 3: Presentacion tesisAngel-revisada

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical

Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Main Goal

Automatising the construction of MULTILINGUAL DOMAIN MODULES

Page 4: Presentacion tesisAngel-revisada

DOM-Sortze (Larrañaga, 2012) a framework for building DOMAIN MODULES from electronic textbooks

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical

Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Previous Work: DOM-Sortze

Page 5: Presentacion tesisAngel-revisada

Electronic Textbook

LDO Gathering

Preprocess

LOs Gathering

Domain Module

Document Body Internal Representation

Document Outline Internal Representation

Learning Domain Ontology

1

23

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical

Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Previous Work: DOM-Sortze

Page 6: Presentacion tesisAngel-revisada

Planetary System Solar System

Moon

Satellite

Planet Earth

partOfpartOfpartOf

isA

isAprerequisite

The Moon is Earth's only natural satellite

LO1

hasDR

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical

Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

DOM-Sortze: Domain Module Representation Formalism

Learning Domain Ontology (LDO)Topics and pedagogical relationships

Learning Objects (LO)• Definitions• Examples• Problem Statements• …

Page 7: Presentacion tesisAngel-revisada

Limitations of DOM-Sortze:

1. Developed for a single language: Basque.

2. Its formalism is not able to represent Multilingual Domain Modules.

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical

Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

DOM-Sortze: Limitations

Page 8: Presentacion tesisAngel-revisada

1. Can be the formalism used in DOM-Sortze be enhanced for Multilingual Domain Modules?

– Extend the formalism to deal with Multilingual Domain Modules.

2. Which enhancements are required to deal with various languages?

– Develop a method for extracting Multilingual Terminology.

– Improve the Relationship Acquisition.

– Provide a method for acquiring Multilingual Learning Objects.

Automatising the construction of MULTILINGUAL DOMAIN MODULES

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical

Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Goals

Page 9: Presentacion tesisAngel-revisada

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical

Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

I. Introduction: Motivations and GoalsII. LiDom Builder: Building Multilingual Domain

ModulesIII. Acquisition of Multilingual TerminologyIV. Identification of Pedagogical RelationshipsV. Gathering Multilingual Learning ObjectsVI. Conclusions and Future Work

Outline

Page 10: Presentacion tesisAngel-revisada

I. Introduction: Motivations and GoalsII. LiDom Builder: Building Multilingual Domain

ModulesIII. Acquisition of Multilingual TerminologyIV. Identification of Pedagogical RelationshipsV. Gathering Multilingual Learning ObjectsVI. Conclusions and Future Work

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical

Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Outline

Page 11: Presentacion tesisAngel-revisada

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical

Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Multilingual Terminology Extraction

Pedagogical Relationship Extraction

Textbook

Multilingual Learning Object

Generation

LiDom Builder

Overview

LiDom Builder: framework for automatising the acquisition of Multilingual Domain Modules

Domain Module

Page 12: Presentacion tesisAngel-revisada

Equiv. “en”Equiv. “es”

Planetary System Solar System

Moon

Satellite

Planet Earth

partOfpartOf partOf

isA

isAprerequisite

pedagogically

Close

“ilargi”

“luna”

“moon”

LO1 LO2

eu

en

es

hasDR hasDR

@

@ @

@

@

@

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical

Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Multilingual Domain Module Formalism

Page 13: Presentacion tesisAngel-revisada

Language Identification

LDO Gathering

Electronic Textbook

Preprocess

LOs Gathering

Document Internal Representation

Document Outline Internal Representation

1

23

Domain ModuleLearning Domain Ontology

NLP Parsers Illinois ChunkerIllinois POS taggerFreeLingIXA-Pipes

Topic ExtractionRelationship ExtractionSet of HeuristicsGrammar

Multilingual LOsGrammar Discourse Markers

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical

Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Proposed Enhancements

LiTeWi

LiReWi

LiLoWi

0

Page 14: Presentacion tesisAngel-revisada

12

Electronic Textbook

LDO Gathering

Preprocess

LOs Gathering

Document Internal Representation

Document Outline Internal Representation

1

23

Domain ModuleLearning Domain Ontology

Knowledge Resources

…..

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical

Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Proposed Enhancements

Page 15: Presentacion tesisAngel-revisada

• Two phases

• Tuning up• Set the thresholds and default confidence values.

• Evaluation• Gold Standard (Recall, Precision, F1-Score).

• Expert validation.

• Use of three textbooks

1. Programming: Introduction to Object Oriented Programming (Wong .S, 2010).

2. Astronomy: Introduction to Astronomy (Morison, 2008).

3. Biology: Introduction to Molecular Biology (Raineri,2010).

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical

Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

General Evaluation Methodology

Page 16: Presentacion tesisAngel-revisada

I. Introduction: Motivation and GoalsII. LiDom Builder: Building Multilingual Domain

ModulesIII. Acquisition of Multilingual TerminologyIV. Identification of Pedagogical RelationshipsV. Gathering Multilingual Learning ObjectsVI. Conclusions and Future Work

IntroductionAcquisition of

Multilingual TerminologyIdentification of

Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Outline

Page 17: Presentacion tesisAngel-revisada

In DOM-Sortze, terminology extracted with ErauzTerm (Alegria et al., 2004).

A new tool called LiTeWi has been developed.

IntroductionAcquisition of

Multilingual TerminologyIdentification of

Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Acquisition of Multilingual Terminology

Page 18: Presentacion tesisAngel-revisada

IntroductionAcquisition of

Multilingual TerminologyIdentification of

Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

LiTeWi

TF-IDF KP-Miner CValue Shallow Parsing Grammar

Electronic TextbookCandidate Extraction

Generic Corpus

Mapping

Disambiguation

Filtering

Mapping to other languagesCandidate Selection

Combination

Page 19: Presentacion tesisAngel-revisada

IntroductionAcquisition of

Multilingual TerminologyIdentification of

Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Shallow Parsing Algorithm

• Uses a derived grammar from (Larrañaga, 2012).

Constraint Grammar applied

to POS tagsShallow Parser

TopicsArray ListStack………

GrammarTopic + [*]+ part of + [det] +Topic……………….

Textbook

Sentences may contain topicsThis is called an Array ListA Stack is used to model systems that exhibit LIFO…

Extraction Rules

Chunksan Array ListA Stack…….

Page 20: Presentacion tesisAngel-revisada

IntroductionAcquisition of

Multilingual TerminologyIdentification of

Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

LiTeWi

TF-IDF KP-Miner CValue Shallow Parsing Grammar

Electronic TextbookCandidate Extraction

Mapping

Disambiguation

Filtering

Mapping to other languages

Generic Corpus

Candidate Selection

Combination

20

Page 21: Presentacion tesisAngel-revisada

IntroductionAcquisition of

Multilingual TerminologyIdentification of

Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Mapping

• Terms mapped to their corresponding Wikipedia articles.

• Search procedure to match Wikipedia article titles and their labels.

Page 22: Presentacion tesisAngel-revisada

IntroductionAcquisition of

Multilingual TerminologyIdentification of

Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

LiTeWi

TF-IDF KP-Miner CValue Shallow Parsing Grammar

Electronic TextbookCandidate Extraction

Mapping

Disambiguation

Filtering

Mapping to other languages

Generic Corpus

Candidate Selection

Combination

22

Page 23: Presentacion tesisAngel-revisada

IntroductionAcquisition of

Multilingual TerminologyIdentification of

Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Disambiguation

• Method based on global disambiguation (Milne et al., 2008).

• Domain knowledge step added to improve the results.

• Use as a disambiguation context the domain important terms.

• Gold Term List: Domain important terms with only one sense.

Monosemic terms that have highest CValue score.

Page 24: Presentacion tesisAngel-revisada

IntroductionAcquisition of

Multilingual TerminologyIdentification of

Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Disambiguation

WikiminerCompare Service

Term List (to disambiguate)-Java

- Inheritance-Property

Disambiguated Term -Java (programming Language)

Gold Term List-Class

-Programming Language-Array List

Class Prog. Lang.

Array List

Prog. Language 0.90 0.85 0.64

Island 0.7 0.77 0.53

City 0.56 0.75 0.6

Average

0.890.70

0.63

-Java

Page 25: Presentacion tesisAngel-revisada

IntroductionAcquisition of

Multilingual TerminologyIdentification of

Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

LiTeWi

TF-IDF KP-Miner CValue Shallow Parsing Grammar

Electronic TextbookCandidate Extraction

Mapping

Disambiguation

Filtering

Mapping to other languages

Generic Corpus

Candidate Selection

Combination

Page 26: Presentacion tesisAngel-revisada

IntroductionAcquisition of

Multilingual TerminologyIdentification of

Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Filtering Unwanted Terms

WikiminerCompare Service

Number of Related Gold Terms

Gold Term List-Solar System- Black Hole-Solar Mass

Term List (to filter)-Universal Studios

-Planet-Windows 98

Relatedness Score-Planet -Windows 98

Domain Related Term

-Planet

-Planet

N(>1)

Threshold(>=0.6)

Solar System (0.34)

Black Hole (0.53)

Solar Mass (0.47)

Solar System (0.23)

Black Hole (0.68)

Solar Mass (0.50)

-Universal Studios

-Windows 98

Page 27: Presentacion tesisAngel-revisada

IntroductionAcquisition of

Multilingual TerminologyIdentification of

Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

LiTeWi

TF-IDF KP-Miner CValue Shallow Parsing Grammar

Electronic TextbookCandidate Extraction

Mapping

Disambiguation

Filtering

Mapping to other languages

Generic Corpus

Candidate Selection

Topic EN ES EUMoon Moon Luna Ilargia

Combination

Page 28: Presentacion tesisAngel-revisada

IntroductionAcquisition of

Multilingual TerminologyIdentification of

Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Evaluation

Tuning up

• Introduction to Object Oriented Programming textbook.

Evaluation

• Gold Standard and Expert Validation.

• Gold Standard based on the terms appearing on the index of each textbook.

• Evaluated on Introduction to Astronomy and Introduction to Molecular Biology.

Page 29: Presentacion tesisAngel-revisada

IntroductionAcquisition of

Multilingual TerminologyIdentification of

Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Results

Gold-Standard Ex. Validation

Precision (%) Recall (%) F1 Score (%) Correctness (%)

Astronomy 3.55 62.96 6.72 18.55

Mol. Biology 2.24 10.21 3.67 49.27

Gold-Standard Ex. ValidationPrecision (%) Recall (%) F1 Score (%) Correctness (%)

Astronomy 17.96 72.55 28.79 78.77

Mol. Biology 27.09 50.53 87.70 71.65

• Wikifier (Cheng , 2013)

• LiTeWi

Page 30: Presentacion tesisAngel-revisada

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Outline

I. Introduction: Motivation and GoalsII. LiDom Builder: Building Multilingual Domain

ModulesIII. Acquisition of Multilingual TerminologyIV. Identification of Pedagogical RelationshipsV. Gathering Multilingual Learning ObjectsVI. Conclusions and Future Work

Page 31: Presentacion tesisAngel-revisada

Introduction

In DOM-Sortze, relationship acquisition for Basque using Shallow Parsing

An adaptation and extension of the Heuristic-based analysis of the outline has been developed.

A new tool called LiReWi has been developed.

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Page 32: Presentacion tesisAngel-revisada

Heuristic-based analysis of the outline

Document Outlines• Reflects the organization made by the author.• The structure of the outline underlies pedagogical relationships.• Low cost process (summarised).

DOM-Sortze• Each outline item is considered as a domain topic.• By default gathers a partOf relation between an item and its subitems. • Heuristics to detect isA relations.

LiDom Builder• Adaptation to English of heuristics from (Larrañaga et al., 2004).• Improvement of isA identification using Wikitaxonomy (Ponzetto et al., 2007).

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Page 33: Presentacion tesisAngel-revisada

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Wikipedia Enhanced Process

………..4.- Structure of polymers / Macromolecules

4.1.- Polymer chemistry4.2.- Molecular weight4.3.- Form, structure and molecular configuration4.3.- Supramolecular arrangement4.4.- Crystalline and amorphous polymers4.5.- Families of polymeric materials

4.5.1.- Thermosettings4.5.2.- Thermoplastics4.5.3.- Elastomers

5.- Phase diagrams / Definitions5.1.- Solid solutions5.2.- Phases rule of Gibbs5.3.- Types of phase diagram

1. Identify groups of sibling nodes

2. Select the groups of leaf nodes in which the partOf relationship has been identified

Thermosettings polymer (Article id= 321827)

Thermoplastic (Article id= 182444)

Elastomer (Article id = 842224)

3. Link and disambiguate each node to a Wikipedia article using Wikiminer (Milne et al., 2012)

Materials scienceElastomersPolymer physics

Polymer physicsPolymer chemistry

4. Process every group using (Ponzetto et al., 2007) taxonomy

5. Infer isA relationship in those groups that share a common ancestor

Page 34: Presentacion tesisAngel-revisada

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Evaluation

Gold Standard

• 57 document outlines in English from different domains.

• Human instructors defined the optimal output (LDOs).

• Each LDO restricted to the topics of the outline.

Page 35: Presentacion tesisAngel-revisada

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Results

• Heuristic Analysis

• Heuristic Analysis + Wikipedia Enhanced Process

partOf isA Total

Precision (%) 84.12 78.95 83.85

Recall (%) 98.66 21.20 83.85

partOf isA Total

Precision (%) 89.19 77.30 87.70

Recall (%) 96.49 50.53 87.70

Page 36: Presentacion tesisAngel-revisada

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Identification of Pedagogical Relationships: LiReWi

Mapping

Topics

Knowledge Bases

LiReWiElectronic Textbook

Candidate Relationship Extraction

Combination & Filtering

Page 37: Presentacion tesisAngel-revisada

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Mapping

Topic: SyntaxWikipedia id=3206060WordNet id=?

Comparer

Page Rank Disambiguation

SyntaxWordNet id= 6176322

SyntaxWordNet id= 8436203

Final id

Mapped WordNet idreturned =

WordNet id = 6176322

! =

Fernando’s Mappings

Babelnet MappingsWiki Id WordNet id3206060 8436203,…………. ………..……… …………

Wiki Id WordNet id3206060 6176322,…………. ………..……… …………

Mapping To WordNet Disambiguation

Disambiguation Context

WordNet id84362036176322……….

Java, Programming….

Page 38: Presentacion tesisAngel-revisada

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Identification of Pedagogical Relationships: LiReWi

MappingCandidate

Relationship Extraction

Topics

Knowledge Bases

LiReWiElectronic Textbook

Combination & Filtering

Page 39: Presentacion tesisAngel-revisada

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Candidate Relationship Extraction

WordNet Extractor

WibiExtractor

WikiRelations Extractor

Shallow Parsing Grammar Extractor

SequentialExtractor

NLP data

WikiTaxonomy Extractor

isApartOf

prerequisite

prerequisitepedagogically

-Close

isApartOf

isAisA isApartOf

Candidate Relationships

Page 40: Presentacion tesisAngel-revisada

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Candidate Relationship Extraction

Path Based Extractors:

Rocky planet

Mars

Planet

(path length=2,confidence=0.9)(path length=1,

confidence=1)

isAisA

WordNet Extractor

WibiExtractor

WikiRelations Extractor

Shallow Parsing Grammar Extractor

SequentialExtractor

WikiTaxonomy Extractor

Page 41: Presentacion tesisAngel-revisada

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Candidate Relationship Extraction

• WikiRelations: Set of tuples that state the relationships between Wikipedia categories.

T Tauri, Star, isA…………Radiation, Radio waves, partOfLight, Electromagnetic radiation, partOf…………Light, Electromagnetic radiation, partOf…………T Tauri star, Star, isA007 license to kill, video games, isA

WikiRelations Tuples

Light partOf Electromagnetic radiation (Confidence=0.7)

Topic: Light Cat1: Light Cat2: …

Topic: Electromagnetic radiation Cat1: Electromagnetic radiation

Topic: ……

WordNet Extractor

WibiExtractor

WikiRelations Extractor

Shallow Parsing Grammar Extractor

SequentialExtractor

WikiTaxonomy Extractor

Page 42: Presentacion tesisAngel-revisada

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Sentences with mentionsEarth is part of the Solar System.……………….

Candidate Relationship Extraction

• Extractor based on the rules defined in (Larrañaga, 2012).

TopicsSolar SystemEarthPlanetMars

Find Mentions Constraint Grammar applied to POS tags

RelationshipsEarth partOf Solar System……………….…………

GrammarTopic + [*]+ part of + [det] +Topic……………….

Textbook

WordNet Extractor

WibiExtractor

WikiRelations Extractor

Shallow Parsing Grammar Extractor

SequentialExtractor

WikiTaxonomy Extractor

Page 43: Presentacion tesisAngel-revisada

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

WordNet Extractor

WibiExtractor

WikiRelations Extractor

Shallow Parsing Grammar Extractor

SequentialExtractor

WikiTaxonomy Extractor

Candidate Relationship Extraction

Textbook

TopicsWavelengthEmission spectrumPlanetSolar System

Find Mentions

Look links in/links out on

WikipediaReasoner

RelationsEmission spectrumpedagogicallyClose Wavelength…………………….

Possible candidates:Wavelength, Emission Spectrum

(2 times)

Sentences with mentions...leading to different radiated wavelengths, make up an emission spectrum. ... the emission spectrum of a particular star, the wavelength of ………………..

Relatedness > threshold

Emission spectrum (link out) WavelengthWavelength (link out) Emission spectrum

Page 44: Presentacion tesisAngel-revisada

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Candidate Relationship Extraction

Topic1 Topic2 Topic3 Topic4

Topic1 is pedagogicallyClose to Topic2 Topic3 is a prerequisite of Topic4

4

3

4

1

Mentions (Links):-Topic3, 4 mentions -….

Mentions (Links):-Topic4, 1 mentions -….

Mentions (Links):-Topic2, 3 mentions -….

Mentions (Links):-Topic1, 4 mentions -….

WordNet Extractor

WibiExtractor

WikiRelations Extractor

Shallow Parsing Grammar Extractor

SequentialExtractor

WikiTaxonomy Extractor

Page 45: Presentacion tesisAngel-revisada

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Identification of Pedagogical Relationships: LiReWi

MappingCandidate

Relationship Extraction

Combination & Filtering

Learning Domain Ontology

Topics

Knowledge Bases

LiReWiElectronic Textbook

Page 46: Presentacion tesisAngel-revisada

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Combination & Filtering Relationships

-Earth isA Planet (WordNet Ex) (Conf=1)-Earth isA Planet (WikiRelations Ex) (Conf=0.8)-Planet isA Earth (WikiTax Ex) (Conf=0.7)-Earth partOf Solar System (WordNet Ex) (Conf=1)-Earth isA Terrestrial Planet (WikiTax Ex) (Conf=0.5)

-Earth isA Planet (WordNet Ex, WikiRelations Ex) (Conf=1)

-Earth partOf Solar System (WordNet Ex) (Conf=1)

Relationships

-Earth isA Planet (WordNet Ex, WikiRelations Ex) (Conf=1)-Planet isA Earth (WikiTax Ex) (Conf=0.7)-Earth partOf Solar System (WordNet Ex) (Conf=1)

-Earth isA Planet (WordNet Ex, WikiRelations Ex) (Conf=1)-Earth partOf Solar System (WordNet Ex) (Conf=1)-Earth isA Terrestrial Planet (WikiTax Ex) (Conf=0.5)

Confidence Combiner

Conflict Resolver

Filter

Final Relationships

Conflict Resolution

Relationships combined

Filter below threshold

-Planet isA Earth (WikiTax Ex) (Conf=0.7)

-Earth isA Terrestrial Planet (WikiTax Ex) (Conf=0.5)

Page 47: Presentacion tesisAngel-revisada

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Evaluation

Tuning up

• Introduction to Object Oriented Programming textbook.

Evaluation • Gold Standard and Expert Validation.

• Introduction to Astronomy textbook.

• Gold standard, four experts stated the set of relationships.

• Using a subset of the main domain topics according to the score given by LiTeWi.

Page 48: Presentacion tesisAngel-revisada

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Results

Precision (%) Recall (%) F1-Score (%) ExpertValidation (%)

LiReWi 36.21 50.57 42.42 43.98

DOM-Sortze 63.27 20.74 31.24 N.A.

Page 49: Presentacion tesisAngel-revisada

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical

Relationships

Gathering Multilingual Learning Objects

Conclusions and Future WorkLiDom Builder

Outline

I. Introduction: Motivations and GoalsII. LiDom Builder: Building Multilingual Domain

ModulesIII. Acquisition of Multilingual TerminologyIV. Identification of Pedagogical RelationshipsV. Gathering Multilingual Learning ObjectsVI. Conclusions and Future Work

Page 50: Presentacion tesisAngel-revisada

Gathering Multilingual Learning Objects

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical

Relationships

Conclusions and Future WorkLiDom Builder

Introduction

50

In DOM-Sortze, LOs acquisition for Basque using Shallow Parsing.

A Validation of the approach for English has been carried out.

LiLoWi has been developed to move towards the elicitation of Multilingual LOs.

Page 51: Presentacion tesisAngel-revisada

Gathering Multilingual Learning Objects

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical

Relationships

Conclusions and Future WorkLiDom Builder

Adapting Learning Object elicitation to English

Basque English

Pattern adibidez, @topic for instance, @topic

Example Uretan, adibidez hidrogeno eta oxigeno atomoak daude.

For instance, there are hydrogen and oxygen atoms in water.

Textbook

TopicsWavelengthEmission spectrumEarth.Solar System Find

Mentions Grammar

Sentences with mentionsEarth is a planet.……………….

Learning Objects

The Moon is Earth's only natural satellite

Page 52: Presentacion tesisAngel-revisada

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical

Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Evaluation

Gold Standard and Expert Validation:

• Evaluated on Introduction to Object Oriented Programming.

• Gold Standard built by some experts.

Two Aspects

• Grammar.• Learning Objects.

Page 53: Presentacion tesisAngel-revisada

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical

Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Evaluation

Definitions Examples Prob. Stat. Princ. Stat. TotalFound 164 1 12 49 226

Correct 138 1 7 35 181

Precision (%) 84.15 100 58.33 71.43 80.09

Recall (%) ExpertValidation (%)

DOM-Sortze 70.31 91.88

LiDom 75.93 86.79

• Grammar

• Learning Objects

Page 54: Presentacion tesisAngel-revisada

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical

Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

LiLoWi

54

Metadata Generator

Multilingual LOs from WordNet/Wikipedia

TopicsSolar SystemEmission spectrumEarth. LO2es

LO1en

LO2en

Equivalents

Page 55: Presentacion tesisAngel-revisada

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical

Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

• Evaluated on the Principles of Object-Oriented Programming.

• Used the same LDO described in the previous experiment.

• Expert Validation.

Two Aspects

How LiLoWi enhanced the LO coverage for the LDO topics.

How many multilingual LOs are extracted.

Evaluation

55

Page 56: Presentacion tesisAngel-revisada

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical

Relationships

Gathering Learning Objects

Conclusions and Future WorkLiDom Builder

Results

56

  Definitions ReferencesEnglish Spanish Basque French

Number of topicsTopic coverage (%)

4656.10

3643.90

910.97

3643.90

1214.63

• Grammar + Wikipedia/WordNet

Total Definitions

Number of topics 21 19

Topics coverage (%) 25.61 19.51

• Grammar-based approach

Page 57: Presentacion tesisAngel-revisada

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical

Relationships

Gathering Learning Objects

Conclusions and Future Work

LiDom Builder

I. Introduction: Motivation and GoalsII. LiDom Builder: Building Multilingual Domain

ModulesIII. Acquisition of Multilingual TerminologyIV. Identification of Pedagogical RelationshipsV. Gathering Multilingual Learning ObjectsVI. Conclusions and Future Work

Outline

57

Page 58: Presentacion tesisAngel-revisada

1. Provision of a suitable formalism to represent Multilingual Domain Modules.

2. Developed a method for the elicitation of multilingual terminology.– First term extractor to our knowledge based on searching patterns for

educational content.

3. Relationship Acquisition has been improved.– Extension of outline processor to English + Enhancement with Wikipedia.– Development of LiReWi, a module for the elicitation of pedagogical

relationships for Educational Ontologies.– Developed a state of the art mapper from Wikipedia to WordNet.

4. Developed a method for multilingual LO generation. – Extension of DOM-Sortze for English.– Development of LiLoWi, a module for the elicitation of multilingual LOs using

different knowledge bases.

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical

Relationships

Gathering Learning Objects

Conclusions and Future Work

LiDom Builder

Goal Achievement

Page 59: Presentacion tesisAngel-revisada

Conclusions and Future Work

• Automatising the inclusion of new languages.

• Multilingual Learning Object generation from similarity and machine translation techniques.

• Concept Map-Based Learning Object Generation.

• Improvements on each module of LiDom Builder.

Future Work

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical

Relationships

Gathering Learning ObjectsLiDom Builder

Page 60: Presentacion tesisAngel-revisada

Conclusions and Future Work

Software Released

Software

• LiTeWi, released with Spanish/English support: https://github.com/Neuw84/LiTe

• Wikipedia/WordNet mapper: https://github.com/Neuw84/Wikipedia2WordNet

• Spanish stemmer: https://github.com/Neuw84/SpanishInflectorStemmer

• Training Data for Wikiminer: https://github.com/Neuw84/Wikipedia353Spanish

• LiReWi: coming soon….

Web Demo

• LiDom builder : http://galan.ehu.es/lidom/

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical

Relationships

Gathering Learning ObjectsLiDom Builder

Page 61: Presentacion tesisAngel-revisada

IntroductionAcquisition of Multilingual Terminology

Identification of Pedagogical

Relationships

Gathering Learning Objects

Conclusions and Future Work

LiDom Builder

Publications

A Combined Approach for Eliciting Relationships for Educational Ontologies Using Several Knowledge Bases. Ángel Conde, Mikel Larrañaga, Ana Arruarte, Jon A. Elorriaga. Journal of Knowledge-Based Systems. Submitted.

LiteWi: A Combined Term Extraction Method for Eliciting Educational Ontologies from Textbooks.Ángel Conde, Mikel Larrañaga, Ana Arruarte, Jon A. Elorriaga, Dan Roth. Journal of the Association for Information Science and Technology, 67(2), pp. 380–399, 2016.

Testing Language Independence in the Semiautomatic Construction of Educational Ontologies. Ángel Conde, Mikel Larrañaga, Ana Arruarte, Jon A. Elorriaga. 12th International Conference on Intelligent Tutoring Systems ITS 2014, Springer, Vol. 8474, pp. 545-550, 2014.

Automatic Generation of the Domain Module from Electronic Textbooks. Method and Validation. Mikel Larrañaga, Ángel Conde, Iñaki Calvo, Jon A. Elorriaga, Ana ArruarteIEEE Transactions on Knowledge and Data Engineering, 26(1), pp. 69-82, 2014.

Automating the Authoring of Learning Material in Computer Engineering Education.Ángel Conde, Mikel Larrañaga, Iñaki Calvo, Jon A. Elorriaga, Ana Arruarte. 42nd Frontiers in Education Conference, pp. 1376-1381, 2012.

Page 62: Presentacion tesisAngel-revisada

LiDom Builder: Automatising the Construction of Multilingual Domain

Ángel Conde ManjónGaLan Research Group – LSI department, University of the Basque

Country (UPV/EHU)

Supervisors:Mikel Larrañaga Olagaray & Ana Arruarte Lasa

UPV/EHU