practical project of the 2006 joint international master’s degree
TRANSCRIPT
![Page 1: Practical Project of the 2006 Joint International Master’s Degree](https://reader036.vdocument.in/reader036/viewer/2022081516/56649e885503460f94b8c487/html5/thumbnails/1.jpg)
Practical Project of the 2006Joint International Master’s Degree
![Page 2: Practical Project of the 2006 Joint International Master’s Degree](https://reader036.vdocument.in/reader036/viewer/2022081516/56649e885503460f94b8c487/html5/thumbnails/2.jpg)
Agenda
Introduction Technologies in use Architecture Demonstration Remaining Issues Work packages for Semester II Questions & Comments
![Page 3: Practical Project of the 2006 Joint International Master’s Degree](https://reader036.vdocument.in/reader036/viewer/2022081516/56649e885503460f94b8c487/html5/thumbnails/3.jpg)
Introduction
Practical project during the course of studies Timeframe: two terms Topic: Prototype of a semantic search engine
using UIMA
Objectives of the first semester Study the UIMA-Framework and OpenNLP library Search for players, teams, matches and dates Semantic search for goal events Implement an executable prototype
![Page 4: Practical Project of the 2006 Joint International Master’s Degree](https://reader036.vdocument.in/reader036/viewer/2022081516/56649e885503460f94b8c487/html5/thumbnails/4.jpg)
Technologies in Use
UIMA-Framework OpenNLP Java / Java Server Pages Tomcat-Server Python (Webcrawler)
![Page 5: Practical Project of the 2006 Joint International Master’s Degree](https://reader036.vdocument.in/reader036/viewer/2022081516/56649e885503460f94b8c487/html5/thumbnails/5.jpg)
ArchitectureOverview
Unstructured informationPlain Text
converter (parser)
Persistent Search index
UIMA-Framework
OpenNLP
Input
Output
Sentence detection
Word detection
Paragraph detection
Date & Time annotator
Player annotator Match annotator
CAS
NLP-Annotator 1
Goal-Event annotator
User Interface
![Page 6: Practical Project of the 2006 Joint International Master’s Degree](https://reader036.vdocument.in/reader036/viewer/2022081516/56649e885503460f94b8c487/html5/thumbnails/6.jpg)
ArchitectureWebcrawler
Usage of web crawler for preselection of Texts
Implemented in Python Crawls ca. 2500 pages in 20 minutes Presently based on keywords Transfer of results to Jimgle still
manual
![Page 7: Practical Project of the 2006 Joint International Master’s Degree](https://reader036.vdocument.in/reader036/viewer/2022081516/56649e885503460f94b8c487/html5/thumbnails/7.jpg)
ArchitectureNLP-Annotator
Usage of the OpenNLP-Tools & API Rule based approach Tagging of paragraphs, sentences and words Part-of-Speech-Tagging
Implementation in UIMA as separate annotator Results are used by consecutive annotators Internal usage only, not displayed in the search
index
![Page 8: Practical Project of the 2006 Joint International Master’s Degree](https://reader036.vdocument.in/reader036/viewer/2022081516/56649e885503460f94b8c487/html5/thumbnails/8.jpg)
Architecture
Identification of players of the WM2006 Rule based implementation Usage of the OpenNLP word-annotations Matching against the player database
(XML-File) Consideration of last names and
nicknames
Player-Annotator
![Page 9: Practical Project of the 2006 Joint International Master’s Degree](https://reader036.vdocument.in/reader036/viewer/2022081516/56649e885503460f94b8c487/html5/thumbnails/9.jpg)
ArchitectureDate & Time-Annotator
Identification of time and date information Usage of the OpenNLP word-annotations Presently custom, rule based implementation Detecs standard conform time & date
information Detection of relative or colloquial time
information not implemented yet
![Page 10: Practical Project of the 2006 Joint International Master’s Degree](https://reader036.vdocument.in/reader036/viewer/2022081516/56649e885503460f94b8c487/html5/thumbnails/10.jpg)
ArchitectureMatch-Annotator
Identification of matches Based on 3 components
Detection of locality Detection of participating teams Detection of the match result
Usage of upstream annotators OpenNLP word-annotations Player annotations Date- & time-annotations
![Page 11: Practical Project of the 2006 Joint International Master’s Degree](https://reader036.vdocument.in/reader036/viewer/2022081516/56649e885503460f94b8c487/html5/thumbnails/11.jpg)
ArchitectureGoal-Event Annotator
Description of goals are too complex for a rule-based detection
Therefore: Machine based learning Usage of the OpenNLP library Based on statistical information of sentences Comprehensive training necessary
Implementation as OpenNLP component Integration into UIMA by wrapper-classes
![Page 12: Practical Project of the 2006 Joint International Master’s Degree](https://reader036.vdocument.in/reader036/viewer/2022081516/56649e885503460f94b8c487/html5/thumbnails/12.jpg)
ArchitecturePersistent Indexing
Functionality Import of all files in a specific directory Annotation of all available texts Compilation of XML-Files with CAS-data of
every source text Adjacent creation of a search index
Provision of index files for the web-server
![Page 13: Practical Project of the 2006 Joint International Master’s Degree](https://reader036.vdocument.in/reader036/viewer/2022081516/56649e885503460f94b8c487/html5/thumbnails/13.jpg)
ArchitectureGraphical User Interface
Linux server with tomcat installation Simple operation via web-based GUI Search queries are handled by Java server
pages Processing of requests by Java beans
![Page 14: Practical Project of the 2006 Joint International Master’s Degree](https://reader036.vdocument.in/reader036/viewer/2022081516/56649e885503460f94b8c487/html5/thumbnails/14.jpg)
Demonstration Search engine
![Page 15: Practical Project of the 2006 Joint International Master’s Degree](https://reader036.vdocument.in/reader036/viewer/2022081516/56649e885503460f94b8c487/html5/thumbnails/15.jpg)
Open IssuesFurther proceeding…?
Search for attributes e.g. Player AND Germany (presently only via OmniFind)
Automate processing of search engine results
Further training of the components Usage improvements at front- and
backend
![Page 16: Practical Project of the 2006 Joint International Master’s Degree](https://reader036.vdocument.in/reader036/viewer/2022081516/56649e885503460f94b8c487/html5/thumbnails/16.jpg)
New scenarios……for the second semester
Automated analysis of eMails Search for phone numbers Search for customer contacts of employee Find employees with specific skills Find links & relations between employees
Competitive analysis Compare own products with ones from competitors Find out about customer opinions in internet portals
Further ideas??
![Page 17: Practical Project of the 2006 Joint International Master’s Degree](https://reader036.vdocument.in/reader036/viewer/2022081516/56649e885503460f94b8c487/html5/thumbnails/17.jpg)
Ideas……for the second semester
Natural language based search queries Design templates for customizable
annotators Machine based learning for the Web-Crawler Mark annotations in the search results Automated processing of search results Implement more anotators via OpenNLP Provide annotators as web-services
Further ideas??
![Page 18: Practical Project of the 2006 Joint International Master’s Degree](https://reader036.vdocument.in/reader036/viewer/2022081516/56649e885503460f94b8c487/html5/thumbnails/18.jpg)
JIMGLEJIM Master-Project
Questions?
Suggestions?
![Page 19: Practical Project of the 2006 Joint International Master’s Degree](https://reader036.vdocument.in/reader036/viewer/2022081516/56649e885503460f94b8c487/html5/thumbnails/19.jpg)
JIMGLEJIM Master-Project
Thanks for your attention…