jon atle gullaspråkteknologi og innovasjon1 språkteknologi i industrielle anvendelser or: how we...
TRANSCRIPT
Jon Atle Gulla Språkteknologi og innovasjon 1
Språkteknologi i industrielle
anvendelser
Or: How we have commercialized
linguistic technologies1. Linguistics in search1. Linguistics in search
2. Semantics for interoperability
2. Semantics for interoperability
Jon Atle Gulla
Norwegian University of Science and Technology, Trondheim, Norway
Email: [email protected]
3. Ontologies in process mining
3. Ontologies in process mining
4. Linguistics in news reporting
4. Linguistics in news reporting
Who am I? Professor, Information Systems group, IDI/NTNU
Education:Siv.ing./dr.ing. (information systems, NTH)Cand.philol. (linguistics, AVH)MSc (management, London Business School)
Work experience:Fast Search & Transfer, Munich (linguistics in search) Norsk Hydro, Brussels (enterprise systems)GMD, Darmstadt (information retrieval)
Field of research: Search technologies Semantic Web Social Web Sentiment analysis and recommendations
Jon Atle Gulla ICEIS 2008 2
1. The FAST Alltheweb.com site2000: Alltheweb.com was one of the largest search engines on the InternetFAST acquired Elexir Sprachtechnologie in MunichIntended to add linguistics to search engine
Query
Retrieved documents
Jon Atle Gulla Språkteknologi og innovasjon
Linguistic Techniques in FAST Linguistics in search:
Documents Categories ofdocuments
<none>Searchoptions
Category-based selection
Allselected
Categorizing techniques
Reduced search space
Relevant documents
Transformeddocuments
Query Transformed query
Content-based search
Keyword-based search
Transformational techniques
Increased semantics
Presentational techniques
List of documents
Presentation ofdocument list
Content-based access
Title-based access
Improved transparency
Language identificationSpam detectionTopic categorization
LemmatizationPhrasingAnti-phrasing
Clustering
Jon Atle Gulla Språkteknologi og innovasjon
The FAST Experience
Linguistics a small part of a large system Linguistics as behind-the-scene technology Linguistics not a major breakthrough
Linguistics is not easy: Data-intensive Only statistical approaches feasible at the time
Jon Atle Gulla ICEIS 2008 5
What happened to FAST?2003: Internet part sold to Overture (Yahoo)2009: Enterprise part sold to Microsoft
What happened to FAST?2003: Internet part sold to Overture (Yahoo)2009: Enterprise part sold to Microsoft
2. Semantics in Interoperability Semantic Web:
Adding semantics to data/services for humans and computers to communicate better
Ontology: Explicit representation of a shared conceptualization (domain terminology model)
Semantic markup languages for ontology building (OWL, RDF)
2003: Petromax IIP project for construction of ontology for the oil & gas sector (based on ISO15926)
2011: EU LinkedDesign project for use of ontologies in manufacturing processes
Jon Atle Gulla ICEIS 2008 6
Jon Atle Gulla ICEIS 2008 7
Silly Semantic Conflicts Prevent Data harmonizationMean time between failure
1 “A period of time which is the mean period of time interval between failures”
2 “The time duration between two consecutive failures of a repaired item” (International Electrotechnical Vocabulary online database)
3 “The expectation of the time between failures” (International Electrotechnical Vocabulary online database)
4 “The expectation of the operating time between failures” (MIL-HDBK-29612-4)
5 “Total time duration of operating time between two consecutive failures of a repaired item” (International Electrotechnical Vocabulary online database)
6 “Predicts the average number of hours that an item, assembly, or piece part will operate before it fails” (Jones, J. V. Integrated Logistics Support Handbook, McGraw Hill Inc, 1987)
7 “For a particular interval, the total functional life of a population of an item divided by the total number of failures within the population during the measurement interval. The definition hoolds for time, rounds, miles, events, or other measure of life units”. (MIL-PRF-49506, 1996, Performance Specification Logistics Management Information)
8 “The average length of time a system or component works without failure” (MIL-HDBK-29612-4)
Even simple terms aremisunderstood
Jon Atle Gulla ICEIS 2008 8
<owl:Class rdf:about="#CHRISTMAS_TREE">…<dc:description rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> An artefact that is an assembly of pipes and piping parts, with valves and associated control equipment that is connected to the top of a wellhead and is intended for control of fluid from a well.</dc:description><dc:title rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> CHRISTMAS TREE</dc:title>…<rdfs:subClassOf rdf:resource="#ARTEFACT"/></owl:Class>
OWL petroleum ontology
SemanticWeb Lessons Learned
Data integration and harmonization improved in sector
But: Demanding and complex technologies Semantic Web technologies still immature and expensive So far few commercial solutions using semantic technologies
(Some work on ontology-driven search applications)
Jon Atle Gulla ICEIS 2008 9
3. Ontologies in Process Mining Process mining:
Techniques and tools for discovering process flow, control, data, organizational and social structures from enterprise systems’ event logs
Dynamic reporting for exposing real business flows and explaining interesting transaction patterns
Semantic process mining:Using ontologies to improve the interpretation of event logs and the construction of business flows
Jon Atle Gulla ICEIS 2008 10
Semantic Process Mining
Jon Atle Gulla ICEIS 2008 11
Detected process flow
Formal definition ofprocessterminology
Ontology
Commercialization of Technology 2004: Businesscape founded Ongoing work on Enterprise Visualization Suite:
Combines two challenging technologies (data mining and Semantic Web)
Substantial improvement from traditional process mining (and traditional reporting tools)
However: Difficult to explain the complexity and capability of solution to
customers Few customers competent enough to distinguish process
mining from traditional reporting
Jon Atle Gulla ICEIS 2008 12
4. Linguistics in News Reporting Semantic approaches to
news reporting:
Extract content from news articles Validate content of articles Opinion mining from news articles
and social sites Model user preferences for news recommendation Combine/aggregate knowledge from heterogenous sources
Commercial potential uncertain
Jon Atle Gulla ICEIS 2008 13
Conclusions
Linguistics often a supporting technology Good linguistic resources tedious and expensive
to develop Not always easy to justify inclusion of linguistics
Linguistics in our projects: Enable new services and products Enhance existing services and products
Jon Atle Gulla ICEIS 2008 14