search engines and knowledge graphs - panos alexopoulos€¦ · some usages • metadata extraction...
TRANSCRIPT
![Page 1: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/1.jpg)
Search Engines and Knowledge Graphs
It’s Complicated!
Panos AlexopoulosHead of Ontology
![Page 2: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/2.jpg)
Who we are and what we do
![Page 3: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/3.jpg)
We develop Technology to bridge the language and meaning gap between People and Jobs ...
I like programming, but I’m interested do take on more project management responsibility
Is there a job in our organisation that better fits my degree?
I’d like to work on our mobile strategy. I’ve helped a friend develop a mobile app.
I’d like to do more with my organisational talent.
We are looking to hire:An experienced tech team team lead
The ideal candidate has:- min. 5yr of experience- Certified scrummaster- Exp. w/iOS, Android
Completed academic studies Computer Science or related
30% travel for customer presentations
![Page 4: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/4.jpg)
… through a family of sophisticated software products ...
![Page 5: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/5.jpg)
… that large organizations in the HR and Recruitment sector use...
![Page 6: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/6.jpg)
Knowledge Graphs
![Page 7: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/7.jpg)
What are Knowledge Graphs• Knowledge graphs are (large) networks of
entities, their semantic types, properties, and relationships between entities.
![Page 8: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/8.jpg)
Textkernel Knowledge Graph• Concept Types:
• Professions• Skills • Qualifications (Degrees, Certificates)• Organizations (Companies, Educational Institutes)• Industries
• Entity relations:• Synonym• Broader/Narrower• Related (Skill2Skill, Profession2Skill, Qualification2Skill etc)
![Page 9: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/9.jpg)
How are Knowledge Graphs used in Search
![Page 10: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/10.jpg)
Some usages• Metadata extraction for content indexing
• Entities (e.g., skills, professions, companies etc mentioned in a CV or vacancy)• Relations (e.g., events mentioned in a news article)
• Metadata extraction for query parsing and interpretation• Entity and relation extraction from the user query
• Query Expansion• “If I am looking for a search engine specialist, I would be also fine with an Elastic Search
engineer”
• Semantic relevance calculation• “I am looking for a C++ book, how relevant would a Java book be?”
![Page 11: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/11.jpg)
Metadata extraction from content and queries• The lexical forms of the entities and relations are used as gazetteers for extraction.• The relations between the entities are used as contextual evidence for disambiguation
![Page 12: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/12.jpg)
Query Expansion and Semantic Relevance• The graph’s relations can be used to generate
additional entities to include in the query so as to increase recall.
• The strengths of these relations and/or the distances between the entities can be used to calculate semantic relevance.
![Page 13: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/13.jpg)
Knowledge Graph in Search Pitfalls
![Page 14: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/14.jpg)
3 pitfalls1. Not well-defined or well-documented semantics of
the knowledge graph.
2. Not using the right type or amount of knowledge for the search scenario at hand.
3. Not mining the knowledge from the right sources.
![Page 15: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/15.jpg)
Bad Semantics - The abuse of synonymy• People (and therefore graphs) often consider as synonyms terms that are in reality
hyponyms or otherwise related.
![Page 16: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/16.jpg)
Bad Semantics - The abuse of synonymySynonyms of“Economist” according to ESCO
Synonyms of “Software Engineer” according to DBPedia
● economics science researcher● macro analyst● economics analyst● economics research scientist● labour economist● social economist● interest analyst● econometrician● economics researcher● econophysicist● economics scientist● economics scholar● economics research analyst
● Senior Software Engineer● Software engineer● Consulting software engineer● Software engineering naming
controversy● Computer science engineer● Debates within software engineering● Consulting software engineers● Software Engineer● Computer Science Engineer
![Page 17: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/17.jpg)
Bad Semantics - The abuse of synonymy• Why is this a problem:
• Synonymy means (almost) interchangeability of meaning.
• If you call a relation in that way and it isn't, then terms with different meanings will be considered as fully equivalent.
• E.g. when looking for an “economics scholar” you will always get “interest analysts” (and vice versa).
![Page 18: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/18.jpg)
Bad Semantics - The abuse of synonymy• What to do:
• If you own the Knowledge Graph be quite strict in what you call a "synonym".
• If you are using an external Knowledge Graph be extra careful with its assumptions about synonymy.
![Page 19: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/19.jpg)
Bad Semantics - The inadequacy of relatedness• Often Knowledge Graphs contain a "related" relation to
represent semantically related terms whose exact relation we don’t know.
• Especially with the advent of Word2Vec, semantic relatedness is (misleadingly) easy to calculate.
• The problem starts when this “related” relation has no further info about its provenance or context.
![Page 20: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/20.jpg)
Bad Semantics - The inadequacy of relatedness
Related Skills for Data Scientist according to ESCO
Related Skills for Data Scientist according to Textkernel Knowledge Graph
data miningdata modelsinformation categorisationinformation extractiononline analytical processingquery languagesresource description framework query languagestatisticsvisual presentation techniques
Apache Spark R Big Datamachine learningPythonStakeholdersmarketing
![Page 21: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/21.jpg)
Bad Semantics - The inadequacy of relatedness• What is the problem
• Semantic relatedness is a vague, highly subjective and context-dependent relation.
• If this relation is not adequately contextualized and documented I can’t really know whether it fits my search scenario.
• E.g. What relatedness criteria and guidelines were given to ESCO experts?• E.g. What data and relatedness measures were used by Textkernel?
![Page 22: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/22.jpg)
Bad Semantics - The inadequacy of relatedness• What to do:
• If you own the Knowledge Graph, contextualize and document your “related” relations:
• Guidelines and criteria given to humans (experts or crowd).• Data and methods used to for mining.• Intended usage
• If you are using an external Knowledge Graph, be extra careful with its assumptions about relatedness.
![Page 23: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/23.jpg)
Knowledge Incompatibility• Domain semantics of a Knowledge Graph are not
necessarily equivalent to the application’s semantics
• Not all relations are good for query expansion and/or semantic relevance.
• Not all entities and relations are good as disambiguation evidence.
![Page 24: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/24.jpg)
Knowledge Incompatibility - Wrong Knowledge• Experiment made at Textkernel:
• Used Word2Vec related skills for query expansion when searching for CVs and Vacancies.
• Precision of expansion pairs was 18%!
• Developed an expansion-specific relation extractor from vacancy texts
• Precision of expansion pairs increased to 60%
![Page 25: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/25.jpg)
Knowledge Incompatibility - Too Much Knowledge• Experiment made at iSOCO:
• Used DBPedia to extract and disambiguate mentions of players and teams from short textual descriptions of football highlights.
• Precision was 60% and recall 55%
• Pruned DBPedia to keep only entities and relations that were more likely to occur in the text and help towards disambiguation.
• Precision increased to 82% and recall to 80%
![Page 26: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/26.jpg)
Suboptimal Knowledge Mining• Follows usually from badly defined semantics:
• No correct or clear guidelines to knowledge miners.
• Not appropriate source data selection • E.g., good search expansions are most likely to be
found in user logs.• E.g. hyponyms are most likely to be found in
definitions.
• Inaccurate training data for ML algorithms.
![Page 27: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/27.jpg)
Suboptimal knowledge mining• What to do: Be semantics-driven, not data or method-driven!
![Page 28: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/28.jpg)
Wrapping Up
![Page 29: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/29.jpg)
3 Action Points
Do t s a tAvo h a m
t a
➔ Well defined schema
➔ Documentation of assumptions
➔ Careful knowledge reuse
➔ Adapt/transform the knowledge to your search scenario
➔ Beware of knowledge that may actually harm you
➔ Start with the target semantics, and use them to select your data and methods, not the other way around!
Ada f r o
![Page 30: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned](https://reader033.vdocument.in/reader033/viewer/2022060705/6071727e106a7c050c3d4ed1/html5/thumbnails/30.jpg)
Thank you!
Panos AlexopoulosHead of Ontology
E-mail: [email protected]
Web: http://www.panosalexopoulos.com
LinkedIn: www.linkedin.com/in/panosalexopoulos
Twitter: @PAlexop