semantic infrastructure workshop applications

41
Semantic Infrastructure Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com

Upload: riona

Post on 25-Feb-2016

38 views

Category:

Documents


3 download

DESCRIPTION

Semantic Infrastructure Workshop Applications. Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com. Agenda. Search and Semantic Infrastructure Elements /Rich Dynamic Results Different Environments Design Issues - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Semantic Infrastructure Workshop  Applications

Semantic Infrastructure Workshop Applications

Tom ReamyChief Knowledge Architect

KAPS GroupKnowledge Architecture Professional Services

http://www.kapsgroup.com

Page 2: Semantic Infrastructure Workshop  Applications

2

Agenda Search and Semantic Infrastructure

– Elements /Rich Dynamic Results– Different Environments– Design Issues

Platform for Information Applications– Multiple Applications– Case Study – Categorization & Sentiment– Case Study – Taxonomy Development– Case Study – Expertise & Sentiment

Conclusions

Page 3: Semantic Infrastructure Workshop  Applications

3

A Semantic Infrastructure Approach to Search:Elements Multiple Knowledge Structures

– Facet – orthogonal dimension of metadata– Taxonomy - Subject matter / aboutness– Ontology – Relationships / Facts

• Subject – Verb - Object Software - Search, ECM, auto-categorization, entity

extraction, Text Analytics and Text Mining People – tagging, evaluating tags, fine tune rules and

taxonomy People – Users, social tagging, suggestions Rich Search Results – context and conversation

Page 4: Semantic Infrastructure Workshop  Applications

4

A Semantic Infrastructure Approach to Search:Rich Results Elements

– Faceted Navigation– Categorization – metadata and/or dynamic– Tag Clouds – clustering– User Tags, personalization– Related topics – discovery

Supports all manner of search behaviors and needs– Find known items – zero in with facets– Discovery – Tags clouds, user tags, related topics– Deep dive - categorization

Page 5: Semantic Infrastructure Workshop  Applications

5

Page 6: Semantic Infrastructure Workshop  Applications

6

Page 7: Semantic Infrastructure Workshop  Applications

7

Page 8: Semantic Infrastructure Workshop  Applications

8

A Semantic Infrastructure Approach to Search: Three Environments E-Commerce

– Catalogs, small uniform collections of entities– Conflict of information and Selling– Uniform behavior – buy this

Enterprise– More content, more types of content– Enterprise Tools – Search, ECM– Publishing Process – tagging, metadata standards

Internet– Wildly different amount and type of content, no taggers– General Purpose – Flickr, Yahoo– Vertical Portal – selected content, no taggers

Page 9: Semantic Infrastructure Workshop  Applications

9

A Semantic Infrastructure Approach to Search: Enterprise Environment –Taxonomy, 7 facets Taxonomy of Subjects / Disciplines:

– Science > Marine Science > Marine microbiology > Marine toxins Facets:

– Organization > Division > Group– Clients > Federal > EPA– Instruments > Environmental Testing > Ocean Analysis > Vehicle– Facilities > Division > Location > Building X– Methods > Social > Population Study– Materials > Compounds > Chemicals– Content Type – Knowledge Asset > Proposals

Page 10: Semantic Infrastructure Workshop  Applications

10

A Semantic Infrastructure Approach to Search: Internet Design Subject Matter taxonomy – Business Topics

– Finance > Currency > Exchange Rates Facets

– Location > Western World > United States– People – Alphabetical and/or Topical - Organization– Organization > Corporation > Car Manufacturing > Ford– Date – Absolute or range (1-1-01 to 1-1-08, last 30 days)– Publisher – Alphabetical and/or Topical – Organization– Content Type – list – newspapers, financial reports, etc.

Page 11: Semantic Infrastructure Workshop  Applications

11

Page 12: Semantic Infrastructure Workshop  Applications

12

Rich Search ResultsDesign Issues - General What is the right combination of elements?

– Faceted navigation, metadata, browse, search, categorized search results, file plan

What is the right balance of elements?– Dominant dimension or equal facets– Browse topics and filter by facet

When to combine search, topics, and facets?– Search first and then filter by topics / facet– Browse/facet front end with a search box

Page 13: Semantic Infrastructure Workshop  Applications

13

Rich Search ResultsDesign Issues - General Homogeneity of Audience and Content Model of the Domain – broad

– How many facets do you need?– More facets and let users decide– Allow for customization – can’t define a single set

User Analysis – tasks, labeling, communities• Issue – labels that people use to describe their

business and label that they use to find information Match the structure to domain and task

– Users can understand different structures

Page 14: Semantic Infrastructure Workshop  Applications

14

Rich Search ResultsAutomatic Facets – Special Issues Scale requires more automated solutions

– More sophisticated rules Rules to find and populate existing metadata

– Variety of types of existing metadata – Publisher, title, date– Multiple implementation Standards – Last Name, First / First Name,

Last Issue of disambiguation:

– Same person, different name – Henry Ford, Mr. Ford, Henry X. Ford– Same word, different entity – Ford and Ford

Number of entities and thresholds per results set / document– Usability, audience needs

Relevance Ranking – number of entities, rank of facets

Page 15: Semantic Infrastructure Workshop  Applications

15

Semantic Infrastructure for Search Based AppsMultiple Applications Platform for Information Applications

– Content Aggregation– Duplicate Documents – save millions!– Text Mining – BI, CI – sentiment analysis– Combine with Data Mining – disease symptoms, new – Social – Hybrid folksonomy / taxonomy / auto-metadata– Social – expertise, categorize tweets and blogs, reputation– Ontology – travel assistant – SIRI

Use your Imagination!

Page 16: Semantic Infrastructure Workshop  Applications

16

Semantic Infrastructure for Search AppsMultiple Applications SIRI – Travel Assistant

Page 17: Semantic Infrastructure Workshop  Applications

Semantic Infrastructure for Search Apps Case Study – Categorization & Sentiment Call Motivation

– Categorization – Motivation Taxonomy – Purpose of previous calls to understand current call– Issues of scale, small size of documents, jargon, spelling

Customer Sentiment– Telecom Forums– Feature level – not just products – Issue of context - sarcasm, jargon

Knowledge Base– Categorization, Product extraction, expertise-sentiment analysis– Social Media as source for solutions

17

Page 18: Semantic Infrastructure Workshop  Applications

Case Study – Categorization & Sentiment

18

Page 19: Semantic Infrastructure Workshop  Applications

Case Study – Categorization & Sentiment

19

Page 20: Semantic Infrastructure Workshop  Applications

Case Study – Categorization & Sentiment

20

Page 21: Semantic Infrastructure Workshop  Applications

Case Study – Categorization & Sentiment

21

Page 22: Semantic Infrastructure Workshop  Applications

Semantic Infrastructure for Search Apps Case Study – Taxonomy Development

Problem – 200,000 new uncategorized documents Old taxonomy –need one that reflects change in corpus Text mining, entity extraction, categorization Content – 250,000 large documents, search logs, etc. Bottom Up- terms in documents – frequency, date, Clustering – suggested categories Clustering – chunking for editors Entity Extraction – people, organizations, Programming languages Time savings – only feasible way to scan documents Quality – important terms, co-occurring terms

22

Page 23: Semantic Infrastructure Workshop  Applications

Case Study – Taxonomy Development

23

Page 24: Semantic Infrastructure Workshop  Applications

Case Study – Taxonomy Development

24

Page 25: Semantic Infrastructure Workshop  Applications

Case Study – Taxonomy Development

25

Page 26: Semantic Infrastructure Workshop  Applications

26

Semantic Infrastructure ApplicationsExpertise Analysis Sentiment Analysis to Expertise Analysis(KnowHow)

– Know How, skills, “tacit” knowledge No single correct categorization

– Women, Fire, and Dangerous Things– Types of Animals

• Those that belong to the Emperor• Embalmed Ones• Suckling Pigs• Fabulous Ones• Those that are included in this classification• Those that tremble as if they were mad• Other

Page 27: Semantic Infrastructure Workshop  Applications

27

Semantic Infrastructure ApplicationsExpertise Analysis – Basic Level Categories Mid-level in a taxonomy / hierarchy Short and easy words Maximum distinctness and expressiveness First level named and understood by children Level at which most of our knowledge is organized Levels: Superordinate – Basic – Subordinate

– Mammal – Dog – Golden Retriever– Furniture – chair – kitchen chair

Page 28: Semantic Infrastructure Workshop  Applications

28

Semantic Infrastructure ApplicationsExpertise Analysis Experts prefer lower, subordinate levels

– In their domain, (almost) never used superordinate Novice prefer higher, superordinate levels General Populace prefers basic level Not just individuals but whole societies / communities differ

in their preferred levels Issue – artificial languages – ex. Science discipline Issue – difference of child and adult learning – adults start

with high level

Page 29: Semantic Infrastructure Workshop  Applications

29

Semantic Infrastructure ApplicationsExpertise Analysis What is basic level is context(s) dependent

– Document/author expert in news health care, not research Hybrid – simple high level taxonomy (superordinate), short words –

basic, longer words – expert Plus Develop expertise rules – similar to categorization rules

– Use basic level for subject– Superordinate for general, subordinate for expert

Also contextual rules– “Tests” is general, high level– “Predictive value of tests” is lower, more expert– If terms appear in same sentence - expert

Page 30: Semantic Infrastructure Workshop  Applications

30

Expert General

Research (context dependent) Kid

Statistical Pay

Program performance Classroom

Protocol Fail

Adolescent Attitudes Attendance

Key academic outcomes School year

Job training program Closing

American Educational Research Association Counselor

Graduate management education Discipline

Education Terms

Page 31: Semantic Infrastructure Workshop  Applications

31

Expert GeneralMouse Cancer

Dose Scientific

Toxicity Physical

Diagnostic Consumer

Mammography Cigarette

Sampling Smoking

Inhibitor Weight gain

Edema Correct

Neoplasms Empirical

Isotretinion Drinking

Ethylene Testing

Significantly Lesson

Population-base Knowledge

Pharmacokinetic Medicine

Metabolite Sociology

Polymorphism Theory

Subsyndromic Experience

Radionuclide Services

Etiology Hospital

Oxidase Social

Captopril Domestic

Pharmacological agents

Dermatotoxicity

Mammary cancer model

Biosynthesis

Healthcare Terms

Page 32: Semantic Infrastructure Workshop  Applications

32

Semantic Infrastructure ApplicationsExpertise Analysis – application areas Taxonomy/ Ontology development /design – use basic level User contribution

– Card sorting – non-experts use superficial similarities– Survey for attributes instead of cart sorting, general structure

Develop expert and general versions/sections/synonyms Info presentation – combine superordinate and basic

– Similar to scientific – Genus – Species is official name Text Mining

– Expertise characterization of writer

Page 33: Semantic Infrastructure Workshop  Applications

33

Semantic Infrastructure ApplicationsExpertise Analysis – application areas Business & Customer intelligence

– General – characterize people’s expertise to add to evaluation of their comments

– Combine with sentiment analysis – finer evaluation – what are experts saying, what are novices saying

– Deeper research into communities, customers Enterprise Content Management

– At publish time, software automatically gives an expertise level – present to author for validation

– Combine with categorization – offer tags that are suitable level of expertise

Page 34: Semantic Infrastructure Workshop  Applications

34

Semantic Infrastructure ApplicationsExpertise Analysis – application areas Social Media - Community of Practice

– Characterize the level of expertise in the community– Evaluate other communities expertise level– Personalize information presentation by expertise

Expertise location– Generate automatic expertise characterization based on

authored documents Expertise of people in a social network

– Terrorists and bomb-making

Page 35: Semantic Infrastructure Workshop  Applications

Semantic Infrastructure ApplicationsExpertise Analysis – application areas- CoP Basic Level

– Blog– Software (Design)– Web (Design)– Linux– Javascript– Web2.0– Google– Css– Flash

Superordinate– Music– Photography– News– Education– Business– Technology– Politics– Science– Culture

35

Page 36: Semantic Infrastructure Workshop  Applications

Semantic Infrastructure ApplicationsExpertise Analysis – application areas-Tags CSS

– Web Design– Design– Css3– Tutorial– Webdev– Javascript– Web– Development– Html– Jquery– html5

Education– Technology– Resources– Teaching– Learning– Science– Web20– Games– Interactive– Research– Tools– reference

36

Page 37: Semantic Infrastructure Workshop  Applications

37

Semantic Infrastructure Approach to SearchConclusions Semantic Infrastructure solution (people, policy, technology,

semantics) and feedback is best approach Foundation – Hybrid ECM model with text analytics, Search Integrated Search design is essential – rich results

– Subject, facets, tag clouds, etc. Semantic Infrastructure as a platform for multiple applications

– Build on infrastructure for economy and quality Text Analytics (Entity extraction and auto-categorization) are

essential Future – new kinds of applications:

– Text Mining and Data mining, research tools, sentiment– Beyond Sentiment – expertise applications– NeuroAnalytics – cognitive science meets search and more

• Watson is just the start

Page 38: Semantic Infrastructure Workshop  Applications

Questions? Tom Reamy

[email protected] Group

Knowledge Architecture Professional Serviceshttp://www.kapsgroup.com

Page 39: Semantic Infrastructure Workshop  Applications

39

Resources Books

– Women, Fire, and Dangerous Things• George Lakoff

– Knowledge, Concepts, and Categories• Koen Lamberts and David Shanks

Web Sites– Text Analytics News -

http://social.textanalyticsnews.com/index.php

– Text Analytics Wiki - http://textanalytics.wikidot.com/

Page 40: Semantic Infrastructure Workshop  Applications

40

Resources Blogs

– SAS- http://blogs.sas.com/text-mining/ Web Sites

– Taxonomy Community of Practice: http://finance.groups.yahoo.com/group/TaxoCoP/

– LindedIn – Text Analytics Summit Group– http://www.LinkedIn.com– Whitepaper – CM and Text Analytics -

http://www.textanalyticsnews.com/usa/contentmanagementmeetstextanalytics.pdf

– Whitepaper – Enterprise Content Categorization strategy and development – http://www.kapsgroup.com

Page 41: Semantic Infrastructure Workshop  Applications

41

Resources Articles

– Malt, B. C. 1995. Category coherence in cross-cultural perspective. Cognitive Psychology 29, 85-148

– Rifkin, A. 1985. Evidence for a basic level in event taxonomies. Memory & Cognition 13, 538-56

– Shaver, P., J. Schwarz, D. Kirson, D. O’Conner 1987. Emotion Knowledge: further explorations of prototype approach. Journal of Personality and Social Psychology 52, 1061-1086

– Tanaka, J. W. & M. E. Taylor 1991. Object categories and expertise: is the basic level in the eye of the beholder? Cognitive Psychology 23, 457-82