what is semantic publishing? and why should i care?

60
What is Semantic Publishing? And Why Should I Care? Jabin White Director of Strategic Content Wolters Kluwer Health – P&E May 13, 2010 PSP Presents – Semantic Publishing: An Introduction

Upload: knoton

Post on 25-Feb-2016

30 views

Category:

Documents


0 download

DESCRIPTION

What is Semantic Publishing? And Why Should I Care?. Jabin White Director of Strategic Content Wolters Kluwer Health – P&E May 13, 2010 PSP Presents – Semantic Publishing: An Introduction. Agenda. Introductions Some definitions Vocabularies, Taxonomies, and Ontologies , Oh My! - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: What is Semantic Publishing? And Why Should I Care?

What is Semantic Publishing?And Why Should I Care?

Jabin WhiteDirector of Strategic ContentWolters Kluwer Health – P&EMay 13, 2010PSP Presents – Semantic Publishing: An Introduction

Page 2: What is Semantic Publishing? And Why Should I Care?

Agenda• Introductions• Some definitions

▫Vocabularies, Taxonomies, and Ontologies, Oh My!• What is metadata, and why should publishers care?• What is semantic tagging, and why should

publishers care?• Impact of all this on publishers’…

▫Workflows/processes▫Business cases

• The Semantic Web• Final Thoughts, Recommendations

Page 3: What is Semantic Publishing? And Why Should I Care?

Introductions: My Company•Director of Strategic Content for Wolters

Kluwer Health – Professional & Education•Wolters Kluwer Health includes:

▫ Lippincott Williams & Wilkins titles▫ Ovid▫ UpToDate▫ Provation Order Sets▫ Drug Facts & Comparisons▫ Medi-Span▫ Clin-eguide

Page 4: What is Semantic Publishing? And Why Should I Care?

Introductions: Me• Started as Editorial Assistant• Dove into SGML in the mid-90s working on

drug reference• Six years at Elsevier in Electronic Production• Don’t typecast me!• Joined WK Health in May 2009

▫Responsible for making sure content flows through company more efficiently (DTDs, Content Management, Authoring Tools, Semantic Enrichment, Product Information Management, etc.)

Page 5: What is Semantic Publishing? And Why Should I Care?

The Web - Stop the Insanity!•A few humble web stats:

▫There are 2 billion (billion!) Google searches daily

▫There are 1 trillion (1,000,000,000,000) unique URLs in Google’s index

▫There are 2,695,205 articles in English on Wikipedia

▫It would take 412.3 years to view all the content on YouTube (3/08), but don’t try, because there are 13 hours of video uploaded every minute

** Source: Adam Singer’s “Social Media, Web 2.0 and Internet Stats site:http://thefuturebuzz.com/2009/01/12/social-media-web-20-internet-numbers-stats/

Page 6: What is Semantic Publishing? And Why Should I Care?

So What?•Clay Shirky’s concept of “Filter Failure”•When the capacity of people to “keep up

with” information is exceeded, curation becomes the value differentiator

Page 7: What is Semantic Publishing? And Why Should I Care?
Page 8: What is Semantic Publishing? And Why Should I Care?

Definitions• Controlled vocabulary: a bunch of words, no

relationships▫But there is advantage if all users use the same terms

to describe things• Taxonomy: is a controlled vocabulary with hierarchy• Thesaurus: is interchangeable with controlled

vocabulary, also sometimes referred to as an ontology

• Ontology: all of the above; think neural network with a bunch of relationships

• MetaData: data about data (we’ll get to that)

Page 9: What is Semantic Publishing? And Why Should I Care?

Some Level-Setting• Unfortunately, these definitions have been

diluted to the point of uselessness by their misuse▫Think “Content Management” around the

year 2000• MetaThesaurus – a collection of all of these

things▫EXAMPLE: UMLS

Page 10: What is Semantic Publishing? And Why Should I Care?

Information Classification•Pretty Wonky, Pretty Fast

•Hyperonym: Broader Term, more general▫car is a hyperonym of pinto)

•Hyponym: Narrower Term▫Baseball is a hyponym of sports

•Meronym: part term▫Kansas is a meronym of United States

•Holynym: whole term▫European Union is a holynm of France

Page 11: What is Semantic Publishing? And Why Should I Care?

Taxonomies in STM

Page 12: What is Semantic Publishing? And Why Should I Care?

Some Heavy Hitters•UMLS•MeSH•SNOMED-CT•ICD-9 and ICD-10•RxNORM•LOINC, ICPC-93, and VA/KP Subset of

SNOMED

Page 13: What is Semantic Publishing? And Why Should I Care?

UMLS – Unified Medical Language System•More than 5 million terms or named

entities•Divided into concepts, and each term has

unique identifier•Not a vocabulary, but a mapping

BETWEEN vocabularies

Page 14: What is Semantic Publishing? And Why Should I Care?

UMLS•Vocabularies included in the UMLS:

▫ MeSH Headings in 8 languages▫ ICPC-93 in 14 languages▫ WHO Adverse Drug Reaction Terminology in 5 languages▫ SNOMED-2, SNOMED-3, and UK Clinical Terms (former Read

Codes)▫ ICD-10 in English and German▫ ICD-10-AM (Australian Modification)▫ ICD-9 (US Modification)

Page 15: What is Semantic Publishing? And Why Should I Care?

The Semantic Network (UMLS)• Semantic types are big things like Disease, Syndrome, or

Clinical Drug• Semantic relationships are useful links between semantic

types (ie, Clinical Drug treats Disease or Symptom)

Page 16: What is Semantic Publishing? And Why Should I Care?

One Concept, Many NamesTERM SOURCE

VOCABULARYAtrial fibrillation ICD-9-CMAF NCI ThesaurusAfib MedDRAAtrial fibrillation (disorder)

SNOMED Clinical Terms

Atrium; fibrillation ICPC2-ICD10 Thesaurus

Page 17: What is Semantic Publishing? And Why Should I Care?

MeSH – Medical Subject Headings• An 11-level hierarchy developed and maintained by

the National Library of Medicine, part of the US Department of Health and Human Services

• The indexing method for MEDLINE/PubMed▫Contains more than 16 million references to journal

articles in the life sciences, with concentration in biomedicine

▫5,200 journals worldwide in 37 languages▫Since 2005, 2,000-4,000 references are added daily,

Tuesday-Saturday, all indexed to MeSH▫Loading suspended for two weeks every

November/December while MeSH is updated

Page 18: What is Semantic Publishing? And Why Should I Care?

The MeSH Staff

Page 19: What is Semantic Publishing? And Why Should I Care?

SNOMED-CT• Systemized Nomenclature of Medicine (Clinical

Terms)• 344,000 concepts, arguably the most complete

clinical taxonomy in the world• Developed and maintained by the College of

American Pathologists• Licensed by NLM, freely available to license as part

of UMLS• US Standard for electronic health information

exchange by Health IT standards panel• Adopted for use by US government through the

Consolidated Health Informatics (CHI) initiative

Page 20: What is Semantic Publishing? And Why Should I Care?

ICD-9 and ICD-10•International Classification of Diseases•Version 9 moving to Version 10 (US is

slower than rest of the world on this)•Codes that define diseases:

▫ Example: 411.0 = Postmyocardial infarction syndrome (aka, Dressler’s Syndrome)

•Used to drive insurance re-imbursements, billing, and other classifications of diseases

•Used to figure morbidity and mortality figures by US government

Page 21: What is Semantic Publishing? And Why Should I Care?

RxNorm•Standardized names for drugs, collections

of drugs, and delivery devices•Like MeSH, developed and maintained by

National Library of Medicine•Also includes standard way of expressing

generic and trade names, ingredients, strengths, and dose forms

Page 22: What is Semantic Publishing? And Why Should I Care?

LOINC Mapping Files•Logical Observation Identifiers Names

and Codes•A set of universal names and ID codes for

identifying laboratory and clinical test results

•Used to better communicate with HIT (Health Information Technology) systems

•Not much of an impact on publishers, but we should know about them

Page 23: What is Semantic Publishing? And Why Should I Care?

1/3

Page 24: What is Semantic Publishing? And Why Should I Care?

What is Metadata, and Why Should Publishers Care?

Page 25: What is Semantic Publishing? And Why Should I Care?

What is Metadata?•Reading most definitions of metadata and

related standards is like trying to resolve disputes with my kids

•Metadata is “data about data”▫But what does that mean?

•Its use may be increasing, but metadata is NOT new

Page 26: What is Semantic Publishing? And Why Should I Care?

Why Should Publishers Care•In the move from print publishing to

digital, metadata is a powerful tool to help publishers get content in the right place, in the right format, and known to the right systems and people, at the right time

•Print books were easy▫Everyone knew what they were▫You could really only use them one way▫They had a beginning, an end, a physical

presence, and a set price (mostly)

Page 27: What is Semantic Publishing? And Why Should I Care?

Why Should Publishers Care•Today, computers are often communicating

with one another as much as they are with users (people)

•Metadata becomes critical in:▫B2B relationships▫Enhancing B2C relationships▫B2-_________ relationships

•The quality of the metadata gives publishers a more powerful voice in what happens to their content

Page 28: What is Semantic Publishing? And Why Should I Care?

Why Should Publishers Care?• For example:

▫A digital asset (an image)▫What file format is it?▫How big is the image?▫Who took the picture?▫Who owns the picture?▫Can you use it on your web site? If you do, what credit

do you have to give to the owner?▫What date was it created?▫Is it part of a collection?▫Is it related to another piece of content?▫Does it stand alone or is it part of a group of images?

Page 29: What is Semantic Publishing? And Why Should I Care?

Publishers Should Care•If a publisher’s goal is to disseminate

content to the widest possible audience, metadata is critical

Page 30: What is Semantic Publishing? And Why Should I Care?

Publisher Relationships• Again, in books you had one use model• Metadata allows publishers to have diverse relationships

with content consumers and other information providers▫ Customers (duh)▫ Aggregators▫ The Open Web (not Google, but other search engines)

But don’t try to “game” the search engines with adult keywords; that’s just wrong

There have been lawsuits over use of meta keywords, including Playboy suing two adult web sites

▫ Technology partners/developers▫ Systems wherein content is a “value add”▫ Multiple output formats

Page 31: What is Semantic Publishing? And Why Should I Care?

Types of Metadata• HTML Metadata

▫ <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

▫ <meta name="verify-v1" content="kBoFGUuwppiWVWGx4Ypzkw1Cs1GgMYEMMbfNr7FY65w=" />

▫ <meta name="description" content="International publisher of professional health information for physicians, nurses, specialized clinicians & students. Medical & nursing charts, journals, and pda software.">

▫ <meta name="keywords" content="springhouse, medical book, nursing journal, medical pda software, lippincott medical reference, lww, lippincott, lww com, medical publisher">

▫ <link rel="stylesheet" href="/css/style.css" type="text/css">

For people

For search enginges

Page 32: What is Semantic Publishing? And Why Should I Care?

Types of Metadata• Classifying Metadata

▫ ISBN (I told you this wasn’t new)

▫ Dewey Decimal System▫ Books in

Print/CIP/Library of Congress data

▫ MARC records▫ DOI (Digital Object

Identifier)

• Descriptive Metadata (sorry, my examples are from STM)▫ ICD-9 and ICD-10 Codes▫ MeSH▫ SNOMED-CT▫ NANDA, NIC, NOC for

Nursing▫ NDC, HCPCS for drugs

OLD NEW

Page 33: What is Semantic Publishing? And Why Should I Care?

Types of Metadata• Classifying Metadata

▫ ISBN (I told you this wasn’t new)

▫ Dewey Decimal System▫ Books in

Print/CIP/Library of Congress data

▫ MARC records▫ DOI (Digital Object

Identifier)

• Descriptive Metadata (sorry, my examples are from STM)▫ ICD-9 and ICD-10 Codes▫ MeSH▫ SNOMED-CT▫ NANDA, NIC, NOC for

Nursing▫ NDC, HCPCS for drugs

OLD NEW

• DOI (Digital Object Identifier)

Page 34: What is Semantic Publishing? And Why Should I Care?

Semantic Metadata• Using controlled vocabularies, extra power can

be added to content via semantic tagging to drive:▫More precise searching▫Contextually-based connections▫Lowering of “two terms meaning the same

thing” syndrome (hypertension vs. high blood pressure; heart attack vs. myocardial infarction)

▫Filling in of content gaps• Semantic tagging *is* metadata, but it

deserves its own section (coming up)

Page 35: What is Semantic Publishing? And Why Should I Care?

What is Semantic Tagging?

Page 36: What is Semantic Publishing? And Why Should I Care?

Semantic Basics•Semantics is tagging that describes what

content *is* and not how it should *look* on the page or screen

•Contrast to structural tagging, which is made of elements such as <para>, <list>, and <title>

•Both are XML, but semantics is like XML on steroids!

•Doing semantic tagging without a controlled vocabulary is madness for scholarly publishing▫Think “folksonomies”

Page 37: What is Semantic Publishing? And Why Should I Care?

Manual Tagging•DESCRIPTION: A subject matter expert (SME)

reads chapter/article, indexes or tags based on content, resulting in enriched content

•POSITIVES – If precision needed, and clinical understanding of concepts (ie, judgment) required, probably still the best option

•NEGATIVES - Cost prohibitive on large volumes of information; not scalable; inconsistency if controlled vocabulary not followed, or different taggers used

Page 38: What is Semantic Publishing? And Why Should I Care?

Manual Tagging – Other Factors•Offshore resources have improved in

recent years as “knowledge work” has gone global, resulting in cost reductions▫Some processes considered “too expensive”

to be done manually before could be revisited

•Great dependence on *type* of content, which means use cases should drive workflow decisions

Page 39: What is Semantic Publishing? And Why Should I Care?

Automated Approaches• DESCRIPTION: Software crawls content, adds

tags/unique identifiers or finds concepts & patterns to drive more intelligent search or entity extraction

• POSITIVES – Very effective in finding “trends” or concepts over a large repository of data; growing industry because of information overload (aka Data Mining, Text Analysis)

• NEGATIVES – Sometimes leads to false positives, lack of precision or judgment by machines processing data

Page 40: What is Semantic Publishing? And Why Should I Care?

Automated Approaches – Other Factors•If used effectively, quick wins on large

repositories•Can be used to accomplish projects that

would never be attempted (or approved) manually

Page 41: What is Semantic Publishing? And Why Should I Care?

Combination Approaches•DESCRIPTION: Automated process followed by

SME checking (deeper level than straight QA) and addition of specific conceptual information

•POSITIVES – best of both worlds for projects that deserve it; can drive precision but can also cover large repositories

•NEGATIVES – costs; every time software or people act on your content, there are costs – you don’t get a discount from either because you are doing both

Page 42: What is Semantic Publishing? And Why Should I Care?

FUD Around Semantic Search•Semantic Search engines

▫TEMIS, Collexis, NetBase, Vivisimo, OpenCalais▫Finding semantic concepts based on entities and

search algorithms▫Finding a needle in a haystack

•Semantic Tagging▫People (SMEs) identify concepts and tag

accordingly▫Drives precision in search and other things▫Finding the right needle in a stack of 10 needles

Page 43: What is Semantic Publishing? And Why Should I Care?

A Note About “Folksonomies” •Having users “tag” or classify data is

increasing in popularity•Not much use in clinical areas of health

sciences•If you are sick, do you want to know what

100 people think, or the one expert?

Page 44: What is Semantic Publishing? And Why Should I Care?

2/3

Page 45: What is Semantic Publishing? And Why Should I Care?

Impact on Publishers

Page 46: What is Semantic Publishing? And Why Should I Care?

Impact on Publishers•Impact depends on how deep you want to

go▫i.e., what am I going to get in return for

investing in metadata, and is it worth it?▫More and more, this is not an “if”

proposition, it’s “how much”•Publishers who buy in have two basic

choices on approach:

Page 47: What is Semantic Publishing? And Why Should I Care?

Option 1: Metadata in the Workflow• Requires deeper commitment, but has bigger

potential upside▫Positive impact on product creation and development

• Requires thinking about tools, workflows, and enterprise-level systems to allow for creation and MAINTENANCE of metadata

• Combination of good metadata in the workflow and creativity in product development team can pay big benefits

• Allows participation of authors (or subject matter experts in lieu of) at the beginning of the workflow

Page 48: What is Semantic Publishing? And Why Should I Care?

Option 2: Outside the Workflow• Requires lesser commitment, but potentially fewer

rewards• Can be done with zero impact on current systems• Has benefit of content being in “final form”

(whatever that means anymore) when intelligence is added in metadata

• Can keep SMEs as a separate offshoot of the workflow – easily outsourced

• Can attack this problem with brute force semantic search engines, but this is a different thing

Page 49: What is Semantic Publishing? And Why Should I Care?

Impact on Publishers•Active vs. Passive Metadata

▫Active metadata Publisher intentionally associates markup with

certain pieces of content Often using controlled vocabulary Includes semantic indexing Can also be machine-based, using scripts, etc.

▫Passive metadata Metadata created based on use of content

Image X was used as part of an image bank on pediatric Inheritance of properties from parent objects

Page 50: What is Semantic Publishing? And Why Should I Care?

Implications for Search•Machines don’t know the difference

between hypertension and high blood pressure▫ More accurately, machines don’t know they are the SAME

•How this is handled is a matter of User Experience (did you mean? … give them the result … etc.), but the content must be tagged first

Page 51: What is Semantic Publishing? And Why Should I Care?

Linking Content Within the Workflow•Use models have changed in health

sciences•Customers don’t expect (or don’t have

time) to exit a system to check clinical information▫ It needs to be at the Point of Care

•We need to have content linked into customer workflows, and taxonomies drive this

Page 52: What is Semantic Publishing? And Why Should I Care?

The Semantic Web

Page 53: What is Semantic Publishing? And Why Should I Care?

Semantic Web•Current web (mostly HTML) is

“undefined” information, and the growth is making this even worse

•Semantic web concept would ensure that content providers classify their information, so the web would become more of a smart database of information

Page 54: What is Semantic Publishing? And Why Should I Care?

Jabin’s Shopping ListHTML XML<H1>Jabin’s Shopping List <list type=“grocery” date=“5-13-2010”></H1> <title>Jabin’s Shopping List</title><ul><li>Bread</li> <grain>Bread</grain><li>Milk</li> <dairy>Milk</dairy><li>Bananas</li> <fruit>Bananas</fruit><li>Beans</li> <veggie>Beans</veggie></ul> </list>

The semantic web both requires and acts on this kind of tagging

Page 55: What is Semantic Publishing? And Why Should I Care?

A new idea? … Not so much• May 2001 issue, “Scientific American”• The Semantic Web: A new form of Web content that is

meaningful to computers will unleash a revolution of new possibilitiesBy Tim Berners-Lee, James Hendler and Ora Lassila

• The entertainment system was belting out the Beatles' "We Can Work It Out" when the phone rang. When Pete answered, his phone turned the sound down by sending a message to all the other local devices that had a volume control. His sister, Lucy, was on the line from the doctor's office: "Mom needs to see a specialist and then has to have a series of physical therapy sessions. Biweekly or something. I'm going to have my agent set up the appointments." Pete immediately agreed to share the chauffeuring.

Page 56: What is Semantic Publishing? And Why Should I Care?

Semantic Web vs. semantic Web•Grand vision of Semantic Web is a great

goal, but will take time•Meanwhile, each industry has its own

vocabulary(ies), which can drive their own semantic webs

•Resource Description Framework (RDF) can and will “bind” these webs together, but each industry vertical can make progress in the interim

Page 57: What is Semantic Publishing? And Why Should I Care?

Implications•If every industry has its own language,

how is that language *expressed*?•Answer: Taxonomies•How are those taxonomies applied?•Answer: Semantic Tagging

Page 58: What is Semantic Publishing? And Why Should I Care?

Final Thoughts

Page 59: What is Semantic Publishing? And Why Should I Care?

Importance of Use Cases•Use Cases should drive strategy and

justifications for all of this!•One taxonomy size/coverage does not fit all•One method of tagging/indexing does not

fit all▫ There is a fundamental difference, tension, and ultimately

tradeoff between large concept coverage over a massive amount of data, and precise conceptual expressiveness

•Approach should be tailored to content set and goals for that content set

Page 60: What is Semantic Publishing? And Why Should I Care?

THANK YOU

Questions?

Jabin WhiteDirector of Strategic ContentWolters Kluwer [email protected]: @jabinwhiteBlog: Technically Speaking at http://www.bookbusinessmag.com/channel/technically-speaking