open calais for sf and la meetups

19
Calais Thomson Reuters Calais Initiative

Upload: krista-thomas

Post on 30-Nov-2014

1.711 views

Category:

Technology


3 download

DESCRIPTION

Here is the deck we shared with the SF and LA Semantic Web Meetups this past week (March, '09). It covers Calais 4.0 and its connection to the Linked Data cloud. Please join us at OpenCalais.com

TRANSCRIPT

Page 1: Open Calais For SF And LA Meetups

CalaisThomson Reuters Calais Initiative

Page 2: Open Calais For SF And LA Meetups

Overview• Going to discuss five basic topics

– What is Calais?– Why we’re doing it & what our goals are– How it works / What’s under the hood?– A few examples – Where it’s headed

Page 3: Open Calais For SF And LA Meetups

Calais…

• Calais extracts smart metadata from unstructured text and links that metadata to the Linked Data cloud.

Page 4: Open Calais For SF And LA Meetups

Calais progress to date• Launched in late January, 2008

• 9,500 developers have joined OpenCalais.com

• 1-3 million content ‘transactions’ per day

• Delivered four major update releases

• Free (as in free) for commercial or non-commercial use

Page 5: Open Calais For SF And LA Meetups

Unstructured Text

Unstructured Text

Calais extracts entities,

facts and events

Calais extracts entities,

facts and events

Metadata returned to

the user with keys

Metadata returned to

the user with keys

Keys provide

access to the Calais

Linked Data cloud

Keys provide

access to the Calais

Linked Data cloud

Which provides information and

other Linked Data pointers

Which provides information and

other Linked Data pointers

To a range of open and partner Linked

data assets, including

Thomson Reuters

To a range of open and partner Linked

data assets, including

Thomson Reuters

11

22

33

44

55

66

Page 6: Open Calais For SF And LA Meetups

Quick DemoYou can find the Calais Viewer demonstration tool here: http://viewer.opencalais.com (Note that the Calais Viewer is not the Calais service. It is merely a demonstration of how the service works.)

– Copy and paste the text of a business news article from AP, Dow Jones or Reuters.com into the viewer, and press submit. The article is sent to the Calais engine which tags the content and returns it, marked-up.

– The tags appear on the left hand rail, and you can click on the plus (+) sign to see the tags expand.

– Since we are now on Calais 4.0, you can also use the viewer to see the Linked Data assets related to the tags Calais returns.

• Click on a company name on the left hand rail to find a Calais summary page featuring a basic description for that company, as well as a number of links.

• Follow those links to see the other data entries on that company that are available for public use in the Linked Data Cloud.

– For example, here is the Calais summary page for IBM: http://d.opencalais.com/er/company/ralg-tr1r/9e3f6c34-aa6b-3a3b-b221-a07aa7933633.html

– And here is the summary page for IBM in DBPedia (the Wikipedia translated into computer language): http://dbpedia.org/page/IBM

Page 7: Open Calais For SF And LA Meetups

Why & What

1. Derive semantic metadata from textual assets2. Use that semantic metadata to create entry points into

the linked data ecosystem3. Provide a simple mechanism for the sharing of semantic

metadata about textual content assets4. And just why are you doing this…

Page 8: Open Calais For SF And LA Meetups

1: Semantics from Text: The Text Problem

• People consume text

• Most of it isn’t semantically enabled

• Most of it won’t be semantically enabled

• This isn’t about standards –microfromats vs RDFa vs. whatever.

• Why: Latency, cost and short shelf-life

Page 9: Open Calais For SF And LA Meetups

1: Semantics from Text: The Text Problem• Target areas

where:– The economics

don’t support metadata creation

– The value of metadata is potentially high

– The value of aggregated metadata is potentially extremely high

Seco

nds

Year

s

Seconds

Years

Tweets

New Gen

News

Legacy News

Scient. Pubs

Great Novels

Latency

Shel

f Life

Page 10: Open Calais For SF And LA Meetups

2: Getting from Text to the Linked Data Ecosystem

Page 11: Open Calais For SF And LA Meetups

The Linked Data Cloud

Page 12: Open Calais For SF And LA Meetups

3: Semantic Metadata Transport Layer• I’m a content producer.

We’ve loaded the car with rich semantic metadata

– I’m sharing it within my four walls

– How do I transport it to my consumers?

– RSS / Atom, XML, Proprietary data feeds, Content API’s

Page 13: Open Calais For SF And LA Meetups

4: Why We’re Doing It

• Two simple answers:

– Hyper-evolution of capabilities – better, faster, stronger

– The walled garden content world

Page 14: Open Calais For SF And LA Meetups

How it Works – Under the Hood of Calais

Page 15: Open Calais For SF And LA Meetups

How it Works – Under the Hood of Calais

Calais Web Service

ClearForest NLP Engine

Rule Base

Lexicons

RDF

Disambig. Engine

Reference Data Assets

Metadata Management

Document Level

Metadata

Entity Level Linked Data

and …

Output Formatting

Stat Tools

Page 16: Open Calais For SF And LA Meetups

Where From Here?• We’ve seen examples of first generation uses.

• Where does this go in the future?

• Beyond the document– Social Resume analysis– Museum Content Coalitions– Knowledge Management Applications– Investigative Journalism*

Page 17: Open Calais For SF And LA Meetups

Investigative Journalism

FOIA Contract Documents

Calais Web Service

Company:PersonFamilyRelation

News Calais Web Service

Company:ContractCompany:Affiliation

Big Fuzzy Graph

Page 18: Open Calais For SF And LA Meetups

What’s in the Pipeline?• 2009 (this is a fuzzy list)

– Person disambiguation @ domain level?– Other disambiguation– Continued expansion of URI’s (entities & events)– Calais as hub– Exposure of the IDE?– User managed lexicons– Languages– Opt-in SPARQL Endpoint?

Page 19: Open Calais For SF And LA Meetups

• www.opencalais.com

– Gallery – code and applications examples– Forums– Documentation

• Twitter @opencalais, Facebook Group