news the new way: semantics in the driver's seat

Post on 14-Dec-2014

434 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Slides from presentation by Philip Dudchuk (RIA Novosti) and Daniel Hladky (Ontos/W3C) at SemTechBiz 2012 in San-Francisco

TRANSCRIPT

Philip Dudchuk & Daniel Hladky

Semantics in the Driver’s Seat

News the New Way

SemTechBiz, San Francisco, June 5, 2012

Philip Dudchuk & Daniel HladkySemTechBiz, San Francisco, June 5, 2012

Philip Dudchuk

Head of Semantic Platform,RIA Novosti

Daniel Hladky

Deputy Director, W3C RussiaMember of the Board, Ontos AG

Founded in the beginning of the WW2, RIA Novosti was initially a news agency reporting on the situation at the war front

1941

SemTechBiz, San Francisco, June 5, 2012Philip Dudchuk & Daniel Hladky

Philip Dudchuk & Daniel HladkySemTechBiz, San Francisco, June 5, 2012

First news websites looked like simple feeds

SemTechBiz, San Francisco, June 5, 2012Philip Dudchuk & Daniel Hladky

Boom of platforms in late 2000s

Philip Dudchuk & Daniel HladkySemTechBiz, San Francisco, June 5, 2012

Metadata rules the world of news

• News metadata gets right content to right departments of the customer (big media)

• Metadata locates reported events (local newspapers)

• Metadata enables vertical products focused on selected areas (banking, automotive, government)

Philip Dudchuk & Daniel HladkySemTechBiz, San Francisco, June 5, 2012

Distinct metadata sets

Philip Dudchuk & Daniel HladkySemTechBiz, San Francisco, June 5, 2012

2011: Need in a common Semantic Publishing Platform

• Build and manage a common news ontology and vocabularies for all products and news websites

• Generate metadata for both news items and articles on websites

• Aggregate content and metadata for further use in end-user applications (websites and mobile apps)

Philip Dudchuk & Daniel HladkySemTechBiz, San Francisco, June 5, 2012

Evolution of the Publishing Process

Philip Dudchuk & Daniel HladkySemTechBiz, San Francisco, June 5, 2012

News Ontology

Philip Dudchuk & Daniel HladkySemTechBiz, San Francisco, June 5, 2012

Managing the Triple Store

Triple Store updates

• Editorial meetings• Statistics about ‘heuristic’ entities• Adding an entity directly from CMS

Linguistic Information in the Triple Store

• Morphology• Disambiguation rules & attributes

Philip Dudchuk & Daniel HladkySemTechBiz, San Francisco, June 5, 2012

Impacts

Philip Dudchuk & Daniel HladkySemTechBiz, San Francisco, June 5, 2012

Filtering news content by triple queries at the customer’s end (via API):

• content about any oil & gas company• content about any employee of any public body in a

given region of Russia• content about any event going to happen in my city

Common metadata for newswire and web content allow to blend free and paid content into new products (news archive)

Impact 1: Broadcasting News with Semantic Metadata

Philip Dudchuk & Daniel HladkySemTechBiz, San Francisco, June 5, 2012

My ria.ru

• Locating the user and filtering the content by region• Gathering user interests and filtering content by

entities and topics

Impact 2: Adaptive Content of Websites

Philip Dudchuk & Daniel HladkySemTechBiz, San Francisco, June 5, 2012

Putting together news metadata with external content

• summer forest fires• juvenile delinquency in towns and regions• election fraud cases

Impact 3: Non-traditional Aggregations and Analytics

Philip Dudchuk & Daniel HladkySemTechBiz, San Francisco, June 5, 2012

21

10

4

5

2

1

11

16 3

1 12

14

17

10

12

2

2

93

11

3

1

1

Combination of crowd-sourced geo data about forest fires and local reports by RIA Novosti

Philip Dudchuk & Daniel HladkySemTechBiz, San Francisco, June 5, 2012

A case study: country image analysis

Philip Dudchuk & Daniel HladkySemTechBiz, San Francisco, June 5, 2012

Country image analysis

• Searching news content related to Russia across more than 3,000 foreign sources

• Processing search results, tagging and aggregating content with its metadata

• Producing statistics about reaction on subjects connected to Russia (events, people, organizations)

Philip Dudchuk & Daniel HladkySemTechBiz, San Francisco, June 5, 2012

Top sources with biggest number of negative publications on involvement of Russian politicians and businessmen in Yulia Tymoshenko’s case

Negativity Index

Tymoshenko’s case in Ukraine, threat to boycott Euro 2012

‘Pussy riots’ punks arrested

Philip Dudchuk & Daniel HladkySemTechBiz, San Francisco, June 5, 2012

US media on Russia’s reaction on the events in Syria

Syria’s media on the same topic

The New York TimesThe Washington Post

The Financial Times

Philip Dudchuk & Daniel HladkySemTechBiz, San Francisco, June 5, 2012

Further Challenge

• Processing the content from social media to create adaptive social applications

• Semantic metadata for pictures and video (image & voice recognition)

• Making RIA content & metadata API public

• Creating a LOD cloud bubble out of RIA ontology and vocabularies

Philip Dudchuk & Daniel HladkySemTechBiz, San Francisco, June 5, 2012

Thank you!

@philip_dudchuk@daniel_hladky

top related