bbc case study agenda - wild apricot › resources › documents... · 2014-10-24 · a brief...
TRANSCRIPT
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 1
BBC Case Study Agenda ì A brief introduction to MarkLogic ì The BBC Dynamic Semantic Publishing model ì DSP Technology Components ì DSP and the Olympics ì The relevancy of triples, ontologies and open linked data for everyone else ì Embracing the potential
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 2
BRIEF INTRODUCTION TO MARKLOGIC
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 3
Hierarchical Era For your application data! • Application- and
hardware-specific
Data Drives the Need for a New Generation Database
Relational Era “For all your structured data!” • Normalized, tabular
model • Application-
independent query • User control
Any Structure Era “For all your data!” • Schema-agnostic • Massive scale • Query and search • Analytics • Heterogeneous data • Faster time-to-results
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 4
Harnessing Data & Reimagining Applications
ì Reduce Risk
ì Better Manage Compliance
ì Create New Value from Data
ì Optimize Operations
ì Lower TCO / Better IT Economics
ì Better Decision-making
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 5
The Only Enterprise NoSQL Database ì Document Store – Search & Query
ì Triple Store – Semantic Discovery
ì ACID Transactions
ì High Availability / Disaster Recovery
ì Replication
ì Government-grade Security
ì Scalability & Elasticity
ì On-premise or Cloud Deployment
ì Hadoop for Storage & Compute
ì Powerful Indexing Paradigm
ì Drives Alerting, Geospatial, Analytics and more…
SEARCH DATABASE
APPLICATION SERVICES
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 6
THE BBC DSP MODEL
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 7
BBC’s Aspiration
"Our aspiration was that just as the Coronation did for TV in 1953, the Olympics would do for digital in 2012"
Phil Fearnley – General Manager. Future Media – News & Knowledge
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 8
Some Numbers Behind the Success A daily record of 9.5M Global browsers
A daily record of 7.1M UK browsers
55M global browsers across the games
37M UK browsers across the games
106M requests for BBC Olympic video content
bbc.co.uk traffic for
Olympics greater over 24 hours than ALL of World Cup 2010
2.8 Petabytes of Data in the busiest
DAY
700 Gbits per second during Bradley
Wiggins’ TT Gold 9.2M UK browsers from
Mobile Devices 34% of all daily browsers
from phones
2.3M UK browsers
from Tablets 12M requests for
Mobile Video
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 9
Introducing Dynamic Semantic Publishing (DSP)
ì Uses linked data technology to automate… ì Aggregation ì Publishing ì Re-purposing ì …of interrelated content objects
ì Driven by an ontological domain-modeled information architecture
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 10
DSP – Key Points ì DSP enables the automated publication of metadata and content-
state driven web pages ì Each web page automatically aggregates and renders links to
relevant stories and assets ì Minimal journalist involvement: a small number of journalists can
author and surface the content with as light a touch as possible ì Underpinned by Ontologies, a Triple Store and a Content Store
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 11
DSP TECHNOLOGY COMPONENTS
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 12
The Editorial Tool (The Enabler) ì The tagging ontology is kept deliberately simple to protect
the journalist from the complexities of the underlying domain model
ì A simple set of asset/domain joining predicates, such as
"about" and "mentions", drive the annotation tool UI and workflow
ì The journalist applies suggested annotations as well as searching for triple store-indexed concepts.
ì All ontology concepts are linked to linked open data (LOD) identifiers (DBPedia, Geonames etc.).
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 13
The Sport ontology and Meta model which powers these automated annotation powered aggregations has now been published and can be re-used under a Creative Commons attribution licence
The Ontology Model (Framework)
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 14
The Triple Store (Populating the Framework) ì Maps the assets to the Ontology model ì Relies on RDF - Resource Description Framework: ì Making statements about concepts/resources in the form of
subject-predicate-object expressions (triples) ì RDF semantics improve navigation, content re-use and
journalist determined levels of automation ("edited by exception")
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 15
Semantics: Organizing and Connecting Data for Meaning
Data is stored in Triples, expressed as: Subject : Predicate : Object John Smith : livesIn : London London : isIn : England
Rules tell us something about the triples:
If (A livesIn X) AND (X isIn Y) then (A livesIn Y)
"John Smith" "England" livesIn "London" isIn
livesIn
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 16
The Content Store (The Delivery Engine)
ì Ingests all the assets intended for consumption ì Stores those assets (in multiple schema) in a single
place ì Dynamically delivers all assets to all the web pages,
systems and devices that need them… precisely when they need them
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 17
Making things work…
O CS TS
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 18
Making things presentable…
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 19
DSP AND THE OLYMPICS
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 20
The World Cup 2010 – The First Step
ì Featured 700-plus team, group and player pages ì That number of indices impossible with a static publishing
model
ì Every page orchestrated by automated annotation-powered aggregations
ì The DSP architectural approach enabled the BBC to support
much greater breadth and scale
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 21
The World Cup Domain Model ì The domain model included concepts and relationships such as:
ì time and location
ì events and competitions
ì groups
ì stages and rounds
ì matches
ì teams, squads and players
ì players within squads
ì teams playing in groups
ì groups within stages
ì …etc
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 22
BBC Content…
ì A journalist selects and applies the single concept "Frank Lampard“ ì The Triple Store both applies and infers links to other ‘concepts’ such as…
ì "England Squad"
ì "Group C" and
ì "FIFA World Cup 2010"
ì … as generated triples within the triple store
ì The semantics of the ontologies, the factual data, and the content metadata are all taken into account during each query evaluation.
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 23
plays for
plays in
Lampard
Chelsea
Southampton v Chelsea
Premier League
Include a story about Lampard
Include in the list of recent matches
Include in the League Table
Facts into Actions:
http://www.bbc.com/sport/
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 24
External Content Feeds… ì Automated XML sports stats feeds from various sources are delivered and
processed by the BBC ì Feeds are now also transformed into an RDF representation ì The transformation process maps feed-supplier IDs onto corresponding
ontology concepts ì Sports stats for Matches, Teams and Players are aggregated inline and served
dynamically from the content store, orchestrated by the triple store
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 25
The Olympics – DSP at Scale
“The Content Store which currently powers all of the statistics and navigation on the sports site has been scaled to handle ingesting many thousands of content objects per second “…whilst concurrently supporting many millions of dynamic page renditions and impressions a day “…This high performance content store will allow the BBC Sports site to ingest and render sport statistics including live football scores, live football tables, live Olympics event statistics and results in near real-time whilst rendering this content dynamically using the DSP approach.”
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 26
The BBC’s Conclusion?
"The demand and astonishing feedback we've seen from audiences accessing our Olympics content online, whenever they want, on the devices they choose, has exceeded our expectations and helped fulfill this aspiration… a truly digital games”
Phil Fearnley – General Manager. Future Media – News & Knowledge
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 27
THE RELEVANCY OF TRIPLES, ONTOLOGIES AND OPEN LINKED
DATA FOR EVERYONE ELSE…
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 28
Key Themes in Play ì Complex Ecosystem
ì Huge event, huge variety of known actors, venues, events, dependencies, outcomes
ì Looking for an accurate, complete representation of that Ecosystem
ì Rich, complete view, broad and deep discovery possibilities, insight!
ì Moving from ‘static’ to ‘dynamic’
ì Low economies of scale to high economies of scale
ì ‘Walled Garden’ to ‘Expansive’
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 29
Context from the World at Large
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
Linked Open Data
ì Facts that are freely available ì In a form that’s easily consumed
Examples:
ì DBPedia
ì GeoNames
Machine readable knowledge!
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 30
The Digital Supply Chain Core
Ingest Aggregate Manipulate Deliver
Ingest Aggregate Manipulate Deliver
Ingest Aggregate Manipulate Deliver
!
!
Incompatibility:
• Formats • Schema • Definitions • Vocabulary • Meaning
NoSQL
RDF
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 31
NoSQL & RDF for a New World of Data ì NoSQL
ì Relax schema constraints to enable data efficiency and a ‘single view’ ì Single Store or Metadata Layer
ì Banish silos ì RDF
ì Maximize discovery and insight ì Associate and Link ALL relevant things together
ì With rules that govern the degree of flex for relevancy ì Breadth & completeness within a restricted environment
ì Complete freedom and serendipity wherever appropriate
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 32
Horizontal Relevancy & Application ì Patient 360 ì Single view of Citizen ì Know Your Customer ì Complete view of Assets ì Fraud Detection
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 33
Horizontal Relevancy & Application ì What assets and information are relevant for this athlete? ì Which drugs are compatible for treatment of this condition? ì Which products will work together / might you wish to purchase together? ì Which financial products are compatible with this investors risk profile? ì What advice is compatible with this patient’s lifestyle and health history?