semantic web standards and the variety “v” of big data

32
© Copyright 2014 TopQuadrant Inc. Slide Semantic Web standards and the Variety “V” of Big Data Bob DuCharme August 20, 2014

Upload: bobdc

Post on 19-Nov-2014

418 views

Category:

Data & Analytics


2 download

DESCRIPTION

TopQuadrant presentation by Bob DuCharme given in the dual NoSQL and SemanticTechnology & Business track in San Jose on August 20, 2014

TRANSCRIPT

Page 1: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 1

Semantic Web standards and

the Variety “V” of Big Data

Bob DuCharme

August 20, 2014

Page 2: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 2

Three Vs of Big Data

Volume

Velocity

Variety

Page 3: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 3

Gartner, September 2013

Page 4: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 4

Which dimensions did people struggle with the most?

Volume 35%

Velocity 16%

Variety 49%

Page 5: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 5

Why is variety hard?

Furniture Inventory

Protein Database?

Customer Database

Conference Attendees?

SurnameGivenNameLastPurchaseZipCodeEmail

last_namefirst_nameis_speakerpostal_codeemail

Page 6: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 6

Schemas

Good thing:

Ensure data quality

Make query writing* easier

Add efficiency

*And essentially, all application development

Annoying thing:

Can’t add property values someone didn’t see coming

Changing schema (and data with it) slow and expensive

Often tied too closely to specific implementation

Inflexibility × 3.

Page 7: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 7

Schemaless NoSQL databases

Can’t add property values someone didn’t see coming?

Changing schema (and data with it) slow and expensive?

Often tied too closely to specific implementation?

Page 8: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 8

Schemaless: how do applications know what properties are available?

By any means necessary

Documentation

Query for properties that got used

App possibly written by same person or team

Responsibility shifted from database (designer) to application (designer)

Page 9: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 9

Schema: all or nothing?

Customer Database

Conference Attendees?

SurnameGivenNameLastPurchaseZipCodeEmail

last_namefirst_nameis_speakerpostal_codeemail

ETL (Extract-Transform-Load)?

Page 10: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 10

RDF Schema (RDFS)

W3C Standard since 2004

Often overshadowed by superset standard OWL

Describes RDF, written using RDF syntaxes

Semantic Web

Linked Data

Page 11: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 11

RDF

www.w3.org/RDF (second sentence!):

“RDF has features that facilitate data merging even if the underlying schemas differ, and it specifically supports the evolution of schemas over time without requiring all the data consumers to be changed.”

Page 12: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 12

Sample schema

@prefix cust: <http://companyX.com/ns/customer#> .@prefix ca: <http://companyY.com/ns/confAttendees#> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

cust:Surname a rdf:Property . # or: cust:Surname rdf:type rdf:Property . cust:GivenName a rdf:Property . cust:ZipCode a rdf:Property . cust:Email a rdf:Property .

ca:last_name a rdf:Property . ca:first_name a rdf:Property . ca:postal_code a rdf:Property. ca:email a rdf:Property .

# LastPurchase and is_speaker: don't care (for now)!

Customer Database

Conference Attendees

Page 13: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 13

Relating properties# assuming prefix declarations from previous slide@prefix schema: <http://schema.org/> .@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

cust:Surname rdfs:subPropertyOf schema:familyName . ca:last_name rdfs:subPropertyOf schema:familyName .

cust:GivenName rdfs:subPropertyOf schema:givenName . ca:first_name rdfs:subPropertyOf schema:givenName .

cust:Email rdfs:subPropertyOf schema:email . ca:email rdfs:subPropertyOf schema:email .

Cust:ZipCode rdfs:subPropertyOf schema:postalCode . ca:postal_code rdfs:subPropertyOf schema:postalCode .

Page 14: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 14

Using the combined data

# SPARQL query: where should we open# a government relations office?

SELECT ?postalCodeWHERE { ?person schema:email ?email . FILTER(strends(?email,".gov")) ?person schema:postalCode ?postalCode . }

Page 15: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 15

Middleware to treat RDBMS as RDF

Customers

Mapping Middleware (e.g. D2R, Ultrawrap)

Application

SPARQL query

SQL query

Relational results

SPARQL query

results

Page 16: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 16

Middleware to treat RDBMS as RDF

Customers

Mapping Middleware (e.g. D2R, Ultrawrap)

Application

SPARQL query

SQL query

Relational results

SPARQL query

results

Conference Attendees

SQL query

Relational results

Schema metadata

triplestore

Page 17: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 17

Further enhancement

ex:Person a rdfs:Class.

schema:familyName rdfs:domain ex:Person . schema:givenName rdfs:domain ex:Person . schema:email rdfs:domain ex:Person . schema:postalCode rdfs:domain ex:Person .

schema:postalCode rdfs:label "postal code" . Schema:postalCode rdfs:comment "Zip code in the USA, postcode in the UK."

Page 18: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 18

Adding more with OWL

equipment code room

X1703 main kitchen

Z0439 cold storage

room building

main kitchen 98 Main St.

cold storage 14 Broad St.

Equipment Room addresses

eq:room rdfs:subPropertyOf ex:locatedIn . rmaddr:building rdfs:subPropertyOf ex:locatedIn .

ex:locatedIn a owl:TransitiveProperty.

rmaddr:98MainSt a ex:Building. eq:X1703 eq:room eq:mainKitchen .eq:mainKitchen rmaddr:building rmaddr:98MainSt .

Page 19: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 19

Query for which building

# SPARQL query: what building is# equipment piece x1703 in?

SELECT ?buildingWHERE { ?building a ex:Building. eq:X1703 ex:locatedIn ?building . }

located in

located in

Page 20: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 20

A little more OWL

schema:email a owl:inverseFunctionalProperty .

ex:cust401 cust:GivenName "James" . ex:cust401 cust:Surname "Smith" . ex:cust401 cust:Email "[email protected]" .

ex:ca04395 ca:first_name "Jim" . ex:ca04395 ca:last_name "Smith" . ex:ca04395 ca:email "[email protected]" .

ex:cust401 owl:sameAs ex:ca04395 .

Page 21: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 21

What OWL adds to RDFS

RDFS gives you properties to describe your properties, classes, and instances (i.e. your resources)

OWL gives you:

• More properties to describe your resources

• Classes that you can use to describe resources

• The ability to define your own classes that you can use to describe resources

Page 22: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 22

Middleware to treat RDBMS as RDF

Customers

Mapping Middleware (e.g. D2R, Ultrawrap)

Application

SPARQL query

SQL query

Relational results

SPARQL query

results

Conference Attendees

SQL query

Relational results

Schema metadata

triplestore

Page 23: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 23

Descriptive vs. Proscriptive schemas

Not rules to follow– e.g. “Employee must have a first and last name!”– Other ways to do implement constraints

Machine-readable guides to what you’ve got to work with– Data types– Relationships to other resources and classes of

resources Metadata!

Page 24: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 24

Whose schemas?

Your own schemas can describe what you need from the data you’re using

Standardized schemas (e.g. schema.org, GoodRelations) can tie together your data with data form other sources

Tie together your custom schemas with (subsets that you’re interested in of) standardized schemas

Tie together (subsets that you’re interested in of) different data sets from different sources

Page 25: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 25

Top-down or bottom-up schema development?

Whichever you like I like bottom-up

– (Hey Cyc project: good luck with that!) Lots of data to deal with?

– Model just enough to drive a simple, proof-of-concept application

– Build the model (schema) a little at a time, then add more to your application

– Connect that model to models of (subsets of) other data sets

Page 26: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 26

Who is doing this now?

Pharma

Oil and gas

Publishing

Page 27: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 27

TopQuadrant Products and Solutions

Solutions

Asset Management

Solutions

Search / Content

Enrichment

TopBraid Platform Solution Engine

IDE

Solutions

Compose your own

Solutions

Master Data Management

SolutionsInformation

Discovery for Life Sciences

Solutions

Information Exchange

TopQuadrant offers configurable, out-of-the box solutions enabling organizations to evolve their information infrastructure into a semantic ecosystem

Page 28: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 28

Dynamic Interactive Exploration - Search, Query, Filter, Browse, Navigate, Visualize, Share

Logical Data Warehouse - Flexible, Adaptive Information Structuring

TopBraid Insight™ (TBI)

Connect the dots for new insights. Ease Big Data Variety

Page 29: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2013 TopQuadrant Inc. Slide 29

Page 30: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 30

• Tames Big Data to empower businesses

• Offers on-demand integrated access to diverse data, making it possible to discover information just in time

• Delivers new levels of creativity and infrastructure flexibility

TopBraid Insight: Connects the Dots

Page 31: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 31

Photo credits

• Volume: (CC BY-NC 2.0) Fabrizio Monti https://www.flickr.com/photos/delphaber/3514894189

• Velocity: (CC BY 2.0) Gabriel https://www.flickr.com/photos/cod_gabriel/1332225362

• Variety: (CC BY-NC-SA 2.0) IRRI Photos https://www.flickr.com/photos/ricephotos/4753359957

Page 32: Semantic Web Standards and  the Variety “V” of Big Data

© Copyright 2014 TopQuadrant Inc. Slide 32

“A wonderful harmony is created when we join together the seemingly unconnected.” - Heraclitus

Bob DuCharme [email protected]

Thank you!