semantic web standards and the variety “v” of big data

Post on 19-Nov-2014

418 Views

Category:

Data & Analytics

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

TopQuadrant presentation by Bob DuCharme given in the dual NoSQL and SemanticTechnology & Business track in San Jose on August 20, 2014

TRANSCRIPT

© Copyright 2014 TopQuadrant Inc. Slide 1

Semantic Web standards and

the Variety “V” of Big Data

Bob DuCharme

August 20, 2014

© Copyright 2014 TopQuadrant Inc. Slide 2

Three Vs of Big Data

Volume

Velocity

Variety

© Copyright 2014 TopQuadrant Inc. Slide 3

Gartner, September 2013

© Copyright 2014 TopQuadrant Inc. Slide 4

Which dimensions did people struggle with the most?

Volume 35%

Velocity 16%

Variety 49%

© Copyright 2014 TopQuadrant Inc. Slide 5

Why is variety hard?

Furniture Inventory

Protein Database?

Customer Database

Conference Attendees?

SurnameGivenNameLastPurchaseZipCodeEmail

last_namefirst_nameis_speakerpostal_codeemail

© Copyright 2014 TopQuadrant Inc. Slide 6

Schemas

Good thing:

Ensure data quality

Make query writing* easier

Add efficiency

*And essentially, all application development

Annoying thing:

Can’t add property values someone didn’t see coming

Changing schema (and data with it) slow and expensive

Often tied too closely to specific implementation

Inflexibility × 3.

© Copyright 2014 TopQuadrant Inc. Slide 7

Schemaless NoSQL databases

Can’t add property values someone didn’t see coming?

Changing schema (and data with it) slow and expensive?

Often tied too closely to specific implementation?

© Copyright 2014 TopQuadrant Inc. Slide 8

Schemaless: how do applications know what properties are available?

By any means necessary

Documentation

Query for properties that got used

App possibly written by same person or team

Responsibility shifted from database (designer) to application (designer)

© Copyright 2014 TopQuadrant Inc. Slide 9

Schema: all or nothing?

Customer Database

Conference Attendees?

SurnameGivenNameLastPurchaseZipCodeEmail

last_namefirst_nameis_speakerpostal_codeemail

ETL (Extract-Transform-Load)?

© Copyright 2014 TopQuadrant Inc. Slide 10

RDF Schema (RDFS)

W3C Standard since 2004

Often overshadowed by superset standard OWL

Describes RDF, written using RDF syntaxes

Semantic Web

Linked Data

© Copyright 2014 TopQuadrant Inc. Slide 11

RDF

www.w3.org/RDF (second sentence!):

“RDF has features that facilitate data merging even if the underlying schemas differ, and it specifically supports the evolution of schemas over time without requiring all the data consumers to be changed.”

© Copyright 2014 TopQuadrant Inc. Slide 12

Sample schema

@prefix cust: <http://companyX.com/ns/customer#> .@prefix ca: <http://companyY.com/ns/confAttendees#> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

cust:Surname a rdf:Property . # or: cust:Surname rdf:type rdf:Property . cust:GivenName a rdf:Property . cust:ZipCode a rdf:Property . cust:Email a rdf:Property .

ca:last_name a rdf:Property . ca:first_name a rdf:Property . ca:postal_code a rdf:Property. ca:email a rdf:Property .

# LastPurchase and is_speaker: don't care (for now)!

Customer Database

Conference Attendees

© Copyright 2014 TopQuadrant Inc. Slide 13

Relating properties# assuming prefix declarations from previous slide@prefix schema: <http://schema.org/> .@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

cust:Surname rdfs:subPropertyOf schema:familyName . ca:last_name rdfs:subPropertyOf schema:familyName .

cust:GivenName rdfs:subPropertyOf schema:givenName . ca:first_name rdfs:subPropertyOf schema:givenName .

cust:Email rdfs:subPropertyOf schema:email . ca:email rdfs:subPropertyOf schema:email .

Cust:ZipCode rdfs:subPropertyOf schema:postalCode . ca:postal_code rdfs:subPropertyOf schema:postalCode .

© Copyright 2014 TopQuadrant Inc. Slide 14

Using the combined data

# SPARQL query: where should we open# a government relations office?

SELECT ?postalCodeWHERE { ?person schema:email ?email . FILTER(strends(?email,".gov")) ?person schema:postalCode ?postalCode . }

© Copyright 2014 TopQuadrant Inc. Slide 15

Middleware to treat RDBMS as RDF

Customers

Mapping Middleware (e.g. D2R, Ultrawrap)

Application

SPARQL query

SQL query

Relational results

SPARQL query

results

© Copyright 2014 TopQuadrant Inc. Slide 16

Middleware to treat RDBMS as RDF

Customers

Mapping Middleware (e.g. D2R, Ultrawrap)

Application

SPARQL query

SQL query

Relational results

SPARQL query

results

Conference Attendees

SQL query

Relational results

Schema metadata

triplestore

© Copyright 2014 TopQuadrant Inc. Slide 17

Further enhancement

ex:Person a rdfs:Class.

schema:familyName rdfs:domain ex:Person . schema:givenName rdfs:domain ex:Person . schema:email rdfs:domain ex:Person . schema:postalCode rdfs:domain ex:Person .

schema:postalCode rdfs:label "postal code" . Schema:postalCode rdfs:comment "Zip code in the USA, postcode in the UK."

© Copyright 2014 TopQuadrant Inc. Slide 18

Adding more with OWL

equipment code room

X1703 main kitchen

Z0439 cold storage

room building

main kitchen 98 Main St.

cold storage 14 Broad St.

Equipment Room addresses

eq:room rdfs:subPropertyOf ex:locatedIn . rmaddr:building rdfs:subPropertyOf ex:locatedIn .

ex:locatedIn a owl:TransitiveProperty.

rmaddr:98MainSt a ex:Building. eq:X1703 eq:room eq:mainKitchen .eq:mainKitchen rmaddr:building rmaddr:98MainSt .

© Copyright 2014 TopQuadrant Inc. Slide 19

Query for which building

# SPARQL query: what building is# equipment piece x1703 in?

SELECT ?buildingWHERE { ?building a ex:Building. eq:X1703 ex:locatedIn ?building . }

located in

located in

© Copyright 2014 TopQuadrant Inc. Slide 20

A little more OWL

schema:email a owl:inverseFunctionalProperty .

ex:cust401 cust:GivenName "James" . ex:cust401 cust:Surname "Smith" . ex:cust401 cust:Email "jsmith@somecompany.com" .

ex:ca04395 ca:first_name "Jim" . ex:ca04395 ca:last_name "Smith" . ex:ca04395 ca:email "jsmith@somecompany.com" .

ex:cust401 owl:sameAs ex:ca04395 .

© Copyright 2014 TopQuadrant Inc. Slide 21

What OWL adds to RDFS

RDFS gives you properties to describe your properties, classes, and instances (i.e. your resources)

OWL gives you:

• More properties to describe your resources

• Classes that you can use to describe resources

• The ability to define your own classes that you can use to describe resources

© Copyright 2014 TopQuadrant Inc. Slide 22

Middleware to treat RDBMS as RDF

Customers

Mapping Middleware (e.g. D2R, Ultrawrap)

Application

SPARQL query

SQL query

Relational results

SPARQL query

results

Conference Attendees

SQL query

Relational results

Schema metadata

triplestore

© Copyright 2014 TopQuadrant Inc. Slide 23

Descriptive vs. Proscriptive schemas

Not rules to follow– e.g. “Employee must have a first and last name!”– Other ways to do implement constraints

Machine-readable guides to what you’ve got to work with– Data types– Relationships to other resources and classes of

resources Metadata!

© Copyright 2014 TopQuadrant Inc. Slide 24

Whose schemas?

Your own schemas can describe what you need from the data you’re using

Standardized schemas (e.g. schema.org, GoodRelations) can tie together your data with data form other sources

Tie together your custom schemas with (subsets that you’re interested in of) standardized schemas

Tie together (subsets that you’re interested in of) different data sets from different sources

© Copyright 2014 TopQuadrant Inc. Slide 25

Top-down or bottom-up schema development?

Whichever you like I like bottom-up

– (Hey Cyc project: good luck with that!) Lots of data to deal with?

– Model just enough to drive a simple, proof-of-concept application

– Build the model (schema) a little at a time, then add more to your application

– Connect that model to models of (subsets of) other data sets

© Copyright 2014 TopQuadrant Inc. Slide 26

Who is doing this now?

Pharma

Oil and gas

Publishing

© Copyright 2014 TopQuadrant Inc. Slide 27

TopQuadrant Products and Solutions

Solutions

Asset Management

Solutions

Search / Content

Enrichment

TopBraid Platform Solution Engine

IDE

Solutions

Compose your own

Solutions

Master Data Management

SolutionsInformation

Discovery for Life Sciences

Solutions

Information Exchange

TopQuadrant offers configurable, out-of-the box solutions enabling organizations to evolve their information infrastructure into a semantic ecosystem

© Copyright 2014 TopQuadrant Inc. Slide 28

Dynamic Interactive Exploration - Search, Query, Filter, Browse, Navigate, Visualize, Share

Logical Data Warehouse - Flexible, Adaptive Information Structuring

TopBraid Insight™ (TBI)

Connect the dots for new insights. Ease Big Data Variety

© Copyright 2013 TopQuadrant Inc. Slide 29

© Copyright 2014 TopQuadrant Inc. Slide 30

• Tames Big Data to empower businesses

• Offers on-demand integrated access to diverse data, making it possible to discover information just in time

• Delivers new levels of creativity and infrastructure flexibility

TopBraid Insight: Connects the Dots

© Copyright 2014 TopQuadrant Inc. Slide 31

Photo credits

• Volume: (CC BY-NC 2.0) Fabrizio Monti https://www.flickr.com/photos/delphaber/3514894189

• Velocity: (CC BY 2.0) Gabriel https://www.flickr.com/photos/cod_gabriel/1332225362

• Variety: (CC BY-NC-SA 2.0) IRRI Photos https://www.flickr.com/photos/ricephotos/4753359957

© Copyright 2014 TopQuadrant Inc. Slide 32

“A wonderful harmony is created when we join together the seemingly unconnected.” - Heraclitus

Bob DuCharme bducharme@topquadrant.com

Thank you!

top related