heterogene daten blitzschnell analysiert - home: doag e.v. · e-node e-node add nodes t ....

30
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Heterogene Daten blitzschnell analysiert Jochen Jörg, Principle Consultant , MarkLogic GmbH Agiles Daten-Management mit einer Enterprise NoSQL Datenbank

Upload: ngothuan

Post on 04-Jun-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Heterogene Daten blitzschnell analysiert Jochen Jörg, Principle Consultant , MarkLogic GmbH

Agiles Daten-Management mit einer Enterprise NoSQL Datenbank

Page 2: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 2

Agenda §  MarkLogic Enterprise NoSQL – Grundlagen

§  Beispielanwendung: Aggregation und Analyse heterogener Daten

§  Beispielanwendung: Architektur und Daten-Management

§  MarkLogic Enterprise NoSQL – Multi Model Ansatz

Page 3: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 3

MarkLogic Enterprise NoSQL Grundlagen

Page 4: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 4

Daten-Landschaft

Information Continuum

RDBMS

semi-strukturiert strukturiert

Freier Text Relational

Hierarchische Beziehugen Tweets

Emails Dokumente Social Media

XML Metadaten Content

Geo-Daten

JSON

Graphen

Search Engine

unstrukturiert

Page 5: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 5

MarkLogic – Enterprise NoSQL Datenbank

Datenbank Suche

Plattform

Multi Model Datenbank – Dokumente und Beziehungen (Semantics)

Volltext-Suche, Queries, Analysen

Plattform (flexible APIs)

Page 6: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 6

MarkLogic – Dokumentenorientierte Datenbank §  JSON, XML, Text, Binär (PDF, Bilder, Videos)

§  Dokument == Informations-Container

§  Granularer Zugriff

–  Lesend, schreibend, transaktional und konsistent

–  Suche, Query, Analyse

–  Unterschiedliche Ausgabeformate („on the fly“)

§  Schema-Agnostisch à Flexibilität in der Datenschicht

Page 7: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 7

Dokument als Informations-Container <SAR>  

<title>  Suspicious  vehicle…  Suspicious  vehicle  near  airport  <date>  <type>  <threat>  

2012-­‐11-­‐12Z  observation/surveillance  

<type>  suspicious  activity  <category>  suspicious  vehicle  

<location>  <lat>  37.497075  <long>  -­‐122.363319  

<subject>  IRIID  <subject>  IRIID  

<predicate>  <predicate>  

isa  value  

<triple>  <triple>  

<object>  license-­‐plate  <object>  ABC  123  

<description>  A  blue  van…  A  blue  van  with  license  plate  ABC  123  was  observed  parked  behind  the  airport  sign…  

</title>  </date>  

</type>  

</type>  </category>  

</threat>  

</lat>  </long>  

</location>  

</subject>  </subject>  

</predicate>  </predicate>  

</object>  </object>  

</description>  </SAR>  

</triple>  </triple>  

Metadaten, Daten, Beziehungen und Inhalte

Page 8: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 8

Dokument als Informations-Container <SAR>  

<title>  

Suspicious  vehicle…  

<date>  

2012-­‐11-­‐12Z  

<type>  

<threat>  

suspicious  activity  <category>  

suspicious  vehicle  

<location>  

<lat>  

37.497075  

<long>  

-­‐122.363319  

<description>  

A  blue  van…  

<subject>  <subject>  

<predicate>  

<object>  IRIID  

IRIID  

isa  

value  

license-­‐plate  

ABC  123  <predicate>  

<object>  

observation/surveillance  <type>  

<triple>  

<triple>  

Unstructured full-text

Geospatial Values

Page 9: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 9

3 Make changes when data changes - enable agile development

1 Start with complex, rapidly changing data 2

Load data “as-is” - model and transform data over time

Daten-Integration mit MarkLogic

November  SU   MO   TU   WE   TH   FR   SA  

1   2   3   4   5   6   7  8   9   10   11   12   13   14  15   16   17   18   19   20   21  22   23   24   25   26   27   28  29   30                      

Page 10: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 10

Beispielanwendung: Aggregation und Analyse heterogener

Daten

Page 11: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 11

Showcase: Fußball Weltmeisterschaft 2014 §  Erstellung einer datenzentrischer Applikation auf Basis

einer flexiblen Datenschicht.

§  Konsolidierung und Analyse heterogener Sport-Daten

–  Spieler

–  Mannschaften

–  Spiel-Partien

Page 12: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 12

Live - Demo

Page 13: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 13

Beispielanwendung: Architektur und Daten-Management

Page 14: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 14

Frameworks & Tools §  MarkLogic NoSQL Platform Search & Store

§  Spring MVC, Spring Boot

§  Thymeleaf (Templating)

§  Gradle: Build Tool, Managing MarkLogic setup/deployment

§  Apache Camel: –  Integration Framework

–  “Swiss Army Knife” für Daten-Aggregation

Page 15: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 15

Anwendungs-Entwicklung

Iterate

Load Data Sources “as-is”

(XML, JSON, Binary)

Search Transform Combine

Data

Definition skalarer Indexe Data Access Web Application

User Interface

== Agiler Prozess

Page 16: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 16

Client-Tier

HTML5 Bootstrap JavaScript jQuery

View

MatchesController PlayersController …

Abstract Base

Repository

MatchRepository PlayerRepository TeamRepository…

Application Architecture – Main components

MarkLogic Java

Client API

<Façade> MarkLogic

Connections

HTTP HTTP

Middle Tier, Java Data-Tier, MarkLogic

Search Options

Page 17: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 17

MarkLogic Enterprise NoSQL Multi Model Ansatz

Page 18: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 18

MarkLogic – Suche §  Wildcard, Stemming, Advanced Language Support

§  Facets, Snippets, Highlighting

§  Proximity boosting

§  Relevance ranking, Sorting

§  Alerting, Geospatial

§  Triple Search, Analytics

Page 19: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 19

MarkLogic Semantics - Triple Ausdruck einer Beziehung oder Eigenschaft Fakt (Subject - Predicate – Object) David Bowie (Subject) is-a (Predicate) Singer (Object)

David Bowie (Subject) is-born-in (Predicate) London (Object)

London (Subject) is-capital-of (Predicate) England (Object)

Page 20: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 20

MarkLogic als Semantischer Triple Store §  Datenbank für Milliarden von Tripeln

§  Triples und Dokumente in “seamless” einer Datenbank

§  Kombination: –  Dokument-Metadaten

–  Linked Data (privat, öffentlich, z.B. DBpedia)

--> neue Lösungsansätze

Page 21: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 21

Semantics - Einsatzszenarien §  Kontext-bezogene Suche und Suchergebnisse

§  Verbesserte Erschließung von Information

§  Verlinkung von Daten

§  Automatische Herleitung neuer Fakten („Inferencing“)

§  Kontext-bezogene Aggregation und Bereitstellung von Daten (Dynamic Semantic Publishing)

...

Page 22: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 22

MarkLogic - Multi-Model Ansatz

Dokumenten-Model

Flexible Datenhaltung

Agile, schnelle Applikations-Erstellung

Graph/ RDF Model

“Unbegrenzte” Relationen

Effiziente, flexible Verlinkung

Kontext

+ > Relationales Model

Flexible Queries

Strenges Schema

Enterprise

Suche Query Analyse

Horizontale Skalierbarkeit

Enterprise

Page 23: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 23

Progammier-Schnittstellen - APIs

Middle Tier

Database Layer

+

JavaScript XQuery/XSLT

REST API + Extension SQL via ODBC

Java API

NodeJS API

Client Tier Content/ Data Consumer

Page 24: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 24

Bitemporale Datenhaltung Versionierung/ Historisierung zeitabhängiger Daten

§  Datenanalyse über zwei Zeitachsen §  Gültigkeits-Zeit („valid time“) §  Systemzeit bei Daten-Änderung („recording time“)

à  Welcher (zeitabhängiger) Wert war zu einem bestimmten Datum für einen bestimmten Zeitraum gültig?

à  Rekonstruierung von Entscheidungen, Auditing

Page 25: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 25

Bitemporale Datenhaltung Historisierung zeitabhängiger Daten

§  Verwaltung zeitabhängiger Daten. §  Wann ist ein bestimmter Vertrag, Wert, Police,

Aktienkurs gültig. §  Wann wurde eine Änderung dieser

zeitabhängigen Daten vorgenommen.

Page 26: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 26

Skalierbarkeit und Elastizität Horizontal, ohne Downtime

§  MarkLogic ist für horizontale Skalierbarkeit in einem Cluster konzipiert

§  Ermöglicht gleichbleibende, hohe Performance bei:

§  Zunahme des Daten-Volumens §  Zunehmender Last/ Traffic

D-NODE D-NODE

E-NODE E-NODE

D-NODE

Add nodes

to scale out

Automated failover

Page 27: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

Geospatial Support

Full-text Search

Flexible Indexes

Native JSON Store

Native XML Store

Real-time Alerting

Native RDF Triple Store

Bitemporal

Tiered Storage

Fully Transactional

Server-side JavaScript

Hadoop and HDFS

Cloud Ready (AWS)

SQL Support

Scalable and Elastic

MarkLogic Content Pump

REST API

Samplestack

Ad-hoc Queries

Schema Agnostic

XA Transactions

24/7 Engineering Support

LDAP and Kerberos Security

Security Certifications

Configuration Management

Monitoring and Management

Performance at scale

Customizable Failover

Customizable Backup

Atomic Forests

Point-in-time Recovery

ACID Transactions

Index Across Data Types

Flexible Replication

Semantic Inference

Multi-OS Support

POWERFUL AGILE TRUSTED

MarkLogic / Enterprise NoSQL Database Platform

Page 28: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 28

Unsere Kunden

§  Aggregieren und Konsolidieren

§  Effektiver Zugriff auf Inhalte

§  Flexibilität und Anpassungsfähigkeit in der Datenhaltung

§  Neue Geschäftsmodelle

§  Datenanalyse

§  Vereinfachung von Infrastruktur und Betrieb

Page 29: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 29

MarkLogic - das Unternehmen Management

Elisa Smith VP and General Counsel davor eMeter, Pillsbury Winthrop

Jon Bakke SVP Services davor Oracle

David Ponzini SVP Alliances & Business Development davor Inktomi, PeopleSoft

Peter Norman CFO davor Chordiant, KPMG

Robert Roepke VP Finance & HR davor Yahoo!, Chordiant

Michaline Todd CMO davor Veritas, Serena

Niederlassungen

Hauptsitz San Carlos, CA – Silicon Valley Regionale Niederlassungen New York, Washington DC, London, Frankfurt, München (Berlin), Utrecht, Tokyo, Paris, Stockholm, Sydney, Singapore 400 Angestellte 420 Enterprise Kunden Lokale Präsenz in Deutschland

CEO & Firmengründer

Christopher Lindblad gründete MarkLogic in 2001 davor InfoSeek

Gary Bloom CEO, Joined May 2012 davor Veritas, Oracle, eMeter

Investoren Pat Grady Board Member

Tom Banahan Board Member

Brent Jones

Joe Pasqua SVP Product Strategy davor Symantec, Adobe

David Gorbet VP Engineering davor Microsoft

Page 30: Heterogene Daten blitzschnell analysiert - Home: DOAG e.V. · E-NODE E-NODE Add nodes t . Geospatial Support Full-text Search Flexible Indexes Native JSON Store Native XML Store Real-time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 30

Vielen Dank!

Jochen Jörg

[email protected]

@jochenjoerg

de.linkedin.com/in/jochenjoerg

MarkLogic http://www.marklogic.com

http://developer.marklogic.com