infinitegraph presentation from oct 21, 2010 dbta webcast

Post on 27-Jan-2015

104 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Here is the presentation from Warren Davidson, Director of Business Development, and Darren Wood, InfiniteGraph chief architect. The October 21, 2010 webinar hosted by DBTA, with InfiniteGraph and Riptano, covered new data technologies and how the NOSQL ("Not Only SQL") approach is beneficial in addressing some of the more complex application, scalability and performance requirements in handling vast amounts of data, and in performing advanced analytics on those data volumes with greater ease and speed.

TRANSCRIPT

October 21, 2010

Warren Davidson wdavidson@infinitegraph.comDarren Wood dwood@infinitegraph.comInfiniteGraph www.infinitegraph.com

Agenda

• The NoSQL Landscape• InfiniteGraph• Solving what problems and how?

Copyright © InfiniteGraph

Some NoSQL Notes

Copyright © InfiniteGraph

• NoSQL = Not Only SQL

• NoSQL is requirements driven

• NoSQL = open source?

• NoSQL = cloud computing?

Company Confidential

The NoSQL Landscape

Cassandra

InfiniteGraph

NoSQL Landscape

Key Value Stores

Key Value Stores

BigTable Clones

BigTable Clones

Document databasesDocument databases

Complexity

Voldemort – LinkedInDynamo - Amazon

Cassandra – FacebookHBase – Apache/HadoopHypertable

CouchDB – ApacheMongoDB

Neo4jHypergraphDBAllegroGraphSones

Performance

Graph Databases

Social Network AnalysisIntelligence Community

Graph Databases

Graph Databases• A graph database is used to trace relationships among entities, most

commonly people, to any depth. Its characteristics are:– Very simple, fixed schema– Very complex data relationships– Used to support complex associations among like entities.

6

Node

Edge

John Jones

Jane Jones-Smith

Nancy Jones Paul

Jones

Doris Smith

Jim Smith

Jeff Smith

Meta-Model Instance Example (simplified)

Attribute(s)

Jeff Smith

InfiniteGraphA business unit of Objectivity

• In the business of distributed data management for over 10 years

• Solving graph data problems for over 8 years

• Focusing on the emerging requirements of graph data for cloud and on-premise distributed systems

Copyright © InfiniteGraph

Graphs are everywhere

Enterprise and government 2.0, bio-engineering, gene sequencing, drug development…..

LinkedIn, Facebook….Social network analytics, social CRM….

Network analysis, complex BoM, predictive and real-time ISR, fraud detection and response….

Graph Databases – What’s so Different ?

Darren WoodChief Architect, InfiniteGraph

Graph Databases

• Key technical attributes• How Infinite Graph addresses these• Query and navigation• Challenges/Requirements of Distibution• Practical applications

Copyright © InfiniteGraph

Graph Databases

• Optimized around data relationships– Relationships as first class citizens– Super fast navigation between entities– Rich/flexible annotation of connections

• Small focused API (typically not SQL)– Natively work with concepts of Vertex/Edge– SQL has no concept of “navigation”– Most attempts based in SQL are convoluted

Copyright © InfiniteGraph

Physical Storage Comparison

Copyright © InfiniteGraph

Meetings

P1 Place TimeP2Alice Denver 5-27-10Bob

Calls

From Time DurationToBob 13:20 25CarlosBob 17:10 15Charlie

Payments

From Date AmountToCarlos 5-12-10 100000Charlie

Met5-27-10Alice

Called13:20Bob

Payed100000Carlos

Charlie

Called17:10

Rows/Columns/Tables Relationship/Graph Optimized

Query and Navigation• Queries – but not as you know them• More like a rules based search and discovery• Asynchronous Results

Copyright © InfiniteGraph

Alice Carlos CharlieBobMeets Calls Pays

Calls

“Find all paths between Alice and Charlie”

“Find all paths between Alice and Charlie – within 2 degrees”

“Find all paths between Alice and Charlie – events in May 2010”

Management of Large Data Graphs

• Graphs grow quickly– Billions of phone calls / day in US– Emails, social media events, IP Traffic– Financial transactions

• Some analytics require navigation of large sections of the graph

• Each step (often) depends on the last• Must distribute data and go parallel

Copyright © InfiniteGraph

Graph Partitioning

• Graph partitioning is not as simple• Graph operations are rarely partition bound• Graphs are ‘alive’• Repartitioning is expensive• Partitions must co-operate

Copyright © InfiniteGraph

Distributed API

Application(s)

Partition 1 Partition 3Partition 2 Partition ...n

Processor Processor Processor Processor

Graph Partitioning – Reality !

Copyright © InfiniteGraph

Distributed Graph Must Haves

• High performance distributed persistence• Ability to deal with remote data reads (fast)• Intelligent local cache of subgraphs• Distributed navigation processing• Distributed, multi-source concurrent ingest• Write modes supporting both strict and

eventual consistency

Copyright © InfiniteGraph

Practical Applications

Copyright © InfiniteGraph

Graph Analysis (Algorithms)

• Social Networks– Most connected participants– Influencers– Important Syndicates or Sub-networks

• Central figures in crime organisations• Business Intelligence

– Discovering Knowledge Assets– Complex analytics

Copyright © InfiniteGraph

Graph Analysis (Patterns)

• Crime (again)– Recognize common patterns of activity– Complex chains of interaction

• Security– Recognize attack/threat patterns– Auditing / log analytics

• Targeting Advertising– To specific browsing patterns

Copyright © InfiniteGraph

Many Many More !

• Spatial data• Defence / Situational Awareness• Sciences• Health Care• Genealogy• Logistics• Tracking

Copyright © InfiniteGraph

Thankyou !

Copyright © InfiniteGraph

darren.wood@infinitegraph.comwdavidson@infinitegraph.com

Twitter - @infinitegraph

top related