infinitegraph presentation from oct 21, 2010 dbta webcast

22
October 21, 2010 Warren Davidson [email protected] Darren Wood [email protected] InfiniteGraph www.infinitegraph.com

Upload: infinitegraph

Post on 27-Jan-2015

104 views

Category:

Technology


0 download

DESCRIPTION

Here is the presentation from Warren Davidson, Director of Business Development, and Darren Wood, InfiniteGraph chief architect. The October 21, 2010 webinar hosted by DBTA, with InfiniteGraph and Riptano, covered new data technologies and how the NOSQL ("Not Only SQL") approach is beneficial in addressing some of the more complex application, scalability and performance requirements in handling vast amounts of data, and in performing advanced analytics on those data volumes with greater ease and speed.

TRANSCRIPT

Page 1: InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast

October 21, 2010

Warren Davidson [email protected] Wood [email protected] www.infinitegraph.com

Page 2: InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast

Agenda

• The NoSQL Landscape• InfiniteGraph• Solving what problems and how?

Copyright © InfiniteGraph

Page 3: InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast

Some NoSQL Notes

Copyright © InfiniteGraph

• NoSQL = Not Only SQL

• NoSQL is requirements driven

• NoSQL = open source?

• NoSQL = cloud computing?

Page 4: InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast

Company Confidential

The NoSQL Landscape

Cassandra

InfiniteGraph

Page 5: InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast

NoSQL Landscape

Key Value Stores

Key Value Stores

BigTable Clones

BigTable Clones

Document databasesDocument databases

Complexity

Voldemort – LinkedInDynamo - Amazon

Cassandra – FacebookHBase – Apache/HadoopHypertable

CouchDB – ApacheMongoDB

Neo4jHypergraphDBAllegroGraphSones

Performance

Graph Databases

Social Network AnalysisIntelligence Community

Graph Databases

Page 6: InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast

Graph Databases• A graph database is used to trace relationships among entities, most

commonly people, to any depth. Its characteristics are:– Very simple, fixed schema– Very complex data relationships– Used to support complex associations among like entities.

6

Node

Edge

John Jones

Jane Jones-Smith

Nancy Jones Paul

Jones

Doris Smith

Jim Smith

Jeff Smith

Meta-Model Instance Example (simplified)

Attribute(s)

Jeff Smith

Page 7: InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast

InfiniteGraphA business unit of Objectivity

• In the business of distributed data management for over 10 years

• Solving graph data problems for over 8 years

• Focusing on the emerging requirements of graph data for cloud and on-premise distributed systems

Copyright © InfiniteGraph

Page 8: InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast

Graphs are everywhere

Enterprise and government 2.0, bio-engineering, gene sequencing, drug development…..

LinkedIn, Facebook….Social network analytics, social CRM….

Network analysis, complex BoM, predictive and real-time ISR, fraud detection and response….

Page 9: InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast

Graph Databases – What’s so Different ?

Darren WoodChief Architect, InfiniteGraph

Page 10: InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast

Graph Databases

• Key technical attributes• How Infinite Graph addresses these• Query and navigation• Challenges/Requirements of Distibution• Practical applications

Copyright © InfiniteGraph

Page 11: InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast

Graph Databases

• Optimized around data relationships– Relationships as first class citizens– Super fast navigation between entities– Rich/flexible annotation of connections

• Small focused API (typically not SQL)– Natively work with concepts of Vertex/Edge– SQL has no concept of “navigation”– Most attempts based in SQL are convoluted

Copyright © InfiniteGraph

Page 12: InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast

Physical Storage Comparison

Copyright © InfiniteGraph

Meetings

P1 Place TimeP2Alice Denver 5-27-10Bob

Calls

From Time DurationToBob 13:20 25CarlosBob 17:10 15Charlie

Payments

From Date AmountToCarlos 5-12-10 100000Charlie

Met5-27-10Alice

Called13:20Bob

Payed100000Carlos

Charlie

Called17:10

Rows/Columns/Tables Relationship/Graph Optimized

Page 13: InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast

Query and Navigation• Queries – but not as you know them• More like a rules based search and discovery• Asynchronous Results

Copyright © InfiniteGraph

Alice Carlos CharlieBobMeets Calls Pays

Calls

“Find all paths between Alice and Charlie”

“Find all paths between Alice and Charlie – within 2 degrees”

“Find all paths between Alice and Charlie – events in May 2010”

Page 14: InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast

Management of Large Data Graphs

• Graphs grow quickly– Billions of phone calls / day in US– Emails, social media events, IP Traffic– Financial transactions

• Some analytics require navigation of large sections of the graph

• Each step (often) depends on the last• Must distribute data and go parallel

Copyright © InfiniteGraph

Page 15: InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast

Graph Partitioning

• Graph partitioning is not as simple• Graph operations are rarely partition bound• Graphs are ‘alive’• Repartitioning is expensive• Partitions must co-operate

Copyright © InfiniteGraph

Page 16: InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast

Distributed API

Application(s)

Partition 1 Partition 3Partition 2 Partition ...n

Processor Processor Processor Processor

Graph Partitioning – Reality !

Copyright © InfiniteGraph

Page 17: InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast

Distributed Graph Must Haves

• High performance distributed persistence• Ability to deal with remote data reads (fast)• Intelligent local cache of subgraphs• Distributed navigation processing• Distributed, multi-source concurrent ingest• Write modes supporting both strict and

eventual consistency

Copyright © InfiniteGraph

Page 18: InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast

Practical Applications

Copyright © InfiniteGraph

Page 19: InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast

Graph Analysis (Algorithms)

• Social Networks– Most connected participants– Influencers– Important Syndicates or Sub-networks

• Central figures in crime organisations• Business Intelligence

– Discovering Knowledge Assets– Complex analytics

Copyright © InfiniteGraph

Page 20: InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast

Graph Analysis (Patterns)

• Crime (again)– Recognize common patterns of activity– Complex chains of interaction

• Security– Recognize attack/threat patterns– Auditing / log analytics

• Targeting Advertising– To specific browsing patterns

Copyright © InfiniteGraph

Page 21: InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast

Many Many More !

• Spatial data• Defence / Situational Awareness• Sciences• Health Care• Genealogy• Logistics• Tracking

Copyright © InfiniteGraph

Page 22: InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast

Thankyou !

Copyright © InfiniteGraph

[email protected]@infinitegraph.com

Twitter - @infinitegraph