tracking data lineage with neo4j and linkurious

18
Tracking data lineage with Neo4j and Linkurious. SAS founded in 2013 in Paris | http://linkurio.us | @linkurious

Upload: linkurious

Post on 07-Jan-2017

6.004 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Tracking data lineage with neo4j and Linkurious

Tracking data lineage with Neo4j and Linkurious.

SAS founded in 2013 in Paris | http://linkurio.us | @linkurious

Page 2: Tracking data lineage with neo4j and Linkurious

French startup specialized in graph-visualization.

CTO

Web-scale archiving

Université de Technologie de

Compiègne

CMO

>5 years in consulting

Sciences Po + Ecole de Guerre

Economique

JeanVilledieu

SébastienHeymann

DavidRapin

CEO

Created Gephi

Phd in CS and complex systems

from UPMC

Page 3: Tracking data lineage with neo4j and Linkurious

What is a graph?

PERSONname: Séb

age: 29

PERSONname: Jean

age: 31

LOCATIONname: Paris

Lives

inLives in

Knows

Page 4: Tracking data lineage with neo4j and Linkurious

A graph is a set of nodes and relationships.

This is a node

This is a relationship

PERSONname: Séb

age: 29

PERSONname: Jean

age: 31

LOCATIONname: Paris

This is a property

Page 5: Tracking data lineage with neo4j and Linkurious

What is data lineage?

“Data lineage is defined as a data life cycle that includes the data's origins and where it moves over time. It describes what happens to data as it goes through diverse processes”

- Wikipedia

Page 6: Tracking data lineage with neo4j and Linkurious

A real-world data pipeline.

Page 7: Tracking data lineage with neo4j and Linkurious

Top 5 data lineage questions.

1. Where is this data coming from?

2. Who has access to that information?

3. Do we have sensitive data that’s being propagated unsafely?

4. Is my database still being used in an important company process or can I remove it?

5. What systems and reports would be impacted by a change in that particular process?

Page 8: Tracking data lineage with neo4j and Linkurious

Traditional databases are not adapted to data lineage.

Hard to query

Querying connected data through SQL is a hard and error-prone process.

Slow

Slow performances for questions requiring looking up multiple connections.

Too rigid

Hard to accommodate an evolving data model in a relational database.

Page 9: Tracking data lineage with neo4j and Linkurious

The cost of bad data lineage.

● A general lack of confidence in data;

● Potential legal exposure;

● Finding answers and making decisions becomes complex and time-consuming;

...it results in wasted time, money, opportunities, etc.

Page 10: Tracking data lineage with neo4j and Linkurious

Graph DBs are perfect for data lineage.

● Easy to model the flow of data in a graph;

● Query relationships with ease and in real-time;

● Adapt your schema to accommodate new data and relationships;

● Popularity of graph databases has increased 500% in the last 2 years and our partner Neo4j is the leader.

Page 11: Tracking data lineage with neo4j and Linkurious

Linkurious brings ability to find answers.

● Tech and business users can search the data lineage intuitively and find answers;

● Visualization brings ability to understand and communicate complex connections;

● Accelerate and improve decisions.

Page 12: Tracking data lineage with neo4j and Linkurious

Unique ability to store and analyse your data lineage.

Neo4j

Your data lineage is a large graph. Store and query it quickly with Neo4j.

Linkurious

Search and find answers easily through a visual interface.

Page 13: Tracking data lineage with neo4j and Linkurious

Metadata Process

System

Process Metadata ReportMetadata

System System

Example: a graph model for data lineage.

Page 14: Tracking data lineage with neo4j and Linkurious

Question #1: what’s the data lineage of this report?

Our business people need to know what data was

used to generate this month’s sales report. I need

to understand which metadata, which systems

and which processes were involved.

IT Analyst

Page 15: Tracking data lineage with neo4j and Linkurious

Question #1: visualize the data lineage of a report.

It only takes a few minutes to search a report and analyse its lineage. No need to be an expert!

Page 16: Tracking data lineage with neo4j and Linkurious

Question #2: what is this database used for?

We’re relocating our datacenter and need to move

a server on which a database is stored? Can we

decommission it? I need to understand what

processes and reports rely on this server.

IT Analyst

Page 17: Tracking data lineage with neo4j and Linkurious

Question #2: visualize an impact analysis.

We can visualize and inspect the complex set of relationships involved in the impact analysis.

Page 18: Tracking data lineage with neo4j and Linkurious

Conclusion.

Contact us to discuss your projects at [email protected]