graphconnect 2014 sf: the business graph

16
SAN FRANCISCO | 10.22.2014 THE BUSINESS GRAPH The Business Graph (Why we chose Neo4j to rebuild CrunchBase)

Upload: neo4j-the-fastest-and-most-scalable-native-graph-database

Post on 14-Jun-2015

1.139 views

Category:

Technology


0 download

DESCRIPTION

GraphConnect 2014 SF: The Business Graph presented by Kurt Freytag, Head of Product and Engineering, CrunchBase

TRANSCRIPT

Page 1: GraphConnect 2014 SF: The Business Graph

SAN FRANCISCO | 10.22.2014 THE BUSINESS GRAPH

The Business Graph(Why we chose Neo4j to rebuild CrunchBase)

Page 2: GraphConnect 2014 SF: The Business Graph

THE BUSINESS GRAPH

Kurt Freytag Head of Product, CrunchBase

[email protected] 415.891.7761 @kfreytag

5’10”, 155lbs. Coding since 1977

Who Am I?

Page 3: GraphConnect 2014 SF: The Business Graph

THE BUSINESS GRAPH

• Concise History of CrunchBase

• Our Vision

• Why Neo4j?

• Building w/ Neo4j & The Web

• Q&A

What am I Talking About?

Page 4: GraphConnect 2014 SF: The Business Graph

THE BUSINESS GRAPH

• Started in 2007 by Michael Arrington

• Zero dedicated staff from 2007-2013

• Organically became source of truth for Startup Ecosystem

• Millions of Monthly Users

• Ran on two crappy AWS servers

History of CrunchBase - In One Slide

MySQL 5.0Rails 2.0

Page 5: GraphConnect 2014 SF: The Business Graph

THE BUSINESS GRAPH

• The Complete Graph of the Connected Business World

• Entities: people, products, companies

• Activities: fundings, acquisitions, job changes

• Connections: how everything relates

• Time: the lifecycle of every element

• World’s Most Powerful Startup Community

• Open to all

The Vision of CrunchBase

Page 6: GraphConnect 2014 SF: The Business Graph

THE BUSINESS GRAPH

Emil Eifrem Founder

• A natural way of modeling data

Why Neo4j?Neotechnologies

Company

Neo4j Enterprise Edition Product

Seed Round Funding

Sunstone Capital Investor

Connor Venture Partners Investor

Lars Nordwall COO

Philip Rathle VP of Products

GraphConnect 2014 Event

Kurt Freytag Speaker

Page 7: GraphConnect 2014 SF: The Business Graph

THE BUSINESS GRAPH

• A natural way of modeling data

• Adapts easily to changing requirements

Why Neo4j?Neotechnologies

Company

Seed Round Funding

Sunstone Capital Investor

Connor Venture Partners Investor

Investment

Investment

John Smith Lead Investor

John Smith Lead Investor

Page 8: GraphConnect 2014 SF: The Business Graph

THE BUSINESS GRAPH

• A natural way to model data

• Adapts easily to changing requirements

• Built-In Business Intelligence • Very specific or very general questions

• We don’t know the questions in advance

Why Neo4j?

select if (tg.described_count > 1, 'complex', 'basic') dupe_class, o.normalized_name, concat('=hyperlink("http://www.crunchbase.com', o.permalink, '", "', o.name, '")')name_url, ifnull(o.domain, '') domain, ifnull(o.homepage_url, '') homepage_url, if(o.status = 'unknown', '', o.status) status, o.permalink, ifnull(o.investment_rounds, '') investment_rounds, ifnull(o.funding_rounds, '') funding_rounds, ifnull(o.relationships, '') relationships, ifnull(o.milestones, '') milestones, if( o.logo_url is null, '', 'Yes') has_logo, length(ifnull(o.overview, '')) overview_length, ifnull(o.created_by, '') created_by, date_format(o.created_at, '%Y-%m-%d %H:%i:%s') created_at, UNIX_TIMESTAMP(o.created_at) ts, ( ifnull(o.investment_rounds, 0)*20 + ifnull(o.funding_rounds, 0)*20 + ifnull(o.relationships, 0)*10 + ifnull(o.milestones, 0) + length(ifnull(o.overview, '')) + if( o.logo_url is null, 0, 50)) entity_rank, o.entity_type, o.entity_id from cb_objects o join t_duplicate_objects td on td.object_id = o.id join t_duplicate_groups tg on tg.id = td.duplicate_group_id where td.max_created_at > FROM_UNIXTIME(i_start_unixtime)

EXPLAIN PLAN

Page 9: GraphConnect 2014 SF: The Business Graph

THE BUSINESS GRAPH

• A natural way of modeling data

• Adapts easily to changing requirements

• Built-In Business Intelligence • Very specific or very general questions

• We don’t know the questions in advance

• Directly maps to our OO thinking

Why Neo4j?class Organization < BaseEntity relationship :has_funding_round, relationship :has_customer, relationship :sponsors_event, ...end

Neotechnologies Company

class FundingRound < BaseActivity attribute :announced_on, attribute :closed_on, attribute :funding_type, attribute :series, attribute :money_raised, attribute :post_money_valuation, ...end

Seed Round Funding

class HasFundingRound < BaseRelationship relationship :has_funding_round, relationship :has_customer, relationship :sponsors_event, ...endha

s_funding_round

Page 10: GraphConnect 2014 SF: The Business Graph

THE BUSINESS GRAPH

• A natural way of modeling data

• Adapts easily to changing requirements

• Built-In Business Intelligence • Very specific or very general questions

• We don’t know the questions in advance

• Directly maps to our OO thinking

• We move faster • Just launched CrunchBase Events @ TC Disrupt London

• Design, development, QA, and release was 2 weeks

Why Neo4j?

Page 11: GraphConnect 2014 SF: The Business Graph

Okay, if Neo’s so awesome, why doesn’t everybody use it?

Page 12: GraphConnect 2014 SF: The Business Graph

THE BUSINESS GRAPH

• CGI • design a data model

• roll-your-own database connection

• manually write all your queries

• ORM (Hibernate, Doctrine) • design a data model

• build the objects

• map ‘em through configuration

Databases & the Web - A Brief History

Page 13: GraphConnect 2014 SF: The Business Graph

THE BUSINESS GRAPH

• Today’s languages use datastores as dumb repos

• Generate schemas from code

• Isolate developer from writing queries

• Focus on business logic, not data

• Couple of Problems • The DBA role existed for a reason

• Data modeling is the foundation of a scalable architecture

• Generated queries can easily be 1,000x less efficient

• Quick development can lead to slow applications

Database as a Commodity

Page 14: GraphConnect 2014 SF: The Business Graph

THE BUSINESS GRAPH

• Neo4j is tough to adopt • Languages don’t support it out-of-the-box

• The tools / drivers that exist are immature

• Neo4j is not plug-n-play

• However… • Neo4j is ideal for Object-Oriented development

• Graphs are a natural fit for many use cases

• We need to make Neo4j as easy to choose as MySQL

Means that…

+ = ?

Page 15: GraphConnect 2014 SF: The Business Graph

THE BUSINESS GRAPH

• ActiveRecord for Neo4j

• Implements a lot of ActiveModel • Validations

• Serialization

• Callbacks

• Handles all Marshalling / UnMarshalling

• “Feels” like ActiveRecord

• Makes Neo4j plug-n-play for Rails

• We Will Open Source It

“Deja”

Page 16: GraphConnect 2014 SF: The Business Graph

Thanks. Enjoy.