neo4j gokuldaspillai-121018170144-phpapp01
TRANSCRIPT
Commercial Graph at Intuit
Gokuldas Pillai
Engineer, Data Services, Intuit
@gokool
Improving the lives of 60M people
…creates a unique and compelling set of data
1 in 3Tax Returns
1 in12Americans
Pay
$2.6Tin Transactions
25 MillionQuestions Answered
1 to 50Apps
From
7 MillionMobile Customers
45M Customers Using Connected Services
Is it time to hire?
Small Business Hiring Trends
My revenue increased
5%...is that good?
Revenue Comparisons
Am I spending
more than my friends?
Spending Profiles
Auto $750
Rent $1,200
Groceries $400
Intuit Payment Graph
• Discover the latent network from multiple product data-stores
– Uniquely identify entities and their connections
– Connections scored by volume of trade
• Empower Business Unit (BU) teams to leverage the Intuit Payment Graph to build applications.
– Graph to be available for real time access
The Graph Server provides rich profiles
IdentityName
AddressPhoneEmail
Mint IdEtc.
SocialFacebook
YelpTwitter
Etc.
DemographicsAge
GenderEtc.
Consumer Profile Facets
IdentityName
AddressPhoneEmail
QBO IdEtc.
SocialFacebook
YelpTwitter
Etc.
FirmographicsCategoryRevenue
EmployeesEtc.
Business Profile Facets
And the buyer-seller relationships
May 20113 purchases$650.25
May 20111 purchase$25.95
Consumer
Business Business
Design
Fuzzy matching & de-duplicating entities
ID: 002114902Name: The Windsor-Press IncStreet: 6 N 3rd StCity: HamburgState: PAZip: 19526-1502Phone: (610)-562-2267
Company ABC
name: The Windsor Press, Inc.address: PO Box 465 6 North Third Streetcity: Hamburgstate: PAzip: 19526phone: (610) 562-2267
name: The Windsor Pressaddress: P.O. Box 465 6 North 3rd St.city: Hamburgstate: PAzip: 19526-0465phone: (610) 562-2267
Company PQR
Dun & Bradstreet
Both of the above vendor records map to external reference data:
Commercial Graph Architecture
Business names, address, phone, industry code
Real-time Applications
Request
Response
De-duped Nodes
Transactions
Invoices, bills,
payments, vendors, customers
Categorization
Matching/De-duping
Offline analytics
Data Model
CompanyName: Acme IncZip: 95134…
CompanyName: Veva LLCZip: 94040…
ProductName:Quickbooks…
ProductName:Payroll…
Relationship:CUSTOMERTxn Count: 125No. of years:1
Relationship:LICENSEDNo. of years:8
CompanyName: Beta LLCLocation: 94043…
Relationship:CUSTOMERTxn Count: 467No. of years:3
Data-model Demo
Scale
• Size of the graph
– 29 Mn Unique Nodes
– 315 Mn Properties
– 48 Mn Relationships
Referrals & recommendations
Connecting consumers with small businesses
Small business micro-communities
Big Data
for the Little Guy
Usecase - Vendor Recommendation
START n=node(23539) MATCH
n-[:PAYS]-v-[:PAYS]-vovWHERE
has(vov.IC4_DESC) AND vov.IC4_DESC =~ 'Legal.*' AND not (ID(vov) = ID(v))
RETURN ID(vov),vov.ENTITY_TYPE,vov.CITY?,vov.IC4_DESC?
ORDER BY vov.loyalty;
Why Neo4J
• Java – matched in-house skills
• Flexible/Supports quick exploration
• Easy admin functionality – set-up, adding data
• Built in access points over HTTP (REST/JSON)
• SQL-like Query language (Cypher is awesome!)
• Active mailing list
• Good documentation
• Vendor support
Neo4j for real-time graph applications
18
Cypher Query Language
START biz = node(100) MATCH biz–
[TRANSACTS]- x RETURN x
Great for… Opportunity Areas…
Real time
Cypher
Built-in Algos
Lucene search
Horizontalscaling
Access control
Indexing
Experiment. Measure. Pivot.
Persevere.
Privacy matters…a lot.
Build the right team.
Team
• 2 Engineers (100%)
• 2 Data Scientists (50%)
• 1 Product Manager
• We are hiring Data Engineers !
– http://careers.intuit.com/professional
Thank you.