lego: data driven growth hacking powered by big data
TRANSCRIPT
![Page 1: LEGO: Data Driven Growth Hacking Powered by Big Data](https://reader036.vdocument.in/reader036/viewer/2022062310/58710ffe1a28abac6d8b5885/html5/thumbnails/1.jpg)
1Salesforce ConfidentialSalesforce
Confidential
LEGO: Data Driven Growth Hacking Powered by Big Data
June 2016
Kamal Duggireddy Prashant Gokhale
![Page 2: LEGO: Data Driven Growth Hacking Powered by Big Data](https://reader036.vdocument.in/reader036/viewer/2022062310/58710ffe1a28abac6d8b5885/html5/thumbnails/2.jpg)
2Salesforce Confidential
Kamal Duggireddy
Kamal Duggireddy currently leads Data Engineering, Product Data Science Team at Salesforce.com Prior to this, he served as Director - Big Data Architecture at American Express. Combining deep technical skills along with business knowledge and strong execution experience, Kamal developed reference architectures and new enterprise-level capabilities with the Hadoop stack.
Prashant Gokhale
Prashant is currently working on solving big data problems at Salesforce.com using Hadoop and its ecosystem components. Prior to this he held several critical engineering positions at Yahoo, Cloudera & Lookout.
About Us
![Page 3: LEGO: Data Driven Growth Hacking Powered by Big Data](https://reader036.vdocument.in/reader036/viewer/2022062310/58710ffe1a28abac6d8b5885/html5/thumbnails/3.jpg)
3Salesforce Confidential
The Use Case | Overview
ExecutivesAnalystsProduct Managers
![Page 4: LEGO: Data Driven Growth Hacking Powered by Big Data](https://reader036.vdocument.in/reader036/viewer/2022062310/58710ffe1a28abac6d8b5885/html5/thumbnails/4.jpg)
4Salesforce Confidential
The Use Case | Flow
Ad-Hoc Requests
Predictive Data Apps
Data Engineering & Curation
Smart Data Dashboards(Salesforce Wave)
Advanced AnalysisInstrumentation
150+ Loglines
HadoopData Processing
Traditional Data Warehouses Dimensions
![Page 5: LEGO: Data Driven Growth Hacking Powered by Big Data](https://reader036.vdocument.in/reader036/viewer/2022062310/58710ffe1a28abac6d8b5885/html5/thumbnails/5.jpg)
5Salesforce Confidential
The Journey | How it all started
![Page 6: LEGO: Data Driven Growth Hacking Powered by Big Data](https://reader036.vdocument.in/reader036/viewer/2022062310/58710ffe1a28abac6d8b5885/html5/thumbnails/6.jpg)
6Salesforce Confidential
Milestones | Along the way
</>
<\>
Reusability Declarative Data Lake Data Dictionary
Self serviceAutomation
Security Visualization Governance
![Page 7: LEGO: Data Driven Growth Hacking Powered by Big Data](https://reader036.vdocument.in/reader036/viewer/2022062310/58710ffe1a28abac6d8b5885/html5/thumbnails/7.jpg)
7Salesforce Confidential
The Framework | Finally!
Dat
aset
s(V
ario
us g
rain
)
Data Lake
Log Processing
Metadata
Flow Engine
W
eb A
pp
Self Service
Log
Sou
rces
Clou
d M
etri
cs
Data Profiler
Data Science
Kafka Splunk
Files
Warehouse
Objects
Hadoop
Cube
s(C
usto
m g
rain
)
![Page 8: LEGO: Data Driven Growth Hacking Powered by Big Data](https://reader036.vdocument.in/reader036/viewer/2022062310/58710ffe1a28abac6d8b5885/html5/thumbnails/8.jpg)
8Salesforce Confidential
Goals
ScalableProcess hundreds of billions of log lines.
FlexibleHandle thousands of log schemas. Support variable grain and transformations using custom code.
Data QualityAutomated data profiling, monitoring and alerting.
Self ServiceEnable ad-hoc analysis
![Page 9: LEGO: Data Driven Growth Hacking Powered by Big Data](https://reader036.vdocument.in/reader036/viewer/2022062310/58710ffe1a28abac6d8b5885/html5/thumbnails/9.jpg)
9Salesforce Confidential
Log Processing Engine•Declaratively define features and flows.
•Normalize data across multiple log lines.
•Custom code injection for data transformation.
Data Profiler•Profile data at scale to detect anomalies.
Web App •Interface to manage features and flows.
Job Automation engine•End to end automation from features/flows to curated data sets in Wave.
Key Building Blocks
![Page 10: LEGO: Data Driven Growth Hacking Powered by Big Data](https://reader036.vdocument.in/reader036/viewer/2022062310/58710ffe1a28abac6d8b5885/html5/thumbnails/10.jpg)
10Salesforce Confidential
Log Processing Engine
logType==’X’ and event==’Create Event’ and page==’Home Landing’,”Feat 1”,”eval_code(event.toUpperCase())”,page,…..
logType==’ABC’ and event==’Create Event’ and page==’Home’,”Feat 2”,”eval_code(event.substring(5))”,event,…..
usage Log Files
Feature definitions
Hive tables
Data Normalization
Data Cleansing
Data Transformation
+
![Page 11: LEGO: Data Driven Growth Hacking Powered by Big Data](https://reader036.vdocument.in/reader036/viewer/2022062310/58710ffe1a28abac6d8b5885/html5/thumbnails/11.jpg)
11Salesforce Confidential
Data Profiler
Dataset Field Type, Total, Min, Max, Avg, # Nulls, # Distinct, Median, 99th %tile, Top N
lego_feat browser STR 2.3B 7 63 25 1M 50 34 38 [.....]
lego_feat url STR 2.3B 20 223 50 0 5M 70 90 [.....] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Datasets across platform
HCatalog
MapReduce
Datasets Dataset Profile An Example
Monitoring & alerting
![Page 12: LEGO: Data Driven Growth Hacking Powered by Big Data](https://reader036.vdocument.in/reader036/viewer/2022062310/58710ffe1a28abac6d8b5885/html5/thumbnails/12.jpg)
12Salesforce Confidential
Everything put together
Dat
aset
s(V
ario
us g
rain
)
Data Lake
Log Processing
Metadata
Flow Engine
W
eb A
pp
Self Service
Log
Sou
rces
Clou
d M
etri
cs
Data Profiler
Data Science
Kafka Splunk
Files
Warehouse
Objects
Hadoop
Cube
s(C
usto
m g
rain
)
![Page 13: LEGO: Data Driven Growth Hacking Powered by Big Data](https://reader036.vdocument.in/reader036/viewer/2022062310/58710ffe1a28abac6d8b5885/html5/thumbnails/13.jpg)
13Salesforce Confidential
Data Volumetrics
TOTAL
Avg. Volume of App Logs processed (Compressed) 100’s TB/mon
Avg. Number of Jobs 6000+ /mon
Avg. Log Size volume growth rate A lot!
Number of Log Record Types 1,000s
Number of fields 10s of 1,000s
200+ BEvents / Day
500+Features
![Page 14: LEGO: Data Driven Growth Hacking Powered by Big Data](https://reader036.vdocument.in/reader036/viewer/2022062310/58710ffe1a28abac6d8b5885/html5/thumbnails/14.jpg)
14Salesforce Confidential
thank y u
14
We are hiring!! www.salesforce.com/comapany/careers