data pipelines: big data meets salesforce
TRANSCRIPT
Data Pipeline:
Big Data meets Salesforce
Carolina Ruiz Medina
Principal Developer on Product Innovation
@carolenlanube
Agustina García Peralta
Principal Developer on Platform Strategy
@agarciaodeian
Carolina Ruiz MedinaPrincipal Developer on Product Innovation
FinancialForce.com , MVP
@CarolEnLaNube
@CodeCoffeeCloud
Agustina García PeraltaPrincipal Developer, Platform Strategy
FinancialForce.com
@agarciaodeian
About
GREAT ALONE. BETTER TOGETHER.
Native to Salesforce1™ Platform
since 2009
Investors include Salesforce Ventures
650+ employees, San Francisco based
4
Agenda
• Data Pipeline - Overview
• Pipeline Use Cases
• How Pipeline works – Demos
• Big Data
• Take away
• Q&A
Asynchronous apex
• @future
• Queueable
• Batch Apex
• Flex Queue (since Summer ’15)
Common scenario – Large amount of data
• Any other option?
• Data Pipeline: New feature to integrate Apache Pig into Salesforce
Common scenario – Large amount of data
• What does it do?
• Process massive amounts of data in parallel.
• Key elements
• MapReduce software to write programs to run amounts of data in parallel
• Hadopp cluster cluster for storing and analyzing amounts of data
Apache Pig Background
Enables Developers to create executions for
analyzing LARGE AMOUNT of data
in PARALLEL
• How does it work?
• It uses Pig Latin
• Data-flow language
• Between SQL and Java
• We can create our own UDF (user – define functions)
Apache Pig Background
• Why is it relevant?
• Technology associated with Hadoop but can be used by other frameworks Salesforce
• Is there anything unique to Apache Pig running in Salesforce?
• Running in multitenant environment
Apache Pig Background
• Under Pilot program GA by Summer ‘16 (Safe Harbor)
• How does Data Pipeline work?
• Run Pig Scripts written in Pig Latin language
What is Data Pipeline?
Data Pipeline Pig Script
Apex?
• Execution feature
• Run asynchronously
• In Parallel
• From where?
• Developer Console
• During deploy
• Tooling API 33.0 onwards
What is Data Pipeline?
• Anything else?
• It is an ETL (Extract – Transform – Load)
• Pig Scripts can be included into a package
What is Data Pipeline?
What is Data Pipeline?
1 . Performance
Data Pipeline – Advantages vs other processes
2 . Ability to Execute Scripts in Parallel
3 . No hitting governor Limits
4 . De-couple On-line Transaction
Processing and On-line Analytical
Processing
5 . Allows you to think in terms of
data flow
How Pipeline can help us?
…. and we need to process
them Now!
We have a large volume of
Financial Transactions
…. for our Users to be able to
use them: Report, print, or for
another quick process to finish
revaluate
Prepare data
for Currency
Revaluation
SObject SObjectto
How Pipeline can help us?
…. and we need to process
them Now!
We have a large volume of
Financial Transactions
…. for our manager to look the
progress, to export data
quickly...
Extracting
information
from large
amount of Data
SObject File to
To build the Solution lets See Pig Script first
What is Pig Script ?
Operators
JOIN
GROUP
DISTINCT
ORDER
…
Solution SObject SObjectto
Solution
SObject File to
File created
Demo
Use Case –
LBX 7/7/2015 $150.00 I-00000
Other 7/7/2015 $250.00 I-00001
LBX 7/7/2015 $150.00 I-00002
LBX 12/7/2015 $350.00 I-00003
Other 15/7/2015 $550.00 I-00004
LBX 7/7/2015 $150.00 I-00000
Other 7/7/2015 $250.00 I-00001
LBX 7/7/2015 $150.00 I-00002
LBX 12/7/2015 $350.00 I-00003
Other 15/7/2015 $550.00 I-00004
LBX 7/7/2015 $150.00 I-00000
Other 7/7/2015 $250.00 I-00001
LBX 7/7/2015 $150.00 I-00002
LBX 12/7/2015 $350.00 I-00003
Other 15/7/2015 $550.00 I-00004
LBX 7/7/2015 $150.00 I-00000
Other 7/7/2015 $250.00 I-00001
LBX 7/7/2015 $150.00 I-00002
LBX 12/7/2015 $350.00 I-00003
Other 15/7/2015 $550.00 I-00004
7/7/2015 LBX $300.00
7/7/2015 Other $250.00
12/7/2015 Other $250.00
15/7/2015 Other $550.00
LBX 7/7/2015 $150.00 I-00000
Other 7/7/2015 $250.00 I-00001
LBX 7/7/2015 $150.00 I-00002
LBX 12/7/2015 $350.00 I-00003
Other 15/7/2015 $550.00 I-00004
SObjecttoFile
Use Case - SObjecttoFile
Use Case –
No header!!
SObjecttoFile
Demo
Use Case – SObjecttoFile
Use Case – SObjecttoFile
Data Pipeline – 2 more options
Join 2 objects
Data Pipeline – 2 more options
Read and Process a JSON file
• Thousand of invoices
• Keep them somewhere for audit processes
• No need all information, just some field values
But that is not all!!
Big Data
#Big Data#Big Objects
Big Data – Big Objects
Custom Object Big Object
Creation Manual & Metadata Metadata
• Under Pilot program GA by Summer ‘16 (Safe Harbor)
Big Data – Big Objects
Big Data – Big Objects
Big Data – Big Objects
Custom Object Big Object
Creation Manual & Metadata Metadata
API name myObject__c myObject__b
Enable Reports, Track Activities,
Track Field History, etc. Options Available Options No Available
Field Types All Text ; Date/Time ; Lookup
Big Data – Big Objects
Custom Object Big Object
Able to edit / delete fields? Yes No
Triggers; Field Sets; etc Options Available Options no Available
Big Data – Big Objects
Custom Object Big Object
How to Populate records All options Bulk API; SOAP API; Data Pipeline
Can I amen a record? Yes No Only clone is available
Can I see data creating a Tab Yes No Only via SOQL
For free? Yes No Talk with Salesfoce about it
Storage? It count against storage limitationIt DOES NOT count against the
storage limitation
Big Data – Big Objects & Pipeline
• Size complexity 20 operators, 20 loads and 10 stores / script
• Run up to 30 scripts a day
• Bulk API
• Store calls it and its limits are in place
• Does not support some operators like Count
• Can’t break the rules on Salesforce Platform triggers, validations, required fields, etc…
• Once you run the process there is no way back
Data Pipeline - Limitations
Data Pipeline – Take away
1. New Feature is in Pilot
2. Run Scripts via:
Developer Console
Deploy
Tooling API ( since API 33.0)
3. Run Scripts Asynchronously and in Parallel
4. Better performance Batch Apex ------ Pipeline
5. Easy to use!!
Q&A// add info for next session at 4.00 pm with the PMs
• https://pig.apache.org/
• http://goo.gl/h5N7Sa
• https://goo.gl/KXQSKC
Links and more
Carolina Ruíz Medina
@CarolEnLaNube
@CodeCoffeeCloud
www.codeandvoge.com
http://www.meetup.com/es/South-Spain-
Salesforce-Developer-Group/
Agustina García Peralta
@agarciaodeian
www.agarciaodeian.com
http://www.meetup.com/es/Spain-Salesforce-
Developer-User-Group/
Thank you