data pipelines: big data meets salesforce

43
Data Pipeline: Big Data meets Salesforce Carolina Ruiz Medina Principal Developer on Product Innovation [email protected] @carolenlanube Agustina García Peralta Principal Developer on Platform Strategy [email protected] @agarciaodeian

Upload: salesforce-developers

Post on 20-Feb-2017

336 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Data Pipelines: Big Data Meets Salesforce

Data Pipeline:

Big Data meets Salesforce

Carolina Ruiz Medina

Principal Developer on Product Innovation

[email protected]

@carolenlanube

Agustina García Peralta

Principal Developer on Platform Strategy

[email protected]

@agarciaodeian

Page 2: Data Pipelines: Big Data Meets Salesforce

Carolina Ruiz MedinaPrincipal Developer on Product Innovation

FinancialForce.com , MVP

@CarolEnLaNube

@CodeCoffeeCloud

Page 3: Data Pipelines: Big Data Meets Salesforce

Agustina García PeraltaPrincipal Developer, Platform Strategy

FinancialForce.com

@agarciaodeian

Page 4: Data Pipelines: Big Data Meets Salesforce

About

GREAT ALONE. BETTER TOGETHER.

Native to Salesforce1™ Platform

since 2009

Investors include Salesforce Ventures

650+ employees, San Francisco based

4

Page 5: Data Pipelines: Big Data Meets Salesforce

Agenda

• Data Pipeline - Overview

• Pipeline Use Cases

• How Pipeline works – Demos

• Big Data

• Take away

• Q&A

Page 6: Data Pipelines: Big Data Meets Salesforce

Asynchronous apex

• @future

• Queueable

• Batch Apex

• Flex Queue (since Summer ’15)

Common scenario – Large amount of data

Page 7: Data Pipelines: Big Data Meets Salesforce

• Any other option?

• Data Pipeline: New feature to integrate Apache Pig into Salesforce

Common scenario – Large amount of data

Page 8: Data Pipelines: Big Data Meets Salesforce

• What does it do?

• Process massive amounts of data in parallel.

• Key elements

• MapReduce software to write programs to run amounts of data in parallel

• Hadopp cluster cluster for storing and analyzing amounts of data

Apache Pig Background

Enables Developers to create executions for

analyzing LARGE AMOUNT of data

in PARALLEL

Page 9: Data Pipelines: Big Data Meets Salesforce

• How does it work?

• It uses Pig Latin

• Data-flow language

• Between SQL and Java

• We can create our own UDF (user – define functions)

Apache Pig Background

Page 10: Data Pipelines: Big Data Meets Salesforce

• Why is it relevant?

• Technology associated with Hadoop but can be used by other frameworks Salesforce

• Is there anything unique to Apache Pig running in Salesforce?

• Running in multitenant environment

Apache Pig Background

Page 11: Data Pipelines: Big Data Meets Salesforce

• Under Pilot program GA by Summer ‘16 (Safe Harbor)

• How does Data Pipeline work?

• Run Pig Scripts written in Pig Latin language

What is Data Pipeline?

Data Pipeline Pig Script

Apex?

Page 12: Data Pipelines: Big Data Meets Salesforce

• Execution feature

• Run asynchronously

• In Parallel

• From where?

• Developer Console

• During deploy

• Tooling API 33.0 onwards

What is Data Pipeline?

Page 13: Data Pipelines: Big Data Meets Salesforce

• Anything else?

• It is an ETL (Extract – Transform – Load)

• Pig Scripts can be included into a package

What is Data Pipeline?

Page 14: Data Pipelines: Big Data Meets Salesforce

What is Data Pipeline?

Page 15: Data Pipelines: Big Data Meets Salesforce

1 . Performance

Data Pipeline – Advantages vs other processes

2 . Ability to Execute Scripts in Parallel

3 . No hitting governor Limits

4 . De-couple On-line Transaction

Processing and On-line Analytical

Processing

5 . Allows you to think in terms of

data flow

Page 16: Data Pipelines: Big Data Meets Salesforce

How Pipeline can help us?

…. and we need to process

them Now!

We have a large volume of

Financial Transactions

…. for our Users to be able to

use them: Report, print, or for

another quick process to finish

revaluate

Prepare data

for Currency

Revaluation

SObject SObjectto

Page 17: Data Pipelines: Big Data Meets Salesforce

How Pipeline can help us?

…. and we need to process

them Now!

We have a large volume of

Financial Transactions

…. for our manager to look the

progress, to export data

quickly...

Extracting

information

from large

amount of Data

SObject File to

Page 18: Data Pipelines: Big Data Meets Salesforce

To build the Solution lets See Pig Script first

What is Pig Script ?

Operators

JOIN

GROUP

DISTINCT

ORDER

Page 19: Data Pipelines: Big Data Meets Salesforce

Solution SObject SObjectto

Page 20: Data Pipelines: Big Data Meets Salesforce

Solution

SObject File to

File created

Page 21: Data Pipelines: Big Data Meets Salesforce

Demo

Page 22: Data Pipelines: Big Data Meets Salesforce

Use Case –

LBX 7/7/2015 $150.00 I-00000

Other 7/7/2015 $250.00 I-00001

LBX 7/7/2015 $150.00 I-00002

LBX 12/7/2015 $350.00 I-00003

Other 15/7/2015 $550.00 I-00004

LBX 7/7/2015 $150.00 I-00000

Other 7/7/2015 $250.00 I-00001

LBX 7/7/2015 $150.00 I-00002

LBX 12/7/2015 $350.00 I-00003

Other 15/7/2015 $550.00 I-00004

LBX 7/7/2015 $150.00 I-00000

Other 7/7/2015 $250.00 I-00001

LBX 7/7/2015 $150.00 I-00002

LBX 12/7/2015 $350.00 I-00003

Other 15/7/2015 $550.00 I-00004

LBX 7/7/2015 $150.00 I-00000

Other 7/7/2015 $250.00 I-00001

LBX 7/7/2015 $150.00 I-00002

LBX 12/7/2015 $350.00 I-00003

Other 15/7/2015 $550.00 I-00004

7/7/2015 LBX $300.00

7/7/2015 Other $250.00

12/7/2015 Other $250.00

15/7/2015 Other $550.00

LBX 7/7/2015 $150.00 I-00000

Other 7/7/2015 $250.00 I-00001

LBX 7/7/2015 $150.00 I-00002

LBX 12/7/2015 $350.00 I-00003

Other 15/7/2015 $550.00 I-00004

SObjecttoFile

Page 23: Data Pipelines: Big Data Meets Salesforce

Use Case - SObjecttoFile

Page 24: Data Pipelines: Big Data Meets Salesforce

Use Case –

No header!!

SObjecttoFile

Page 25: Data Pipelines: Big Data Meets Salesforce

Demo

Page 26: Data Pipelines: Big Data Meets Salesforce

Use Case – SObjecttoFile

Page 27: Data Pipelines: Big Data Meets Salesforce

Use Case – SObjecttoFile

Page 28: Data Pipelines: Big Data Meets Salesforce

Data Pipeline – 2 more options

Join 2 objects

Page 29: Data Pipelines: Big Data Meets Salesforce

Data Pipeline – 2 more options

Read and Process a JSON file

Page 30: Data Pipelines: Big Data Meets Salesforce

• Thousand of invoices

• Keep them somewhere for audit processes

• No need all information, just some field values

But that is not all!!

Page 31: Data Pipelines: Big Data Meets Salesforce

Big Data

#Big Data#Big Objects

Page 32: Data Pipelines: Big Data Meets Salesforce

Big Data – Big Objects

Custom Object Big Object

Creation Manual & Metadata Metadata

• Under Pilot program GA by Summer ‘16 (Safe Harbor)

Page 33: Data Pipelines: Big Data Meets Salesforce

Big Data – Big Objects

Page 34: Data Pipelines: Big Data Meets Salesforce

Big Data – Big Objects

Page 35: Data Pipelines: Big Data Meets Salesforce

Big Data – Big Objects

Custom Object Big Object

Creation Manual & Metadata Metadata

API name myObject__c myObject__b

Enable Reports, Track Activities,

Track Field History, etc. Options Available Options No Available

Field Types All Text ; Date/Time ; Lookup

Page 36: Data Pipelines: Big Data Meets Salesforce

Big Data – Big Objects

Custom Object Big Object

Able to edit / delete fields? Yes No

Triggers; Field Sets; etc Options Available Options no Available

Page 37: Data Pipelines: Big Data Meets Salesforce

Big Data – Big Objects

Custom Object Big Object

How to Populate records All options Bulk API; SOAP API; Data Pipeline

Can I amen a record? Yes No Only clone is available

Can I see data creating a Tab Yes No Only via SOQL

For free? Yes No Talk with Salesfoce about it

Storage? It count against storage limitationIt DOES NOT count against the

storage limitation

Page 38: Data Pipelines: Big Data Meets Salesforce

Big Data – Big Objects & Pipeline

Page 39: Data Pipelines: Big Data Meets Salesforce

• Size complexity 20 operators, 20 loads and 10 stores / script

• Run up to 30 scripts a day

• Bulk API

• Store calls it and its limits are in place

• Does not support some operators like Count

• Can’t break the rules on Salesforce Platform triggers, validations, required fields, etc…

• Once you run the process there is no way back

Data Pipeline - Limitations

Page 40: Data Pipelines: Big Data Meets Salesforce

Data Pipeline – Take away

1. New Feature is in Pilot

2. Run Scripts via:

Developer Console

Deploy

Tooling API ( since API 33.0)

3. Run Scripts Asynchronously and in Parallel

4. Better performance Batch Apex ------ Pipeline

5. Easy to use!!

Page 41: Data Pipelines: Big Data Meets Salesforce

Q&A// add info for next session at 4.00 pm with the PMs

Page 42: Data Pipelines: Big Data Meets Salesforce

• https://pig.apache.org/

• http://goo.gl/h5N7Sa

• https://goo.gl/KXQSKC

Links and more

Carolina Ruíz Medina

[email protected]

@CarolEnLaNube

@CodeCoffeeCloud

www.codeandvoge.com

http://www.meetup.com/es/South-Spain-

Salesforce-Developer-Group/

Agustina García Peralta

[email protected]

@agarciaodeian

www.agarciaodeian.com

http://www.meetup.com/es/Spain-Salesforce-

Developer-User-Group/

Page 43: Data Pipelines: Big Data Meets Salesforce

Thank you