data pipelines: big data meets salesforce

Post on 20-Feb-2017

336 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data Pipeline:

Big Data meets Salesforce

Carolina Ruiz Medina

Principal Developer on Product Innovation

cruiz@financialforce.com

@carolenlanube

Agustina García Peralta

Principal Developer on Platform Strategy

agarcia@financialforce.com

@agarciaodeian

Carolina Ruiz MedinaPrincipal Developer on Product Innovation

FinancialForce.com , MVP

@CarolEnLaNube

@CodeCoffeeCloud

Agustina García PeraltaPrincipal Developer, Platform Strategy

FinancialForce.com

@agarciaodeian

About

GREAT ALONE. BETTER TOGETHER.

Native to Salesforce1™ Platform

since 2009

Investors include Salesforce Ventures

650+ employees, San Francisco based

4

Agenda

• Data Pipeline - Overview

• Pipeline Use Cases

• How Pipeline works – Demos

• Big Data

• Take away

• Q&A

Asynchronous apex

• @future

• Queueable

• Batch Apex

• Flex Queue (since Summer ’15)

Common scenario – Large amount of data

• Any other option?

• Data Pipeline: New feature to integrate Apache Pig into Salesforce

Common scenario – Large amount of data

• What does it do?

• Process massive amounts of data in parallel.

• Key elements

• MapReduce software to write programs to run amounts of data in parallel

• Hadopp cluster cluster for storing and analyzing amounts of data

Apache Pig Background

Enables Developers to create executions for

analyzing LARGE AMOUNT of data

in PARALLEL

• How does it work?

• It uses Pig Latin

• Data-flow language

• Between SQL and Java

• We can create our own UDF (user – define functions)

Apache Pig Background

• Why is it relevant?

• Technology associated with Hadoop but can be used by other frameworks Salesforce

• Is there anything unique to Apache Pig running in Salesforce?

• Running in multitenant environment

Apache Pig Background

• Under Pilot program GA by Summer ‘16 (Safe Harbor)

• How does Data Pipeline work?

• Run Pig Scripts written in Pig Latin language

What is Data Pipeline?

Data Pipeline Pig Script

Apex?

• Execution feature

• Run asynchronously

• In Parallel

• From where?

• Developer Console

• During deploy

• Tooling API 33.0 onwards

What is Data Pipeline?

• Anything else?

• It is an ETL (Extract – Transform – Load)

• Pig Scripts can be included into a package

What is Data Pipeline?

What is Data Pipeline?

1 . Performance

Data Pipeline – Advantages vs other processes

2 . Ability to Execute Scripts in Parallel

3 . No hitting governor Limits

4 . De-couple On-line Transaction

Processing and On-line Analytical

Processing

5 . Allows you to think in terms of

data flow

How Pipeline can help us?

…. and we need to process

them Now!

We have a large volume of

Financial Transactions

…. for our Users to be able to

use them: Report, print, or for

another quick process to finish

revaluate

Prepare data

for Currency

Revaluation

SObject SObjectto

How Pipeline can help us?

…. and we need to process

them Now!

We have a large volume of

Financial Transactions

…. for our manager to look the

progress, to export data

quickly...

Extracting

information

from large

amount of Data

SObject File to

To build the Solution lets See Pig Script first

What is Pig Script ?

Operators

JOIN

GROUP

DISTINCT

ORDER

Solution SObject SObjectto

Solution

SObject File to

File created

Demo

Use Case –

LBX 7/7/2015 $150.00 I-00000

Other 7/7/2015 $250.00 I-00001

LBX 7/7/2015 $150.00 I-00002

LBX 12/7/2015 $350.00 I-00003

Other 15/7/2015 $550.00 I-00004

LBX 7/7/2015 $150.00 I-00000

Other 7/7/2015 $250.00 I-00001

LBX 7/7/2015 $150.00 I-00002

LBX 12/7/2015 $350.00 I-00003

Other 15/7/2015 $550.00 I-00004

LBX 7/7/2015 $150.00 I-00000

Other 7/7/2015 $250.00 I-00001

LBX 7/7/2015 $150.00 I-00002

LBX 12/7/2015 $350.00 I-00003

Other 15/7/2015 $550.00 I-00004

LBX 7/7/2015 $150.00 I-00000

Other 7/7/2015 $250.00 I-00001

LBX 7/7/2015 $150.00 I-00002

LBX 12/7/2015 $350.00 I-00003

Other 15/7/2015 $550.00 I-00004

7/7/2015 LBX $300.00

7/7/2015 Other $250.00

12/7/2015 Other $250.00

15/7/2015 Other $550.00

LBX 7/7/2015 $150.00 I-00000

Other 7/7/2015 $250.00 I-00001

LBX 7/7/2015 $150.00 I-00002

LBX 12/7/2015 $350.00 I-00003

Other 15/7/2015 $550.00 I-00004

SObjecttoFile

Use Case - SObjecttoFile

Use Case –

No header!!

SObjecttoFile

Demo

Use Case – SObjecttoFile

Use Case – SObjecttoFile

Data Pipeline – 2 more options

Join 2 objects

Data Pipeline – 2 more options

Read and Process a JSON file

• Thousand of invoices

• Keep them somewhere for audit processes

• No need all information, just some field values

But that is not all!!

Big Data

#Big Data#Big Objects

Big Data – Big Objects

Custom Object Big Object

Creation Manual & Metadata Metadata

• Under Pilot program GA by Summer ‘16 (Safe Harbor)

Big Data – Big Objects

Big Data – Big Objects

Big Data – Big Objects

Custom Object Big Object

Creation Manual & Metadata Metadata

API name myObject__c myObject__b

Enable Reports, Track Activities,

Track Field History, etc. Options Available Options No Available

Field Types All Text ; Date/Time ; Lookup

Big Data – Big Objects

Custom Object Big Object

Able to edit / delete fields? Yes No

Triggers; Field Sets; etc Options Available Options no Available

Big Data – Big Objects

Custom Object Big Object

How to Populate records All options Bulk API; SOAP API; Data Pipeline

Can I amen a record? Yes No Only clone is available

Can I see data creating a Tab Yes No Only via SOQL

For free? Yes No Talk with Salesfoce about it

Storage? It count against storage limitationIt DOES NOT count against the

storage limitation

Big Data – Big Objects & Pipeline

• Size complexity 20 operators, 20 loads and 10 stores / script

• Run up to 30 scripts a day

• Bulk API

• Store calls it and its limits are in place

• Does not support some operators like Count

• Can’t break the rules on Salesforce Platform triggers, validations, required fields, etc…

• Once you run the process there is no way back

Data Pipeline - Limitations

Data Pipeline – Take away

1. New Feature is in Pilot

2. Run Scripts via:

Developer Console

Deploy

Tooling API ( since API 33.0)

3. Run Scripts Asynchronously and in Parallel

4. Better performance Batch Apex ------ Pipeline

5. Easy to use!!

Q&A// add info for next session at 4.00 pm with the PMs

• https://pig.apache.org/

• http://goo.gl/h5N7Sa

• https://goo.gl/KXQSKC

Links and more

Carolina Ruíz Medina

cruiz@financialforce.com

@CarolEnLaNube

@CodeCoffeeCloud

www.codeandvoge.com

http://www.meetup.com/es/South-Spain-

Salesforce-Developer-Group/

Agustina García Peralta

agarcia@financialforce.com

@agarciaodeian

www.agarciaodeian.com

http://www.meetup.com/es/Spain-Salesforce-

Developer-User-Group/

Thank you

top related