data pipelines - big data meets salesforce

Click here to load reader

Post on 18-Jan-2017

263 views

Category:

Engineering

0 download

Embed Size (px)

TRANSCRIPT

Salesforce

Data Pipeline:Big Data meets SalesforceCarolina Ruiz MedinaPrincipal Developer on Product [email protected]@carolenlanubeAgustina Garca PeraltaPrincipal Developer on Platform [email protected]@agarciaodeian

Carolina Ruiz MedinaPrincipal Developer on Product InnovationFinancialForce.com , [email protected] @CodeCoffeeCloud

Agustina Garca PeraltaPrincipal Developer, Platform [email protected]

About

GREAT ALONE. BETTER TOGETHER.Native to Salesforce App Cloud since 2009Investors include Salesforce VenturesCustomers in 27 countries650+ employees, San Francisco basedDreamforce.FinancialForce.com

First, a few quick words about FinancialForce.com.

FinancialForce.com builds ERP apps that are native to the Salesforce App cloud including Accounting, professional services automation, Human resources and Inventory applications. Our apps can be subscribed to separately or part of a whole ERP family.

Our company investors include Salesforce Ventures, which made their original investment in us in 2009.

We have customers all around the world in 27 countries and over 650 employees including those at our headquarters on 595 Market St. here in San Francisco.

We have quite few sessions and parties planned here this week, you can learn more about those at Dreamforce.Financialforce.com. Feel free to join us.

AgendaData Pipeline - OverviewPipeline Use CasesHow Pipeline works DemosBig DataTake awayQ&A

Asynchronous [email protected] ApexFlex Queue (since Summer 15)Common scenario Large amount of data

Any other option? Data Pipeline: New feature to integrate Apache Pig into Salesforce

Common scenario Large amount of data

What does it do? Process massive amounts of data in parallel.Key elementsMapReduce software to write programs to run amounts of data in parallelHadopp cluster cluster for storing and analyzing amounts of data

Apache Pig Background

Enables Developers to create executions for analyzing LARGE AMOUNT of data in PARALLEL

How does it work? It uses Pig Latin Data-flow languageBetween SQL and JavaWe can create our own UDF (user define functions)

Apache Pig Background

Why is it relevant? Technology associated with Hadoop but can be used by other frameworks Salesforce

Is there anything unique to Apache Pig running in Salesforce?Running in multitenant environmentApache Pig Background

Under Pilot program GA by Summer 16 (Safe Harbor)How does Data Pipeline work?Run Pig Scripts written in Pig Latin language

What is Data Pipeline?

Data PipelinePig ScriptApex?

Execution featureRun asynchronouslyIn ParallelFrom where?Developer ConsoleDuring deployTooling API 33.0 onwards

What is Data Pipeline?

Anything else?It is an ETL (Extract Transform Load)Pig Scripts can be included into a package

What is Data Pipeline?

What is Data Pipeline?

1 . PerformanceData Pipeline Advantages vs other processes

2 . Ability to Execute Scripts in Parallel

3 . No hitting governor Limits

4 . De-couple On-line Transaction Processing and On-line Analytical Processing

5 . Allows you to think in terms of data flow

How Pipeline can help us?

. and we need to process them Now! We have a large volume of Financial Transactions. for our Users to be able to use them: Report, print, or for another quick process to finish revaluatePrepare data for Currency Revaluation SObject SObject

to

How Pipeline can help us?

. and we need to process them Now! We have a large volume of Financial Transactions. for our manager to look the progress, to export data quickly... Extracting information from large amount of Data SObject File

to

To build the Solution lets See Pig Script firstWhat is Pig Script ?

OperatorsJOINGROUPDISTINCTORDER

SolutionSObject SObject

to

SolutionSObject File

to

File created

Demo

Use Case

LBX7/7/2015$150.00I-00000Other7/7/2015$250.00I-00001LBX7/7/2015$150.00I-00002LBX12/7/2015$350.00I-00003Other15/7/2015$550.00I-00004

LBX7/7/2015$150.00I-00000Other7/7/2015$250.00I-00001LBX7/7/2015$150.00I-00002LBX12/7/2015$350.00I-00003Other15/7/2015$550.00I-00004

LBX7/7/2015$150.00I-00000Other7/7/2015$250.00I-00001LBX7/7/2015$150.00I-00002LBX12/7/2015$350.00I-00003Other15/7/2015$550.00I-00004

LBX7/7/2015$150.00I-00000Other7/7/2015$250.00I-00001LBX7/7/2015$150.00I-00002LBX12/7/2015$350.00I-00003Other15/7/2015$550.00I-00004

7/7/2015LBX$300.007/7/2015Other$250.0012/7/2015Other$250.0015/7/2015Other$550.00

LBX7/7/2015$150.00I-00000Other7/7/2015$250.00I-00001LBX7/7/2015$150.00I-00002LBX12/7/2015$350.00I-00003Other15/7/2015$550.00I-00004

SObject

toFile

Use Case -

SObject

toFile

Use Case

No header!!SObject

toFile

Demo

Use Case

SObject

toFile

Use Case

SObject

toFile

Data Pipeline 2 more options

Join 2 objects

Data Pipeline 2 more optionsRead and Process a JSON file

Thousand of invoicesKeep them somewhere for audit processesNo need all information, just some field valuesBut that is not all!!

Big Data

#Big Data#Big Objects

Big Data Big ObjectsCustom ObjectBig ObjectCreationManual & MetadataMetadata

Under Pilot program GA by Summer 16 (Safe Harbor)

Big Data Big Objects

Big Data Big Objects

Big Data Big ObjectsCustom ObjectBig ObjectCreationManual & MetadataMetadataAPI namemyObject__cmyObject__bEnable Reports, Track Activities, Track Field History, etc.Options AvailableOptions No AvailableField TypesAllText ; Date/Time ; Lookup

Numbers!!!

Big Data Big ObjectsCustom ObjectBig ObjectAble to edit / delete fields?YesNoTriggers; Field Sets; etcOptions AvailableOptions no Available

//Run as presentation to see al information

Big Data Big ObjectsCustom ObjectBig ObjectHow to Populate recordsAll optionsBulk API; SOAP API; Data PipelineCan I amend a record?YesNo Only clone is availableCan I see data creating a TabYesNo Only via SOQLFor free?YesNo Talk with Salesfoce about itStorage?It count against storage limitationIt DOES NOT count against the storage limitation

Yes!!

//Run as presentation to see al information

Big Data Big Objects & Pipeline

Size complexity 20 operators, 20 loads and 10 stores / scriptRun up to 30 scripts a dayBulk APIStore calls it and its limits are in placeDoes not support some operators like CountCant break the rules on Salesforce Platform triggers, validations, required fields, etcOnce you run the process there is no way backData Pipeline - Limitations

Data Pipeline Take away1. New Feature is in Pilot

2. Run Scripts via: Developer Console Deploy Tooling API ( since API 33.0) 3. Run Scripts Asynchronously and in Parallel4. Better performance 5. Easy to use!!

Q&AISV Scale: Big Data for ISV 4pmPark Central Hotel, Franciscan Ballroom

https://pig.apache.org/http://goo.gl/h5N7Sahttps://goo.gl/KXQSKC

Links and moreCarolina Ruz [email protected]@[email protected]://www.meetup.com/es/South-Spain-Salesforce-Developer-Group/

Agustina Garca [email protected]@agarciaodeianwww.agarciaodeian.comhttp://www.meetup.com/es/Spain-Salesforce-Developer-User-Group/

Thank you

null3239.1877