data pipelines -big data meets salesforce
TRANSCRIPT
Salesforce
Data Pipeline:Big Data meets SalesforceCarolina Ruiz MedinaPrincipal Developer on Product [email protected]@carolenlanubeAgustina Garca PeraltaPrincipal Developer on Platform [email protected]@agarciaodeian
Carolina Ruiz MedinaPrincipal Developer on Product InnovationFinancialForce.com , MVP@CarolEnLaNube @CodeCoffeeCloud
Agustina Garca PeraltaPrincipal Developer, Platform StrategyFinancialForce.com@agarciaodeian
About GREAT ALONE. BETTER TOGETHER.Native to Salesforce1 Platform since 2009Investors include Salesforce Ventures650+ employees, San Francisco based
4
AgendaData Pipeline - OverviewPipeline Use CasesHow Pipeline works DemosBig DataTake awayQ&A
Asynchronous apex@futureQueueableBatch ApexFlex Queue (since Summer 15)Common scenario Large amount of data
Any other option? Data Pipeline: New feature to integrate Apache Pig into Salesforce
Common scenario Large amount of data
DONE slide 7 & 8 New feature is integrating apqache pig into SFReduce the text ! == make it more visual
What does it do? Process massive amounts of data in parallel.Key elementsMapReduce software to write programs to run amounts of data in parallelHadopp cluster cluster for storing and analyzing amounts of data
Apache Pig Background
Enables Developers to create executions for analyzing LARGE AMOUNT of data in PARALLEL
How does it work? It uses Pig Latin Data-flow languageBetween SQL and JavaWe can create our own UDF (user define functions)
Apache Pig Background
Why is it relevant? Technology associated with Hadoop but can be used by other frameworks Salesforce
Is there anything unique to Apache Pig running in Salesforce?Running in multitenant environmentApache Pig Background
Under Pilot program GA by Summer 16 (Safe Harbor)How does Data Pipeline work?Run Pig Scripts written in Pig Latin language
What is Data Pipeline?
Data PipelinePig ScriptApex?
Execution featureRun asynchronouslyIn ParallelFrom where?Developer ConsoleDuring deployTooling API 33.0 onwards
What is Data Pipeline?
Anything else?It is an ETL (Extract Transform Load)Pig Scripts can be included into a package
What is Data Pipeline?
What is Data Pipeline?
1 . PerformanceData Pipeline Advantages vs other processes
2 . Ability to Execute Scripts in Parallel
3 . No hitting governor Limits
4 . De-couple On-line Transaction Processing and On-line Analytical Processing
5 . Allows you to think in terms of data flow
How Pipeline can help us?
. and we need to process them Now! We have a large volume of Financial Transactions. for our Users to be able to use them: Report, print, or for another quick process to finish revaluatePrepare data for Currency Revaluation SObject SObject
to
Complex process to run at the end of the month that consume lots of resources
In general terms,revaluationof acurrencyis a calculated adjustment to a country's official exchange rate relative to a chosen baseline. The baseline can be anything from wage rates to the price of gold to a foreigncurrency.
There are two situations in which you might want to perform a currency revaluation. At period end. You might want to revalue your income statement to eliminate the effect of exchange rate fluctuations. At year end. You might want to revalue the companys balance sheet so that it values the assets and liabilities of the company at the exchange rate applicable on the balance sheet date.
How Pipeline can help us?
. and we need to process them Now! We have a large volume of Financial Transactions. for our manager to look the progress, to export data quickly... Extracting information from large amount of Data SObject File
to
Get all the info from our ** weekly** extract large volumes transactionsThere are two situations in which you might want to perform a currency revaluation. At period end. You might want to revalue your income statement to eliminate the effect of exchange rate fluctuations. At year end. You might want to revalue the companys balance sheet so that it values the assets and liabilities of the company at the exchange rate applicable on the balance sheet date.
In general terms,revaluationof acurrencyis a calculated adjustment to a country's official exchange rate relative to a chosen baseline. The baseline can be anything from wage rates to the price of gold to a foreigncurrency.
To build the Solution lets See Pig Script firstWhat is Pig Script ?
OperatorsJOINGROUPDISTINCTORDER
Pigis a high levelscriptinglanguage that is used with Apache Hadoop.Pigenables data workers to write complex data transformations without knowing Java.Pig'ssimple SQL-like scriptinglanguage is calledPigLatin, and appeals to developers already familiar with scriptinglanguages and SQL.
Break it down to level that is even more basic . Before it gets to the slide leading to --- tunning slide
SolutionSObject SObject
to
SolutionSObject File
to
File created
File size
Demo
Use Case
LBX7/7/2015$150.00I-00000Other7/7/2015$250.00I-00001LBX7/7/2015$150.00I-00002LBX12/7/2015$350.00I-00003Other15/7/2015$550.00I-00004
LBX7/7/2015$150.00I-00000Other7/7/2015$250.00I-00001LBX7/7/2015$150.00I-00002LBX12/7/2015$350.00I-00003Other15/7/2015$550.00I-00004
LBX7/7/2015$150.00I-00000Other7/7/2015$250.00I-00001LBX7/7/2015$150.00I-00002LBX12/7/2015$350.00I-00003Other15/7/2015$550.00I-00004
LBX7/7/2015$150.00I-00000Other7/7/2015$250.00I-00001LBX7/7/2015$150.00I-00002LBX12/7/2015$350.00I-00003Other15/7/2015$550.00I-00004
7/7/2015LBX$300.007/7/2015Other$250.0012/7/2015Other$250.0015/7/2015Other$550.00
LBX7/7/2015$150.00I-00000Other7/7/2015$250.00I-00001LBX7/7/2015$150.00I-00002LBX12/7/2015$350.00I-00003Other15/7/2015$550.00I-00004
SObject
toFile
Use Case -
SObject
toFile
Use Case
No header!!SObject
toFile
Demo
Use Case
SObject
toFile
Use Case
SObject
toFile
Data Pipeline 2 more options
Join 2 objects
Data Pipeline 2 more optionsRead and Process a JSON file
Thousand of invoicesKeep them somewhere for audit processesNo need all information, just some field valuesBut that is not all!!
Big Data
#Big Data#Big Objects
Big Data Big ObjectsCustom ObjectBig ObjectCreationManual & MetadataMetadata
Under Pilot program GA by Summer 16 (Safe Harbor)
Big Data Big Objects
Big Data Big Objects
Big Data Big ObjectsCustom ObjectBig ObjectCreationManual & MetadataMetadataAPI namemyObject__cmyObject__bEnable Reports, Track Activities, Track Field History, etc.Options AvailableOptions No AvailableField TypesAllText ; Date/Time ; Lookup
Big Data Big ObjectsCustom ObjectBig ObjectAble to edit / delete fields?YesNoTriggers; Field Sets; etcOptions AvailableOptions no Available
Big Data Big ObjectsCustom ObjectBig ObjectHow to Populate recordsAll optionsBulk API; SOAP API; Data PipelineCan I amen a record?YesNo Only clone is availableCan I see data creating a TabYesNo Only via SOQLFor free?YesNo Talk with Salesfoce about itStorage?It count against storage limitationIt DOES NOT count against the storage limitation
Big Data Big Objects & Pipeline
Size complexity 20 operators, 20 loads and 10 stores / scriptRun up to 30 scripts a dayBulk APIStore calls it and its limits are in placeDoes not support some operators like CountCant break the rules on Salesforce Platform triggers, validations, required fields, etcOnce you run the process there is no way backData Pipeline - Limitations
Data Pipeline Take away1. New Feature is in Pilot
2. Run Scripts via: Developer Console Deploy Tooling API ( since API 33.0) 3. Run Scripts Asynchronously and in Parallel4. Better performance 5. Easy to use!!
Q&AISV Scale: Big Data for ISVsSession Date: 9/17/2015Session Time: 4:00 p.m. - 4:40 p.m.PSTLocation: Franciscan Ballroom, Park Central Hotel
https://pig.apache.org/http://goo.gl/h5N7Sahttps://goo.gl/KXQSKC
Links and moreCarolina Ruz [email protected]@[email protected]://www.meetup.com/es/South-Spain-Salesforce-Developer-Group/
Agustina Garca [email protected]@agarciaodeianwww.agarciaodeian.comhttp://www.meetup.com/es/Spain-Salesforce-Developer-User-Group/
Thank you
null3239.1877