streaming with oracle data integration

Post on 19-Mar-2017

153 Views

Category:

Software

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

StreamingTransformationsUsingOracleDataIntegration

MichaelRainey|BIWASummit2017

• MichaelRainey-TechnicalAdvisor• SpreadingthegoodwordaboutGluentproductswiththeworld

• OracleDataIntegrationexpertise• OracleACEDirector• mRainey.co

2

Introduction

we liberate enterprise data

Whatis“Streaming”

• Theprocessingandanalysisofstructuredor“unstructured”datainreal-time

• WhyStreaming?• Whenspeed(velocity)ofdataiskey• Streamingdataisprocessedin“timewindows”,inmemory,acrossaclusterofservers

• Examples:• Calculatingaretailbuyingopportunity• Real-timecostcalculations• IoTdataanalysis

4

Whatis“Streaming”

“Publish-subscribemessagingrethoughtasadistributedcommitlog”

5

Streamingdata-ApacheKafka

Image source: kafka.apache.org/

EnterpriseDataBus

6

EnterpriseDataBus

6

• Scalable,fault-tolerant,high-throughputstreamprocessing• SparkStreamingreceivesliveinputdatastreamsfromvarioussources• ContinuousstreamofdataisknownasadiscretizedstreamorDStream

• Dataisdividedintomini-batchesandprocessedbytheSparkengine• Operationssuchasjoin,filter,map,count,windowedcomputations,etcareusedtotransformdatain-flight

7

Streamprocessing-ApacheSpark

WhyOracleDataIntegration?

• EnterprisehasinvestedheavilyinODIand/orGoldenGate

• Gettingstartedwithdevelopmentlanguages(Python/pySpark,Java,etc)

• Centralizedmetadatamanagement• Integratewithotherdatasourcesusingasingleinterface

• Realizedcostsavings• AccordingtoGartner,200%increaseinmaintenancecostswhencustomcoding(https://www.gartner.com/doc/3432617/does-customcoded-data-integration-stack)

9

WhyOracleDataIntegration?

10

StreamingwithOracleDataIntegration

10

StreamingwithOracleDataIntegration

Real-timedatareplication

Streamingintegration:OGG->Kafka

Streamingintegration:Kafka->SparkStreaming

11

RelationaldatabasetransactionstoKafka

• GoldenGate• …isnon-invasive• …hascheckpointsforrecovery• …movesdataquickly• …iseasytosetup

12

WhyGoldenGatewithKafka?

• Heterogeneoussourcesandtargets• Builttointegratealldata

• Flexibility• Reusablecodetemplates(KnowledgeModules)

• ReusableMappings• ODIcanadapttoyourdatawarehouse-andnottheotherwayaround

• Flowbasedmappings

13

WhyOracleDataIntegratorwithSparkStreaming?

GettingstartedwithstreamingusingOracleDataIntegration

• StandardGoldenGateExtract/PumpprocessestocaptureRDBMSdata• ReplicatforJavaparameterfile&processgroupcreatedandsetup• KakfaProducerpropertiesandKafkaHandlerconfigurationsetup

15

OracleGoldenGateforBigData-KafkaHandlerSetup

• Kafkahandlerproperties• SetpropertiesforhowGoldenGateinteractswithKafka• Format,transactionvsoperationmode,etc

• Kafkaproducerconfiguration

16

GoldenGateforKafkasetup

http://mrainey.co/ogg-kafka-oow

17

KafkaandOracleDataIntegratorsetup

17

KafkaandOracleDataIntegratorsetup

• CreateModelusingKafkaLogicalSchema

• CreateDatastore• Similartostandard“File”datastore,definefileformatandsetupcolumns

• OnlysupportforCSV• FutureformatsmayincludeJSON,Avro,etc

• AddDatastoretomapping

18

KafkaandOracleDataIntegrator

• CreateSparkDataServer,Physical/LogicalSchema• SetHadoopDataServer• Addproperties,suchascheckpointing,asynchronousexecutionmode,etc• Additionalpropertiescanbeadded:http://spark.apache.org/docs/latest/configuration.html

• SparkServerissetupasStaginglocation• SourceDatastorefromKafka,OracleDB,etc• TargetDatastoreisCassandra,OracleDB,etc

• CodegeneratedbyKMispySpark• pySparkcodecanbeaddedtofilters,joins,othercomponentsfortransformations• Additionallanguages(Scala,Java)maybecomingsoon

19

SparkStreamingandOracleDataIntegrator

20

SparkStreamingandOracleDataIntegrator

EnabletheStreamingflaginthePhysicaldesignofamapping.

TogenerateSparkcode,settheExecuteOnHintoptiontousetheSparkdataserverasthestaginglocationforyourmapping

TargetIKMshouldnotbeset.Sparkgeneratedcodewillhandleintegrationandloadintotarget.

21

Trackingtheprocess

Whenexecuting,theprocesswillruncontinuouslyintheODIOperator.

IftheconnectionbetweentheODIAgentandSparkAgentislost,itwillreestablishitselfafterrecovery.

• Streamingisthe“velocity”indata.AKA“FastData”

• OracleDataIntegratorandOracleGoldenGateprovideaframeworkfordevelopmentandmanagementofdatastreamingprocesses• BigDataadd-onscontinuetosupportnewtechnologies

• BuildastreamingarchitectureusingGoldenGateandODI:• Metadatamanagement• IntegrationofRDBMSdatawith“schemaonread”data• Buildupontheskillsin-house

22

Recap

23

we liberate enterprise data

thank you!

top related