best practices for stream processing with gridgain and ... · best practices for stream processing...

59
Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services Rob Meyer Outbound Product Management

Upload: others

Post on 26-Apr-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

BestPracticesforStreamProcessingwithGridGainand

ApacheIgniteandKafka

AlexeyKukushkinProfessionalServices

RobMeyerOutboundProductManagement

Page 2: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

Agenda

• WhyweneedKafka/Confluent-Ignite/GridGainintegration• Ignite/GridGainKafka/ConfluentConnectors• Deployment,monitoringandmanagement• IntegrationExamples• Performanceandscalabilitytuning• Q&A

Page 3: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

WhyweneedKafka/Confluent-Ignite/GridGainintegration

Page 4: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

ApacheKafkaandConfluent

Adistributedstreamingplatform:• Publish/subscribe• Scalable• Fault-tolerant• Real-time• Persistent• WrittenmostlyinScala

Page 5: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

GridGainCompanyConfidential

GridGain In-MemoryComputingPlatform

GridGain In-Memory Computing Platform

New Applications, AnalyticsExisting Applications Streaming, Machine Learning

RDBMS NoSQL Hadoop

In-MemoryData Grid

In-Memory Database

Streaming Analytics

• BuiltonApacheIgnite• Comprehensiveplatformthatsupportsallprojects

• Noripandreplace• In-memoryspeed,petabytescale• EnablesHTAP,streaminganalyticsandcontinuouslearning

• WhatGridGain adds• Production-readyreleases• Enterprise-gradesecurity,deploymentandmanagement

• Globalsupportandservices• Provenformissioncriticalapps

ContinuousLearning

Page 6: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

ComputeTransactions Continuous Learning(Machine, Deep Learning)SQLStreaming Services

Spark (DataFrame, RDD, HDFS)ODBC/JDBC

Memory-Centric Storage

StreamingAnalytics,MachineandDeepLearning

RDBMS NoSQL Hadoop

Native Persistence 3rd party Persistence

NoSQLSQL

Messaging Java, .NET, … R1, Python1

1.RandPythondeveloperscurrentlyinvokeJavaclasses.DirectRandPythonsupportplanned.

KafkaCamelSparkStormJMSMQTT…

KafkaCamelSparkStormJMSMQTT…

Decision AutomationStream Ingestion Machine, Deep Learning AnalyticsStream Processing

Page 7: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

Ignite/GridGain&KafkaIntegration

• Kafkaiscommonlyusedasamessagingbackboneinaheterogeneoussystem• AddIgnite/GridGaintoaKafka-basedsystem

Page 8: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

https://www.imcsummit.org/2018/eu/session/embracing-service-consumption-shift-banking

Page 9: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

IgniteandGridGainKafkaConnectors

Page 10: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

DevelopingKafkaConsumers&Producers

YoucandevelopKafkaintegrationforanysystemusingKafkaProducerandConsumerAPIsbutyouneedtosolveproblemslike:• HowtouseeachAPIofeveryproducerandconsumer• HowKafkawillunderstandyourdata• Howdatawillbeconvertedbetweenproducersandconsumers• Howtoscaletheproducer-to-consumerflow• Howtorecoverfromafailure• …andmanymore

Page 11: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

GridGain-KafkaConnector:Out-of-the-boxIntegration

• Addressesalltheintegrationchallengesusingbestpractices• Doesnotneedanycodingeveninthemostcomplexintegrations• DevelopedbyGridGain/IgniteCommunitywithhelpfromConfluenttoensurebothIgniteandKafkabestpractices• BasedonKafkaConnectandIgniteAPIs• KafkaConnectAPIencouragesdesignforscalability,failoveranddataschema• GridGainSourceConnectorusesIgniteContinuousQueries• GridGainSinkConnectorusesIgniteDataStreamer

Page 12: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

KafkaSourceandSinkConnectors

Page 13: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

KafkaConnectServerTypes

Ingeneral,thereare4separateclustersinKafkaConnectinfrastructure:• Kafkacluster• clusternodescalledBrokers

• KafkaConnectcluster• clusternodescalledWorkers

• SourceandSinkGridGain/Igniteclusters• ServerNodes

Page 14: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

GridGainConnectorFeatures

Twoconnectorsindependentfromeachother:• GridGainSourceConnector• streamsdatafromGridGainintoKafka• usesIgnitecontinuousqueries

• GridGainSinkConnector• streamsdatafromKafkaintoGridGain• usesIgnitedatastreamers

Page 15: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

GridGainSourceConnector:Scalability

Scalesbyassigningmultiplesourcepartitions toKafkaConnecttasks.ForGridGainSourceConnector:• Partition=Cache• Record=CacheEntry

KafkaSourceConnectorModel:

Page 16: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

GridGainSourceConnector:RebalancingandFailoverRebalancing:re-assignmentofKafkaConnectorsandTasksto Workerswhen

• AWorkerjoinsorleavesthecluster• Acacheisaddedorremoved

Failover:resumingoperationafterafailure• howtoresumeafterfailureorrebalancingwithoutlosingcacheupdatesoccurredwhenKafkaWorkernodewasdown?

SourceOffset:positioninthesourcestream.KafkaConnect:• providespersistentanddistributedsourceoffsetstorage• automaticallysaveslastcommittedoffset• allowsresumingfromthelastoffsetwithoutlosingdata.

Problem:cacheshavenooffsets!

Page 17: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

GridGainSourceConnector:FailoverPolicies

• None:nosourceoffsetsaved,startlisteningtocurrentdataafterrestart• Cons:updatesoccurredduringdowntimearelost(“atleastonce”datadeliveryguaranteeviolated)• Pros:fastest

• FullSnapshot:nosourceoffsetsaved,alwayspullalldatafromthecacheuponstartup• Cons:

• Slow,notapplicableforbigcaches• Duplicatedata(”exactlyonce”datadeliveryguaranteeisviolated)

• Pros:nodataislost

Page 18: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

GridGainSourceConnector:FailoverPolicies• Backlog:resumefromthelastcommittedsourceoffset• KafkaBacklogcacheinIgnite

• Key:incrementaloffset• Value:cachenameandserializedcacheentries

• KafkaBacklogserviceinIgnite• RunscontinuousqueriespullingdatafromsourcecachesintoBacklog

• SourceConnectorgetsdatafromBacklogfrombacklogstartingfromthelastcommittedoffset

• Cons• Intrusive:GridGainclusterimpact• Complexconfiguration:needtoestimateamountofmemoryforBacklog

Page 19: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

GridGainSourceConnector:DynamicReconfigurationConnectormonitorslistofavailablecachesandre-configuresitselfifacacheisaddedorremoved.UsecacheWhitelist and cacheBlacklist propertiestodefinefromwhichcachestopulldata.

Page 20: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

GridGainSourceConnector:InitialDataLoad

UseshallLoadInitialData configurationpropertytospecifyifyouwanttheConnectortoloadthedatathatisalreadyinthecachebythetimetheConnectorstarts.

Page 21: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

GridGainSinkConnector

• SinkConnectorsareinherentlyscalablesinceconsumingdatafromaKafkatopicisscalable• SinkConnectorsinherentlysupportfailoverthankstotheKafkaConnectorframeworkauto-committingoffsetsofthepusheddata.

Page 22: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

GridGainConnectorDataSchema

BothSourceandSinkGridGainConnectorssupportdataschema.• AllowsGridGainConnectorsunderstanddatawithattachedschemafromotherKafkaproducersandconsumers• SourceConnectorattachesKafkaschemabuiltfromIgniteBinaryobjects• SinkConnectorconvertsKafkarecordstoIgniteBinaryobjectsusingKafkaschema

Limitations:• IgniteAnnotationsarenotsupported• IgniteCHARconvertedtoKafkaSHORT(sameforarrays)• IgniteUUIDandCLASSconvertedtoKafkaSTRING(sameforarrays)

Page 23: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

IgniteConnectorFeatures

• IgniteSourceConnector• pushesdatafromIgniteintoKafka• usesIgniteEvents• mustenableEVT_CACHE_OBJECT_PUT,whichnegativelyimpactsclusterperformance

• IgniteSinkConnector• pullsdatafromKafkaintoIgnite• useIgnitedatastreamer

Page 24: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

ApacheIgnitevs.GridGainConnectors

Feature ApacheIgniteConnector GridGainConnector

Scalability LimitedSourceconnectorisnotparallelSinkconnectorisparallel

SourceconnectorcreatesataskpercacheSinkconnectorisparallel

Failover NOSourcedataislostduringconnectorrestartorrebalancing

YESSourceconnectorcanbeconfiguredtoresumefromthelastcommittedoffset

Preservingsourcedataschema

NO YES

Handlingmultiplecaches

NO YESConnectorcanbeconfiguredtohandleanynumberofcaches

DynamicReconfiguration

NO YESSourceconnectordetectsaddedorremovedcachesandre-configuresitself

Page 25: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

ApacheIgnitevs.GridGainConnectors

Feature ApacheIgniteConnector GridGainConnector

InitialDataLoad NO YES

Handlingdataremovals

YES YES

SerializationandDeserializationofdata

YES YES

Filtering LimitedOnlysourceconnectorsupportsafilter

YESBothsourceandsinkconnectorssupportfilters

Transformations KafkaSMTs KafkaSMTs

Page 26: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

ApacheIgnitevs.GridGainConnectors

Feature ApacheIgniteConnector GridGainConnector

DevOps Somefree-texterrorlogging HealthModeldefined

Support ApacheIgniteCommunity SupportedbyGridGain,certifiedbyConfluent

Packaging UberJAR ConnectorPackage

Deployment PluginPATHonallKafkaConnectworkers

PluginPATHonallKafkaConnectworkers.CLASSPATHonallGridGainnodes.

KafkaAPIVersion 0.10 2.0

SourceAPI Igniteevents Ignitecontinuousqueries

SinkAPI Ignitedatastreamer Ignitedatastreamer

Page 27: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

Deployment,monitoringandmanagement

Page 28: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

GridGainConnectorDeployment

1. PrepareConnectorPackage2. RegisterConnectorwithKafka3. RegisterConnectorwithGridGain

Page 29: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

PrepareGridGainConnectorPackage

1. GridGain-KafkaConnectorispartofGridGainEnterpriseandUltimate8.5.3(tobereleasedintheendofOctober2018)

2. Theconnectorisin$GRIDGAIN_HOME/integration/gridgain-kafka-connect

• (GRIDGAIN_HOMEenvironmentvariablepointstotherootGridGaininstallationdirectory)

3. Pullmissingconnectordependenciesintothepackage:cd $GRIDGAIN_HOME/integration/gridgain-kafka-connect./copy-dependencies.sh

Page 30: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

RegisterGridGainConnectorwithKafka

ForeveryKafkaConnectWorker:1. CopyGridGainConnectorpackagedirectorytowhereyouwant

KafkaConnectorstobelocatedforexample,into/opt/kafka/connect directory

2. EditKafkaConnectWorkerconfiguration(kafka-connect-standalone.properties orkafka-connect-distributed.properties)toregistertheconnectoronthepluginpath:plugin.path=/opt/kafka/connect/gridgain-kafka-connect

Page 31: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

RegisterGridGainConnectorwithGridGain

ThisassumesGridGainversionis8.5.3OneveryGridGainservernodecopythebelowJARsinto$GRIDGAIN_HOME/libs/user directory.GettheKafkaJARsfromtheKafkaConnectworkers:• gridgain-kafka-connect-8.5.3.jar• connect-api-2.0.0.jar• kafka-clients-2.0.0.jar

Page 32: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

IgniteConnectorDeployment

1. PrepareConnectorPackage2. RegisterConnectorwithKafka

Page 33: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

PrepareIgniteConnectorPackage

ThisassumesIgniteversionis2.6.Createadirecotory containingthebelowJARs(findJARsinthe$IGNITE_HOME/libs sub-directories):

• ignite-kafka-connect-0.10.0.1.jar• ignite-core-2.6.0.jar• ignite-spring-2.6.0.jar• cache-api-1.0.0.jar• spring-aop-4.3.16.RELEASE.jar• spring-beans-4.3.16.RELEASE.jar• spring-context-4.3.16.RELEASE.jar• spring-core-4.3.16.RELEASE.jar• spring-expression-4.3.16.RELEASE.jar• commons-logging-1.1.1.jar

Page 34: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

RegisterGridGainConnectorwithKafka

ForeveryKafkaConnectWorker:1. CopyIgniteConnectorpackagedirectorytowhereyouwantKafka

Connectorstobelocatedforexample,into/opt/kafka/connect directory

2. EditKafkaConnectWorkerconfiguration(kafka-connect-standalone.properties orkafka-connect-distributed.properties)toregistertheconnectoronthepluginpath:plugin.path=/opt/kafka/connect/ignite-kafka-connect

Page 35: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

Monitoring:GridGainConnector

WelldefinedHealthModel:• NumericEventIDuniquelyidentifiesspecificproblem• Eventseverity• Problemdescriptionandrecoveryactionisavailableathttps://docs.gridgain.com/docs/certified-kafka-connector-monitoring

ConfigureyourmonitoringsystemtodetecteventIDinthelogsandmayberunautomatedrecoveryasdefinedintheHealthModel

• Samplestructuredlogentry(#usedasadelimiter):09-10-2018 19:57:35 # ERROR # 15000 # Spring XML configuration path is invalid: /invalid/path/ignite.xml

Page 36: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

Monitoring:IgniteConnector

NoHealthModelisdefined.1. Runnegativetests2. CheckKafkaandIgnitelogsoutput3. Configureyourmonitoringsystemtodetectcorrespondingtext

patternsinthelogs

Page 37: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

IntegrationExamplesPropagatingRDBMSupdatesintoGridGain

Page 38: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

PropagatingRDBMSupdatesintoGridGain

Ignite/GridGainhasa3rd PartyPersistencefeature(CacheStore)thatallows:• PropagatingcachechangestoexternalstoragelikeRDBMS• AutomaticallycopyingdatafromexternalstoragetoIgniteuponaccessingdatamissedinIgnite

WhatifyouwanttopropagateexternalstoragechangetoIgniteatthemomentofthechange?- 3rd PartyPersistencecannotdothat!

Page 39: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

PropagatingRDBMSupdatesintoGridGain

UseKafkatoachievethatwithoutwritingsinglelineofcode!

Page 40: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

Assumptions

• Forsimplicitywewillruneverythingonthesamehost• IndistributedmodeGridGainnodes,KafkaConnectworkersandKafkabrokersarerunningondifferenthosts

• GridGain8.5.3clusterwithGRIDGAIN_HOMEvariablesetonthenodes• Kafka2.0clusterwithKAFKA_HOMEvariablesetonallbrokers

Page 41: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

1.RunDBServer

WewilluseH2Databaseinthisdemo.Wewilluse/tmp/gridgain-h2-connect asaworkdirectory.• DownloadH2andsetH2_HOMEenvironmentvariable.• RunH2Server:java -cp $H2_HOME/bin/h2*.jar org.h2.tools.Server -webPort 18082 -tcpPort 19092

TCP server running at tcp://172.25.4.74:19092 (only local connections)

PG server running at pg://172.25.4.74:5435 (only local connections)

Web Console server running at http://172.25.4.74:18082 (only local connections)

• IntheopenedH2WebConsolespecifyJDBCURL:jdbc:h2:/tmp/gridgain-h2-connect/marketdata

• PressConnect

Page 42: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

2.CreateDBTablesandAddSomeData

InH2WebConsoleExecute:• CREATE TABLE IF NOT EXISTS QUOTES (id int, date_time timestamp, price double, PRIMARY KEY (id));

• CREATE TABLE IF NOT EXISTS TRADES (id int, symbol varchar, PRIMARY KEY (id));

• INSERT INTO TRADES (id, symbol) VALUES (1, 'IBM');

• INSERT INTO QUOTES (id, date_time, price) VALUES (1, CURRENT_TIMESTAMP(), 1.0);

Page 43: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

3.StartGridGainCluster(Single-node)

$GRIDGAIN_HOME/bin/ignite.sh /tmp/gridgain-h2-connect/ignite-server.xml[15:41:15] Ignite node started OK (id=b9963f9a)

[15:41:15] Topology snapshot [ver=1, servers=1, clients=0, CPUs=8, offheap=3.2GB, heap=1.0GB]

[15:41:15] ^-- Node [id=B9963F9A-8F1E-4177-9743-F129414EB133, clusterState=ACTIVE]

<beanid="ignite.cfg"class="org.apache.ignite.configuration.IgniteConfiguration"><propertyname="discoverySpi"><beanclass="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi"><propertyname="ipFinder"><beanclass="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder"><propertyname="addresses"><list><value>127.0.0.1:47500</value>

</list></property>

</bean></property>

</bean></property>

</bean>

Page 44: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

4.DeploySourceandSinkConnectors

• DownloadConfluentJDBCConnectorpackagefromhttps://www.confluent.io/connector/kafka-connect-jdbc/• UnzipConfluentJDBCConnectorpackageinto/tmp/gridgain-h2-connect/confluentinc-kafka-connect-jdbc

• CopyGridGainConnectorpackagefrom$GRIDGAIN_HOME/integration/gridgain-kafka-connect into/tmp/gridgain-h2-connect/gridgain-kafka-connect• Copykafka-connect-standalone.properties Kafkaworkerconfigurationfilefrom$KAFKA_HOME/config into/tmp/gridgain-h2-connect andsetthepluginpathproperty:

plugin.path=/tmp/gridgain-h2-connect/confluentinc-kafka-connect-jdbc-5.1.0-SNAPSHOT,/tmp/gridgain-h2-connect/gridgain-gridgain-kafka-connect-8.7.0-SNAPSHOT

Page 45: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

5.StartKafkaCluster(Single-broker)• ConfigureZookeeperwith/tmp/gridgain-h2-connect/zookeeper.properties:dataDir=/tmp/gridgain-h2-connect/zookeeperclientPort=2181

• StartZookeeper:$KAFKA_HOME/bin/zookeeper-server-start.sh /tmp/gridgain-h2-connect/zookeeper.properties

• ConfigureKafkabroker:copydefault$KAFKA_HOME/config/server.properties to/tmp/gridgain-h2-connect/kafka-server.properties customizeit:

broker.id=0listeners=PLAINTEXT://:9092log.dirs=/tmp/gridgain-h2-connect/kafka-logszookeeper.connect=localhost:2181

• StartKafkabroker:$KAFKA_HOME/bin/kafka-server-start.sh /tmp/gridgain-h2-connect/kafka-server.properties[2018-10-10 16:11:21,573] INFO Kafka version : 2.0.0 (org.apache.kafka.common.utils.AppInfoParser)

[2018-10-10 16:11:21,573] INFO Kafka commitId : 3402a8361b734732 (org.apache.kafka.common.utils.AppInfoParser)

[2018-10-10 16:11:21,574] INFO [KafkaServer id=0] started (kafka.server.KafkaServer)

Page 46: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

6.ConfigureSourceJDBCConnector

/tmp/gridgain-h2-connect/kafka-connect-h2-source.properties:

name=h2-marketdata-sourceconnector.class=io.confluent.connect.jdbc.JdbcSourceConnectortasks.max=10

connection.url=jdbc:h2:tcp://localhost:19092//tmp/gridgain-h2-connect/marketdatatable.whitelist=quotes,trades

mode=timestamp+incrementingtimestamp.column.name=date_timeincrementing.column.name=id

topic.prefix=h2-

Page 47: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

7.ConfigureSinkGridGainConnector

/tmp/gridgain-h2-connect/kafka-connect-gridgain-sink.properties:

name=gridgain-marketdata-sinktopics=h2-QUOTES,h2-TRADEStasks.max=10connector.class=org.gridgain.kafka.sink.IgniteSinkConnector

igniteCfg=/tmp/gridgain-h2-connect/ignite-client-sink.xmltopicPrefix=h2-

Page 48: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

8.StartKafka-ConnectCluster(Single-worker)

$KAFKA_HOME/bin/connect-standalone.sh \

/tmp/gridgain-h2-connect/kafka-connect-standalone.properties \

/tmp/gridgain-h2-connect/kafka-connect-h2-source.properties \

/tmp/gridgain-h2-connect/kafka-connect-gridgain-sink.properties

[2018-10-10 16:52:21,618] INFO Created connector h2-marketdata-source (org.apache.kafka.connect.cli.ConnectStandalone:104)

[2018-10-10 16:52:22,254] INFO Created connector gridgain-marketdata-sink (org.apache.kafka.connect.cli.ConnectStandalone:104)

Page 49: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

9.SeeCachesCreatedinGridGain

OpenGridGainWebConsoleMonitoringDashboardathttps://console.gridgain.com/monitoring/dashboard andseeGridGainSinkConnectorcreatedQUOTESandTRADEScaches:

Page 50: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

10.SeeInitialH2DatainGridGain

OpenGridGainWebConsoleQueriespageandrunScanqueriesforQUOTESandTRADES:

Page 51: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

11.UpdateH2Tables

InH2WebConsoleExecute:• INSERT INTO TRADES (id, symbol) VALUES (2, ‘INTL');

• INSERT INTO QUOTES (id, date_time, price) VALUES (2, CURRENT_TIMESTAMP(), 2.0);

Page 52: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

12.SeeRealtimeH2DatainGridGain

OpenGridGainWebConsoleQueriespageandrunScanqueriesforQUOTESandTRADES:

Page 53: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

Performanceandscalabilitytuning

Page 54: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

DisableProcessingofUpdates

Forperformancereasons,SinkConnectordoesnotsupportexistingcacheentryupdatebydefault.Set shallProcessUpdates configurationsettingto true tomaketheSinkConnectorupdateexistingentries.

Page 55: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

DisableDynamicSchema

Sourceconnectorcacheskeyandvaluesschemas.• Theschemasarecreatedasthefirstcacheentryispulledandre-usedforallsubsequententries.

Thisworksonlyiftheschemasneverchange.• Set isSchemaDynamic to true tosupportschemachanges.

Page 56: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

ConsiderDisablingSchema

SourceConnectordoesnotgenerateschemasif isSchemaless configurationsettingis true.Disablingschemassignificantlyimprovesperformance.

Page 57: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

CarefullyChooseFailoverPolicy

• Canallowlosingdata?UseNone.• Cachesaresmall(e.g.referencedatacaches)?UseFullSnapshot.• OtherwiseuseBacklog.

Page 58: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

PlanKafkaConnectBacklogCapacity

OnlyBacklog failoverpolicysupportsboth“atleastonce”and“exactlyonce”deliveryguarantee.GridGainSourceConnectorcreatesBackloginthe“kafka-connect”memoryregion,whichrequirescapacityplanningtoavoidlosingdatabyeviction(unlesspersistenceisenabled).Considertheworstcasescenario:• MaximumKafkaConnectorworkerdowntimeallowedinyoursystem• Peaktraffic

Multiplepeaktrafficbymaxdowntimetoestimate“kafka-connect”dataregionsize.

Page 59: Best Practices for Stream Processing with GridGain and ... · Best Practices for Stream Processing with GridGain and Apache Ignite and Kafka Alexey Kukushkin Professional Services

Q&AThankyou!