Transcript
Page 1: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

IntegratingData-ParallelAnalyticsintoStream-ProcessingUsinganIn-MemoryDataGrid

DR.WILLIAML.BAIN

SCALEOUT SOFTWARE,INC.

Page 2: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

• Dr.WilliamBain,Founder&CEOofScaleOutSoftware:• Email:[email protected]

• Ph.D.inElectricalEngineering(RiceUniversity,1978)

• Careerfocusedonparallelcomputing– BellLabs,Intel,Microsoft

• 3priorstart-ups,lastacquiredbyMicrosoftandproductnowshipsasNetworkLoadBalancinginWindowsServer

• ScaleOutSoftwaredevelopsandmarketsIn-MemoryDataGrids,softwarefor:• Scalingapplicationperformancewithin-memorydatastorage

• Analyzinglivedatainrealtimewithin-memorycomputing

• Thirteen+yearsinthemarket;450+customers,12,000+servers

AbouttheSpeaker

2In-MemoryComputingSummitEurope2018

Page 3: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

Agenda

HowIn-MemoryComputingCreatestheNextGenerationinStream-Processing• Goalsandchallengesforstream-processing• Addingcontext:statefulstream-processing• Overviewofin-memorydatagrids(IMDGs)• Digitaltwinmodelforstream-processing• WhyuseanIMDG:integratedeventprocessinganddata-parallelanalysis• Exampleusecases• Detailedcodesample:runnerswithsmartwatches• Performancebenefits

©ScaleOutSoftware,Inc.CompanyConfidential

3In-MemoryComputingSummit2017

Page 4: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

GoalsforStream-Processing

• Goals:• Processincomingdatastreamsfrommany(1000s)ofsources.• Analyzeeventsforpatternsofinterest.• Providetimely(real-time)feedbackandalerts.• Providedata-parallelanalyticsforaggregatestatisticsandfeedback.

• Manyapplications:• InternetofThings(IoT)• Medicalmonitoring• Logistics• Financialtradingsystems• Ecommercerecommendations

4

EventSources

In-MemoryComputingSummitEurope2018

Page 5: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

Example:EcommerceRecommendations

1000sofonlineshoppers:• Eachshoppergeneratesaclickstreamofproductssearched.• Stream-processingsystemmust:

• Correlateclicksforeachshopper.• Maintainahistoryofclicksduringashoppingsession.

• Analyzeclickstocreatenewrecommendationswithin100msec.

• Analysismust:• Takeintoaccounttheshopper’spreferencesanddemographics.

• Useaggregatefeedbackoncollaborativeshoppingbehavior.

5In-MemoryComputingSummitEurope2018

Page 6: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

ProvidingRecommendationsinRealTime

• Requiresscalablestream-processingtoanalyzeeachclickandrespondin<100ms:• Acceptinputwitheacheventonshopper’spreferences.• Provideaggregatefeedbackonbest-sellingproducts.

6In-MemoryComputingSummitEurope2018

Page 7: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

ProvidingAggregateMetrics

• Mustaggregatestatisticsforallshoppers:• Trackreal-timeshoppingbehavior.• Chartkeypurchasingtrends.• Enablemerchandizertocreatepromotionsdynamically.

• Aggregatestatisticscanbesharedwithshoppers:• Allowsshopperstoobtaincollaborativefeedback.

• Examplesincludemostviewedandbestsellingproducts.

7In-MemoryComputingSummitEurope2018

Page 8: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

ChallengesforStream-ProcessingArchitectures• Basicstream-processingarchitecture:

• Challenges:• Howefficientlycorrelateeventsfromeachdatasource?• Howcombineeventswithrelevantstateinformationtocreatethenecessarycontextforanalysis?• Howembedapplication-specificanalysisalgorithmsinthepipeline?• Howgeneratefeedback/alertswithlowlatency?• Howperformdata-parallelanalyticstodetermineaggregatetrends?

8In-MemoryComputingSummitEurope2018

Page 9: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

AddingContexttoStream-Processing

• Statefulstream-processingplatformsadd“unmanaged”datastoragetothepipeline:• Pipelinestagesperformtransformationsinasequenceofstagesfromdatasourcestosinks.• Datastorage(distributedcache,database)isaccessedfromthepipelinebyapplicationcodeinanunspecifiedmanner.

• Examples:Apama (CEP),ApacheFlink,Storm

• Problems:• Thereisnosoftwarearchitectureformanagingstateinformation.

• Thisaddscomplexitytotheapplication.

• Createsanetworkbottleneck.• Doesnotaddressneedfordata-parallelanalytics.

9In-MemoryComputingSummitEurope2018

Page 10: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

LambdaArchitecture:BatchParallelAnalytics

• Lambdaarchitectureseparatesstream-processing(“speedlayer”)fromdata-parallelanalytics(“batchlayer”).• Createsqueryable state,but:

• Doesnotenhancecontextforstatefulstreamprocessing.

• Doesnotperformdata-parallelanalyticsonlineforimmediatefeedback.

• Doesnotleadtoa“HybridTransactionalandAnalyticsProcessing”(HTAP)architecture.

10

https://commons.wikimedia.org/w/index.php?curid=34963987

Howcombinestream-processingwithstatetosimplifydesign,maximizeperformance,andenablefastdata-parallelanalytics?

In-MemoryComputingSummitEurope2018

Page 11: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

In-MemoryDataGrid(IMDG)

IMDGprovidesapowerfulplatformforstatefulstream-processing.WhatisanIMDG?• IMDGstoreslive,object-orienteddata:

• Usesakey/valuestoragemodelforlargeobjectcollections.

• Mapsobjectstoaclusterofcommodityserverswithlocationtransparency.

• Haspredictablyfast(<1msec.)dataaccessandupdates.• Designedfortransparent scalingandhighavailability

• IMDGintegratesin-memorycomputingwithdatastorage:• Usesobject-orientedexecutionmodel.• Leveragesthecluster’scomputingpower.• Computeswherethedatalivestoavoidnetworkbottlenecks.

Logicalview

Physicalview

11

IMDGStorageModel

In-MemoryComputingSummitEurope2018

Page 12: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

HowanIMDGCanIntegrateComputation• Eachgridhostrunsaworkerprocesswhichexecutesapplication-definedmethodsonstoredobjects.• Thesetofworkerprocessesiscalledaninvocationgrid(IG).

• IGusuallyrunslanguage-specificruntimes(JVM,.NET).

• IMDGcanshipcodetotheIGworkers.

• KeyadvantagesforIGs:• Followsobject-orientedmodel.• Avoidsnetworkbottlenecksbymovingcomputingtothedata.

• LeveragesIMDG’scores&servers.

InvocationGrid

12

In-MemoryDataGrid

In-MemoryComputingSummitEurope2018

Page 13: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

IMDGRunsEventHandlersforStream-Processing

Eventhandlersrunindependentlyforeachincomingevent:• IMDGdirectseventtoaspecificobjectusingReactiveX forlowlatency.• IMDGexecutesmultipleeventhandlersinparallelforhighthroughput.

13

Object

In-MemoryComputingSummitEurope2018

Page 14: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

IMDGExecutesData-ParallelComputationsMethodexecutionimplementsabatchjobonanobjectcollection:• Clientrunsasinglemethodonallobjectsinacollection.• Executionrunsinparallelacrossthegrid.• Resultsaremergedandreturnedtotheclient.

14

Object

In-MemoryComputingSummitEurope2018

Page 15: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

ABasicData-ParallelExecutionModel

Afundamentalmodelfromparallelsupercomputing:• Runonemethod(“eval”)inparallelacrossmanydataobjects.• Optionallymerge theresults.• Binarycombiningisaspecialcase,but…

• ItrunsinlogN timetoenablescalablespeedup

15In-MemoryComputingSummitEurope2018

Page 16: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

MapReduceBuildsonThisModel

• Implements“group-by”computations.• Example:“DetermineaverageRPMforallwindmillsbyregion(NE,NW,SE,SW).”• Runsintwodata-parallelphases(map,reduce):• Map phaserepartitionsandoptionallycombinessourcedata.

• Reduce phaseanalyzeseachdatapartitioninparallel.

• Returnsresultsforeachpartition.

partitions

16In-MemoryComputingSummitEurope2018

Page 17: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

DistributedForEach:AnotherData-ParallelModel

17In-MemoryComputingSummitEurope2018

Page 18: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

ReducedGCTimewithDistributedForEach

PMI DistributedForEach

18In-MemoryComputingSummitEurope2018

Page 19: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

Stream-ProcessingwiththeDigitalTwinModel

• CreatedbyMichaelGrieves;popularizedbyGartner• RepresentseachdatasourcewithanIMDGobjectthatholds:• Aneventcollection• Stateinformationaboutthedatasource• Logicforanalyzingevents,updatingstate,andgeneratingalerts

• Benefits:• Offersastructuredapproachtostatefulstream-processing.• Automaticallycorrelatesincomingeventsbydatasource.• Integratesallrelevantcontext(events&state).• Enableseasydeploymentofapplication-specificlogic(e.g.,ML,rulesengine,etc.)foranalysisandalerting.

• Providesdomainforaggregateanalysisandfeedback.

19In-MemoryComputingSummitEurope2018

Data-parallelanalysis

Page 20: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

SomeApplicationsforDigitalTwins

Adigitaltwincorrelatesincomingeventswithcontextusingdomain-specificalgorithmstogeneratealerts:

20

Application Context Events Logic Alerts

IoT devices Devicestatus&history Devicetelemetry Analyzetopredictmaintenance.

Maintenancerequests

Medicalmonitoring

Patienthistory&medications

Heart-rate,blood-pressure,etc.

Evaluatemeasurementsovertimewindowswithrulesengine.

Alertstopatient&physician

CableTV Viewerpreferences&history,set-topboxstatus

Channelchangeevents,telemetry

Cleanse&mapchanneleventsforreco.engine;predictboxfailure.

Viewerrecom-mendations,repairalerts

Ecommerce Shopperpreferences&buyinghistory

Clickstreameventsfromwebsite

UseMLtomakeproductrecommendations.

Productlistforwebsite

Frauddetection

Customerstatus&history

Transactions Analyzepatternstoidentifyprobablefraud.

Alertstocustomer&bank

In-MemoryComputingSummitEurope2018

Page 21: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

WhyUseanIMDGtoHostDigitalTwins?

IMDGprovidesanexcellentDTplaftorm:• Scalable,object-orienteddatastorage:

• Offersanaturalmodelforhostingdigitaltwins.• Cleanlyseparatesdomainlogicfromdata-parallelorchestration.

• Integrated,In-memorycomputing:• Automaticallycorrelatesincomingeventsforanalysis.

• Enablesbothstreamanddata-parallelprocessing.

• Highperformance:• Avoidsdatamotionandassociatednetworkbottlenecks.

• Fastandscalestohandlelargeworkloads.• Integratedhighavailability:

• Usesdatareplicationdesignedforlivesystems.• Canensurethatcomputationishighav.

21In-MemoryComputingSummitEurope2018

Page 22: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

ScalingEventIngestionwithKafka

22

• IMDGpartitionsdigitaltwinobjectsacrossservers.• Kafkaofferspartitionstoscaleouthandlingofeventmessages.• Partitionsaredistributedacrossbrokers.• Brokersprocessmessagesinparallel.

• IMDGcanmapKafkapartitionstogridpartitions:• IMDGspecifiesevent-mappingalgorithmtoKafka.

• IMDGlistenstoappropriateKafkapartitions.

• Thisminimizeseventhandlinglatency.• Avoidsstore-and-forwardwithinIMDG.

In-MemoryComputingSummitSiliconValley2017

Page 23: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

IntegratingEventandData-ParallelProcessingTheIMDG:• Postsincomingeventstoitsrespectivedigitaltwinobject.• Runsthetwin’seventhandlermethodwithlowlatency.• Eventhandlermanagestheeventcollectionandcanusetimewindowsforanalysis.

• Eventhandlerusesandupdatesin-memorystate.• Eventhandlercanuse/updateoff-linestate.• Eventhandleroptionallygeneratesalertsandfeedbacktoitsdigitaltwin.

• Runsdata-parallelmethodstoanalyzealldigitaltwinsinreal-time.• Resultscanbeusedforbothalertingandfeedback.

23

OfflineState Data-ParallelAnalysis

In-MemoryComputingSummitEurope2018

Page 24: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

Example:EcommerceShoppingSite

Trackswebshoppersandprovidesreal-timerecommendations:• EachDTobjectholdsclickstreamofbrowsedproducts,preferences,anddemographics.• Eventhandleranalyzesthisdataandupdatesrecommendations.• Periodicdata-parallel,batchanalyticsacrossallshoppersdetermineaggregatetrends:• Examplesincludebestsellingproducts,averagebasketsize,etc.

• Usedforanalysisandreal-timefeedback

24In-MemoryComputingSummitEurope2018

Page 25: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

Example:TrackingaFleetofVehicles

• Goal:Tracktelemetryfromafleetofcarsortrucks.• Eventsindicatespeed,position,andotherparameters.

• Digitaltwinobjectstoresinformationaboutvehicle,driver,anddestination.

• Eventhandleralertsonexceptionalconditions(speeding,lostvehicle).

• Periodicdata-parallelanalyticsdeterminesaggregatefleetperformance:• Computesoverallfuelefficiency,driverperformance,vehicleavailability,etc.

• Canprovidefeedbacktodriverstooptimizeoperations.

25In-MemoryComputingSummitEurope2018

Page 26: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

UsingDigitalTwinsinaHierarchy

Trackscomplexsystemsashierarchyofdigitaltwinobjects:• Leafnodesreceivetelemetryfromphysicalendpoints.• Higherlevelnodesrepresentsubsystems:• Receivetelemetryfromlower-levelnodes.

• Supplytelemetrytohigher-levelnodesasalerts.

• Allowsuccessiverefinementofreal-timetelemetryintohigher-levelabstractions.

26

Example:HierarchyofDigitalTwinsforaWindmill

In-MemoryComputingSummitEurope2018

Page 27: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

DetailedExample:Heart-RateWatchMonitoring

Goal:Trackheart-rateforalargepopulationofrunners.• Heart-rateeventsflowfromsmartwatchestotheirrespectivedigitaltwinobjectsforanalysis.• Theanalysisuseswearer’shistory,activity,andaggregatestatisticstodeterminefeedbackandalerts.

27In-MemoryComputingSummitEurope2018

Page 28: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

DigitalTwinObject(Java)• Holdseventcollectionanduser’scontext(age,medicalhistory,currentstatus,etc.):

28

public class User implements Serializable {private int _id;private double _height;private double _bodyWeight;private Gender _gender;private int _age;private int _averageHr;private WorkoutProgress _status;private int _sessionAverageMax;private List<Medication> _medications;private List<Long> _heartIncidents;private List<HeartRate> _runningHeartRateTelemetry;private long _alertTime;private boolean _alerted;...}

In-MemoryComputingSummitEurope2018

Eventcollection

User’scontext

Page 29: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

Events&Alerts

• EventholdsperiodictelemetrysentfromwatchtoIMDG:

• Alertholdsdatatobesentbacktowearerand/ortomedicalpersonnel:

29

public class HeartRateEvent {private int _userId;private int _heartRate;private long _timestamp;private WorkoutType _workoutType;private WorkoutProgress _workoutProgress;private Event _event;...}

public class HeartRateAlert {private int _userId; private String _alertType;private String _params;...}

In-MemoryComputingSummitEurope2018

Page 30: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

SettingUpaReactiveX PipelineontheIMDG

• DefineaReactiveX observerthatrunsoneveryserverintheIMDG:

• CreateaninvocationgridthatInitializestheReactiveX observeratstartup:

30

public class HeartRateObserver implements Observer<Event>, Serializable {@Override public void onNext(Event event) {

HeartRateEvent hre = HeartRateEvent.fromBytes(event.getPayload());hre.setEvent(event);User.processRunningEvent(hre);} ...}

In-MemoryComputingSummitEurope2018

Pipeline pipeline = new Pipeline(“userCache”, “userGrid”);GridAction action = pipeline.createRemoteObserverAction("userObserver",

new HeartRateObserver());InvocationGrid grid = new InvocationGridBuilder(“userGrid”)

.addJar("./bin/appcode.jar")

.addStartupAction(action)

.load();

Callapplication

Initializeobserver

Page 31: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

EventHandlerandEventPosting

• PostinganeventtotheReactiveX observer:• Thekeydetermineswhichserverreceivestheeventforposting.

• HandlinganeventpostedtotheReactiveX observeronDTtwin’sserver:

31

private static void processRunningEvent(HeartRateEvent hre) {CachedObjectId id = hre.getId();User u = (User)cache.retrieve(id, false);...executeRunningWorkoutAnalytics(hre, u);...cache.update(id, u);}

In-MemoryComputingSummitEurope2018

RetrieveDTobject

Analyzeevent

UpdateDTobject

pipeline.postEvent(makeKey(UserId),"heartRateEvent", HeartRateEvent.toBytes(new HeartRateEvent(last, System.nanoTime(),WorkoutType.Running, WorkoutProgress)));

Page 32: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

EventAnalysis• Handlesaneventforanactiveuserdoingarunningworkout:

32

private static void executeRunningWorkoutAnalytics(HeartRateEvent hre, User u) {long start = twoWeeksAgo();long sessionTimeout = threeHours();SessionWindowCollection<HeartRate> swc = new

SessionWindowCollection<>(u.getRunningHeartRateTelemetry(),heartRate -> heartRate.getTimestamp(), start, sessionTimeout);

swc.add(new HeartRate(hre.getHeartRate(), hre.getTimestamp()));

int total = 0; int windowCount = 0;for(TimeWindow<HeartRate> window : swc) {

int avg = 0;for(HeartRate hr : window) {avg += hr.getHeartRate();}total += (avg/window.size());windowCount++;}

u.setAverageHr(total/windowCount);u.analyzeAndCheckForAlert(hre);}

In-MemoryComputingSummitEurope2018

Analyzeeventhistory

Analyzeuser’scontext

Createtimewindows

Addevent

Page 33: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

AnalysisTechniquesEnabledbyDigitalTwin

Enabledetailedheart-ratemonitoringforahighintensityexerciseprogram:• Exampleofdatatobetracked:

• Exercisespecifics:typeofexercise,exercise-specificparameters(distance,strides,altitudechange,etc.)

• Participantbackground/history:age,height,weighthistory,heart-relatedmedicalconditionsandmedications,injuries,previousmedicalevents

• Exercisetracking:sessionhistory,average#sessionsperweek,averageandpeakheartrates,frequencyofexercisetypes

• Aggregate statistics:average/max/minexercisetrackingstatisticsforallparticipants

• Exampleoflogictobeperformed:• Notifyparticipantifsessionhistoryacrosstimewindowsindicatesneedtochangemix.• Notifyparticipantifheartratetrendsdeviatesignificantlyfromaggregatestatistics.• Alertparticipant/medicalpersonnelifheartrateanalysisacrosstimewindowsindicatesanimminentthreattohealth.

• Report aggregatestatisticstoanalystsand/orusers.

33In-MemoryComputingSummitEurope2018

Page 34: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

DataParallelAnalysisAcrossallDigitalTwins

• UsesIMDG’sin-memorycomputeenginetocreateaggregatestatisticsinrealtime.

• Resultscanbereportedtoanalystsandupdatedeveryfewseconds.

• Resultscanbeusedasfeedbacktoeventanalysisindigitaltwinobjectsand/orreportedtousers.

34In-MemoryComputingSummitEurope2018

Page 35: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

ComputingAggregateData• Performsadata-parallelcomputationusingtheIMDG’sEval andMergemethods:

35

public class AggregateStatsInvokable implements Invokable<User, Integer,AggregateStats> {@Overridepublic AggregateStats eval(User u, Integer numUsers) {

AggregateStats userStats = new AggregateStats(numUsers);userStats.merge(u);return userStats ;

}

@Overridepublic AggregateStats merge(AggregateStats mergedStats,

AggregateStats u) {mergedStats.merge(u);return mergedStats;

}}

In-MemoryComputingSummitEurope2018

Evalmethod

Binarymergemethod

Page 36: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

ComputingAggregateData(2)• Computesrunningaverageofheart-ratebycategories:

36

public void merge(AggregateStats user) {numEvents += user.getNumEvents();totalHeartRate18to34 += user.getTotalHeartRate18to34();totalHeartRate35to50 += user.getTotalHeartRate35to50();totalHeartRateOver50 += user.getTotalHeartRateOver50();count18to34 += user.getCount18to34();count35to50 += user.getCount35to50();countOver50 += user.getCountOver50();

totalHeartRateBmiUnderWeight += user.getTotalHeartRateBmiUnderWeight();totalHeartRateBmiNormalWeight += user.getTotalHeartRateBmiNormalWeight();totalHeartRateBmiOverweight += user.getTotalHeartRateBmiOverweight();countUnderweight += user.getCountUnderweight();countNormalWeight += user.getCountNormalWeight();countOverWeight += user.getCountOverWeight();

}

In-MemoryComputingSummitEurope2018

CreatesGroups

Page 37: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

RunningtheData-ParallelComputation

• Usesasinglemethodtorunadata-parallelcomputationandreturnresults.• PublishesmergedresultstoanIMDGobjectforaccessbyuserobjectsand/oranalysts.

37

public void run() { NamedCache usersCache = CacheFactory.getCache(“userCache”);NamedCache statsCache = CacheFactory.getCache(“statsCache”);AggregateStats stats;

InvokeResult<AggregateStats> result =usersCache.invoke(AggregateStatsInvokable.class, null, _numUsers,

TimeSpan.fromMilliseconds(10000));

stats = result.getResult();statsCache.put(“globalStats”, stats);

}

In-MemoryComputingSummitEurope2018

Invokedata-parallelop

StoreresultinIMDG

Page 38: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

Data-ParallelExecutionSteps

• Eval phase:eachserverquerieslocalobjectsandrunseval andmergemethods:• Accessinglocalobjectsavoidsdatamotion.• Completeswithoneresultobjectperserver.

• Mergephase:allserversperformbinary,distributedmergetocreatefinalresult:• Mergerunsinparalleltominimizecompletiontime.

• Returnsfinalresultobjecttoclient.

38In-MemoryComputingSummitEurope2018

Page 39: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

Predictable,ScalablePerformance

• DigitaltwinmodelenablestheIMDGtoscalebothevent-handlingandintegrateddata-parallelanalysis.• Correlatingeventstodigitaltwinobjectscreatesanautomaticbasisforperformancescaling:

• Foreventanalysis• Fordata-parallelanalysis

• Itenablesaccesstoeachevent’scontextwithoutrequiringanetworkaccess.• Italsoco-locatesandencapsulatesapplication-specificcodeusingo-otechniques.

39In-MemoryComputingSummitEurope2018

Page 40: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

AvoidsNetworkBottlenecks

• DigitaltwinmodelavoidsnetworkbottlenecksassociatedwithusinganIMDGasanetworkedcacheinastream-processingpipeline.• Externaldatastoragerequiresnetworkaccesstoobtainanevent’scontext.• Networkbottleneckpreventsscalablethroughput.

40In-MemoryComputingSummitEurope2018

Page 41: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

Wrap-Up

DigitalTwins:TheNextGenerationinStatefulStream-Processing• Challenge:Currenttechniquesforstatefulstream-processing:

• Lackacoherentsoftwarearchitectureformanagingcontext.• Cansufferfromperformanceissuesduetonetworkbottlenecks.

• Thedigitaltwinmodel:• Offersaflexible,powerful,scalablearchitectureforstatefulstream-processing:

• Associateseventswithcontextabouttheirphysicalsourcesfordeeperintrospection.• Enablesflexible,object-orientedencapsulationofanalysisalgorithms.

• Providesabasisforaggregateanalysisandfeedback.

• Scalable,data-parallelcomputingwithanIMDG:• Automaticallycorrelatesincomingeventsandprocessestheminparallel.• Implementsintegrated(real-time),aggregateanalysisforimmediatefeedback.

41In-MemoryComputingSummitEurope2018

Page 42: Integrating Data-Parallel Analytics into Stream -Processing ......• Stateful stream-processing platforms add “unmanaged” data storage to the pipeline: • Pipeline stages perform

www.scaleoutsoftware.com

In-MemoryComput ing for Operat ional Inte l l igence


Top Related