visual analytics sandbox: a big data platform for ... · –virtual big data environment for stream...

11
A National Science Foundation Industry/ University Cooperative Research Center Visual Analytics Sandbox: A big data platform for processing network traffic Raju Gottumukkala, Ph.D. Director of Research, Informatics Research Institute Site Director, NSF Center for Visual and Decision Informatics Assistant Professor, College of Engineering University of Louisiana at Lafayette 2017 Internet2 Global Summit (04/26/2017) Funded by NSF award No.1429526

Upload: others

Post on 25-May-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Visual Analytics Sandbox: A big data platform for ... · –Virtual big data environment for stream processing, graph analytics –Leverage stream processing, graph mining & visualization

ANationalScienceFoundationIndustry/UniversityCooperativeResearchCenter

VisualAnalyticsSandbox:Abigdataplatformforprocessingnetworktraffic

RajuGottumukkala,Ph.D.DirectorofResearch,InformaticsResearchInstitute

SiteDirector,NSFCenterforVisualandDecisionInformaticsAssistantProfessor,CollegeofEngineering

UniversityofLouisianaatLafayette

2017Internet2GlobalSummit(04/26/2017)

FundedbyNSFawardNo.1429526

Page 2: Visual Analytics Sandbox: A big data platform for ... · –Virtual big data environment for stream processing, graph analytics –Leverage stream processing, graph mining & visualization

ANationalScienceFoundationIndustry/UniversityCooperativeResearchCenter

Motivation

• Cyberenvironmentisincreasinglygettingcomplex

• Existingtoolsdonotsupportinteractiveanalysisofdynamicgraphs

Ip FlowGraph[©CVDI]Webgraph[©GW3BI]

• Graphsarecomplexdatastores

Page 3: Visual Analytics Sandbox: A big data platform for ... · –Virtual big data environment for stream processing, graph analytics –Leverage stream processing, graph mining & visualization

ANationalScienceFoundationIndustry/UniversityCooperativeResearchCenter

ApplicationofTVGincyber-security

• Extraction of Traffic Dispersion Graph(TDG) from IP-Flows

• Graph structural properties indicateabnormal traffic patterns (malware)

ATDGforsoribada P2P

Page 4: Visual Analytics Sandbox: A big data platform for ... · –Virtual big data environment for stream processing, graph analytics –Leverage stream processing, graph mining & visualization

ANationalScienceFoundationIndustry/UniversityCooperativeResearchCenter

WhatisVisualAnalyticsSandbox?

• Auniqueanalyticsenvironmentforprocessinghigh-volume,high-velocitydatastreams

– IoT,IPflowgraphs,clickstreams,socialmedia,etc.

• Experimentalinfrastructuretodevelopnext-generationdecisionsupporttools

VisualAnalytics

Virtualization

ResourceManager

NetworkInfrastructure Processing Memory

SSD-Storage

DiskStorage

StreamProcessing

BatchProcessing

DataManagement GraphStorage

Pre-ProcessingLibraries

MachineLearning

GraphProcessing

StatisticalPackages

VisualizationAlgorithms

LayoutAlgorithms

GraphReduction

HCITools(touch/3D)

PredictiveAnalytics

InteractiveAnalytics

AnomalyDetection

Page 5: Visual Analytics Sandbox: A big data platform for ... · –Virtual big data environment for stream processing, graph analytics –Leverage stream processing, graph mining & visualization

ANationalScienceFoundationIndustry/UniversityCooperativeResearchCenter

Hardware

• 112TFlopsofcomputing– 22IntelXeonE5processors(308

cores)– 12NvDIAP100GPU’s

• Storage– 1.8TBRAM– 25.6TBNvMESSD

• 25XfasterthanHDD,5XfasterthanSSD

– 20TBHDD• Networking

– 2X10GEthernetCardsperNode

InformationManagement

Nodes

StreamProcessingNodes

AnalyticsNode

VisualizationNode

Page 6: Visual Analytics Sandbox: A big data platform for ... · –Virtual big data environment for stream processing, graph analytics –Leverage stream processing, graph mining & visualization

ANationalScienceFoundationIndustry/UniversityCooperativeResearchCenter

BenchmarkingGraphOperations

TypeofTemporalCharacteristic GraphOperations

Temporalnetworktopology Degree,connectivity,density

Reachabilityanalysis Paths,walks,trails

Detectingoutliers Nodeoredgeclustering

Nodeneighborhoods Persistentpatterns&motifs

Execution times for retrieving cumulative edge weights of a node (Neo4J vs MySQL)

Figure 9. Execution times for retrieving shortest path betweentwo nodes (Neo4j with timestamp on node and caching vs nocaching)

Execution times for retrieving shortest path between two selected nodes (Neo4j schemes)

Execution times for retrieving weighted shortest path between two nodes (Neo4j with timestamp on node and

caching vs no caching)

Page 7: Visual Analytics Sandbox: A big data platform for ... · –Virtual big data environment for stream processing, graph analytics –Leverage stream processing, graph mining & visualization

ANationalScienceFoundationIndustry/UniversityCooperativeResearchCenter

Touchinteractions

(a) Node history graphs with timestamp browsing

(b) Graph filtering based on edge weight with multitouch controller (lower-bound highlighted yellow).

(c) Control widget (around finger) and zoom window with detached source.

(d) Local neighborhood of a selected node arranged and rendered in detached window(upper-right).

(e) Progress indicator circle provides feedback to users while analytics jobs are processed.

(f) Shortest paths between nodes are emphasized in red, and nodes along the paths are highlighted on the graph.

a) b)

c) d)

e) f)

Page 8: Visual Analytics Sandbox: A big data platform for ... · –Virtual big data environment for stream processing, graph analytics –Leverage stream processing, graph mining & visualization

ANationalScienceFoundationIndustry/UniversityCooperativeResearchCenter

Connectivity

• ConnectedtoLONINetworkandScienceDMZ

• ConnectedtoInternet2.0throughGENIswitch

• 40Gbps connectionsforfasterdatatransfer

GENISwitch

10Gbps

VASandbox

40Gbps

Internet2.0

1Gbps

Internet1.0

Page 9: Visual Analytics Sandbox: A big data platform for ... · –Virtual big data environment for stream processing, graph analytics –Leverage stream processing, graph mining & visualization

ANationalScienceFoundationIndustry/UniversityCooperativeResearchCenter

Expectations&Timeline• Howcanyouusethissystem?– Virtualbigdataenvironmentforstreamprocessing,graphanalytics

– Leveragestreamprocessing,graphmining&visualizationtoolsthatarepartofthesandbox

• Planfordeployment– Softwaredevelopedinapilotenvironmentfor(a)intelligentleveesurveillance,(b)eventdetectioninsocialmediaand(c)influenzaforecasting

– Lookingforcollaborationforoneusecaseincybersecurityusecasefornetworkanalysis(Aug2017)

– SystemwillbemadeavailableforusersonInternet2(Spring2018)

Page 10: Visual Analytics Sandbox: A big data platform for ... · –Virtual big data environment for stream processing, graph analytics –Leverage stream processing, graph mining & visualization

ANationalScienceFoundationIndustry/UniversityCooperativeResearchCenter

ThankYou

Contact:[email protected]:www.ucs.louisiana.edu/~nrg0821

Page 11: Visual Analytics Sandbox: A big data platform for ... · –Virtual big data environment for stream processing, graph analytics –Leverage stream processing, graph mining & visualization

ANationalScienceFoundationIndustry/UniversityCooperativeResearchCenter

RealTimeEventDetection

AnalyticsEngine

[©CVDI]