visual analytics sandbox: a big data platform for ... · –virtual big data environment for stream...
TRANSCRIPT
ANationalScienceFoundationIndustry/UniversityCooperativeResearchCenter
VisualAnalyticsSandbox:Abigdataplatformforprocessingnetworktraffic
RajuGottumukkala,Ph.D.DirectorofResearch,InformaticsResearchInstitute
SiteDirector,NSFCenterforVisualandDecisionInformaticsAssistantProfessor,CollegeofEngineering
UniversityofLouisianaatLafayette
2017Internet2GlobalSummit(04/26/2017)
FundedbyNSFawardNo.1429526
ANationalScienceFoundationIndustry/UniversityCooperativeResearchCenter
Motivation
• Cyberenvironmentisincreasinglygettingcomplex
• Existingtoolsdonotsupportinteractiveanalysisofdynamicgraphs
Ip FlowGraph[©CVDI]Webgraph[©GW3BI]
• Graphsarecomplexdatastores
ANationalScienceFoundationIndustry/UniversityCooperativeResearchCenter
ApplicationofTVGincyber-security
• Extraction of Traffic Dispersion Graph(TDG) from IP-Flows
• Graph structural properties indicateabnormal traffic patterns (malware)
ATDGforsoribada P2P
ANationalScienceFoundationIndustry/UniversityCooperativeResearchCenter
WhatisVisualAnalyticsSandbox?
• Auniqueanalyticsenvironmentforprocessinghigh-volume,high-velocitydatastreams
– IoT,IPflowgraphs,clickstreams,socialmedia,etc.
• Experimentalinfrastructuretodevelopnext-generationdecisionsupporttools
VisualAnalytics
Virtualization
ResourceManager
NetworkInfrastructure Processing Memory
SSD-Storage
DiskStorage
StreamProcessing
BatchProcessing
DataManagement GraphStorage
Pre-ProcessingLibraries
MachineLearning
GraphProcessing
StatisticalPackages
VisualizationAlgorithms
LayoutAlgorithms
GraphReduction
HCITools(touch/3D)
PredictiveAnalytics
InteractiveAnalytics
AnomalyDetection
ANationalScienceFoundationIndustry/UniversityCooperativeResearchCenter
Hardware
• 112TFlopsofcomputing– 22IntelXeonE5processors(308
cores)– 12NvDIAP100GPU’s
• Storage– 1.8TBRAM– 25.6TBNvMESSD
• 25XfasterthanHDD,5XfasterthanSSD
– 20TBHDD• Networking
– 2X10GEthernetCardsperNode
InformationManagement
Nodes
StreamProcessingNodes
AnalyticsNode
VisualizationNode
ANationalScienceFoundationIndustry/UniversityCooperativeResearchCenter
BenchmarkingGraphOperations
TypeofTemporalCharacteristic GraphOperations
Temporalnetworktopology Degree,connectivity,density
Reachabilityanalysis Paths,walks,trails
Detectingoutliers Nodeoredgeclustering
Nodeneighborhoods Persistentpatterns&motifs
Execution times for retrieving cumulative edge weights of a node (Neo4J vs MySQL)
Figure 9. Execution times for retrieving shortest path betweentwo nodes (Neo4j with timestamp on node and caching vs nocaching)
Execution times for retrieving shortest path between two selected nodes (Neo4j schemes)
Execution times for retrieving weighted shortest path between two nodes (Neo4j with timestamp on node and
caching vs no caching)
ANationalScienceFoundationIndustry/UniversityCooperativeResearchCenter
Touchinteractions
(a) Node history graphs with timestamp browsing
(b) Graph filtering based on edge weight with multitouch controller (lower-bound highlighted yellow).
(c) Control widget (around finger) and zoom window with detached source.
(d) Local neighborhood of a selected node arranged and rendered in detached window(upper-right).
(e) Progress indicator circle provides feedback to users while analytics jobs are processed.
(f) Shortest paths between nodes are emphasized in red, and nodes along the paths are highlighted on the graph.
a) b)
c) d)
e) f)
ANationalScienceFoundationIndustry/UniversityCooperativeResearchCenter
Connectivity
• ConnectedtoLONINetworkandScienceDMZ
• ConnectedtoInternet2.0throughGENIswitch
• 40Gbps connectionsforfasterdatatransfer
GENISwitch
10Gbps
VASandbox
40Gbps
Internet2.0
1Gbps
Internet1.0
ANationalScienceFoundationIndustry/UniversityCooperativeResearchCenter
Expectations&Timeline• Howcanyouusethissystem?– Virtualbigdataenvironmentforstreamprocessing,graphanalytics
– Leveragestreamprocessing,graphmining&visualizationtoolsthatarepartofthesandbox
• Planfordeployment– Softwaredevelopedinapilotenvironmentfor(a)intelligentleveesurveillance,(b)eventdetectioninsocialmediaand(c)influenzaforecasting
– Lookingforcollaborationforoneusecaseincybersecurityusecasefornetworkanalysis(Aug2017)
– SystemwillbemadeavailableforusersonInternet2(Spring2018)
ANationalScienceFoundationIndustry/UniversityCooperativeResearchCenter
ThankYou
Contact:[email protected]:www.ucs.louisiana.edu/~nrg0821
ANationalScienceFoundationIndustry/UniversityCooperativeResearchCenter
RealTimeEventDetection
AnalyticsEngine
[©CVDI]