data mining netflow...cost of participating in data mining no yes 10 10 10 red haring time lost to...
TRANSCRIPT
Data Mining NetFlowSo What’s Next?
Mark E KaneFloCon 200520 September 05
Objectives
Data Mining, very brieflyFrequency PatternsDiscoveriesRealizationsChanges Made
Data Mining
Data Mining – automated extraction of previously unknown data that is interesting and potentially useful.
Cost of Participating in Data Mining
Red Haring101010YESNO
Time Lost to Investigate and Clean
Up After Crime∞∞∞NOYES
-000NONO
Crime Prevented / Prosecuted101010YESYES
Result
Example SysAdmin
Hours
Example Investigator
Hours
Example Analyst Hours
Result of Data
MiningReality
Complexity of Mining NetFlow
Shear VolumeComplex Protocol AnalysisAmbiguous InterpretationsVery Smart Adversaries
Common Investigator Issues
Undermanned and overworkedVaried knowledge baseDoes not own networksNo direct reporting structure
Data Mining Techniques
Primary TechniquesRule and Tree InductionCharacterizationClassificationRegressionAssociationClustering
Other TechniquesDependency ModelingChange DetectionTrend AnalysisDeviation DetectionLink AnalysisPattern AnalysisSpatiotemporal Data MiningMining Path Traversal PatternsMining Sequential/Frequent Patterns
Uncertain Reasoning TechniquesFuzzy LogicNeural NetworksBayesian NetworksGenetic AlgorithmsRough Set Theory
Frequency Patterns
Mining Frequent Patterns in Data Streams in Multiple Time Granularities(Giennella, Han, Pei, Yan, and Yu)
Support Decision MakingPast Less Significant than PresentRecord ReductionTime Tilted Windows
Interpreting Time-Tilted Windows
DAYWindowTransition N Y N Y N Y N YSize 1 1 2 2 4 4 8 8
Monday 9Tuesday 15 9Wednesday 6 12Thursday 6 6 12Friday 12 6 12Saturday 16 12 6 12Sunday 6 14 9Monday 12 6 14 9Tuesday 15 9 14 9
0 1 2 3Day 1: 9 events
Day 2: 15 events (two buckets)
Day 3: 6 events (two buckets)
Day 4: 6 events (two buckets)
Day 5: 16 events (three buckets)
Day 6: 12 events (four buckets)
Presenting Frequency Patterns
Data Mining Discoveries
Failed email serversPreviously, unknown trusted relationshipsEncryption without authenticationPossible, but unproven intrusions
Data Mining Results
Frustrated InvestigatorsFrustrated AnalystsOne Very Frustrated Developer
Changes to Employ Data Mining
Establish common basis of understandingEstablish criteria for reporting
Geo-ResolutionTimelinessVolume
Establish reporting procedures
Questions
Mark Kane
mkane @ ddktechgroup.com