from nlp to mlpcomsec.spb.ru/mmm-acns10/doc/mmm-acns2010/session4/02_te… · • nlp: • extract...
TRANSCRIPT
![Page 1: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A](https://reader035.vdocument.in/reader035/viewer/2022081614/5fcec739c83084605d656f4f/html5/thumbnails/1.jpg)
From NLPto MLP
Peter Teufl/Udo Payer/Günther LacknerGraz University of Technology, Austria
![Page 2: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A](https://reader035.vdocument.in/reader035/viewer/2022081614/5fcec739c83084605d656f4f/html5/thumbnails/2.jpg)
What to expect...
• Knowledge mining in various areas:
• Event correlation, RDF data analysis, WIFI privacy, malware analysis
• ... and e-Participation
• Common model based on ML and AI
LemmatizationPOS(C) Tagging
Lexical Parser
Activation Patterns AnalysisMLP vs.
NLPIntro Idea
![Page 3: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A](https://reader035.vdocument.in/reader035/viewer/2022081614/5fcec739c83084605d656f4f/html5/thumbnails/3.jpg)
e-Participation
• Natural Language Processing (NLP)
• Analysis of text
• Clustering
• Relations
• Semantic aware search
LemmatizationPOS(C) Tagging
Lexical Parser
Activation Patterns AnalysisMLP vs.
NLPIntro Idea
![Page 4: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A](https://reader035.vdocument.in/reader035/viewer/2022081614/5fcec739c83084605d656f4f/html5/thumbnails/4.jpg)
Idea...
• In previous work, analysis of polymorphic shellcodes
• Neural networks for the detection of shellcode decryption loops
• Code vs. natural language?
LemmatizationPOS(C) Tagging
Lexical Parser
Activation Patterns AnalysisMLP vs.
NLPIntro Idea
![Page 5: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A](https://reader035.vdocument.in/reader035/viewer/2022081614/5fcec739c83084605d656f4f/html5/thumbnails/5.jpg)
NLP• Natural Language Processing
• Various techniques are available
• Sentence analysis
• POS tagging
• Word sense disambiguation
LemmatizationPOS(C) Tagging
Lexical Parser
Activation Patterns AnalysisIntro Idea MLP vs.
NLP
![Page 6: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A](https://reader035.vdocument.in/reader035/viewer/2022081614/5fcec739c83084605d656f4f/html5/thumbnails/6.jpg)
MLP• Machine Language Processing
• In this talk: focus on assembler, but applicable to any other language
• Assuming that machine language has similar properties as real language
• Semantics, grammar, (word/instruction) disambiguation
LemmatizationPOS(C) Tagging
Lexical Parser
Activation Patterns AnalysisIntro Idea MLP vs.
NLP
![Page 7: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A](https://reader035.vdocument.in/reader035/viewer/2022081614/5fcec739c83084605d656f4f/html5/thumbnails/7.jpg)
MLP• Malware: Signatures? Understanding?
• Fuzzy analysis
• Group code with similar behavior
• Semantic search for similar code
• Semantic relations within code
LemmatizationPOS(C) Tagging
Lexical Parser
Activation Patterns AnalysisIntro Idea MLP vs.
NLP
![Page 8: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A](https://reader035.vdocument.in/reader035/viewer/2022081614/5fcec739c83084605d656f4f/html5/thumbnails/8.jpg)
Goal
• Can we map NLP to MLP???
• NLP research highly developed, mature frameworks, techniques
• Problems in language analysis easier to spot, since we know our language very well
LemmatizationPOS(C) Tagging
Lexical Parser
Activation Patterns AnalysisIntro Idea MLP vs.
NLP
MLP vs. NLP
![Page 9: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A](https://reader035.vdocument.in/reader035/viewer/2022081614/5fcec739c83084605d656f4f/html5/thumbnails/9.jpg)
Layers Lexical Parser
POS Tagging
POS Filter
Lemmatization
Semantic Network Generation
Activation Pattern Generation
Unsupervised Analysis Semantic Search
NLP
Semantic Relations
External Knowledge
Emulator/Disassembler
POC Tagging
POC Filter
Lemmatization
MLP
NLP/MLP Processing
Semantic Processing
Analysis
Supervised Analysis
LemmatizationPOS(C) Tagging
Lexical Parser
Activation Patterns AnalysisIntro Idea MLP vs.
NLP
![Page 10: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A](https://reader035.vdocument.in/reader035/viewer/2022081614/5fcec739c83084605d656f4f/html5/thumbnails/10.jpg)
Lexical parser
• NLP:
• Extract sentences, analyze terms, find relations (e.g. Stanford parser)
• This beautiful_A city_N is_V called_V St. Petersburg_SN.
LemmatizationPOS(C) Tagging
Activation Patterns AnalysisMLP vs.
NLPIntro Idea Lexical Parser
![Page 11: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A](https://reader035.vdocument.in/reader035/viewer/2022081614/5fcec739c83084605d656f4f/html5/thumbnails/11.jpg)
Lexical parser
• MLP:
• instruction sequences (mov, sub, xor...)
• relations between typical instructions (modifying the same register, variable)
• e.g. interrupt preparation, loops e.g.
LemmatizationPOS(C) Tagging
Activation Patterns AnalysisMLP vs.
NLPIntro Idea Lexical Parser
![Page 12: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A](https://reader035.vdocument.in/reader035/viewer/2022081614/5fcec739c83084605d656f4f/html5/thumbnails/12.jpg)
POS tagging, filter
• NLP:
• This beautiful_A city_N is_V called_V St. Petersburg_SN.
• Tagging terms
• Filtering (stop words etc.): beautiful, city, St. Petersburg
LemmatizationLexical Parser
Activation Patterns AnalysisMLP vs.
NLPIntro Idea POS(C) Tagging
![Page 13: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A](https://reader035.vdocument.in/reader035/viewer/2022081614/5fcec739c83084605d656f4f/html5/thumbnails/13.jpg)
POC tagging, filter
• MLP:
• jmp, call, jz... (branch type)
• add, sub... (arithmetic)
• xor... (logic)
LemmatizationLexical Parser
Activation Patterns AnalysisMLP vs.
NLPIntro Idea POS(C) Tagging
![Page 14: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A](https://reader035.vdocument.in/reader035/viewer/2022081614/5fcec739c83084605d656f4f/html5/thumbnails/14.jpg)
Lemmatization• NLP:
• bought => buy
• cars => car
• MLP:
• drop parameters (mov ax,4)
• group instructions (arithmetic, logic)
POS(C) Tagging
Lexical Parser
Activation Patterns AnalysisMLP vs.
NLPIntro Idea Lemmatization
![Page 15: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A](https://reader035.vdocument.in/reader035/viewer/2022081614/5fcec739c83084605d656f4f/html5/thumbnails/15.jpg)
Activation Patterns
• Based on semantic networks, spreading activation, machine learning
• Allows us to analyze arbitrary combination of features (symbolic, real values)
• Patterns are the basis for a wide range of analysis methods
UDP
TCP
TELNET
DNS
Error Rate cluster 1e.g. low error rate
Error Rate cluster 2e.g. high error rate
Byte occurence cluster 1 (e.g. typical for plain text
traffic)
Byte occurence cluster 2
e.g. typical for encrypted traffic)
SSH
RGNG map for error rates
RGNG map for byte histograms
Symbolic values for layer 7 protocols
Symbolic values for layer 4 protocols
RGNG Map
RGNG Map
Direct integration of the features
Direct integration of the features
Activation Patterns
POS(C) Tagging
Lexical Parser AnalysisMLP vs.
NLPIntro Idea Lemmatization
![Page 16: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A](https://reader035.vdocument.in/reader035/viewer/2022081614/5fcec739c83084605d656f4f/html5/thumbnails/16.jpg)
Analysis
• Unsupervised learning (clustering)
• Supervised learning (classifiers)
• Semantic search
• Semantic relations
• Anomaly detection
• Feature relevanceActivation Patterns
POS(C) Tagging
Lexical Parser AnalysisMLP vs.
NLPIntro Idea Lemmatization
![Page 17: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A](https://reader035.vdocument.in/reader035/viewer/2022081614/5fcec739c83084605d656f4f/html5/thumbnails/17.jpg)
Applications
e-ParticipationText analysis
Twitter Mining
User Tracking in WIFI Networks
Event Correlation
RDF Analysis
Malware Analysis
Activation Patterns
POS(C) Tagging
Lexical Parser AnalysisMLP vs.
NLPIntro Idea Lemmatization
![Page 18: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A](https://reader035.vdocument.in/reader035/viewer/2022081614/5fcec739c83084605d656f4f/html5/thumbnails/18.jpg)
MLP Example
• Metasploit shellcodes
• Decoder loops of various decryption engines
• Clustering, semantic search and relations
POS(C) Tagging
Lexical Parser
Activation Patterns
MLP vs. NLPIntro Idea Lemmatization Analysis
![Page 19: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A](https://reader035.vdocument.in/reader035/viewer/2022081614/5fcec739c83084605d656f4f/html5/thumbnails/19.jpg)
Clustering
• NLP:
• Group similar documents, extract key concepts from clusters, gain quick overview
• MLP:
• Group similar code (instruction sequences, functions, decryption loops)
POS(C) Tagging
Lexical Parser
Activation Patterns
MLP vs. NLPIntro Idea Lemmatization Analysis
![Page 20: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A](https://reader035.vdocument.in/reader035/viewer/2022081614/5fcec739c83084605d656f4f/html5/thumbnails/20.jpg)
Semantic Search• NLP:
• St. Petersburg is beautiful. The city was founded in 1703.
• MLP:
• Search for similar concepts (decryption loops)
POS(C) Tagging
Lexical Parser
Activation Patterns
MLP vs. NLPIntro Idea Lemmatization Analysis
![Page 21: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A](https://reader035.vdocument.in/reader035/viewer/2022081614/5fcec739c83084605d656f4f/html5/thumbnails/21.jpg)
Semantic Relations
• NLP
• MLP
POS(C) Tagging
Lexical Parser
Activation Patterns
MLP vs. NLPIntro Idea Lemmatization Analysis
![Page 22: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A](https://reader035.vdocument.in/reader035/viewer/2022081614/5fcec739c83084605d656f4f/html5/thumbnails/22.jpg)
Thoughts...
• Same basic principles for NLP and MLP
• Use the existing knowledge, bring it to another domain
• Apply it to arbitrary language
• Understand malware/programs???
![Page 23: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A](https://reader035.vdocument.in/reader035/viewer/2022081614/5fcec739c83084605d656f4f/html5/thumbnails/23.jpg)
Thank you for your attention!