Data Mining Techniques for malware detection
-BY Aditya Deshmukh(TE-CSE1)
-BY ULLAS KAKANADAN(TE-CSE1)
-BY ANKIT GELDA(TE-CSE1)
-BY SUDARSHAN RANDIVE(TE-CSE1)
CONTENTS
•DATA MINING???•TECHNIQUES???•WHAT IS MALWARE???•TECHNIQUES OVER MALWARE•VARIOUS APPLICATIONS•CONCLUSION•QUESTION?
WHY MINE DATA???
Lots of data is being collected and warehoused
Potentially valuable resource Stored data grows very fast Information is crucial
DATA MINING
Extracting IMPLICIT PREVIOUSLY UNKNOWN POTENTIALLY USEFUL
Needed: programs that detect patterns and regularities in the data
Knowledge Discovery in Data
Knowledge discovery process
Data, Information, and Knowledge
• Dataoperational or transactional datanonoperational datameta data - data about the data itself
• Informationpatterns, associations, or relationships among all this data
• Knowledge
How data mining works??
•Classes: Stored data is used to locate data in predetermined groups.
•Clusters: Data items are grouped according to logical relationships or consumer preferences
•Associations: Data can be mined to identify associations.
•Sequential patterns: Data is mined to anticipate behavior patterns and trends
What is malware???
Short for malicious software old as software itselfprogrammer might create malware most common types Virus Trojans Worms Zombies Spyware
virus
most well-known
not to cause damage, but to clone itself onto another host
virus causes damage it is more likely to be detected
very small footprint
remain undetected for a very long time
Worms
very similar to viruses in many ways
worms are network-aware
computer-to-computer hurdle by seeking new hosts on the network
capable of going global in a matter of seconds
Very hard to be controlled and stopped
trojans
conceal itself inside software
Greeks were able to enter the fortified city of Troy by hiding their soldiers in a big wooden horse given to the Trojans as a gift
Disguises that a trojan can take are only limited by the programmer’s imagination
Cyber-crooks often use viruses, trojans and worms
Trojans also drop spyware
zombies
works in a similar way to spyware
infection mechanisms remain the same
just sits there waiting for commands from the hacker
infect tens of thousands of computers, turning them into zombie machines
distributed denial of service attack
Algorithm in data mining
C4.5 and beyond
The k-means algorithm
Support vector machines
The Apriori algorithm
The EM algorithm
Malware detection techniques
• anomaly-based detection technique
• signature-based detection technique
K-means algorithm
• takes the number of components of the population equal to the final required number of clusters
• examines each component in the population
• assigns it to one of the clusters depending on the minimum distance
• centroid's position is recalculated everytime a component is added
flowchart
ADVANTAGES OF DATA MINING
Marking/Retailing
Banking/Crediting
Law enforcement
Researchers
DISADVANTAGES OF DATA MINING
Privacy Issues
Security issues
Misuse of information/inaccurate information