building & leveraging white database for antivirus testing
DESCRIPTION
Presented at the International Antivirus Testing Workshop 2007 by Mario Vuksan, Director, Knowledgebase Services, Bit9TRANSCRIPT
![Page 1: Building & Leveraging White Database for Antivirus Testing](https://reader034.vdocument.in/reader034/viewer/2022051816/546c343cb4af9f662c8b4ff3/html5/thumbnails/1.jpg)
Building and Leveraginga Whitelist Database for Anti-Virus TestingMario Vuksan, Director, Knowledgebase Services
![Page 2: Building & Leveraging White Database for Antivirus Testing](https://reader034.vdocument.in/reader034/viewer/2022051816/546c343cb4af9f662c8b4ff3/html5/thumbnails/2.jpg)
Agenda• Growing Signature/Definition
Problem• Building a Global Whitelist• Leveraging a Global Whitelist• QA
![Page 3: Building & Leveraging White Database for Antivirus Testing](https://reader034.vdocument.in/reader034/viewer/2022051816/546c343cb4af9f662c8b4ff3/html5/thumbnails/3.jpg)
Growing Signature Problem• Cumulative unique variants have grown ten-fold
over last 5 years (Yankee Group)• “Denial-Of-Service” Attacks: Malware changing
signature every 10 minutes
• Solutions– Heuristic & Behavioral Detections
• New Problem: High “False Positive” Count
![Page 4: Building & Leveraging White Database for Antivirus Testing](https://reader034.vdocument.in/reader034/viewer/2022051816/546c343cb4af9f662c8b4ff3/html5/thumbnails/4.jpg)
Whitelist: a Google-sized ProjectSizing Software Universe
• Number of Files Released Daily by:• Microsoft – 500K / IBM – 100K / Sourceforge – 500K / Mozilla.Org – 250K
• More Components, Daily Builds, Auto Updaters
• 2.7B Files Indexed, heading for 10B• 30TB of Installers, heading for 100TB• Daily acquiring 50M File Records, ¼ of YouTube• Tracking 20,000 Software Companies
– E.g. DMOZ tracks 200,000+ Entities
100 TB
June2005
30M
300M
3B
10B
FilesIndexed
March2006
May2007
Dec2007
1 TB
8 TB
30 TB
Storage
![Page 5: Building & Leveraging White Database for Antivirus Testing](https://reader034.vdocument.in/reader034/viewer/2022051816/546c343cb4af9f662c8b4ff3/html5/thumbnails/5.jpg)
Mechanics of a Whitelist
Collect
Extract
Analyze
Software Infrastructure
Hardware Infrastructure
Publish (Interfaces)
Consumers
Outbound Metadata Inbound User Metadata
![Page 6: Building & Leveraging White Database for Antivirus Testing](https://reader034.vdocument.in/reader034/viewer/2022051816/546c343cb4af9f662c8b4ff3/html5/thumbnails/6.jpg)
Building a Whitelist• Trusted Partners
– Benefits• Trusted Source of Binary Material• In-depth Information on the Binary Data
Indexed– Realities
• Expensive Partner Programs• Complicated Applications• Lack of Interest• Lack of Comprehensive Repositories
![Page 7: Building & Leveraging White Database for Antivirus Testing](https://reader034.vdocument.in/reader034/viewer/2022051816/546c343cb4af9f662c8b4ff3/html5/thumbnails/7.jpg)
Certifying Software– Certificate Mechanism
• As a Component for Validation• Costly Process, Cumbersome for QA
Departments• Great When Seen on Shareware Sites Less than 10% Penetration
– First-Seen Date• Microsoft & Shared Installer Components• Long Time & No Detection Likely Good
![Page 8: Building & Leveraging White Database for Antivirus Testing](https://reader034.vdocument.in/reader034/viewer/2022051816/546c343cb4af9f662c8b4ff3/html5/thumbnails/8.jpg)
Challenges of Software Acquisition
• Buying/Getting Physical Media– Retail Prices vs. Ebay– How to process 35K DVDs?
• FTP Sites• Web Sites
– Simple: Links and Forms– Complicated: Javascript– Super Complicated: Frames and AJAX
• Shareware Sites• Warez
– Legal Ramifications– Users vs. Collectors
![Page 9: Building & Leveraging White Database for Antivirus Testing](https://reader034.vdocument.in/reader034/viewer/2022051816/546c343cb4af9f662c8b4ff3/html5/thumbnails/9.jpg)
Harvesting The Internet• Order of Difficulty
– FTPs – Wget, Curl– Simple HTTPs – Open Source Spiders– Try Grabbing Download.com– Try Grabbing Downloads.microsoft.com– Try Grabbing Canon or any Driver Site
• Datacenter Requirements
![Page 10: Building & Leveraging White Database for Antivirus Testing](https://reader034.vdocument.in/reader034/viewer/2022051816/546c343cb4af9f662c8b4ff3/html5/thumbnails/10.jpg)
Assuring Software is Trustworthy• Anti-Malware Scanning
– Name and Type Normalization• Behavior Scanning• Code Inspection• External Meta Data Collection and Matching
![Page 11: Building & Leveraging White Database for Antivirus Testing](https://reader034.vdocument.in/reader034/viewer/2022051816/546c343cb4af9f662c8b4ff3/html5/thumbnails/11.jpg)
Software Analysis Results• Basic Embedded Data• PE Header Analysis
– Processor, Language, Binary Type• Packers and Protectors
– 500+ Variants– ASPack and Adobe– PECompact and Google
• Install Formats– Proprietary (like Skype)– Binary Diffs (Patch Factory, MS PSF)
• Runtime Analysis and Sandboxing
![Page 12: Building & Leveraging White Database for Antivirus Testing](https://reader034.vdocument.in/reader034/viewer/2022051816/546c343cb4af9f662c8b4ff3/html5/thumbnails/12.jpg)
Software Classifications• Classifying Source
– Trust-based vs. Type-based• Classifying Files
– Functional (Font, Driver, Screensaver) vs. Descriptive • Classifying Products
– Basic• Open Source• Commercial: Driver vs. Application• IM / P2P / Games
– Better• Malware Classifications
– Interesting• Steganography/Watermarking/Hacking/Hiding
![Page 13: Building & Leveraging White Database for Antivirus Testing](https://reader034.vdocument.in/reader034/viewer/2022051816/546c343cb4af9f662c8b4ff3/html5/thumbnails/13.jpg)
Industry & Government Certifications
• Government Certifications– NIAP, FIPS, DCTS
• Vulnerability Reports– CVE, CERT, SANS, MSB, etc.
• For Good Software:– Certification Programs
• Built for Vista, Windows Certified, Java Approved– eTrust Download
• For Malware:– StopBadware, CME
![Page 14: Building & Leveraging White Database for Antivirus Testing](https://reader034.vdocument.in/reader034/viewer/2022051816/546c343cb4af9f662c8b4ff3/html5/thumbnails/14.jpg)
Leveraging the WhitelistDistribution of language
85%
2%1%
1%1%1%1%1%1%1%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%
English (U.S.)JapaneseChinese (Traditional)Chinese (Simplified)KoreanGermanItalianFrenchSpanishPortuguese (Brazil)DutchPolishTurkishRussianSwedishCzechDanishNorwegian BokmalFinnishHungarianGreekPortuguese (Portugal)HebrewArabicEnglish (Canadian)SlovakSlovenianBasqueCatalanCroatianBulgarianUkrainian
![Page 15: Building & Leveraging White Database for Antivirus Testing](https://reader034.vdocument.in/reader034/viewer/2022051816/546c343cb4af9f662c8b4ff3/html5/thumbnails/15.jpg)
PE Header Subsystem
Distribution of subsystem
65%
29%
5% 1%0%0%0%0%0%
The Windows graphical userinterface (GUI) subsystem
The Windows character subsystem
Device drivers and native Windowsprocesses
Windows CE
The Posix character subsystem
Unknown subsystem
An Extensible Firmware Interface(EFI) application
An EFI driver with boot services
An EFI driver with run-time services
![Page 16: Building & Leveraging White Database for Antivirus Testing](https://reader034.vdocument.in/reader034/viewer/2022051816/546c343cb4af9f662c8b4ff3/html5/thumbnails/16.jpg)
Other PE Header Data
Percentage of .NET Applications (based on COR20 header)
6%
94%
.NET application
Others
Percentage of binaries recoganized as DLLs (based on file characteristics bitmask)
76%
24%
DLL
Others
Percentage of binaries with bounded import table
29%
71%
Bounded Import Table
Unbounded
Distribution of machine code
87%
8%
4%
1%0%0%0%0%0%0%0%0%0%
Intel 386 or later processors andcompatible processors
Intel Itanium processor family
AMD64
Alpha_AXP
MIPS little endian
Power PC little endian
ARM little endian
Thumb
Hitachi SH3
MIPS with FPU
Hitachi SH4
MIPS16
![Page 17: Building & Leveraging White Database for Antivirus Testing](https://reader034.vdocument.in/reader034/viewer/2022051816/546c343cb4af9f662c8b4ff3/html5/thumbnails/17.jpg)
What about False Positives?
• Typical Suspects:– Internet Explorer– Drivers (Network, File Access)– OS Components– Universal Installer and Uninstaller
Components• Optimized Applications:
– Using Obscure Third-Party Software– ASPack, PECompact, Themida
![Page 18: Building & Leveraging White Database for Antivirus Testing](https://reader034.vdocument.in/reader034/viewer/2022051816/546c343cb4af9f662c8b4ff3/html5/thumbnails/18.jpg)
Archive Format Distribution• Most popular archive/packer formats
ARC/GZIP44%
ARC/MSCAB27%
ARC/ZIP7%
ARC/BZIP27%
ARC/TAR6%
SFX/MSCAB2%
ARC/LZ1%
SFX/UPX1%
ARC/MSI1%
SFX/MSDelta1%
ARC/PSF1%
ARC/RAR0%
ARC/ISCAB0%
SFX/ZIP0%
SFX/Nullsoft0%
SFX/RAR0%
SFX/IS0%
SFX/WISE0%
ARC/ISO0%
ARC/7ZIP0%
SFX/WISE/Embedded
0%
UPX 0.8x - 2.xx0%
ASPack 2.120%
SFX/BZIP20%
PECompact 2.xx0%
- ASPack 2.112.11d
0%
ARC/PSF0%
SFX/NOS0%
ARC/UDF0%
ARC/WIM0%
ARC/PSF0%
ARC/MSCAB0%
ASPack 2.10%
ARC/MSCAB0%
ASPack 2.110%
UPX 0.8x - 2.xx0%
PECompact 1.681.76 -
0%
- ASPack 2.112.11d
0%
ASPack 2.120%
ASPack 1.08.030%
ASPack 1.07b0%
PECompact 2.xx0%
ASPack 2.0000%
- WinUPack 0.370.390%
ARC/WIM0%
SFX/7ZIP0%
- WinUPack 0.280.3x0%
- ASPack 1.06b1.061b
0%
ASPack 1.08.020%
ASPack 2.120%
- ASPack 2.112.11d
0%
ASPack 2.0010%
Private exeProtector 2.0
0%
CExe 1.0a0%
PE Pack 1.00%
PECompact 1.301.32 -
0%
PECompact 2.xx0%
PC Guard 5.000%
UPX 0.720%
UPX 0.8x - 2.xx0%
Private exeProtector 2.0
0%
ASPack 2.10%
- ASPack 1.08.001.08.01
0%
ASPack 2.0000%
ASPack 1.08.030%
ASPack 1.08.040%
![Page 19: Building & Leveraging White Database for Antivirus Testing](https://reader034.vdocument.in/reader034/viewer/2022051816/546c343cb4af9f662c8b4ff3/html5/thumbnails/19.jpg)
Or Are They False Positives?(FTP Injection Attacks)
• HP
![Page 20: Building & Leveraging White Database for Antivirus Testing](https://reader034.vdocument.in/reader034/viewer/2022051816/546c343cb4af9f662c8b4ff3/html5/thumbnails/20.jpg)
Or Are They False Positives?(FTP Injection Attacks)• Nero AG
![Page 21: Building & Leveraging White Database for Antivirus Testing](https://reader034.vdocument.in/reader034/viewer/2022051816/546c343cb4af9f662c8b4ff3/html5/thumbnails/21.jpg)
Vertical Detection• Malware Sample Vertical File Detection
Chart
• Good File Vertical Analysis• Anti-Malware Reports per Web Site
– Bit9 ISV Safe Software Program
![Page 22: Building & Leveraging White Database for Antivirus Testing](https://reader034.vdocument.in/reader034/viewer/2022051816/546c343cb4af9f662c8b4ff3/html5/thumbnails/22.jpg)
Use Case: Anti-Malware• Benefits
– R&D Tool•Packers, Metadata, Sources
– QA Tool•False Positives
– Performance Accelerator•Robin Bloor’s AVID•Next Generation Anti-Malware
![Page 23: Building & Leveraging White Database for Antivirus Testing](https://reader034.vdocument.in/reader034/viewer/2022051816/546c343cb4af9f662c8b4ff3/html5/thumbnails/23.jpg)
About Bit9• What We Do:
– Application and Device Control Solutions and Software Metadata Reporting
• What We Offer:– Bit9 Parity Protects against Malicious Software and Data
Leakage– The Bit9 Knowledgebase is the Largest Collection of
Actionable Intelligence about the World’s Software• Background
– Founded in 2002 by founders of Okena (Cisco)– $2 Million NIST ATP Grant in 2003– Headquartered in Cambridge, Mass.– Venture Funded