virus-antivirus co-evolution - dtc
TRANSCRIPT
![Page 1: Virus-AntiVirus Co-evolution - DTC](https://reader031.vdocument.in/reader031/viewer/2022021307/6207495b49d709492c2fe4d5/html5/thumbnails/1.jpg)
2006 Symantec Corporation, All Rights Reserved
AnonymizingAnonymizing FilesystemFilesystem Metadata Metadata for Analysisfor Analysis
Chris Xin
Symantec
![Page 2: Virus-AntiVirus Co-evolution - DTC](https://reader031.vdocument.in/reader031/viewer/2022021307/6207495b49d709492c2fe4d5/html5/thumbnails/2.jpg)
Challenges of Filesystem Analysis
Real-time live-system monitoring is difficult.– performance degradation– security & privacy concerns– stability risk
Traces– difficult to reconstruct I/O dependencies– system states– security & privacy concerns
Benchmarks– “There are lies, damn lies and then there are benchmarks.”
Filesystem images– snapshot, backups– security & privacy concerns
![Page 3: Virus-AntiVirus Co-evolution - DTC](https://reader031.vdocument.in/reader031/viewer/2022021307/6207495b49d709492c2fe4d5/html5/thumbnails/3.jpg)
Agenda
Challenges of filesystem analysis
Keeping filesystem images– metasave
Metadata anonymization– secure metasave
Measurement– space efficiency– time efficiency– resource consumption
Summary
![Page 4: Virus-AntiVirus Co-evolution - DTC](https://reader031.vdocument.in/reader031/viewer/2022021307/6207495b49d709492c2fe4d5/html5/thumbnails/4.jpg)
Filesystem Images
Storing the whole system would be expensive.– large storage space– long time
Keeping metadata is a wise idea.– A good resource for understanding some characteristics of a file
system– Cumulative images can be obtained to track the change trend of a file
systemfile size, age, type informationfilesystem aging analysis
– Address some privacy concerns by eliminating user data
Some file systems already provide such a utility.– Ext2: e2image– Linux NTFS: ntfsclone --metadata– VxFS: metasave
![Page 5: Virus-AntiVirus Co-evolution - DTC](https://reader031.vdocument.in/reader031/viewer/2022021307/6207495b49d709492c2fe4d5/html5/thumbnails/5.jpg)
Metasave Utility
The utility saves or restores the metadata of VxFS– Available in version 1 and later versions.– Metadata is kept in a way that the original geometry of a file system
is preserved and all the inode information is intact.– No user data is retained.– Metadata can be saved on top of a snapshot, a backup, or a live
system as an image file.– The image file can be deflated and metadata can be restored back to
a file or a device.
What do we do with images?– troubleshooting– debugging– file system analysis
![Page 6: Virus-AntiVirus Co-evolution - DTC](https://reader031.vdocument.in/reader031/viewer/2022021307/6207495b49d709492c2fe4d5/html5/thumbnails/6.jpg)
Efficient Anonymization
But …your clients may say no …– Sensitive information is still in the file and directory names– Concerns of performance degradation
Solution: Anonymize clients’ information in metadata– Names of files and directories– Client information in file system intent logs
Requirements– Must be difficult to recover original information– Keep the geometry of the file system: retain the length of the
file/directory names– Time efficient– Space efficient– Minimum performance degradation
![Page 7: Virus-AntiVirus Co-evolution - DTC](https://reader031.vdocument.in/reader031/viewer/2022021307/6207495b49d709492c2fe4d5/html5/thumbnails/7.jpg)
Secure Metasave
Enhanced metasave with encryption options– Evolved from metasave, a VxFS utility for saving/restoring
metadata of a file system– Online image saving– Use cryptographic message digest algorithm to obfuscate
client informationThe algorithm can be chosen by a client’s requirementDefault: SHA-1
![Page 8: Virus-AntiVirus Co-evolution - DTC](https://reader031.vdocument.in/reader031/viewer/2022021307/6207495b49d709492c2fe4d5/html5/thumbnails/8.jpg)
Message Digest
Secure one-way hash function: e=H(M)– M: original message– H: hash function– e: digested message
Key properties– Given M, easy to compute e=H(M) – Given e, hard to compute M such that e=H(M)– Given M, hard to find M' (different from M) such that
H(M)=H(M') (minimum collision)
![Page 9: Virus-AntiVirus Co-evolution - DTC](https://reader031.vdocument.in/reader031/viewer/2022021307/6207495b49d709492c2fe4d5/html5/thumbnails/9.jpg)
Implementation
OpenSSL libraryObfuscate a file/directory name
– Do it by individual pathname components/a/bc/bcd /x/rd/wyz
– Retain name lengthDigest works on a fixed length of characters at a time.
– 20 characters for SHA-1If len(name) > len(digest), process it in segments.If len(name) < len(digest) or len(final segment) < len(digest), digest the name string and remove some characters to preserve its original length.Digest can contain characters that are illegal in file/directorynames; map them to legal characters.
![Page 10: Virus-AntiVirus Co-evolution - DTC](https://reader031.vdocument.in/reader031/viewer/2022021307/6207495b49d709492c2fe4d5/html5/thumbnails/10.jpg)
File/Directory Name Manipulation
Parse a name stringMessage digestChop it to its original length
Random number generator with a changeable seed
Character mapping
790
digests
0 67
original name string
20 6040
0 67
obfuscated filename
0
chop to org. length
67
![Page 11: Virus-AntiVirus Co-evolution - DTC](https://reader031.vdocument.in/reader031/viewer/2022021307/6207495b49d709492c2fe4d5/html5/thumbnails/11.jpg)
Obfuscation Options
Full-name obfuscation
Retain file extension if any
Obfuscate extensions as well and make them consistent
original nameobfuscation option
foo1.c foo2.c
full-name abcde uwxyz
retain file extension jkis.c swdx.c
consistent extension jkis.x swdx.x
![Page 12: Virus-AntiVirus Co-evolution - DTC](https://reader031.vdocument.in/reader031/viewer/2022021307/6207495b49d709492c2fe4d5/html5/thumbnails/12.jpg)
Further Handling
Multiple extensions and prefixes for name-only obfuscation option– Look at the last extension only
foo.c.bak abced.bak– retain extension of 4 or less; obfuscate anything bigger
Do not obfuscate the name of special administrative files or directories– lost+found
Rebuild directory indexes and block checksums after name obfuscationSymlinks
– Point to the same place within the file system– “..” is kept intact
Intent logs– Offers an option to not include intent logs in an image file.– If intent log is retained, file and directory names are obfuscated.
![Page 13: Virus-AntiVirus Co-evolution - DTC](https://reader031.vdocument.in/reader031/viewer/2022021307/6207495b49d709492c2fe4d5/html5/thumbnails/13.jpg)
Collision Probability
What’s a collision?– Two files/directories with different names, say A and B, end up with
the same name after obfuscation.
Do we have to worry about it?– Not really– Collision only matters within individual directories.– Chance of collision is tiny
With SHA-1, 1 in 1024 possibility for a filesystem with a trillion file/directory names, and 1 in 1018 for quadrillion names.The character mapping and name length chopping increase the chance of collisions slightly.
– An optional name conflict check is followed after obfuscation for a file system with large directories.
![Page 14: Virus-AntiVirus Co-evolution - DTC](https://reader031.vdocument.in/reader031/viewer/2022021307/6207495b49d709492c2fe4d5/html5/thumbnails/14.jpg)
Measurement
Three categories– Space consumption– Time consumption
encryption overhead– Resource consumption
Six filesystems measured– four customer filesystems– two filesystems on our production server (fs #2 and #6)
Experiment environment– Live production system
Sun Fire E690016 Sparc CPUs, 32GB memory, shared disks
– Test machineSun Fire V2402 Sparc CPUs, 2GB memory, single-user disks
![Page 15: Virus-AntiVirus Co-evolution - DTC](https://reader031.vdocument.in/reader031/viewer/2022021307/6207495b49d709492c2fe4d5/html5/thumbnails/15.jpg)
Space Efficiency
The image of metadata usually takes about 1-5% of the filesystemsize.
storage efficiency
0.08 0.06 0.04 0.05
6.88
0.600.12 0.08
0.73 0.56
11.73
0.63
0
2
4
6
8
10
12
1 2 3 4 5 6
filesystem
% o
f im
age
over
fs s
ize
% of total cap.% of used cap.
![Page 16: Virus-AntiVirus Co-evolution - DTC](https://reader031.vdocument.in/reader031/viewer/2022021307/6207495b49d709492c2fe4d5/html5/thumbnails/16.jpg)
Time Efficiency
How long does it take to get an anonymized file system image?– use “filename-only” option– on the live production system
about 30 minutes to get an encrypted metadata image from fs #6.5--8 secs for fs #2.
– on the test machine:time efficiency
1.9 1.7 0.267 6.4
108.33
273
0
50
100
150
200
250
300
1 2 3 4 5 6
filesystem
time
(sec
)
![Page 17: Virus-AntiVirus Co-evolution - DTC](https://reader031.vdocument.in/reader031/viewer/2022021307/6207495b49d709492c2fe4d5/html5/thumbnails/17.jpg)
A closer look
The factors in play– # of inodes– total filesystem size– filesystem capacity
fs # files time (sec)
production
msv size/
used fs cap.
39 --
4
--
--
--
1836
742
0.12%
0.08%
0.73%
0.56%
11.73%
3,721
59,584
956,180
2,259,443 0.63%
time (sec)
test
msv size/
total fs cap.
total(GB) used(GB)
1.9
1.7
0.267
6.4
108.33
273.0
27.80.08%
0.06%
0.04%
0.05%
6.88%
49.5
9.0
150.0
3.9
0.60% 195.4
1 18.3
2 39.4
3 0.6
4 12.4
5 2.3
6 186.9
![Page 18: Virus-AntiVirus Co-evolution - DTC](https://reader031.vdocument.in/reader031/viewer/2022021307/6207495b49d709492c2fe4d5/html5/thumbnails/18.jpg)
Encryption Overhead
Space efficiency is the same.
time efficiency– Little overhead introduced on a live production system
I/O boundedshared disk
– Noticeable computational overhead on the test machine.
![Page 19: Virus-AntiVirus Co-evolution - DTC](https://reader031.vdocument.in/reader031/viewer/2022021307/6207495b49d709492c2fe4d5/html5/thumbnails/19.jpg)
Encryption Overhead on the Test Machine
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1 2 3 4 5 6
file system
norm
aliz
ed ti
me
no-encryptionfull-obfuscationfilename-onlyconsistent-extension
![Page 20: Virus-AntiVirus Co-evolution - DTC](https://reader031.vdocument.in/reader031/viewer/2022021307/6207495b49d709492c2fe4d5/html5/thumbnails/20.jpg)
Encryption Overhead on the Production System
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1 2 3 4 5 6
file system
norm
aliz
ed ti
me
no-encryptionfull-obfuscationfilename-onlyconsistent-extension
![Page 21: Virus-AntiVirus Co-evolution - DTC](https://reader031.vdocument.in/reader031/viewer/2022021307/6207495b49d709492c2fe4d5/html5/thumbnails/21.jpg)
Resource Consumption
Not much performance degradation during image saving
– 20 MB memory and 1% of CPU were utilized during the image dumping on a live production system.
![Page 22: Virus-AntiVirus Co-evolution - DTC](https://reader031.vdocument.in/reader031/viewer/2022021307/6207495b49d709492c2fe4d5/html5/thumbnails/22.jpg)
Summary
A method of anonymizing filesystem metadata.– Obfuscate clients information to relieve privacy concerns– Cost 1-5% storage of the original file system size.– Fairly quick process and little performance degradation.
We encourage saving file metadata images with anonymization.
– Provide a good resource for file system analysis– Benefit both development and research
The anonymization scheme can be used in other file system utilities, such as trace collecting.
![Page 23: Virus-AntiVirus Co-evolution - DTC](https://reader031.vdocument.in/reader031/viewer/2022021307/6207495b49d709492c2fe4d5/html5/thumbnails/23.jpg)
References
Bruce Schneier, Applied Cryptography. Second Edition, J. Wiley and Sons, 1996
Mark Ryan, “One-way secure hash functions”, Computer Security lecture notes, University of Birmingham.
Geoff Kuenning and Ethan L. Miller, "Anonymization Techniques for URLs and Filenames," Technical Report UCSC-CRL-03-05, University of California, Santa Cruz, September 2003.
Xiaoyun Wang, Yiqun Lisa Yin and Hongbo Yu, “Finding Collisions in the Full SHA-1”, CRYPTO 2005
http://www.linux-ntfs.org/
![Page 24: Virus-AntiVirus Co-evolution - DTC](https://reader031.vdocument.in/reader031/viewer/2022021307/6207495b49d709492c2fe4d5/html5/thumbnails/24.jpg)
Acknowledgements
Thanks to Oleg Kiselev, John Colgrove, Craig Harmer, Chuck Silvers and George Mathew for discussions.
Thanks to Marianne Lent and Paul Massiglia for suggestions.
Thanks to Ken Zachmann for helping with experiments.
![Page 25: Virus-AntiVirus Co-evolution - DTC](https://reader031.vdocument.in/reader031/viewer/2022021307/6207495b49d709492c2fe4d5/html5/thumbnails/25.jpg)
Questions