advances in file carving
TRANSCRIPT
Advances in File CarvingRob Zirnstein, President
Forensic Innovations, Inc.
7/14/2011 www.ForensicInnovations.com
Our Data is GONE!
• All of your servers have Crashed!• Your customers’ Data is Lost!• You backed up last week, but important
business transactions have taken place since.
• 70% of companies with devastating data loss go out of business.
• All it took was one employee writing a simple SQL database script after you fired them.
We Didn’t Find The Evidence!
• What do you do when you’ve searched through all of the evidence and came up empty?
• When you know a suspect is hiding something, where do you look first?
• TrueCrypt Volumes & Unallocated Space• Even good people shred data when
faced with an investigation.• The tools are easy to find.
– www.TrueCrypt.org
How They Hide the Evidence
• Deleting a file– Sends the file to the Windows Recycle Bin
• Empty or bypass the Recycle Bin
– Undelete tools depend on the deleted directory entry• That can be deleted or overwritten too• Then there’s no undeleting possible
• Store files in a TrueCrypt Volume– Undetectable as a file (except for my tools)– Looks like random data in unallocated space
(except for my tool)
How To Get The Files Back
• File Carving– Definition: “General term for extracting data (files)
out of undifferentiated blocks (raw data), like "carving" a sculpture out of soap stone.” http://www.forensicswiki.org/wiki/File_Carving
– The sectors containing the files are orphaned– Some of them may get overwritten– They are like many jigsaw puzzles thrown
into a trash bag, if they were fragmented.– If some sectors were stored consecutively,
then it’s like puzzle pieces that weren’t pulled apart before getting trashed.
File Carving Assumptions
• No Files are Fragmented!?!– All Files are stored in consecutive sectors
• Sector Size = 512 bytes– May be detected through disk structure
• Cluster Size = 512 to 16,384 bytes– May be detected through disk structure
• File Slack may be ignored• RAM slack is ignored
– Or incorrectly bundled in with File Slack– Isn’t it always zeroed out?
File Carving Techniques
• Block Based Carving• Statistical Carving• Header/Footer Carving• Header/Maximum File Size Carving• Header/Embedded Length Carving• File Structure Based Carving• Semantic Carving• Carving with Validation• Fragment Recovery Carving• Repackaging Carving• SmartCarving• Hash Carving• Fuzzy Hash Carving
http://www.forensicswiki.org/wiki/File_Carving
Block Based Carving
• Analyze each sector on a block-by-block basis to determine if they belong together in the same file.
• Assuming that each sector can only be part of a single file
Statistical Carving
• Use statistics or content characteristics to identify each sector.
• Entropy measurement• Filter out blocks that clearly aren’t part of
a desired file type.
Header/Footer Carving
• Search for file header signature(s).• Search for the matching file footer
signatures.• Capture the sectors in between.
Header/Maximum File Size Carving
• Search for file header signature(s).• Consult a list of maximum file lengths for
each header type.• Capture the sectors in between.• Many file types do not detect the
additional unrelated data that may get appended to the recovered file.
Header/Embedded Length Carving
• Search for file header signature(s).• Read the file length from one of the fields
in the header.
File Structure Based Carving
• Once a sector’s file type is identified– Match to other sectors that contain similar
data structures.– Use knowledge of the file type’s data
structures to search for structure parts expected to exist in later sectors.
Semantic Carving
• Identify the language used in a sector.• Identify the language used in each of the
following sectors• Collect the sectors that are written in the
same language
Carving with Validation
• Use a file interpreter or viewer to load each recovered file.– If the interpreter encounters invalid data,
assume that is the point where the carving method failed.
• Use on completed files.• Use on each added sector.
Fragment Recovery Carving
• Find two or more fragments that belong to the same file.
• Filter out the sectors between the fragments that don’t belong.
Repackaging Carving
• Used on partially recovered files.• Rebuild the parts of the file that were not
able to be recovered.• The result should be a file that can be
opened with it’s native application or a standard viewer.
SmartCarving
• Use knowledge of the file system’s typical fragmentation effects.
• Preprocess the source sectors.– Decompress, decrypt or translate the data
• Collate the identified blocks.– Sort by file type
• Reassemble the blocks in sequences that match their file type.
Hash Carving
• Calculate a hash value for each sector– MD5, SHA-1
• Compare the hash value to a list of known sector hash values– This list can be of known Good and/or known
Bad files.– Filter out known Good files. (ex: Installed
applications)– Recover known Bad files. (ex: known illicit
material)
Fuzzy Hash Carving
• Calculate a fuzzy hash value for each sector.• Compare the fuzzy hash values of sectors to
determine which sectors are similar in content.• Combine similar sectors into recovered files.• Match raw data sectors together for object types
that have no identifiable signatures or that extend beyond a single sector.
• Recover file types not previously encountered.
Tools Today (1)
• Adroit Photo Recovery/Forensics– combination of SmartCarving, header carving, structure based validation and validation of the entire file to
determine if each new sector belongs; Repackaging Carving is also available; http://www.forensicswiki.org/wiki/File_Carving:SmartCarving
– Supports JPEG, RAW camera images, PNG, BMP and GIF files
• DataLifter– header-footer carving; Supports 25 file types
• Encase– header-footer carving; Supports ~250 file types
• Foremost– file structure based carving for avi, bmp, doc, gif, hmlt, jpg, mov, pdf, png, rar, wav and zip files.– header-footer carving for art, asf, chm, cookie, cpp, dat, dbx, fws, idx, java, lnk, mail, mbx, mp3, mpg, ost,
pgd, pgp, ppt, pst, ra, rdp, rpm, tif, txt, wma, wmv, wpc and xls files.
• Forensic Toolkit (FTK)– internal techniques unknown; Supports abl, aol, asd, bmp, doc, dot, emf, gif, html, jpg, mpp, one, pdf, png, ppt,
pub, puz, vsd, vss, vst, xla, xls and xlt files.
http://www.forensicswiki.org/w/images/b/b9/Kloet_2007.pdf
Tools Today (2)
• HstEx / Netanalysis– internal techniques unknown; Supports browser history formats
• NFI Defraser– Fragment recovery carving & carving with validation; Supports MPEG, 3GPP, Quicktime & AVI files
• PhotoRec– combination of file structure based carving and header-footer carving of 80 file formats
• PyFlag– appears to use a simple text search method, ignoring sector boundaries; Supports server log file formats
• Recover My Files– internal techniques unknown; Supports 200 file types
• Revit– SmartCarving; Supported file types list not available
http://www.forensicswiki.org/w/images/b/b9/Kloet_2007.pdf
Tools Today (3)
• Scalpel– combination of header-footer and header-maximum file size carving; Supports art, avi, dat, dbx, doc, fws, gif,
htm, idx, java, jpg, mail, max, mbx, mov, mpg, ost, pdf, pgd, pgp, pins, png, pst, ra, rpm, tif, txt, wav, wpc and zip files.
• X-Ways– header-footer carving; unknown support list
http://www.forensicswiki.org/wiki/Tools:Data_Recovery#Carving
Tool Problems
• Few tools handle file fragmentation• The tools that handle fragmentation
support very few file types• Most tools can not detect false positives• Most tools hard code file type support• Only 1 tool claims to rebuild partial files
– It only supports 5 file types (image files)
• Performance is a problem– most tools utilize inefficient databases and
scripting languages
Future Tools
• Carver 2.0– Open Source, in the early specification
stages
• File Harvester– Combination of multiple methods:
• Block Based Carving• Statistical Carving• Header/Footer Carving• Header/Embedded Length Carving• File Structure Based Carving• Fragment Recovery Carving• Repackaging Carving (Phase 3)• SmartCarving• Fuzzy Hash Carving• (secret sauce)