hashing of file blocks: when exact matches are not useful

14
Hashing of File Blocks: When Exact Matches Are Not Useful Douglas White

Upload: others

Post on 23-Apr-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hashing of File Blocks: When Exact Matches Are Not Useful

Hashing of File Blocks: WhenExact Matches Are Not Useful

Douglas White

Page 2: Hashing of File Blocks: When Exact Matches Are Not Useful

DisclaimerTrade names and company products are mentioned in

the text or identified. In no case does suchidentification imply recommendation or endorsementby the National Institute of Standards andTechnology, nor does it imply that the products arenecessarily the best available for the purpose.

Statement of DisclosureThis research was funded by the National Institute of

Standards and Technology Office of LawEnforcement Standards, the Department of JusticeNational Institute of Justice, the Federal Bureau ofInvestigation and the National Archives and RecordsAdministration.

Page 3: Hashing of File Blocks: When Exact Matches Are Not Useful

Identification of Issues

• A meaningless change of file contentsdrastically changes a hash value

• Amount of data input to investigation isimmense

• Common hash algorithms can not identifysuspect files similar to known files

• Commonly used hash algorithms do not yielduseful data on partial or deleted files

Page 4: Hashing of File Blocks: When Exact Matches Are Not Useful

National Software Reference Library& Reference Data Set

The NSRL is conceptually three objects:• A physical collection of software• A database of meta-information• A subset of the database,

the Reference Data Set

The NSRL is designed to collect software fromvarious sources and compute hashes ofknown applications. For the purpose ofblock hashes, we assume applications arebenign.

Page 5: Hashing of File Blocks: When Exact Matches Are Not Useful

Perturbing File Hashes

Use of cryptographic hashes to automaticallyidentify files is absolute, too precise.

When dealing with morphing digital objects,such sorting leaves many files to be dealt withby manual review.

The NSRL hashset is commonly used toautomatically remove benign known itemsfrom human processing, which is fail-safe.

Page 6: Hashing of File Blocks: When Exact Matches Are Not Useful

Reducing Data Inflow

NSRL file content hash values allowinvestigators to automatically removebenign known items from view.

Known benign data can be identifiedbefore it arrives to investigators.

Is it technically possible to meaningfullyreduce the amount of incoming data?

Page 7: Hashing of File Blocks: When Exact Matches Are Not Useful

Block Hashes of Files

NSRL is investigating the usefulness ofintroducing the rigor of cryptographicdigital file identification at a granularlevel which supports statisticalidentification of objects .

Block hashing applies the cryptographicalgorithms to smaller-then-filesizeportions of the suspect data.

Page 8: Hashing of File Blocks: When Exact Matches Are Not Useful

File Selection

NSRL investigated 4096-byte blockhashes of Windows 2000 and WindowsXP operating system files in ourcollection.

NSRL also collected installed file blockhashes from physical and virtualmachines.

Page 9: Hashing of File Blocks: When Exact Matches Are Not Useful

Block Selection

NSRL investigated 4096-byte block hashvalues.

4096 bytes was the smallest windowconsidered, based on tools, storage andstatistical applicability..2% of collection < 16KB3% of collection < 32KB27% of collection < 128KB

Page 10: Hashing of File Blocks: When Exact Matches Are Not Useful

Block Hashing Benefits

• File-based data reduction leaves anaverage of 30% of disk space forhuman investigation

• Incorporating block hashes reduceshuman review to 15% of disk space

• Assist in recognizing wiped media• Assist in profiling media use

Page 11: Hashing of File Blocks: When Exact Matches Are Not Useful

Physical and Virtual Machines

• P-XP vs. P-XP = 83% 8,679 files• P-XP vs. V-XP = 85%• V-XP vs. V-XP = 91%• P-W2K vs. P-W2K = 85% 7,688 files• P-W2K vs. V-W2K = 89%• V-W2K vs. V-W2K = 94%

Page 12: Hashing of File Blocks: When Exact Matches Are Not Useful

Known - Unknown - Zero2nd 512 MB in W2K NTFS VM

Page 13: Hashing of File Blocks: When Exact Matches Are Not Useful

Next Steps

Investigate a wider variety of applicationsAutomation & virtualization of installationComparison with “fuzzy” hashesStorage in Bloom filterPrototype disk block imager“Smart unpacking” of remaining data

Page 14: Hashing of File Blocks: When Exact Matches Are Not Useful

ContactsDouglas [email protected]

Barbara GuttmanSoftware Diagnostics & Conformance Testing [email protected]

Sue Ballou, Office of Law Enforcement StandardsRep. For State/Local Law [email protected]