digital forensicsrocha/teaching/2014s2/... · a. rocha, 2014 – digital forensics (mo447/mc919) 29...

69
Reasoning for Complex Data (RECOD) Lab. Institute of Computing, University of Campinas (Unicamp) Av. Albert Einstein, 1251 - Cidade Universitária CEP 13083-970 • Campinas/SP - Brasil Digital Forensics MO447 / MC919 * Pintura de Rajib Roy, Case Investigation - 2012 Prof. Dr. Anderson Rocha Microsoft Research Faculty Fellow Affiliate Member, Brazilian Academy of Sciences Reasoning for Complex Data (Recod) Lab. [email protected] http://www.ic.unicamp.br/~rocha

Upload: others

Post on 20-May-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

Reasoning for Complex Data (RECOD) Lab. Institute of Computing,

University of Campinas (Unicamp)

Av. Albert Einstein, 1251 - Cidade Universitária CEP 13083-970 • Campinas/SP - Brasil

Digital Forensics MO447 / MC919

* Pintura de Rajib Roy, Case Investigation - 2012

Prof. Dr. Anderson Rocha !

Microsoft Research Faculty Fellow Affiliate Member, Brazilian Academy of Sciences

Reasoning for Complex Data (Recod) Lab. [email protected] http://www.ic.unicamp.br/~rocha

Page 2: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

File Carving & Smart File Carving

Page 3: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 3

Based on “The evolution of file carving – the benefits and problems of forensics recovery.” Anandabrata Pal and Nasir Memon. IEEE Signal Processing Magazine, 26(2):59–71, March 2009.

Page 4: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

Organization

Page 5: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 5

Organization

‣ Intro and Terminology

‣ Traditional File Recovery

‣ File Carving

‣ Smart Carving

‣ Conclusions

‣ References

Page 6: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

Introduction and Terminology

Page 7: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 7

What is File Carving?

‣ Denotes the extraction and recovery of files based on their structure

Page 8: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 8

Why File Carving?

Massive amount of data subject to

‣ File system corruption

‣ Device formatting

‣ Unknown proprietary formats

‣ Files removed or deleted (un- or intentionally)

Page 9: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 9

Storage (1)

‣ Hard disks and SSD's are divided in Clusters

‣ Clusters are formed by sectors and is atomic in the data storage world

‣ Clusters vary from 512 bytes to 32K bytes

Page 10: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 10

Storage (2)

‣ The file systems

• manage the files

• alocate blocks (Clusters)

‣ The traditional allocation may or not not be sequential

• Non-sequential allocation => fragmentation

Page 11: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 11

Example: Storing a file

File

Data from a file under to vantage point of an application

↓A1 A2 A3 A4 A5 A6 A7 A8 A9

Data from a file in the disk, split into blocks

Page 12: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 12

Fragmentation

‣ The fragmentation level depends on:

• File system

• File size

• Cluster size

‣ Once again: non-sequential allocation => fragmentation

‣ Fragments may appear in any order

Page 13: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 13

In this example, each cell represents a block. Here we have three files, each one with three clusters. Clusters 1, 2 and 3 represent, respectively: the beginning, middle and end of file.

Fragmentation example

A1 A2 B1 B2 B3 C2 C1 C3 A3

Page 14: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 14

‣ Taking A as an example:

• A1 and A2 are the base fragment

• A2 é fragmentation point

Terminology

A1 A2 B1 B2 B3 C2 C1 C3 A3

Page 15: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

Traditional data recovery

Page 16: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 16

Traditional recovery

‣ Relies on the structures present in the the file system, for instance: file allocation tables

• File systems normally only mark an entry as removed

‣ It allows a fast recovery of files while they are present in the structure

‣ It avoids searches for unallocated areas of the disk

Page 17: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919)

Traditional recovery

17

* FAT 32 * 4GB limit !* NTFS came to overcome this problem * Uses B-Trees to store the information related to files (not the actual content of files)

Inserting a file - FAT32

Page 18: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919)

Traditional recovery

18

Deleting a file - FAT32

Page 19: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919)

Some discoveries

19

‣ Analyzing ~350 HDs (FAT, NTFS, UFS), it was found that

‣ Fragmentation is low but exists

‣ It is high for user files (MSOffice, e-mail, JPEG).

‣ JPEGs = 16%

‣ MS Word = 17%

‣ AVI = 22%

‣ MS-Outlook PSTs = 58%

Page 20: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919)

Some discoveries

20

‣ Amiga Smart File System moves an entire file upon each edit

‣ Unix File System (UFS) predicts possible extensions leaving some available clusters to a file

‣ XFS and ZFS use late writing until a flush from the OS is sent

Page 21: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919)

Some discoveries

21

‣ SSDs tend to increase fragmentation regardless of the file system used due to wear-leveling techniques

‣ If the controller is compromised, only file carving approaches could be used and not traditional techniques of recovery

‣ In some cases, the file system itself can force a fragmentation (UFS does it to large files or when a file has an odd number of clusters).

Page 22: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

File Carving

Page 23: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 23

File Carving: General Rules

‣ Does not rely directly on the information present file system structures

‣ Normally identify common files by means of hashes (MD5) and keywords

Page 24: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

File Carving based on Structure

Page 25: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 25

File Carving: Recovery based on Structure

‣ It searches files based on “magic numbers” (ie. sequence of bytes in known positions)

• Header and footer (e.g., jpegs), or

• Header and file size (e.g., bmps)

‣ More advanced techniques also use the file content

‣ A file is formed by all clusters between a header and a footer

Page 26: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 26

File Carving: Recovery based on Structure

A1 A2 B1 B2 B3 C2 C1 C3 A3

File A: A1+A2+B1+B2+B3+C2+C1+C3+A3

A1 A2 B1 B2 B3 C2 C1 C3 A3

File B: B1+B2+B3

A1 A2 B1 B2 B3 C2 C1 C3 A3

File C: C1+C3

Page 27: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

File Carving based on Graph Theory

Page 28: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 28

File Carving based on Graph Theory

‣ Approaches the recovery problem by means of structure of files

‣ The blocks represent the vertices

‣ Edges represent the similarity between blocks (weight)

‣ How to define the similarity?

Page 29: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29

Graphs: Hamiltonian Path

‣ Technique presented by Shanmugasundaram et al.

‣ Computes the permutation in a set of n blocks belonging to a file A which represents the original structure in A

• The weights between blocks represent the probability of them being adjacent

• The correct permutation is likely the one that maximizes the sum of weights

‣ The set of all weights creates an adjacency matrix of a complete graph of n vertices.

• The correct sequence is an Hamiltonian path in the graph.

Page 30: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 30

Graphs: Hamiltonian Path

‣ The question is: how to determine the weight between two blocks (clusters)?

• Prediction by parcial matching (PPM) for texts (Kulesh et al.)

• Border comparison for images (Pal et al.)

Page 31: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919)

Prediction by parcial matching (PPM)

31

Page 32: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919)

Border of blocks for images

32

Page 33: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919)

Graphs: Hamiltonian Path

33

‣ Problems?

‣ It does not consider that, in real systems, many files can be fragmented at the same time

‣ Statistics of multiple files could be helpful

Page 34: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919)

Graphs: Hamiltonian Path

34

Page 35: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 35

Graph: k-Vertex Disjoint Path

‣ Refinement of the Hamiltonian Path method by Pal et al.

• In real cases, many files are fragmented simultaneously

• This technique uses the statistics of such files

‣ Each vertex represents a block

Page 36: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 36

Graph: k-Vertex Disjoint Path

‣ We start with k files identified by their headers

• There exists only k disjoint paths, as (usually) each block belongs to a unique file

‣ It is an NP-hard problem

‣ Many algorithms were proposed for this case but the ones called UP – unique path are the highlight

Page 37: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 37

Unique Path (UP) Algorithms

‣ Realistic: each cluster usually belongs to a unique file

‣ The problem: errors propagate in cascade

• An incorrect cluster leads to the wrong reconstruction of two files

Page 38: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 38

File Carving: PUP

PUP: Parallel Unique Path

1. Starts with a set S with k Headers (s1,s

2,...,s

k), related to k files.

2. Finds the set T with k clusters, where ti is the best correspondence to s

i. It selects the t

i

with highest correspondence among all.

i. Adds ti to the path of the ith file

ii. Replaces the current cluster in S to the ith element (si = t

i)

iii. Finds a new set T of the best correspondences

iv. Selects the element with the best correspondence

v. Repeats (i) until all files are complete

Page 39: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 39

Example: PUP

Page 40: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 40

File Carving: SPF

SPF: Shortest Path First

1. Shortest path first (SPF) is an algorithm that assumes that the best recoveries have the lowest average path costs.

2. This algorithm reconstructs each image one at a time.

3. However, after an image is reconstructed the clusters assigned to the image are not removed, only the average path cost is calculated

4. All the clusters in the reconstruction of the image are still available for the reconstruction of the remainder of the images

Page 41: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 41

File Carving: SPF

SPF: Shortest Path First

1. This process is repeated until all the image average path costs are calculated.

2. Then the image with the lowest path cost is assumed to be the best recovery and the clusters assigned to its reconstruction are removed.

3. Each of the remaining images that used the clusters removed have to redo their reassemblies with the remaining clusters and their new average path cost is calculated.

4. Once this process is completed for the remaining images, the one with the lowest average path cost is again removed, and this process continues until all images are recovered.

Page 42: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 42

File Carving: SPF

SPF: Shortest Path First

1. For each image to be reconstructed:

i. From the available set of clusters, reconstructs the path and calculate the average path

2. Finds, among all paths, the one with the lowest avg cost and reconstructs such image

3. Remove from the other paths the used blocks in step (2)

4. Repeats step (1) until all images are reconstructed

Page 43: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 43

File Carving: PUP vs. SPF

‣ Reconstruction of up to 88% of files against 83% of PUP

‣ Performance and scalability are lower than PUP

‣ The edge weights are pre-computed to facilitate the search but this step has complexity O(n2 log n)

‣ Modern disks contain millions of clusters and pre-computing such weights are UNFEASIBLE

Page 44: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

BitFragment Gap Carving

Page 45: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 45

BitFragment Gap Carving

‣ Fast Object Validation for files with headers and footers

‣ Files must be decodable (JPEG, MPEG, ZIP, etc.)

‣ A validator will show if a sequence is valid or not for a specific file type

‣ For instance, PNG uses CRC (error correction codes) at the end of the files

‣ Plain texts and BMPs cannot be recovered this way

Page 46: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 46

BitFragment Gap Carving

‣ Bitfragment Gap Carving (BGC) recovers files by exhaustive search of the gap between two sequences validating everything in between

Page 47: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 47

BitFragment Gap Carving

‣ Consider bh as a header cluster, bf is the fragmentation point, bs the start of the cluster with the footer and bz the footer

‣ For each gap size g starting in 1, all combinations of bf and bs are designated in such a way there are exactly g clusters between them (s - f = g)

‣ Disadvantages:

‣ This technique does not scale for larger gaps

‣ It only works for files of two fragments

‣ It only works for files that can be validated

‣ Correct validation does not mean coherent/correct

Page 48: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 48

BitFragment Gap Carving

Page 49: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

Smart File Carving

Page 50: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 50

Smart Carving

Proposed by Pal et al.

‣ Aims at solving scalability problems

‣ Takes into consideration the typical behavior of fragmentation in disks

‣ Steps:

• Pre-processing

• Collating/Comparison/classification

• Reassembly/Reconstruction

Page 51: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919)

Smart Carving

51

Page 52: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 52

Smart Carving: Pre-processing

‣ Applied to data that are compressed or encrypted

‣ Optionally, can remove the allocated clusters (via additional information from the table of allocation, for instance)

Page 53: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 53

Smart Carving: Collating

‣ Classifiers the clusters by file type

• Keywords (e.g., <HTML>, <IMG>)

• ASCII char frequency

• Entropy

• “File prints” (e.g., histogram of bytes in files)

Page 54: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 54

Smart Carving: “File print”

‣ McDaniel and Heydari proposed 3 algorithms:

• Frequency of byte distribution (BFD): average of histograms of many examples of each type of file and byte correlation

• Frequency of cross-correlation distribution (BDC): correlation among bytes

• Inclusion of header and footer

‣ Low accuracy: 30% (BFD), 45% (BFC) and 95% with headers and footers considered together

‣ Does not work for classifying blocks.

‣ Getting back to the planning!!!

Page 55: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 55

Smart Carving: “File print”

Proposed by Wang and Stalfo

‣ Uses a set of BFD models and standard deviations

‣ Higher accuracy: between 75% and 100%

‣ Accuracy decreases with the number of bytes

Page 56: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 56

Smart Carving: “File print”

Karresand et al. proposed the Oscar method

‣ Uses a centroid model based on the average and std of each byte

• 97% accuracy

‣ Improved when used a measure to analyze byte orderning using absolute difference between adjacent bytes

• 99% accuracy for JPEG files

Page 57: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 57

Smart Carving: Reconstruction

‣ Aims at finding the fragmentation point of a file

‣ Some previous studies have shown that files normally fragment in less than 3 fragments

‣ The reconstruction consists of finding the base fragment and finding its last cluster

Page 58: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 58

Smart Carving: SHT-PUP

‣ Modification of PUP by Pal et al.

‣ SHT: Sequential Hypotheses Testing

‣ Each file has a specific hypothesis

‣ Clusters are combined until a hypothesis is confirmed of refuted

‣ Only implemented for JPEG files

Page 59: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 59

1. Start with a set S with k Headers (s1,s

2,...,s

k), wrt to k files.

2. Finds the set T with k clusters, where ti is the best match to s

i. Selects t

i with the

highest match among all.

i. Adds ti to the path of the ith file

ii. Replaces the current cluster in S to the ith element (si = t

i)

iii. Analyzes, sequentially, the immediate cluster after ti until detecting a frag.

point tf or until the file is complete. Here is the hypothesis testing.

iv. Replaces the current cluster in S with tf (s

i = t

i)

v. Finds the new set T of best matches

vi. Selects the element ti with best match among all

vii. Repeats step (i) until all files are complete

Page 60: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 60

Example: SHT-PUPA1 B1 C1

↓ ↓ ↓

A2 B2 C3

(a)

A1 B1 C1

↓ ↓ ↓

A2 B2 C3

B3

(b)

A1 B1 C1

↓ ↓ ↓A2 B2 C3

B3

(c)

Page 61: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

Video Demo

Page 62: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919)

Page 63: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

Video Demo

Page 64: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919)

Page 65: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

Conclusions

Page 66: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 66

Conclusions

We have shown the benefits and problems that exist with current techniques for recovering files

There is a lot of research yet to be done in this area for data recovery.

While Pal et. al’s techniques are useful for recovering text and images, new weighting techniques need to be created for video, audio, executable and other file formats, thus allowing the recovery to extend to those formats

Page 67: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

References

Page 68: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

A. Rocha, 2014 – Digital Forensics (MO447/MC919) 68

References1. Anandabrata Pal, Husrev T. Sencar, and Nasir Memon. Detecting file fragmentation point using sequential hypothesis testing. Digital

Investigation (DIIN), 5(1):S2–S13, September 2008.

2. Anandabrata Pal and Nasir Memon. The evolution of file carving – the benefits and problems of forensics recovery. IEEE Signal Processing Magazine, 26(2):59–71, March 2009.

3. Pal A, Memon N. Automated reassembly of file fragmented images using greedy algorithms. IEEE Transactions on Image processing February 2006:385–93.

4. K. Shanmugasundaram and N. Memon, “Automatic reassembly of document fragments via data compression,” presented at the 2nd Digital Forensics Research Workshop, Syracuse, NY, July 2002

5. A. Pal, K. Shanmugasundaram, and N. Memon, “Reassembling image fragments,” in Proc. ICASSP, Hong Kong, Apr. 2003, vol. 4, pp. IV–732-5.

6. A. Pal and N. Memon, “Automated reassembly of file fragmented images using greedy algorithms,” IEEE Trans. Image Processing, vol. 15, no. 2, pp.385 – 393, Feb. 2006.

7. A. Pal, T. Sencar, and N. Memon, “Detecting file fragmentation point using sequential hypothesis testing,” Digit. Investig., to be published.

8. M. McDaniel and M. Heydari, “Content based file type detection algorithms,” in Proc. 36th Annu. Hawaii Int. Conf. System Sciences (HICSS’03)—Track 9, IEEE Computer Society, Washington, D.C., 2003, p. 332.1

9. K. Wang, S. Stolfo, “Anomalous payload-based network intrusion detection,” in Recent Advances in Intrusion Detection, ( Lecture Notes in Computer Science), vol. 3224. New York: Springer-Verlag, 2004, pp. 203 –222.

10. M. Karresand and N. Shahmehri, “Oscar file type identification of binary data in disk clusters and RAM pages,” in Proc . IFIP Security and Privacy in Dynamic Environments, vol. 201, 2006, pp. 413 – 424.

11. M. Karresand and N. Shahmehri, “File type identification of data fragments by their binary structure,” in Proc. IEEE Information Assurance Workshop, June 2006, pp. 140 –147.

!NOTE: the papers of A. Pal et al. can be obtained in http://digital-assembly.com/technology/

Page 69: Digital Forensicsrocha/teaching/2014s2/... · A. Rocha, 2014 – Digital Forensics (MO447/MC919) 29 Graphs: Hamiltonian Path ‣ Technique presented by Shanmugasundaram et al. ‣

Obrigado!Thank you!