introduction collecting, searching and sorting evidencenflaw/eie4114sem12019-20/part2s.pdf ·...
TRANSCRIPT
Collecting, Searching and Sorting evidence
Introduction Recovering data is the first step in
analyzing an investigation’s data Recent studies: big volume of data Law enforcement:
Crime investigation Police: Cyber Security and Technology Crime
Bureau Commercial sector:
Data breaches: theft of corporate data pwC HK: Forensic Technology solutions
2
Introduction Each suspect in a criminal case:
5 hard disks, 140 CDs or DVDs 4 memory cards and USB sticks
Business cases Data: 31 hard disks, 14 terabytes for one case
FBI’s regional computer forensics lab 2013: 5973 TBs of data from 7273 exams
Audit report: 1566 outstanding cases (2015): 57% waited bet 91 days and over 2 years 3
Introduction
Recent studies: anti-forensics tools delete files, overwrite clusters multiple times,
create large volume of data of certain types Discussion:
Collecting evidence: file system, file deletion Techniques for recovering files Existing tools and challenges
4
storage media Media where data is stored for long-term
preservation C.f.: primary memory (RAM, cache memories short term storage)
E.g., hard drives, USB flash drives, memory cards
physical size of storage media E.g., C: partition: 200 GB, but physical size is
250GB another partition?5
Example A hard drive contains partitions, a
partition contains a file system: File system: structure used to control how
data is stored Common file systems:
Ext4: common on Linux NFS: common for network storage FAT32: common on surveillance video and thumb
drives NTFS: windows system 6
File Storage
Files are stored in file system Files: sequence of binary data (bits and
bytes) Data is stored in clusters or blocks Blocks corresponding to a file may be
Stored contiguously on disk Split and stored all over the disk
7
Example: NTFS
NTFS file system begins with a metadata called the partition boot sector Partition boot sector contains the master file
table (MFT): dictionary of all files and folders on the NTFS partition For each file or folder, the MFT record contains
info about the name and the actual file data. MFT record describes what clusters on the hard
drive that house the file8
Example: NTFS
Create a file: Get a MFT record Small file stored in MFT Large file allocate clusters and store the
file in clusters Delete file delete MFT record only
9
Example: FAT Storage Files: f1.doc, f2.txt, f3.jpg
10
Filename Starting block
f1.doc 102
f2.txt 106
f3.jpg 110
Root table entriesFAT
Block Next block101 Free102 103103 104104 105105 108106 107107 EOF108 109109 EOF110 111
Deleted a file
entry in the file system is updated to indicate its deleted status
clusters that were previously allocated for storing become unallocated and can be reused to store a
new file But: data are left on the disk until a new file
overwrites them
11
Example: File deletion Delete f1.doc
12
Filename Starting block
?1.doc 102
f2.txt 106
f3.jpg 110
Root table entries
FATBlock Next block101 Free102 Free103 Free104 Free105 Free106 107107 EOF108 Free109 Free110 111
Example: File deletion Delete f1.doc
13
FATBlock Next block101 Free102 Free103 Free104 Free105 Free106 107107 EOF108 Free109 Free110 111
Contents of f1.doc have not been deleted
Major types of file structures
14
Contiguous: stored in blocks in a logical order of sequence
Fragmented: One or more chunks are not stored in a
sequential order (happens when files are added, deleted or modified)
Linear (logical order), non-linear Partial files:
Incomplete files: some portion of the files are unavailable (overwritten by other data)
Example:
15
A:B:C:D:
Major types of file structures
16
Embedded files: Contents of one file are added or stored inside
another file: JPEG inside a word document File systems become large
Large hard disks: inexpensive, common Huge number of files and fragments
Individual files usually lightly fragmented Causes of fragmentation
Low disk space Append more data to an existing file
Major types of file structures
17
Studies: 6% of all files recovered were fragmented
Always perform disk allocation which minimizes file fragmentation to reduce seek time and improve file system performance
File types of forensic interest (AVI, JPG,…) higher fragmentations than file types of little interest (BMP, TXT,…) JPEG: 16%, AVI: 17% PST: 58% (email, outlook) Word doc: 17%
Mobile devices Android applications:
Facebook, Twitter, whatsapp, WeChat, Chrome, Gmail, …
Seldom fragmented: exe files (.apk, .dex,)
Serious fragmented: database files (.db, .db-journal, .wal)
Evidence collection Search evidence in the complete file
system, including recovering those deleted files
File carving: Recovery of file fragments from a digital
storage device without the assistance from the file system
Scanning the raw bytes of the disk and reassembling them
19
Evidence collection File carving:
Possible even if the file system metadata has been completely destroyed
Possible even if the files are deleted Delete: means removing the knowledge of where
the file is, but not removing the file content Possible to recover files with file name
renamed to “hidden” what the file actually is Possible to recover data that is embedded
into another file (JPEG inside a doc) 20
Techniques for File carving Tools have been developed to automate
the process of carving for various file types foremost, scalpel and DataLifter, PhotoRec Specialized forensic tools: EnCase, FTK, X-
ways Can be used to extract files from physical
memory dumps from mobile devices and from raw network traffic
21
Tools for file carving
Need to understand how the tools carve files Not a substitute for knowledge Understand limitations of tools
22
Techniques for File Carving Header-footer
Recover files based on known header Used in EnCase, Foremost, Scalpel
File Structure Header-footer + internal layout of a file Use in Foremost, PhotoRec
Content-based (Semantics)
23
Header-Footer Carving Most basic carving technique Steps
Scan for the header of a file type Once found, scan for the file type’s footer File = bytes between header and footer copy
byte-by-byte
24
Examples of File Signatures
25
Header Footer File typeFFD8FF FFD9 Jpg, jpeg424D BMPFFFB MP3 without ID3
tag494433 MP3 with ID3 tag
D0CF11E0 Doc52494646 Wav25504446 Pdf474946383761 003B GIF
26
File signature: www.garykessler.net/library/file_sigs.html 27
Header-Footer Carving
Problem: Header/footer markers: short
May produce many results (false positives) Cannot handle fragmented/partial files Cannot carve files without fixed headers
(text/html)
28
Fragmented example
29
Variations Estimate the file size through various means
Header-maximum file size carving Fixed the number of bytes in file carving after locating
the header Header-embedded file size carving
Find out the file size through the information available in the “header”
30
Header-Maximum File Size Carving
Carve a fixed no of bytes from the beginning of a possible file
Steps Scan for the header of a file type Extract a fixed no of bytes
Size determined by trial and error
Can be useful for files with footers JPEG: store thumbnails within the image
Thumbnail: another JPEG Are not affected if additional data is appended to
the end of JPEG31
Header-Maximum File Size Carving
Same problem as header-footer carving
Always return results much larger than the original file Manual process to discard additional data
If the guess for the maximum size is too small carved incorrectly
32
Header-Embedded File Size Carving
Many files: embed info about the file size in the first few bytes Find out the size of the file by analyzing the
embedded info Steps
Scan for the header of a file type Determine the file size by reading the bytes extract
33
Header-Embedded File Size Carving
34
File Structure Based Carving
Carve by using knowledge about the internal file structure Metadata Header, footer, identifier strings, size info, etc
Can be used to detect cases of fragmentation if the file structure data is detailed and extensive
35
Example: File structure JPEG file
Header Start of image: FF D8
EXIF info Start of image data
A series of sections End of image data Footer
End of image data (FF D9)
36
File structure PNG file
Header byte Size of the next
section IHDR: identifier of
the next section 12 bytes:
unstructured data
37
Challenges in File carving Original file may be fragmented
carving process that assumes all portions of the file was stored contiguously on the disk will fail
salvaging fragments of multiple files and incorrectly combining them into a single container
Content-based carving Main idea: read individual block and analyze its
contents to find out if it belongs to a particular file
38
Content-based Carving Main idea:
Fragmentation can occur only at block boundaries Block: size of the smallest data unit that can be
written to a storage media (sector or cluster size) One block one single file
Information entropy Entropy: measure of randomness Large changes in entropy
Indicate that the sector belongs to a different file
39
Entropy Example 1: tossing a coin: Possible outcomes: head/tail Prob(head) = Prob(tail) = Entropy = 2
1
logN
n nn
p p
Entropy Example 2: In a bin, there are four different
colored balls: red, yellow, blue and green. There are 9 red color balls, 1 yellow color ball, 1 blue color ball and 1 green color ball
Entropy = 21
logN
n nn
p p
Entropy Example 3: In a bin, there are four different
colored balls: red, yellow, blue and green. There are 3 red color balls, 3 yellow color balls, 3 blue color balls and 3 green color balls
Entropy = 21
logN
n nn
p p
Sliding entropy Sliding window
Measure average value of the bytes Entropy formula:
N: total number of different values Pn: probability of the n-th value
43
21
logN
n nn
p p
Sliding entropy Sliding window
Measure average value of the bytes Bytes: 8 bits: values = 0 to 255 Entropy: 0 – 8
4 – 6: Text and HTML blocks 7 – 8: zip and JPEG blocks
44
Studies txt and jpg
45
Studies Mp3 file, zip version, encrypted version
46
Example
47
Sliding entropy Calculate the entropy of the block of the
data If the block contains compressed data
entropy of these blocks would be similar If a sudden in entropy that block doesn’t
belong to PNG image data
48
Example: sliding entropy Block: 11619 Block: 11820
49
Example: sliding entropy Remove the section where the
entropy drops:
50
Data inbetween zip files
51
Current Research approach 1
52
Stage 1: Header/footer
Stage 2:
Complete JPEG file for segment 2
53
Stage 3: Decoding to RGB
Stage 4: Fragmentation
point high CED value
boundary nearbyCED ED ED
Boundary: RGB values of pixels on both sides of the boundaryNearby: RGB values of pixels on one side of the boundary
Current Research approach 1
54
Stage 6: Aim: construct
from header to footer
Join segments together
Current Research approach 2
55
Graph approach Assume all file
clusters are randomized
Step 1: identify headers/footers
Current Research approach 2
56
Step 2 For each header, find
the best match (using similarity) Similarity calculation
would depend on the content of the cluster Image file: check block
similarity Text file: check word
likelihood
Current Research approach 2
57
Probability/likelihood
Comparison of different methods
Lots of different tools/methods for file carving
Performance comparison: Carving quality Memory and space used
Terminology: Positive: a file that is correctly carved from
the dataset58
Quality Terminology:
False positive: a carving result which is not a positive
False negative: a file that is present in the dataset, but was not carved
59
Yes No
Yes Positive False positive
No False negative
In datasetRecovered
Quality Recall: proportion of the files is recovered
Precision: proportion of the recovered files is correct
F measure: control user’s preference on recall and precision
Recall tptp fn
Precision tptp fp
1Fmeasure 1 11
P R
Example
Consider there is a total of 10,000 files. Out of these 10,000 files, there are 100 files that are fragmented. Suppose that a tool reports that there are 200 fragmented files. However, only 60 are correct. Determine Recall Precision Accuracy
Performance Analysis Public datasets:
FAT carving test dataset (15 files) dftt.sourceforge.net/test11
DFRWS 2006 challenge image (32 files) dfrws.org/2006/challenge
Basic data carving test: http://dftt.sourceforge.net/test11/index.html
Simple datasets good results Complex datasets poor results
Fragmentation of files: major impact 62
Tools comparison Look at
Percentage of files recovered The correctness and reliability of tool output Processing speed of the tool
Requirement: Process roughly 100GB data per day 1.16 MB per second
Handle less than 0.58 MB Impractical
63
Tools comparison: example 1 datasets
Basic data carving test: http://dftt.sourceforge.net/test11/index.html Contains
Valid doc, jpeg, wav, pdf, zip, gif, doc, xls files Invalid jpg file (header has been modified) Deleted ppt, wmv files
Contains only contiguous files E.g., PhotoRec: can find all, except invalid jpg files
(bec header info is not correct)64
Tools comparison: example 1
65
Tools comparison: example 2 Another test set
http://old.dfrws.org/2006/challenge/layout.shtml
Contains Jpeg, zip, html, txt, word files
Fragmented files
66
Tools comparison: example 2 Example:
One JPEG non-fragmented One JPEG non-fragmented, larger than a typical
default max file size One JPEG non-fragmented, but sector before it has
0xffd8 in the first two bytes One JPEG fragmented with text in between One JPEG fragmented with a Word document in
between One JPEG fragmented with random data in between
67
Tools comparison : example 2 Example
One JPEG fragmented with a JPEG in between Two JPEGs that are intertwined One JPEG non-fragmented that is REALLY big One JPEG fragmented with singe sector in between
that starts with 0xffd9 E.g., PhotoRec:
Performance drops because the dataset is more complicated Contiguous + fragmented files
68
Tools comparison : example 2
69
Tools comparison : example 2
70
General Findings MPEG, ZIP:
Difficult to carve bec of common header values
Scalpel: header-based carving PhotoRec: structure-based carving Contiguous files: good performance Fragmented: not easy
71
New approaches for carving
New approaches for carvingUse of file carving to solve Data hiding
conceal a file: change its name to mislead digital investigators Renaming an illegal photograph from xxx.jpg to
xxx.exe Need to check the file header (file signature) The file xxx.exe that has a JPEG header (FF
D8 FF) will be correctly recognized as a graphics file
74
Steganography?
It hides info inside image files Two types: insertion and substitution Insertion
Hidden data is not displayed when viewing the original file Need to analyze the data structure carefully
75 76
Hidden message
Steganography?
Substitution Replaces bits with other bits of data Usually change the last two LSBs (least
significant bit)
77
Original pixel Altered pixel
1010 1010 1010 1001
1001 1101 1001 1110
1111 0000 1111 0011
0011 1111 0011 1100
Steganography Detect variations of the graphic image
When applied correctly you cannot detect hidden data in most cases
Check to see whether the file size, image quality, or file extensions have changed
Clues to look for: Duplicate files with different hash values Steganography programs installed on suspect’s
drive 78
Data and File carving
DeepSound: http://jpinsoft.net/DeepSound is a steganography tool and audio
converter that hides secret data into audio files. The application also enables you to extract secret files directly from audio files or audio CD tracks.
General Guideline
Device Status Difference in data collection:
Power off: Find data stored on static memory (e.g., hard drive) Principle: eliminate any chance of modifying the actual
evidence Write blockers are used between the digital evidence and the
computer Create a disk image: bit by bit copy of the original data (Hash
comparison)
Power on: Able to collect volatile data
Device Status Difference in data collection:
Power on: Can examine if any of the active heard drives are
encrypted or not Full disk encryption: all data on hard drive is encrypted
when the device is off Good chance to collect unencrypted data
Conducting Live Investigation
Document what is visually present on screen Active programs (active windows, data/time
settings, log files?) Collect volatile data
Memory is a good source for finding passwords and data from encrypted communication in plain text Use memory capturer (e.g., FTK Imager)
Conducting Live Investigation
Find out computer install date, OS version, list of users, registered owner, …
Find out time zone info and clock settings Find network drive maps, or remote storage
media Data in Hard drive: e.g., pagefile, hiberfile etc
Pagefile: used by computer when it needs to swap parts of the working memory and dump them somewhere else (browser artifacts)
Hiberfile: saves current machine state when computer is in hibernation
Summary Collecting evidence:
file storage clusters fragmentation
file deletion Data remains in clusters
Techniques for recovering files Header-footer, file structure, content-based approach
Existing tools and challenges
85
Summary Bec of the large volume of data
Investigator: analyze data and understand inter-relationships
Gold standard: analyze all files to ensure nothing is overlooked
Now: “intelligence-based”: subset of files are analyzed dependent upon the intelligence provided to the investigator Not find every piece of evidence, rather sufficient
evidence to determine innocence of guilt
86
Summary File Carving vs Keyword searching:
Looks for data that fits into known file structures and interprets that data in light of these structures
Search for content that matches one or more keywords or keyword patterns
Find structures matching known structures vs Find data matching known data
87