nate french internship final paper
TRANSCRIPT
Nate French
Final Project
Internship
Prof. Van Slyke
Analyzing Digital Forensics Tools in a Virtualized Environment
1
Analyzing Digital Forensics Tools in a Virtualized Environment
The purpose of this paper is to research the viability of using digital forensic software in a virtual
environment. The first topic to be addressed is whether or not specific tools will perform
competently within a virtual machine. This form of virtualization, platform virtualization, has
been available since the 1990s. Virtualization has progressed immensely since it was first
released; however, it is still a work in progress (Sperling, 2010). The second area of focus will
be on providing a benchmark on the speed a virtualized environment will provide compared to a
non-virtualized environment. As the price per unit of storage continues to fall, the speed that
digital forensics frameworks can perform will continue to be a significant factor. A third area of
focus will be to assess the accuracy of this software when run from a virtual machine compared
to a non-virtualized environment.
The acceptance of digital forensics depends solely on the accuracy of the tools and
environment used. With the increasing acceptance of virtualization and cloud computing for both
cost and performance gains, the advent of distributed forensic frameworks is inevitable
(Roussev, 2004). Therefore, determining the accuracy of these environments is crucial for
continued acceptance of digital forensics. Through testing, this research will provide statistical
data to be used for determining the best approach to performing digital forensics. This testing
will be by no means conclusive or definitive on this topic as there is a plethora of variables that
could be tweaked to determine an optimal environment for performing virtualized digital
forensics. However, this research will provide a base for continued research in this area.
This research paper will first explain the key terms and ideas that will be expressed
throughout this paper. Then this paper will firmly establish the importance of performing the
research, after that the framework for performing the research, the hardware, and the software
2
used will be discussed. Then it will be possible to discuss and evaluate the results gathered from
the experimentation.
A. Introduce Topic and Define Key Terms
1. Virtualization The main aspect of understanding this research is the topic of
virtualization, what it is and what it does. Virtualization refers to the method of creating a virtual
object that acts and behaves like the real object. With platform virtualization, it is possible to
create a second instance of an operating system running on the same hardware as the initial
instance. For example, it is possible to run an instance of a Linux operating system on a
computer that natively runs Microsoft Windows. In this instance, the Linux operating system is
referred to as the guest, while the Windows instance is called the host. The guest instance is a
logically segregated system that is sharing the resources of the host system. The two instances
can share the processing time of the central processing unit (CPU), or if more than one CPU is
present, they can be allocated to each instance. A logical partition of the disk space available on
the hard drive will be segregated for use by the guest system. The guest will also receive a
dedicated portion of the random access memory (RAM), which is used for quick access to
important data that are used frequently.
2. Accuracy The second aspect of this research project is testing the accuracy and
speed of the forensic tools. The accuracy portion of this research refers to file hashing and the
number of files discovered. File hashing is the process of creating a unique identifying string for
a file, similar to the idea of a finger print. Even a miniscule change in a file results in a wildly
different file hash. A file hash is used in the court of law to prove that one file is the exact
replica of another file, or that a forensic image of a hard drive is a replica of the original. The
second portion of this area is determining that forensic tools run in a virtual environment will
3
discover the same number of files on a computer compared to performing the examination in a
non-virtualized environment. This is important to ensure that the examiner will discover all files
of interest to the case. If the virtualized environment cannot discover all files, then the validity
of performing virtualized investigations is negated. This is related to the term of rate of error.
Does the virtualized environment experience the same rate of error as a non-virtualized
environment (hopefully 0%)?
3. Speed The speed of the different environments is not a critical factor from a legal
perspective, as the law is only concerned with the accuracy of the results. However, speed is
important when determining what type of environment to use. If the speed of the case creation
process is severely hampered by the use of a virtualized environment, then that fact may negate
any of the benefits of using the virtualized environment. When evaluating the speed of this
process, we are concerned with the processing and indexing features of the Forensic Tool Kit
(FTK). In order to create a case, FTK must process and index an image file of a drive.
Processing is the act of determining where a file starts, stops, the file type, and the contents of
the file. That file is then indexed into a case database file. This process is essentially
enumerating the data in the evidence so that they are searchable by FTK. This form of indexing
allows the investigator to quickly search the image for evidence with different methods. The
first method that will be tested is keyword searching. In this method the investigator can search
the entire drive for a specific keyword that may be evidentiary value. The second search method,
which can be a very powerful method for returning specific evidence, is regex searching. In this
method, the investigator can search the image for a specific pattern of data. An example of this
would be searching the image for credit card numbers. To perform this search manually would
be very time intensive especially if the suspect attempted to hide the data in any manner. Since
4
credit card numbers typically follow a specific pattern, a regex search would allow the
investigator to find all credit cards of a certain pattern with one search. During the testing phase
of this research, the speed that keyword searching and regex searching will be documented for
each environment. The last forensic ability to be tested is the ability for FTK to perform file
carving. File carving is the act of carving deleted files out of unallocated space. File carving is
an important part of digital forensics as the most critical evidence is typically deleted by a
suspect. Proving that a virtualized environment can carve files as competently as a non-
virtualized environment is paramount for supporting a virtualized forensic framework.
One of the main contributing factors to speed, as reported in (FTK whitepaper) is the
input/output (I/O) channels on the computer. These channels consist of cables and busses
between the various hardware components of the computer. There are optimal configurations
explained in this paper that will be used in the setup of the experiment and discussed more
thoroughly later in the paper.
As discussed previously, virtualization and distributed frameworks are becoming more
commonplace in professional environments. Both have their pros and cons compared to their
opposites. One of the main pros for virtualized environments is the use of snapshots. This is the
ability of creating a restore point of a virtual machine at any given time. Generally this is done
once the optimal configuration of the virtual machine is complete. Over time, computers
generally get bloated with unnecessary data and slow down. With the use of a snapshot, the
machine can almost instantaneously be reverted back to its pristine condition. This is important
for maintaining an efficient setup. The main con of a virtual environment is that the virtual
machine will always have fewer resources than the machine hosting it. This means a virtual
machine generally takes longer to perform tasks than the host machine would as it has fewer
5
resources available to perform the operation. With cloud computing this is generally a non-issue
because the customer can purchase the same amount of resources as their host computers
(Roussev, 2004). However, cloud computing means those resources are located in a remote
setting that is not under the direct control of the user. The beneficial aspect of a distributed
framework is that one job can be parsed out to multiple machines, generally improving the speed
at that the operation is completed (Golden, 2004). However a distributed framework requires a
competent network between devices, a protocol to share data, and a method of organizing the
work distribution. The negative attribute of a distributed framework is that the user is
introducing more variables in the system. With more variables comes an increased chance that
something can go wrong.
B. Establish the importance of testing forensic tools. Testing forensics tools is
important for numerous reasons. Most importantly, testing forensic tools before using them in a
production environment is a necessary and critical step in gaining legal acceptance. In order for
a tool to produce legally accepted evidence pertinent to a criminal case, there has to exist
research that shows conclusively what the tool is and is not capable of. The importance of digital
forensics software will continue to grow as the importance of digital technology grows in
society. As technology, the Internet, and E-commerce continue to grow, so will the prevalence
of cybercrime. A second reason for the increase of cybercrime is the rise of “bring your own
device” (BYOD) in professional settings. In general, corporate networks are more secure than
home-based networks and devices. With the rise of BYOD, employees are frequently taking
devices in and out of corporate networks. This increases the vectors of attack as these devices
leave protected networks where the chance of infection is much greater. These infections can
then migrate onto corporate networks where the rewards of cybercrime will generally be greater.
6
Digital forensic tools are critical in performing cybercrime investigations. E-commerce is
estimated to grow at a rate of 12%-15% every year. In the United States alone, E-commerce
sales for the year of 2012 reached $225 billion. By the year 2017, E-commerce is expected to hit
$435 billion in yearly revenue (Trends and Data, 2012). In contrast, it is much more difficult to
come up with accurate statistics on the global cost of cybercrime (Mcafee, 2013). Current
research places an upper limit on cybercrime around .5%-1% of national income. Extrapolation
of this data leads to a lower range of $25 billion to an upper range of $140 billion for the United
States. Regardless of the actual numbers, investigators need competent tools to investigate
cybercrime. This issue becomes even more important when viewed from a national security
perspective.
Cyber-warfare and Cyber-Espionage is the new domain for intra-country competition.
For example, in 2012, the U.S. Navy was experiencing on average 110,000 cyber-attacks per
hour (Worth, 2012). Cyber-attacks, when viewed from a national security perspective, can
compromise intelligence, the security of troops around the globe, national secrets, and classified
technology. When investigating such incidents, it is critical to know exactly what happened,
what may have been taken, and what sensitive data may have been compromised. It is no longer
the act of recovering fraudulent funds, but protecting the lives and security of our country.
Forensic tools are the backbone of the military’s process of “cyber-attack recovery, reaction, and
response functions” (Giordano, 2002).
The second main reason for performing this experiment is the growth of magnetic storage
devices. This research will attempt to benchmark the speed of forensic software in a virtual
environment. This is important due to the fact that the growth of magnetic storage is exponential
and is expected to continue growing at this rate for a long time. The price per megabyte is
7
expected to shrink at a rate of 48% each year (Webb, 2003). As the size of storage media
continues to grow, speed will continue to become a critical factor in performing digital forensics.
Consider the fact that when commercial magnetic storage was introduced in 1956, the cost per
megabyte was $10,000. The 2013 cost per megabyte for magnetic storage is now $.00006
(citation needed). New methods for data storage are continually being researched with many
producing spectacular results. One outstanding sample is the research performed by IBM that
has found the current atomic limit to magnetic data storage. The experimental storage device is
“at least 100 times denser than today’s hard disk drives” (Loth, 2010).
In contrast to the growth of magnetic storage, the growth of processors, the device
that processes all of this data, is expected to die within the decade. This is known as the death of
Moore’s Law, which perfectly described the growth of processing speed from the 1960’s to
2020. On average, the processing speed of digital devices has doubled every 18 months.
However, this process of increasing the density of transistors will reach its physical limit in 2020
when transistors reach the 7nm or 5nm mark. At this point, the explosive growth of processing
speed will end, while the growth of magnetic storage continues to grow. DARPA tracks projects
to replace the complementary metal-oxide-semiconductor (CMOS) technology, however only
three of these replacements are potential candidates and even then, they are not very promising
(Meritt, 2013). Investigators will then be faced with the knowledge that evidence drives will
likely continue to grow exponentially while processing power will grow linearly. Due to this
knowledge, the speed that evidence can be processed will be a critical factor in performing
digital forensics. It is critical that investigators use the most optimal environments when time is
a factor. The diminishing growth of processing speed will in turn create the necessity for cloud
computing and distributed frameworks. Cloud computing is the concept of using multiple
8
computers, geographically separated but able to communicate through networks, that work in
tandem. Typical commercial cloud computing options, such as EC2 from Amazon, allow the
customer to use a set amount of processing power and space. The cloud computer instances are
generally virtual machines located on a much larger commercial server, which is one reason this
research will investigate the use of forensic tools in a virtual environment. A distributed
framework is a method of parsing the work of one job between multiple computers in an
organized fashion. FTK already has a Distributed Forensic Framework (DFF) that allows three
remote computers to assist a fourth workstation. This DFF will likely be expanded upon in the
future to allow the support of more machines. One further avenue of investigation for this line of
research is combining this DFF with the use of virtual machines. In order to determine these
optimal environments, further testing must be performed.
Testing provides numerous benefits to enhance the state of digital forensics.
Through testing, researchers can verify that a process works in a non-production environment. It
is not feasible to test software in a production environment, in which the results of criminal
investigations rely on the results found. Testing needs to be performed before production
environments are utilized. This testing advances the requirement of having evidence being
accepted in a court of law. Research is needed to prove whether or not a tool behaves in a
particular fashion and whether the data produced are forensically sound. Testing will also help
investigators make intelligent decisions when deciding what tools to use for a particular case.
Some tools may perform better than others when investigation certain aspects of a cybercrime.
In some cases, such as incident response investigations for national security purposes, time may
be the important factor to consider. Investigators will need to analyze computer logs to
determine what did or did not happen. Other cases may be more focused on retrieving a large
9
number of financial records and documents. While other cases may only be concerned with what
pictures that reside on a computer. By testing different software in different scenarios, it will be
possible to determine the best tools for each depending on what type of evidence the investigator
is interested in. Finally, proper testing should create a reproducible experiment that can
repeatedly be verified by other researchers. This factor helps to gain court acceptance by
providing scientifically sound and verifiable results.
II. Background Context
A. History of digital forensics. Digital forensics is a sub-category of the forensic
sciences that deals with the examination of evidence on digital media. Digital mediums include,
but are not limited to, computers, removable storage, network traffic, and mobile phone. Digital
forensics became a national concern in the 1980s as digital devices began to make their way into
the corporate world. In 1984, the FBI created the CART, the Computer Analysis and Response
Team, which was the first federal group organized to deal with cybercrime and digital forensics
(FBI, n.d.). From this humble beginning, the digital forensics industry has grown to a billion
dollar a year industry with annual growth rates of 11% (“Digital Forensic Services in the U.S.
Market Research”, N.D.).
Digital forensic cases can encompass many different types of crimes such as
hacking or possession of contraband. However, each case will have a similar work flow. Cases
begin when a crime has been detected in that evidence may be on a digital device. After
detection, the evidence is seized or acquired, following chain-of-custody protocols. The chain of
custody protocol requires that the transfer of evidence is documented whenever possession of the
evidence changes. By following this protocol, investigators can prove that non-authorized
personal did not have access to the evidence or the chance to modify it. Once the evidence
10
mediums are brought to the forensics labs, the lab must take an inventory of all evidence in their
possession. This would include the number of devices, the make and model of each device, and
which case they belong to. Once the inventory is complete, the lab can begin to make images of
each device. This is accomplished by attaching the digital device to a write blocker that prevents
any changes from being made to the evidence. The imaging hardware or software will then
make a byte-by-byte copy of the device to produce an exact replica. This is important as
investigators cannot perform the investigation on the original evidence as the process would then
introduce changes to the evidence and call its validity into question. Once an image of the
evidence has been created, the original evidence would then be returned to secured storage. The
investigators can then use their tools of choice to perform the investigation. This would include
searching the device for any relevant evidence, documenting the steps and procedures used to
acquire the evidence, and organizing the evidence into a package. This is required so that, if
necessary, a third party can verify the results of the investigation.
The work flow and examination procedures for digital investigations were
designed around pertinent case law and requirements for acceptance of evidence in a court of
law. In order to establish guilt, the prosecution must provide evidence that proves criminality
beyond a reasonable doubt (Commonwealth v. Webster, 1850). In order for evidence to be
accepted by the court, the evidence must pass admissibility tests. To be admissible, evidence
must be relevant to the case, must be a material possession (not hearsay), and must not be
precluded by an exclusionary rule. The prosecution must be able to prove the authenticity of the
evidence, meaning the evidence if what it is represented to be. This relates to the chain of
custody mentioned earlier, as the location and possession of the evidence can be accounted for
and testified for through its life cycle. This ensures that the evidence has not been tampered
11
with. Evidence must also pass the Frye test that was established in 1923 after Frye v. United
States. The Frye test requires that technical evidence must be acquired through a scientifically
proven method that has gained acceptance in that particular field of science.
III. The Experiment
1. Providing an Environment Identical to Current Investigations. The purpose of this
experiment and benchmarking is to provide reference information to benefit the Northeast Cyber
Forensics Center (NCFC) in determining the best method for performing digital forensics. The
results of these experiments will also benefit the forensics community at large by providing
concrete evidence and statistics in relation to performing digital forensics in a virtualized
environment. The main goal, however, will be benefiting the NCFC with its setup. Due to this,
the experiment will be formatted to provide as similar of an environment as possible to what
currently is being used. This will determine the use of hardware, software, and procedures
during the experiment. The NCFC currently uses specialized hardware in their operations called
Forensic Recovery of Evidence Device (FRED). FREDs are high-powered workstations with
abnormally higher system specs specifically designed for performing digital forensic work. The
FRED used in this experiment has the following specifications. The central processing unit is
composed of eight Intel i7 cores. To increase performance through hyper-threading, cores are
virtualized, meaning that for every one physical core present on the system, the operating system
addresses two logical cores (“Intel Hyper-Threading Technology”, N.D.). Hyper-threading
increases performance by allowing the operating system to schedule two threads or processes to
a single core. Therefore, the FRED logically has 16 cores, 8 of that are virtualized. The specific
FRED uses 16 GB of Random Access Memory (RAM), which provides a location for fast data
storage and transfer compared to the much slower physical hard drives. FREDs also contain six
12
hard drive bays, which provide the potential for a large amount of storage locally. Four of the
hard drive bays are considered hot-swaps, which are connected to the system through FireWire.
These bays can have hard drives installed or removed at will even while the system is running.
FireWire is capable of transferring data at a rate of 50 MB per second. The remaining two bays
are connected to the machine through a Serial Advance Technology Attachment (SATA). These
bays are capable of transferring data at a rate of 375 MB per second. It is important to note that
the SATA bays are not hot-swap meaning the device must be installed before the system is
turned on and cannot be removed during operation. The FREDs also house a native write-
blocking bay that can be used to image a hard drive. The current operating system used at the
NCFC is Windows 7 Professional 64 bit, which will be used in this experiment as a host
(Environment one). A second windows 7 operating system will be deployed as a virtual guest on
top of the windows 7 host (Environment two). A third operating system that will be used is the
Ubuntu flavor of linux (Version 12.04 TLS). Linux will be used as a host operating system upon
that a third Windows 7 virtual guest environment will be located (Environment 3). In order to
maintain the integrity of the experiment, no operating system updates or software updates will be
applied to the environments during the testing phase. In order to test the virtualization aspect of
this experiment, the software VirtualBox will be used in creating the environment. VirtualBox is
a free program and is capable of running on Windows or Linux. The forensic tool that will be
tested is Forensic Tool Kit (FTK) 4.0.2 provided by AccessData. FTK is a commonly used
forensics programs with wide acceptance within the community. This tool is also one of the
main programs utilized at the NCFC for forensic investigation. In order to provide the best
environment possible, this experiment will utilize the I/O channel optimization (“Quantifying
Hardware Selection,” N.D.). This setup uses four drive bays to store different components of
13
the investigation. Drive one (SATA) is a 500 GB Seagate 7200rpm drive, which contains the
operating systems, virtual machines, and the forensic tools. Drive two (SATA) is a 1000 GB
Seagate 7200rmp drive, which contains the output database of results discovered during the
investigation. As discussed in Quantifying Hardware Selection, the most effective method of
increasing performance is housing the output in a separate drive with the highest throughput and
RPM. Drive three (FireWire) is a 2000 GB Seagate 7200rmp drive, which contains the evidence
images to be examined or processed. Drive four is a 1000 GB Seagate 7200rpm drive, which
contains the evidence computer. Drive four is imaged with FTK imager, the output of that goes
into drive three. The experiment will be comprised of three different environments and three
different evidence packages. In order to provide typical evidence scenarios that will relate to
current operations at the NCFC, the three evidence packages will consist of a 50, 250, and a 500
GB evidence image. To maintain the integrity of the experiment, one 1000 GB Seagate 7200
RPM hard drive will be used to store these packages. All testing will be completed on each
package before testing the next package. When proceeding to the next testing package,
additional evidence will be added to the current package until the size requirements are meant.
This was decided for multiple reasons. This method will reduce the amount of time required to
setup the evidence package. Deciding to use completely different evidence packages, we would
have to ensure that any data from the previous package were securely erased from the medium.
Otherwise the forensic tools could detect data from a previous package, which would
contaminate the evidence environment. Additionally, processing the same evidence multiple
times provides repeatability and testing certain features such as hashing and identifying
evidence. The evidence medium will also have a Windows 7 operating system installed so that
imaging, processing, indexing, and search times will reflect a real current investigation. The
14
clean install of Windows 7 requires roughly 20 GB of space. Therefore the respective evidence
packages will contain roughly 30, 230, and 480 GB of evidence. The actual data comprising the
evidence consists of typical data types that digital forensic investigations focus on. This includes
images, documents, and videos. The images used are contained within the Dresden Image
Database for Forensics Benchmarking (Gloe, 2010). The initial photos totaled 4949 high
resolution images, which require roughly 15 GB of space. In order to quickly grow the amount
of evidence required for the 250 and 500 GB packages, these photos were ran through image
software, IrfanView, which provides batch processing. By processing these images through
various RBG filters, it is possible to quickly accumulate evidence files with unique hashes.
Multiple video files were acquired through Youtube. To provide variation, some files are short
(5-10 minutes), while others are roughly 80-90 minutes. Numerous document files were made of
varying types using Microsoft Office and notepad. These documents also contain textual
evidence to complete string search and regex search experiments. A second source of documents
was obtained through the Enron e-mail dataset, which has been sanitized and released as public
domain material (Enron Email Dataset, 2009). This research will examine whether or not FTK
will find these evidence items in both the virtual and non-virtualized environment. During the
250 GB and 500 GB runs, some data will be deleted to determine if the virtualized environments
will competently recover deleted files.
2. Designing the experiment. The National Institute of Standards and
Technology (NIST) created a methodology for testing forensic tools called the CFTT Forensic
Tool Testing Methodology. This experiment will be modeled after the guidelines provided in
this document. Step one of the CFTT is to acquire the tool to be tested. The NCFC has licensed
full versions of FTK 4, which is the focus of this research. Step two is to review relevant tool
15
documentation. AccessData provides numerous support documents on their website explaining
how to use FTK. Step three is to choose relevant test cases based on the features of the tool.
FTK provides support for imaging, file carving, key word searching, and regex searching. FTK
is capable of recognizing and recovering most files types. This research will test the following
data types, which are frequently sought after during cases: images, documents, e-mail, and video.
Step four is to create a testing strategy to evaluate FTK and its abilities. The testing workflow
consists of first creating an evidence drive that will contain the evidence files to be tested. This
will consist of a base Windows 7 operating system and the various document types. It is
important to ensure that the evidence packages are prepared and validated before entering them
into the evidence drive. This is due to the fact that it is difficult to permanently remove data
from the drive once it has been entered and typically requires the full drive to be zeroed out.
Once the evidence drive is ready for examination, FTK Imager will be used to create the forensic
image that is readable by FTK. When drive three is being images, the evidence image is piped to
drive 4 the image drive. Once imaging is complete, FTK processing can begin. When creating a
new case, the case folder needs to be piped to drive two, the database drive. This is where the
FTK database software has been installed and should be sufficiently large to contain the
databases for all cases in the experiment. This workflow will be completed twice for all three
environments. The processing will be completed twice for each environment to determine how
much variation exists in each environment and to satisfy minimum requirements for thorough
testing of forensic tools (Robust Correctness Testing for Digital Forensic Tools). Once
processing is complete, FTK creates a result document, which lists the processing time, database
optimization times, and number of files found. Now it is possible to examine the evidence
through the FTK framework. At this stage, it is possible to test keyword, regex, and to gather
16
pertinent experiment data such the number of individual file types found and file hashes. This
data will be pulled out into a spread sheet so that it can be compared to subsequent cases. Once
the processing and data gathering has been completed for the 50 GB image, the subsequent 250
GB and 500 GB scenarios can be completed following this procedure. Step five of the CFTTM
is to analyze results and create reports. At this point, it is possible to analyze all of the data
gathered, create a formalized report, and determine which environment performed the best.
IV. Results
1. Present Results: Speed and Accuracy. The results of the research surpassed
our expectations for the experiment. At the onset of the experiment, we predicted that the
virtualized environments would be as accurate as the non-virtualized environment, but the virtual
environment would be less efficient in terms of speed. The reasoning behind this is that the
virtual environment would have fewer resources allocated. With fewer resources, it would stand
the reason the virtual environment would not be as fast. However, the results were completely
counterintuitive and the initial research showed the virtual environment was able to process the
cases faster. However, the Linux environment, which did perform faster than the non-virtualized
environment, did not produce the same levels of accuracy. The Linux environment continually
failed to discover the same number of files as the two Windows environments. The Linux
environment was tested with the 50 GB and 250 GB evidence packages, after which, we decided
to discontinue testing. As mentioned previously, the Linux environment failed to discover an
acceptable number of files and was deemed not accurate enough for further testing. The overall
results in speed are presented in the following graphs.
17
Figure 1 represents the total overall time each scenario took to complete processing in FTK. The
blue lines represent the basic Windows host environment with no virtualization. The red lines
represent a Windows host running a Windows guest virtual machine. In every sample the
Windows virtual machine outperformed the windows host in total job time.
Figure 1. Total Job Time Per Each Environment and Scenario
50gb 50gb 250gb 250gb 500gb 500gb0:00:00
12:00:00
24:00:00
36:00:00
48:00:00
60:00:00
72:00:00
Windows hostWindows VM
Total Job Time
18
The next graph, figure 2, shows the total processing time for each case. As mentioned
previously, processing is the act of enumerating the digital data and deciding where a file begins,
where a file ends, and what type of file it is. Similar to the total job time, the processing time
was faster on the virtual Windows machine than on the basic Windows host.
Figure 2. Total Processing Time Per Each Environment and Scenario
50gb 50gb 250gb 250gb 500gb 500gb0:00:00
4:48:00
9:36:00
14:24:00
19:12:00
24:00:00
28:48:00
33:36:00
38:24:00
43:12:00
48:00:00
Windows host Processing Time Windows VM Processing Time
Total Processing Time
19
The next graph, Figure 3, displays the total indexing time for each scenario. Indexing is the
process of taking all of the files that were processed in the previously mentioned text and
creating a master file list of what the file is and where the file is in relation to other files. This
portion of the processing allows for the quick search time present in FTK.
Figure 3. Total Indexing Time Per Each Environment and Scenario
50gb 50gb 250gb 250gb 500gb 500gb0:00:00
12:00:00
24:00:00
36:00:00
48:00:00
60:00:00
72:00:00
Windows host Indexing TimeWindows VM Indexing Time
Total Indexing Time
20
The last graph, figure 4, shows the post processing time for each scenario. After processing has completed, FTK will optimize the indexing database. This is done by cleaning the database which includes compacting the information and removing duplicate entries. The database that FTK uses utilizes the structured query language or SQL. It is common for SQL databases to routinely run optimization routines to ensure the database is optimized for efficient use.
Figure 4. Total Post Processing Time Per Each Environment and Scenario
50gb 50gb 250gb 250gb 500gb 500gb0:00:00
0:14:24
0:28:48
0:43:12
0:57:36
1:12:00
1:26:24
1:40:48
Windows Host Post Processing TimeWindows VM Post Process-ing Time
Total Post Processing Time
21
The accuracy of the virtualized environment compared to the non-virtualized environment was
exactly the same. Not only did the virtualized Windows environment find the same number of
files, this environment was also capable of correctly hashing all of the files of interest. Due to
the large nature of the files used to check the accuracy, it is not feasible to include a visual
representation of them in this document. However, the file used to check accuracy can be found
online
V. Conclusion
Contrary to our initial assumptions, running FTK in a virtual machine was faster than running the
software in a non-virtualized environment. As stated at the beginning of this document, this
research is only the beginning of optimizing virtual environments for the use in digital forensics.
At this time, we are unable to determine why the virtual environment was faster than the non-
virtualized environment. It is a possibility that having a virtual environment on the FRED
somehow slowed down the non-virtual environment and that FTK would run faster on a FRED
without any virtualization software present. From this preliminary trial, it does appear that the
virtual environment was the fastest and just as accurate as the non-virtual environment. This
certainly lays the tracks for future research into this topic as there are many variables that could
be tweaked. For further research, it would be prudent to determine what caused the discrepancy
in the number of files found in the Linux environment. Considering that we only tested one
Linux OS and that there are dozens of other Linux flavors, further research into the use of linux
would be prudent. Another area of research would be to continue optimizing the Windows
environment. This would include tweaking the memory allocation, turning off unnecessary
services, and testing scenarios with a wider variety of evidence files.
22
References
CFTT Methodology Overview. (2012, January 12). NIST Computer Forensic Tool Testing
Program. Retrieved October 22, 2013, from
http://www.cftt.nist.gov/Methodology_Overview.htm
Curtis, G. E. (2012). The law of cybercrimes and their investigations. Boca Raton: CRC Press.
Digital Forensic Services in the US Market Research. (n.d.). Market Research Reports. Retrieved
October 22, 2013, from http://www.ibisworld.com/industry/digital-forensic-services.html
Cohen, William. (2009, August 21). Enron Email Dataset. Retrieved October 22, 2013,
from https://www.cs.cmu.edu/~enron/
FBI, Brief History of the FBI. (n.d.). FBI.gov. Retrieved October 22, 2013, from
http://www.fbi.gov/about-us/history/brief-history
Giordano, J. (2002). Cyber Forensics: A Military Operations Perspective.International Journal
of Digital Evidence, 1(2). Retrieved October 22, 2013, from
http://www.utica.edu/academic/institutes/ecii/publications/articles/A04843F3-99E5-
632B-FF420389C0633B1B.pdf
Gloe, T., & Böhme, R. (2010). The ‘Dresden Image Database’ for benchmarking digital image
forensics. In Proceedings of the 25th Symposium on Applied Computing (ACM
SAC 2010) (Vol. 2, pp. 1585–1591).
23
Intel Hyper-Threading Technology. (N.D.) Retrieved 11/14/2013 from
http://www.intel.com/content/www/us/en/architecture-and-technology/hyper-
threading/hyper-threading-technology.html
Loth, S., Baumann, S., Lutz, C., & Heinrich, A. (2012). Bistability in Atomic-Scale
Antiferromagnets. Science, 335(6065). Retrieved October 22, 2013, from
http://www.sciencemag.org/content/335/6065/196.abstract
Merritt, R. (2013, August 27). Moore's Law Dead by 2022, Expert Says | EE Times. EE Times |
Electronic Engineering Times | Connecting the Global Electronics Community. Retrieved
October 22, 2013, from http://www.eetimes.com/document.asp?doc_id=1319330
Quantifying Hardware Selection in an FTK 4.0 Environment. (n.d.). DigitalIntelligence.com.
Retrieved October 22, 2013, from
www.digitalintelligence.com/files/FTK4_Recommendation.pdf
Pan, Lei. Batten, Lynn. (). Robust Correctness Testing for Digital Forensics Tools.
Roussev, V., & Richard, G. (n.d.). Breaking the Performance Wall: The Case for Distributed
Digital Forensics.dfrws.org. Retrieved October 22, 2013, from
dfrws.org/2004/day2/Golden-Perfromance.pdf
Sperling, E. (2010, October 4). The Limits Of Virtualization. Forbes. Retrieved October 22,
2013, from http://www.forbes.com/2010/10/01/enterprise-computing-challenges-
technology-cio-network-virtualization.html
24
THE ECONOMIC IMPACT OF CYBERCRIME AND CYBER ESPIONAGE. (2013, July).
McAfee. Retrieved October 22, 2013, from www.mcafee.com/us/resources/reports/rp-
economic-impact-cybercrime.pdf
Trends & Data - Internet Retailer. (n.d.). Industry Strategies for Online Merchants – Internet
Retailer. Retrieved October 22, 2013, from
http://www.internetretailer.com/trends/sales/
Webb, Kent., (2003) A Rule Based Forecast of Hard Disk Drive Costs. Faculty Publications.
Paper 9. Retrieved October 22, 2013 from
http://scholarworks.sjsu.edu/cgi/viewcontent.cgi?article=1008&context=mis_pub
Worth, D. (2012, December 5). HP Discover: US Navy facing 110,000 cyber attacks every hour.
v3.co.uk. Retrieved October 22, 2013, from
www.v3.co.uk/v3-uk/news/2229651/hp- discover-us-navy-facing-110-000-cyber-attacks-
every-hour
25