Download - SLR 150613
-
8/17/2019 SLR 150613
1/17
A Systematic Literature Review for Topic Detection in Cyber-crime
Investigation.
Abstract
The most popular social networking sites for chatting are Facebook, Twitter,
Yahoo Messenger, and Skype. Normally, users use the chat conversation for the
purpose of communicating to another person besides echanging ideas and
discussion. !ecently, there are many cybercrime criminal committed crimes
using chat conversation such as money fraud, se harassment, cyber bully, and
murder. Therefore, this S"! articles are carried out to investigate the eisting
research in chat forensics area. S"! is a form of evidence based approach
applied for a systematic review. This S"! studies had been carried out in thisarticle to investigate the topic detection studies in chat forensics area. The
method used is based on S"! which had a few guidelines which include de#ning
the research $uestion, search process, %nclusion and eclusion criteria in the
study, $uality assessment, data collection, data analysis, and deviations from
protocol. There are &' publications found and only ( publications are selected for
the review. )owever, only * publications speci#cally done for chat forensics area
while other ' publication are done for topic classi#cation in chat message. Thus,
the number of study for topic detection in chat forensics are considered limited.
Keyword Systematic literature review, topic detection, chat forensics, cyber+crime investigation, chat message.
! Introduction.
nline social networking is a new innovative technology of communication that
has been used ehaustively by people in all around the world. -opular social
networking applications used by users are Facebook, Friendster, and Yahoo )o
et al., /00(1. !ecently, online chatting is one of online social networking services.
This service facilitates the user to communicate with other users. nline chats
also known as %nstance Messaging %M1 and we can de#ne this service as a form
of computer+mediated communication that occurs in real time and re$uires the
simultaneous participation of users rebaugh and 2llnutt, /0301. This means
that the users get the feedback or respond directly without having to wait as
long as the other persons are still connected with the chat service. The users
may have connected one to one user or one too many users for a conversation.
2t the beginning, the chat message was started by the 4N%5 command line
application and continued with the traditional client+based messaging programs
before the growth of web+based chat message programs, which have become
popular these days 6iley et al., /0071. Most people connected to this online
social network to build social relationships with people, for eample family,
friends, and even a new friend. Normally, users used the chat conversation forcommunicating with other people besides echanging ideas and a discussion.
-
8/17/2019 SLR 150613
2/17
The users can echange the tet message, image, and documents by using the
chat message. )owever, from the criminal point of view, online social networking
is one of the methods to commit the crime. They easily hide their identity by
using virtual identities, which mean the criminals may use fake information about
themselves. Therefore, searching evidence for identifying the criminals and their
activities becomes a di8cult process.
2t present, there are 3*,07&,000 Malaysian have subscribed Facebook, which
makes it number 37 in the world ranking of Facebook users by country
Facebook, /03/1. Facebook has been adding chat functionality in their features,
which released on 2pril /*, /007. Facebook chat currently support instant
messaging clients such as Yahoo Messenger, Skype, 2" %nstant Messenger, and
"ive Messenger. This is an attractive target for perpetrators to commit crimes.
9lectronic or computing component is not re$uired to commit some crimes such
as murder, drug tra8cking and kidnapping. Nevertheless, technology+based
systems include chat messaging client can play a role in facilitating crimes and
other common criminal activities. %nstant messaging clients provide ideal
settings for gathering intelligence, and such information may enable criminals to
eecute their crime, for instance by determining that someone is a :suitable;
victim. nline chatting might also utiliected when collectingas much information as possible from a suspect and victim machines 6iley et al.
(20071, Simon and Slay /03011.
Therefore, the ob?ective of this article is to make systematic literature review
S"!1 of eisting studies in the chat message area for forensic investigation. This
review is done based on the systematic review by 6itchenham /00(1 and @ala et
al (203*1. This systematic literature review conducted to list down the research
area for the chat forensicAs investigation, and the techni$ues used for each
research area. The net sections are the method used in S"! articles, the results
from the S"!, the discussion from the derived research $uestion, and the
conclusion.
-
8/17/2019 SLR 150613
3/17
" #et$od.
This article has been following the guidelines from an S"! article by 6itchenham
/00(1 and @ala et al. /03*1. This section will show the steps taken to prepare
this systematic literature review. The steps and guidelines mention by
6ichenham /00(1 include by de#ning the research $uestion, search process,
%nclusion and eclusion criteria in the study, $uality assessment, data collection,
data analysis, and deviations from protocol.
/.3 !esearch $uestion.
The research $uestion is an important part in a systematic review since the
$uestion used as the guideline for the entire process in the study. The research$uestions in this article are following the $uestion structure suggested by
6itchenham /00B1 which includes the population, intervention, comparison, and
outcome. The $uestion structure also known as -%C paradigm, which
implemented in the article written by @ala et a l. (203*1. The de#nitions of each
category are as followsD
-opulation -1D The population is the application of area, for eample people,
pro?ect type, and application types. The contet for this article focuses on the
chat forensics.
%ntervention %1D The intervention is the technologies for the software methods,tools, or the procedure for the selected area. %n this contet, the intervention
used either digital forensicAs tool or stylometric.
Comparison C1D Comparison used to compare the intervention with the
procedure or methodology used in the articles. This article will compare the
limitation of the method used in each eperiment.
utcome 1D The outcome used to de#ne the e>ect of the technology towards
each eperiment. The outcome for this articleAs contet is the best method or
tools used throughout the chat forensicAs area.
The research $uestions !E1 are addressed in this article as followsD
!E3D )ow much research article related to chatting forensics were produced
since /00&
!E/D )ow much research article related to topic detection for chat forensics were
produced since /00&
!E*D Ghat are the techni$ues or methods used in the related study
!EBD Ghat are the limitations in the study
2s for the !E3, the $uestion derived from the #rst $uestion structure which is
-
8/17/2019 SLR 150613
4/17
under the population category. The purpose of this $uestion is to analy
-
8/17/2019 SLR 150613
5/17
Tab%e ! - C%ose%y re%ated &eywords
Keywords C%ose%y re%ated &eywords
Chat 2" %nstant Messenger, MSN
Messenger, Yahoo Messenger, %!C
channel, %nstant messenger, %M,
Gindows "ive Messenger, -idgin
Messenger, nline messages, Trillian,
computer mediated communication,
social networks, online messages,
Hoogle talk, Skpe, tetual
communication, unstructured tet.=igital forensics analysis 2uthorship analysis, stylometric,
classi#cation techni$ue, contact
identi#cation, topic identi#cation,
threat detection.
Stylometric Griting Style, write print.
Techni$ue Model, framework.
Most appropriate model )igh accuracy, compatible, applicable.
2uthorship analysis 2uthor identi#cation, gender
prediction, gender identi#cation, author
attribution.
=igital forensics Cybercrime investigation, cyber
forensics.
Then the closely related review article was selected manually from the search
result in the digital database source.
-
8/17/2019 SLR 150613
6/17
/.* %nclusion and eclusion criteria.
The inclusion and eclusion criteria de#ned for specifying the selection of review
articles later. The article was selected if the title and abstract of the article is
related to chatting forensicAs study and topic detection on chat message since
these are the focuses of this article. This criterion is de#ned as the inclusion
criteria. The eclusion criterion in the contet of this article is de#ned as any
article, which is not related to digital forensics will be ecluded during the
selection process. )owever, the article still can be selected for review if the work
applicable to digital forensicAs area even though the main area for the article is
not for forensics.
' Resu%t.
Three forms of results shown on this section, which include the summary of the
search process followed by the result of $uality assessment and $uality factor.
*.3 Search results.
2fter thoroughly running the search process, &' published articles found from the
digital library database. Then the articles are divided into the topic area
discussed in the articles which include authorship analysis, topic detection, and
message attribution. The published articles found for authorship analysis are /3
articles, topic detection had 3/ articles, and message attribution had 3' articles.
Ghile the results for other articles for chat forensics, which had di>erent topic
area is combined into JotherK category, which had seven published articles
found. Table / shows the summary from the search results. The summary shownthe result from the search process which displayed the number of publication
found based from each digital library database and the number of publication
found from each area of study.
Tab%e " - Summary of Searc$ Resu%t
Database
name
(o. of
pub%icati
on found
Aut$ors$i
p
ana%ysis
Topic
detection
#essage
Attributio
n
)t$er
I*** +p%ore 33 B B B 3
Springer Lin& 3& 7 / B 3
Science
Direct
3' & / 7 3
AC# digita%
%ibrary
/ / 0 0 0
*mera%d 3 0 / 0 0
,i%ey 3 3 0 0 0
-
8/17/2019 SLR 150613
7/17
oog%e
Sc$o%ar
& 3 / 0 B
Tota% / "! !" !/ 0
2lthough the number of articles found for topic detection is 3/, only nine articles
are selected in for the review. The reason for ecluded three articles out is
because one unselected article
-
8/17/2019 SLR 150613
8/17
' and 6ose 0 determination of chat
conversations; topic in
Turkish tet based chat
mediums
messenger log #les and
m%!C.
SL
Miah etal.
/033
=etection of childeploiting chats from a
mied chat dataset as a
tet classi#cation task
Chat+logs from -erverted ustice Foundation
%ncorporated -F%1, and
collection of anonymous
chats from websites like
httpDwww.fugly.com and
httpDchatdump.com.
S
7
Chen et
al.
/03
/
2 Topic =etection Method
@ased on Semantic
=ependency =istance and
-"S2
2 real world interactive tet
set collected from a EE
group named A"inu groupA
EE chat1.
S
(
M. 2.
@asher
and C. M.
Fung
/03
*
2naly
-
8/17/2019 SLR 150613
9/17
+ JNoK indicates that the $uestion is contrary or the author not addressed about
the $uestion in the article. The score assigned for this answer is 0.
+ J-artiallyK indicates that the content of article may have implicit meaning or
there is obscurity in the article. The score assigned for this answer is &.0.
Table B shows the result of the $uality assessment for the reviewed articles. 2ll
studies scored B and above with three studies scored B, one study scored B.&,
and one study scored &. The result also shows the percentage of compliance,
which scored 70O and above.
Tab%e 2 - 3ua%ity assessment resu%t
I
D
3A! 3A" 3A' 3A2 3A 3A/ Tota% 4ercentage
of
Comp%iance567
S
3
3 3 0.& 3 0.& 3 & 7*.**
S
/
3 3 0.& 0.& 0.& 0 *.& &7.**
S
*
3 3 3 0.& 0.& 3 & 7*.**
S
B
3 0.& 3 3 3 3 &.& (3.'L
S
&
3 3 0.& 0.& 0.& 3 B.& L&.00
S
'
3 3 3 3 0.& 3 &.& (3.'L
S
L
3 3 3 3 3 3 ' 300.00
S
7
3 3 3 0.& 0.& 3 & 7*.**
S
(
3 3 3 3 3 3 ' 300.00
*.* Euality factors.
This section had been following the articles by 6ichenham et al. /00(1. The
relationship between the $uality score for the published articles and the years of
the published articles are investigated, which shown on the Table &.
-
8/17/2019 SLR 150613
10/17
Tab%e - Average 8ua%ity score for pub%is$ed artic%es by year ofpub%ication.
1ears
"99
/
"99
0
"99
:
"99
;
"9!
9
"9!
!
"9!
"
"9!
'
(umber of Studies 3 0 / 3 / 3 3 3
#ean 8ua%ity score & 0 B./& &.& & ' & '
Standard deviation of
8ua%ity score
0 0 3.0' 0 0.L3 0 0 0
The $uality score for most of the years was 0 because at least one article is
published for topic detection in chat message each year.
2 Discussion.
B.3 )ow much research article related to chatting forensics were produced since
/00&
The results from Table / and Table ' shows that there are &' published articles
found, which related to chatting forensicAs studies. 9ach year there are a number
of publications published for the respective topic area. The topic areas includeauthorship analysis, topic detection, message attribution, threat detection,
monitoring system, data forgery detection, and social network security.
The term of authorship attribution can be de#ned as a process of eamining the
characteristics of a document to #nd or validate the author of a document. The
studies for authorship attribution can be divided into three categories Pheng,
/00'Q rebaugh and 2llnutt, /030Q Nirkhi et al., /03/1D
R 2uthorship identi#cationD identify the real author of a tet message by
eamining other samples of tet by a particular author.
R 2uthorship characteri
-
8/17/2019 SLR 150613
11/17
Topic detection or also known as topic classi#cation is a process to trace the
main topic discussed in a conversation. The comparison between these two
areas is that the authorship attribution is more concerns in detecting and
attributing the author while the topic detection is more focused on the content of
the conversation discussed in a chat message.
Message attribution is a process for eamining the log of chat messages to #nd
the artifacts left to use as the evidence, for eample time, user name, data
echange, and %- address =ickson, /00' 31, /00' /11.
Tab%e / - (umber of studies according to year of pub%ication.
1ear
"9
9
"9
9/
"9
90
"9
9:
"9
9;
"9
!9
"9
!!
"9
!"
"9
!'
Tot
a%
2uthorship analysis 0 B / * / / L 3 0 /3
Topic detection 3 / 3 / 3 / 3 3 3 3/
Message
attribution
0 B * 3 3 * / / 0 3'
ther 3 3 0 0 / 3 / 0 0 L
(umber of
pub%ications
" !! / / / : !" 2 ! /
Table ' shows that the authorship analysis area had the highest number of
publication with /3 studies, followed by the message attribution area with 3'
studies, and message attribution with 3/ studies, whereas the otherAs topic area
only had seven published article. The result shows that most studies are done for
criminal identi#cation purpose.
B.3 )ow much research article related to topic detection for chat forensics were
produced since /00&
Section *.3 mentioned that there are 3/ publications found for topic area but
only nine publications are used for systematic literature review. Table L shows
the number of publications which speci#cally addressed for chat forensics and
the number of publications, which generally focused on topic detection on chat.
-
8/17/2019 SLR 150613
12/17
Tab%e 0 - Average 8ua%ity score for pub%is$ed artic%es based on c$atforensics purpose.
4ub%is$ed for c$at
forensics.
4ub%is$ed genera%%y
for te
-
8/17/2019 SLR 150613
13/17
contetual features. Three basic approaches tried in the study which is n+grams,
foul language, and TF+%=F features.
=i>erent with other studies, Chen et al. /03/1 implement the statistical
techni$ues for information retrieval, which integrates the semantic dependency
distance S==1 and probabilistic latent semantic analysis -"S21 for topicdetection in Chinese chat.
B.3 Ghat are the limitations of the study
2fter thoroughly eamined the selected S"! articles, there are a few aspects,
which considered as the limitation of topic detection in chat message for
forensicAs investigation study.
"anguage of chat dataD Current studies focused on 9nglish chat data =ong et
al., 200'Q )ui et al., 2007Q Miah et al., 201/Q and M. 2. @asher and C. M. Fung,
/03*1 while other languages were Turkish
-
8/17/2019 SLR 150613
14/17
approach had been demonstrated with the best performance for tet
classi#cation purpose )ui et al., /0071. There are a few limitations from the
eisting studies were listed in section B.B.
Appendi< > ?nse%ected studies
(
o.
Aut$or 1ea
r
Tit%e Reason for
re@ection
3 5iong et
al.
/00
&
Geb+chat monitor system+research
and implementation
Monitoring
system.
/
-
8/17/2019 SLR 150613
15/17
3* Chaski /00
L
The keyboard dilemma and
authorship attribution
2uthorship
analysis
3B 6ose et al. /00
L
Mining chat conversations for se
identi#cation
2uthorship
analysis
3& =ickson /00
L
2n eamination into trillian basic *.
contact identi#cation
Message
attribution
3' =ongen /00
L
Forensic artefacts left by pidgin
messenger /.0
Message
attribution
3L =ongen /00
L
Forensic artefacts left by windows live
messenger 7.0
Message
attribution
37 kolica et
al.
/00
L
4sing author topic to detect insider
threats from email tra8c
Topic detection
on email
3( 6ucukyilm
a< et al.
/00
7
Chat miningD predicting user and
message attributes in computer+
mediated communication
2uthorship
analysis
/0 %$bal et al. /00
7
2 novel approach of mining write+
prints for authorship attribution in e+
mail forensics
2uthorship
analysis
/3 6ose et al. /00
7
2 comparison of tetual data mining
methods for se identi#cation in chat
conversations
2uthorship
analysis
// 6iley et al. /00
7
Forensics analysis of volatile instant
messaging
Message
attribution
/* Mar?uni et
al.
/00
(
"eical criminal identi#cation for
chatting corpus
2uthorship
analysis
/B Cheng et
al.
/00
(
Hender identi#cation from e+mails 2uthorship
analysis
/& )o et al. /00
(
%dentifying google talk packets Message
attribution
/' Cheng et
al.
/00
(
Forensics tools for social network
security solutions
Social network
security
/L Silva et al. /00
(
irtual forensicsD social network
security solutions
Social network
security
/7 rebaugh
and
2llnutt
/03
0
=ata mining instant messaging
communications to perform author
identi#cation for cybercrime
investigations
2uthorship
analysis
/( %$bal et al. /03 Mining writeprints from anonymous e+ 2uthorship
-
8/17/2019 SLR 150613
16/17
0 mails for forensic investigation analysis
*0 Yang et al. /03
0
Forensic analysis of popular chinese
internet application
Message
attribution
*3 )usain I
Sridhar
/03
0
%forensicsD forensic analysis of instant
messaging on smart phones
Message
attribution
*/ Simon and
Slay
/03
0
!ecovery of skype application activity
data from physical memory
Message
attribution
** 6ontostath
is et al.
/03
0
Tet mining and cybercrime Crime
classi#cation
*B %$bal et al. /03
3
2 uni#ed data mining solution for
authorship analysis in anonymous
tetual communications
2uthorship
analysis
*& Cheng et
al.
/03
3
2uthor gender identi#cation from tet 2uthorship
analysis
*' 2li et al. /03
3
9valuation of authorship attribution
software on a chat bot corpus
2uthorship
analysis
*L )ariharan
and
!ani.6.!
/03
3
Hender prediction in chat based
medium;s using tet mining
2uthorship
analysis
*7 -eersman
et al.
/03
3
-redicting age and gender in online
social networks
2uthorship
analysis
*( =ing et al. /03
3
4ser identi#cation for instant
messages
2uthorship
analysis
B0 -ateriya
et al.
/03
3
2uthor identi#cation of email forensic
in service oriented architecture
2uthorship
analysis
B3 Mutawa et
al.
/03
3
Forensic artifacts of facebookAs instant
messaging service
Message
attribution
B/ Simon and
Slay
/03
3
!ecovery of pidgin chat
communication artefacts from physical
memory a pilot test to determine
feasibility1
Message
attribution
B* Nirkhi et
al.
/03
/
2nalysis of online messages for
identity tracing in cybercrime
investigation
2uthorship
analysis
BB Mutawa et
al.
/03
/
Forensic analysis of social networking
applications on mobile devices
Message
attribution
B& "evendosk /03 Yahoo Messenger forensics on Message
-
8/17/2019 SLR 150613
17/17
i et al. / windows vista and windows L attribution
B' 2l+Paidy /03
/
Forensic analysis of social networking
applications on mobile devices
=iscovering
criminal
network
BL Teng and
"in
/03
/
Skype chat data forgery detection =ata forgery
detection