research article method for detecting core malware sites...
TRANSCRIPT
Research ArticleMethod for Detecting Core Malware Sites Related toBiomedical Information Systems
Dohoon Kim Donghee Choi and Jonghyun Jin
Agency for Defense Development Daejeon 305-600 Republic of Korea
Correspondence should be addressed to Dohoon Kim karmy01addrekr
Received 5 December 2014 Accepted 17 February 2015
Academic Editor Joongheon Kim
Copyright copy 2015 Dohoon Kim et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
Most advanced persistent threat attacks target web users through malicious code within landing (exploit) or distribution sitesThere is an urgent need to block the affected websites Attacks on biomedical information systems are no exception to this issueIn this paper we present a method for locating malicious websites that attempt to attack biomedical information systems Ourapproach uses malicious code crawling to rearrange websites in the order of their risk index by analyzing the centrality betweenmalware sites and proactively eliminates the root of these sites by finding the core-hub node thereby reducing unnecessary securitypolicies In particular we dynamically estimate the risk index of the affected websites by analyzing various centrality measures andconverting them into a single quantified vector On average the proactive elimination of core malicious websites results in anaverage improvement in zero-day attack detection of more than 20
1 Introduction
Various types of cyber-attacks have recently been attemptedon biomedical information systems [1 2] This is mainlybecause the personal records included in biomedical systemsrepresent valuable financial information
Unfortunately current network security solutions aremore vulnerable to advanced intelligent cyber-attacks [3]than to traditional cyber-attacks (eg distributed denial ofservice and spam) Because advanced persistent threat (APT)attacks [4 5] are concentrated on the weak point of the targetand the context it is very hard to establish which APT attackdetection method and defense system are most appropriatefor biomedical information systems
APT attacks are generally administered through mali-cious code exploitlandingdistribution sites and infectedUser (or Administrator) PCs [6] easily give up contacts tobiomedical information systems Therefore it is necessaryto preisolate the contact points by which malicious code isdisseminated that is the exploitlandingdistribution sites todefend against these targeted attacks and protect biomedicalinformation systems
To defend against APT attacks on biomedical informationsystems it is vital to analyze the way in which the network
betweenmedical websites and relatedwebsites is formedThisis because APT attacks make use of various sociotechnolog-ical methods [7] and create as many links as possible withmedical service users (patients) medical staff and relatedpeople via various contacts Above all administrators shoulddetect malicious code targeted at biomedical informationsystems in an early stage and block the core-hubnode in orderto cope with APT attacks
Therefore this paper proposes amethodology that blocksand eliminates malicious code at an early stage by detectingthe core-hub node at the root of the network between the bio-medical information system-targetedmalicious code exploitlandingdistribution site and the related websites This paperalso employs network analysis to estimate and manage therisk index of the detected malware sites by determiningthe potential risk factor of each exploitlandingdistributionpoint In particular we present a method for reprocessingmalicious code so that it can be used as a reference in termsof malicious code detection and management
Furthermore this paper supports the efficient classifi-cationapplication and management of massive blacklists interms of biomedical information system-targeted malwaresites In this paper we measure the risk index of websites
Hindawi Publishing CorporationComputational and Mathematical Methods in MedicineVolume 2015 Article ID 756842 8 pageshttpdxdoiorg1011552015756842
2 Computational and Mathematical Methods in Medicine
Internet user
Internet user
Internet user
Landing siteDistribution site
Hopping site
Hopping site
Hopping site
Hopping site
Hopping site
Hopping site
Exploit site
Exploit site
Exploit site
Hopping section Exploit section
Distribution site
Distribution site
RedirectionWeb connection Malicious code dissemination
Figure 1 Definition of landing (or exploit)distribution sites including malicious code
with links to biomedical information systems and produce amalicious URL risk index (MRI) from this reference index
2 Background
Todetect the core-hubnode it is first necessary to understandthe entire framework of malicious code distribution andinfection through malicious websites It is also important tounderstand the typical methods of detecting such websitesand to appreciate certain risk estimation methods for thedetection of malicious sites
21 Malware Site Framework To estimate the risk index of amalware site we need to understand the dissemination routeFigure 1 illustrates the definition and operation principles ofthe malware site detection framework which is the basis forrisk index estimation
As shown in Figure 1 the victim (ie internet user) firstvisits the landing site connected with the distribution site andis then redirected to a hopping site or exploit site and finallydownloads themalicious codeThe internet user is eventuallyinfected by the malicious code and may be damaged byvarious secondary cyber-attacks (eg personal informationleaks system destruction and other host-derived attacks)
22WebCrawling-BasedMalicious Site Detection Most stud-ies on malware sites have mainly focused on detectionThese studies primarily apply a web crawling method thatrapidly collects the URL information of websites througha web crawler-based search engine [8 9] However theweb crawling technology used for malicious code collectionselects and collects the execution files or compressed files thatcontain the malicious code unlike the web crawling appliedby search engines
The web crawler considers URLs with file extensionsof exe or HTTP headers with ldquoapplicationoctet-streamrdquocontent types to be execution files and downloads them Thecrawler then inspects the headers of the downloaded files toconfirm whether they are execution files As execution filescompressed files and MS installation files are inspected anddownloaded in the same way
A number of web crawling-based automatic maliciouscode collection techniques have been proposed most ofwhich search websites via web crawling confirm whether thewebsites include malicious code and then downloadanalyzethe relevant content
3 Analysis of the Risk Index of BiomedicalInformation System-Related Malware Sites
We first propose a method for estimating the risk index ofbiomedical information system-targeted malware sites andestimate the ultimate risk index by analyzing the potentialthreat through a correlation analysis between the distributionsite and the other connected sites
The following sections describe our approach for predict-ing the risk index of the exploitlanding sites that redistributethe malicious code The risk of individual exploitlandingsites is calculated through this prediction
31 Vector-Based Risk Index Estimation Method We employa risk vector calculation to estimate the risk index [10] As aplanar vector is indicated by arranging two real numbers athree-dimensional vector is indicated by arranging three realnumbers in the rectangular coordinate system
Spatial rectangular coordinates are indicated by arrangingthree real numbers that are orthogonal to each other throughthe origin 119874
Computational and Mathematical Methods in Medicine 3
MRI
V1
V2
y (betweenness centrality)
z (degree centrality)
x (eigenvector centrality)
O
MRI998400
Figure 2 Entire analysis diagram for malicious code landing (orexploit)distribution site risk estimation
We fix the three coordinate axes 119909 119910 and 119911 set thepositive direction of the 119909 119910 and 119911 axes and then define thelength scale
As shown in Figure 2 three vectors (connectivity eigen-vector and betweenness) are used to estimate the risk indexof malicious code landing (or exploit)distribution sites andthe length is indicated by the vector sum [10] The purpose isto indicate different vector values as lengths and then quantifythe risk index through this
We thus determine which sites have the highest-riskindex and find the significance-based concentration degreeof the corresponding sites by analyzing the central structureof the exploitlandingdistribution sites within maliciouscode that is connected to medical information systemsTo interpret various meanings more objectively this paperrepresents a risk factor and estimates the ultimate risk indexby analyzing the connectivity [11ndash13] degree eigenvector andbetweenness of the distribution site and exploitlanding siteand vectorizing the calculated value We now define eachelement of the risk index for the detected malicious codeexploitlandingdistribution sites
(i) Degree Centrality Analysis of NodesThis is defined asthe number of links incident upon a nodeThe degreecan be interpreted in terms of the immediate risk ofa node catching whatever is flowing through the net-work (such as malware sites) In the case of a directednetwork (where ties have direction) we usually definetwo separate measures of degree centrality namelythe in-degree and out-degree centrality
(ii) Eigenvector Centrality Analysis of Nodes This mea-sures the influence of a node within a networkRelative scores are assigned to all nodes in thenetwork based on the concept that connections tohigh-scoring nodes contribute more to the score ofthe node in question than equal connections to low-scoring nodes
(iii) Betweenness Centrality Analysis of Nodes This is thenumber of shortest paths from all vertices to allothers that pass through that node A node withhigh betweenness centrality has a large influence onthe transfer of items through the network underthe assumption that the transfer of items follows theshortest path
32 Method to Estimate Malicious URL Risk Index (MRI) Toestimate the risk index of theURLof amalicious code exploitlandingdistribution site we follow the process in Figure 3
(1) Step 1 Node Characteristic Classification Landing (orexploit)distribution site information is classified bythe logs produced through the self-developed mali-cious code detection crawler and the detectionhistory is sorted by time from the unit logs of themali-cious code exploitlandingdistribution siteThe basicrisk is also estimated with the following log informa-tion
A Node Characteristic Whether the infected siteis an exploitlanding site or a distribution siteis confirmed If there is no link to the detectedmalicious code (ie the information on thefirst infected site) the site is defined as adistribution site If the URL of another site isexploiteddistributed the site is defined as anexploitlanding site
B Malicious Code ExploitLandingDistribution SiteInformationThis is theURLof the detectedmali-cious code exploitlandingdistribution siteTheexploitlanding site can be the distributionsite If the distribution site is eliminated by aself-developed or other detection system theexploitlanding site is rendered as the distribu-tion site and operated continuously as a mali-cious code distribution site
C IP Address Country Code amp Site SurvivabilityBasic information is collected through the IPaddress and the related server location and thecurrent operating status is investigated In par-ticular the survivability of the exploitlandingdistribution site is very important in estimatingthe risk index Although the site has beentreated or isolated and is no longer operated thepossibility of reinfection exists if the weak pointis exposed continuously Therefore this shouldbe reflected in the risk index estimation
(2) Step 2 Centrality Analysis of NodeThe following threeindices are applied to the centrality analysis of eachnode
(i) Degree Centrality Analysis(ii) Eigenvector Centrality Analysis(iii) Betweenness Centrality Analysis
4 Computational and Mathematical Methods in Medicine
Step 1Node
characteristicclassification
Eigenvectorcentrality analysis
of node
Betweenesscentrality analysis
of node
Degree centrality analysis of node
Step 2
1st order risk analysis
Step 3
2nd order risk analysis
Step 4
Distribution site risk analysis
Exploit site risk analysis
Weight value
calculation
Crawling DB
MRI estimationStep 5
Figure 3 Entire analysis diagram for risk estimation of malicious code exploitlandingdistribution site
A Degree Centrality Index (DCI)
(i) A node that has more directly connectedneighboring nodes has higher degree cen-trality The scale of direct effects is mea-sured
(ii) Degree centrality is calculated from thecomposition ratio of each node
DCI =sum (weight of incedent link)
of nodes minus 1
Time complexity 119874 (119899) (1)
B Eigenvector Centrality Index (ECI)
(i) Assume that the number of the linksincluded in node 119873119895 is 119897119895 If one of theselinks is connected to node119873119894 the probabil-ity that 119873119895 passes 119873119894 is 1119897119895 Therefore theultimate ECI is as follows
ECI = 119868 (119873119894) =sum 119868 (119873119895)
119897119895
(2)
C Betweenness Centrality Index (BCI)
(i) To measure the BCI measure the degreeto which a node is located on the shortestroute between nodes
(ii) The betweenness centrality of a node ishigher if the node connects more differentnode groups The BCI indicates the degreeto which a node functions as a bridge in theentire network
(iii) It is possible to find the intermediate URLthat links information between fields
(iv) Suppose that 119892119895119896 is the shortest possibleroute between nodes 119895 and 119896 in the networkand 119892119895119896(119899119894) is the shortest possible routebetween nodes 119895 and 119896 that includes node119894 The probability of the shortest route thatincludes node 119894 is 119892119895119896(119899119894)119892119895119896
BCI = 119862119861 (119899119894) =sum119895lt119896 119892119895119896 (119899119894)
119892119895119896
(3)
If the main target node is constructed asa child node of depth 1 the degree willbe increased However the BCI will bedecreased by (3)
(3) Step 3 1st Order Risk Analysis The 1st order risk isestimated by calculating the Euclidean distance of thenode analysis result from Step 2 The 1st order risk isthus estimated by the vector distance formula for thevalues calculated in Step 2
1199031 =radicDCI2 + ECI2 + BCI2 (4)
Computational and Mathematical Methods in Medicine 5
(4) Step 4 2nd Order Risk Analysis
A Distribution Site Risk Analysis The risk indexis estimated by considering the weights (over-lapped infection history and survival ratio)based on the 1st order risk analysis The dis-tribution site risk is calculated by the vector ofthe values calculated in Step 3 the overlappedinfection history (119868) of each distribution sitenode and the actual survival ratio (119878)lowast Survival Ratio (119878) is as follows whether treat-ment has been given after infection (based onone yearrsquos information)
Treatment Probability (1198781)
=Survival Cases
Survival Cases + Treatesd Cases
Failure Probability (1198782)
=Treated Cases
Survival Cases + Treatesd Cases
1199032 = 1199031 times 119868 times 1198781 (If the node has been treated)
1199032 = 1199031 times 119868 times 1198782 (If the node has not been treated)
(5)
B ExploitLanding Site Risk Analysis The riskindex is estimated by considering the weights(overlapped infection history and exposure fre-quency) from the 1st order risk analysis Theexploit site risk is calculated by the vectorof values calculated in Step 3 the overlappedinfection history (119868) of each distribution sitenode and the actual exposure frequency in asearch website (119864)
1199033 = 1199031 times (2 times 119868 times 119864
119868 + 119864) or 1199033 = 1199031 times 119868 (6)
(5) Step 5 Malicious URL Risk Index (MRI) The MRI isestimated from the 1st order risk analysis result andthe risk indexThe following formula can be deducedfrom the 1st order risk analysis result calculated inStep 3 and the risk index of each distributionexploitsite calculated in Step 4 considering the characteris-tics of the corresponding node
119903final = radic11990321 + 11990322 + 11990323
(7)
4 Experimental Results
We conducted experiments to examine the performance ofour zero-day detection method based on MCC
For these experiments we processed the detection logacquired by crawling biomedical information system-relatedmalware sites with the developedMCC in the log form statedin Step 1
The estimated risk values are intuitive in our proposedmodel That is our final interpretation is based on the
crawling result Additionally the crawling method uses ablacklist or known patterns Thus our proposed modelexhibits a low false positive rate
The MCC detection method proceeds as follows Theattacker (hacker) inserts malicious code into a specific web-page by operating a malicious code distribution server on theinternet or by hacking a vulnerable web serverThe clients (orusers) of the web server involuntarily use the exploitlandingdistribution site containing themalicious code and downloadthe malicious code Eventually the attacker collects the clientaccounts and various other information from the infectedserver and proceeds to act maliciously
The proposed system searchescrawls 25 million siteson a continuous basis detectsblocks the inserted maliciouscode and establishesoperates a malicious code blacklist
41 Analysis Results As a post hoc study based on the resultsof the MCC operation for a specific period our results sup-port decision making for proactive responses and follow-upmeasures enabling biomedical information system securityexperts or administrators to maximize their operationalefficiency
Figure 4 shows the MRI estimated through the 1st and2nd order risk analysis after the detection of malicious URLs
Table 1 lists the detected malicious code exploitlandingdistribution URLs (including both exploitlanding sites anddistribution sites) The risk index is a relative value If limitedto the range 0-1 the minimum risk would be fixed at 0 but itis hard to set a clear standard for the maximum risk
In this paper we use a relative risk index that fixes theminimum risk to 0 and indicates the high-risk core malwaresites through prioritization
42 Sensitivity Analysis Thedetection rate of actual zero-dayattacks can be measured using a sensitivity analysis based onthe results given in Table 1 Among the malware sites relatedto zero-day attacks occurring to biomedical informationsystems we analyze distribution sites and exploitlandingsites Table 2 shows the detection rate measurements basedon actual data produced in a specific time window
The results in Table 2 focus on the top five high-risk sitesThe multipath malware site group denotes the number ofexploitlanding sites actually connected with a distributionsite The percentage represents the average detection rate ina specific time window and this detection performance isbetter than in the pre-analysis stage
The average early detection rate of distribution sites andexploitlanding sites is also higher in this section than in thepreanalysis stage That is the proactive elimination of coremaliciouswebsites results in an average improvement in zero-day attack detection of more than 20
43 Visualization of Analysis Results The risk index of eachURL calculated in this paper can be analyzed by verifyingwhether the risk index agrees with the weak point of thecorresponding server
This section analyzes the actual weak point based onthe calculated risk index and verifies whether this indexagrees with the actual prioritization using an error analysis
6 Computational and Mathematical Methods in Medicine
1st highest-risk landing (or exploit) site1st highest-risk distribution site
Figure 4 Visualization of malware site risk
Table 1 MRI estimation result of exploitlandingdistribution sites
Node type URL MRI ReliabilityDistribution site http222lowast lowast lowastlowast lowast lowastlowast lowast lowastchhtml 03965 91Distribution site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcomNewindexhtml 03505 92Distribution site httpa1lowast lowast lowast lowast lowast lowast lowast lowast lowastcom1indexhtml 03058 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 03047 95Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 03026 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 03017 94Distribution site httpa2lowast lowast lowast lowast lowast lowast lowast lowast lowastcom2indexhtml 03009 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastrekr 03003 92Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 02993 92Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02991 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02983 91Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02982 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorg 02970 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 02969 96Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcom 02968 95Exploit site httplowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02967 95Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02966 96Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 02966 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 02962 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcom 02961 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02960 94(ldquolowast lowast lowast lowast lowast lowast lowast lowast lowastrdquo the URL information of malware site)
Computational and Mathematical Methods in Medicine 7
Table 2 Average detection rate of zero-day attacks for a given day
Priority of risk Malware site groupwith multipath
Distribution site withsingle path
Landing (or exploit) sitewith single path
1 233 (15) 215 2822 226 (9) 316 3243 147 (8) 228 1814 184 (10) 197 3235 212 (12) 242 176Average early detection rate 2004 2396 2572
technique Figure 4 visualizes the 1st highest-risk distributionsite according to the MRI
The detection and elimination of high-risk maliciouscode exploitlandingdistribution sites related to biomedicalinformation systems can be achieved by visualizing the 1sthighest-risk exploitlandingdistribution site as shown inFigure 4Thus our proposedmodel focuses on estimating therisk presented by target malware sites in the specific field ofbiomedical information
We verify the performance of the proposed model basedon static analysis However for military government orsimilar organizations we must dynamically filter out coremalware sites based on high-performance hardware plat-forms For this reason our method is a good example of asuitable defensive measure for APT attacks
5 Related Work
Methods for detecting and analyzingwebsites includingmali-cious code can generally be divided into static and dynamicanalysis
51 Static Analysis Static analysis mainly uses machinelearning and pattern matching to detect and classify mali-cious URLs
Ma et al [14 15] presented a classification model thatdetects spam and phishingURLsThismodel uses a statisticalmethod to classify URLs by considering the lexical and host-based properties of malicious URLs Although this methoddetects both spam and phishing URLs it cannot distinguishbetween the two
Another approach is to analyze the JavaScript code inweb pages to find the typical features of malicious codeThis is done either statically [16] or dynamically by loadingthe affected pages in an emulated browser [17] Systemssuch as Prophiler [18] consider both JavaScript and otherfeatures found in HTML and the URLs of malicious pagesWhittaker et al [19] proposed a phishing website classifier toautomatically update Googlersquos phishing blacklist They usedseveral features obtained from domain information and pagecontents
JSAND [20] used amachine learning approach to classifymalicious JavaScript
52 Dynamic Analysis Dynamic analysis analyzes theserverndashclient connection to detect and classify maliciousURLs
In other words dynamic analysis relies on visiting web-siteswith an instrumented browser (often referred to as a hon-eyclient) and monitoring the activities of the machine to findthe typical signatures of successful exploitations (eg the cre-ation of a new process) [21] PhoneyC [22] uses a signature-based low-interaction honeypot to detect malicious websites
Systems such as [23 24] execute web content dynamicallyand capture drive-by downloads based on either signatures oranomaly detection while Blade [25] leverages user behaviormodels for drive-by download detection All of these systemsexhibit good detection results However it is usually costlyto follow the full redirection path and monitor each scriptexecution in real time Moreover their accuracy is highlydependent on the malicious response of the webpage tovulnerable components
Provos et al [26] analyzed the maliciousness of a largecollection of web pages using a machine learning algorithmas a prefilter for VM-based analysis They adopted content-based features including the presence of obfuscated JavaScriptand exploit site-pointing iframes
The main differences between the models proposed inthis paper and previous approaches are as follows
(i) The model proposed in this paper applies a staticmethod to analyze the connectivity between nodesand detects the core-hub node dynamically based onthe risk index
(ii) The proposed model detects and blocks the core-hubnode using link data from the high-risk maliciouswebsites as observed for a specific period of time
(iii) The proposed model prevents the dissemination ofmalicious websites in the early stages by blocking thelink between the core malicious code distribution siteand the exploitlanding site
6 Conclusion
In this paper the 1st order risk of malware infection wasanalyzed using log information estimated by an MCC thatconsiders the DCI BCI and ECI of the main nodes based onthe priority of risk This provides a quantitative value of thepotential risk inherent in the corresponding site (node)
In addition the risk index of exploit sites and distribu-tion sites was calculated by considering their weights Theoverlapped infection history and survival ratio were used toestimate the risk of distribution sites whereas the overlapped
8 Computational and Mathematical Methods in Medicine
infection history and exposure frequency were consideredwhen estimating the risk of exploit sites Finally the MRI wasestimated using the 1st order risk analysis and the risk indexof the distribution sites and exploit sites
In future work we will develop a feature model thatpredicts the seriousness of website security problems by data-mining the logs produced frommalicious code detection andvulnerability scanning tools
As this feature model will be used to predict the risk ofa specific website it should contribute to establish an activemalicious code distribution blocking system that realizesproactive responses beyond the limit of reactive responsesthat rely only on traditional malicious code detection tools
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] J A Hansen and N M Hansen ldquoA taxonomy of vulnerabilitiesin implantable medical devicesrdquo in Proceedings of the 2ndAnnualWorkshop on Security and Privacy inMedical andHome-Care Systems (SPIMACS rsquo10) pp 13ndash20 October 2010
[2] C-S Park ldquoSecurity mechanism based on hospital authen-tication server for secure application of implantable medicaldevicesrdquo BioMed Research International vol 2014 Article ID543051 12 pages 2014
[3] E Hutchins M Cloppert and R Amin ldquoIntelligence-drivencomputer network defense informed by analysis of adversarycampaigns and intrusion kill chainsrdquo in Proceedings of the 6thInternational Conference on Information Warfare and Security(ICIW rsquo11) pp 113ndash125 Academic Conferences March 2011
[4] N Moran ldquoUnderstanding Advanced Persistent ThreatsmdashACase Studyrdquo 2010 httpswwwusenixorgsystemfilesloginarticles105484-Moranpdf
[5] S-J Kim D-E Cho and S-S Yeo ldquoSecure model againstAPT in m-connected SCADA networkrdquo International Journalof Distributed Sensor Networks vol 2014 Article ID 594652 8pages 2014
[6] N Provos P Mavrommatis M Abu Rajab and F MonroseldquoAll your iframes points to usrdquo in Proceedings of the USENIXSecurity 2008
[7] S Lee and J Kim ldquoWARNINGBIRD detecting suspiciousURLsin twitter streamrdquo in Proceedings of the Symposium on Networkand Distributed System Security (NDSS rsquo12) 2012
[8] X Sun Y Wang J Ren Y Zhu and S Liu ldquoCollectinginternet malware based on client-side honeypotrdquo in Proceedingsof the 9th International Conference for YoungComputer Scientists(ICYCS rsquo08) pp 1493ndash1498 Hunan China November 2008
[9] Y-C Cho and J-Y Pan ldquoMultiple-feature extracting modulesbased leak mining system designrdquoThe Scientific World Journalvol 2013 Article ID 704865 11 pages 2013
[10] D H Kim Y-G Kim H P In and H C Jeong ldquoA method forrisk measurement of botnetrsquos malicious activitiesrdquo InformationJournal vol 17 no 1 pp 165ndash180 2014
[11] C Ni C Sugimoto and J Jiang ldquoDegree closeness andbetweenness application of group centrality measurements toexplore macro-disciplinary evolution diachronicallyrdquo in Pro-ceedings of the ISSI pp 1ndash13 Durban South Africa 2011
[12] F Barzinpour B Hoda Ali-Ahmadi S Alizadeh and S G JalaliNaini ldquoClustering networksrsquo heterogeneous data in defining acomprehensive closeness centrality indexrdquoMathematical Prob-lems in Engineering vol 2014 Article ID 202350 10 pages 2014
[13] S K Raghavan Unnithan B Kannan and M JathavedanldquoBetweenness centrality in Some classes of graphsrdquo Interna-tional Journal of Combinatorics vol 2014 Article ID 241723 12pages 2014
[14] J Ma L K Saul S Savage and G M Voelker ldquoBeyondblacklists learning to detectmaliciousweb sites from suspiciousURLsrdquo in Proceedings of the 15th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo09) pp 1245ndash1253 July 2009
[15] J Ma L K Saul S Savage and G M Voelker ldquoIdentifyingsuspicious URLs an application of large-scale online learningrdquoin Proceedings of the 26th Annual International Conference onMachine Learning (ICML rsquo09) pp 681ndash688 2009
[16] C Curtsinger B Livshits B Zorn and C Seifert ldquoZozzlelow-overhead mostly static javascript malware detectionrdquo inProceedings of the USENIX Security Symposium 2011
[17] M Cova C Kruegel and G Vigna ldquoDetection and analysis ofdrive-by-download attacks and malicious JavaScript coderdquo inProceedings of the 19th International World Wide Web Confer-ence (WWW rsquo10) pp 281ndash290 April 2010
[18] D Canali M Cova G Vigna and C Kruegel ldquoProphiler a fastfilter for the large-scale detection of malicious web pagesrdquo inProceedings of the 20th International Conference on World WideWeb (WWW rsquo11) pp 197ndash206 2011
[19] C Whittaker B Ryner and M Nazif ldquoLarge-scale automaticclassification of phishing pagesrdquo in Proceedings of the Sympo-sium on Network and Distributed System Security (NDSS rsquo10)2010
[20] P Agten S van Acker Y Brondsema P H Phung L Desmetand F Piessens ldquoJSand complete client-side sandboxing ofthird-party JavaScript without browser modificationsrdquo in Pro-ceedings of the 28th Annual Computer Security ApplicationsConference (ACSAC rsquo12) pp 1ndash10 ACM December 2012
[21] C Seifert I Welch and P Komisarczuk ldquoHoneyc the low-interaction client honeypotrdquo in Proceedings of the New ZealandComputer Science Research Student Conference (NZCSRCS rsquo07)Hamilton New Zealand 2007
[22] N Jose ldquoPhoneyC a virtual client honeypotrdquo in Proceedingsof the 2nd USENIX Conference on Large-Scale Exploits andEmergentThreats Botnets SpywareWorms andMore USENIXAssociation Berkeley Calif USA April 2009
[23] Y-MWang D Beck X Jiang et al ldquoAutomatedweb patrol withstrider honeymonkeysrdquo in Proceedings of the 2006 Network andDistributed System Security Symposium February 2006
[24] The Honeynet Project Capture-HPC client honeypot 2008httpprojectshoneynetorgcapture-hpc
[25] L Lu V Yegneswaran P Porras and W Lee ldquoBlade an attack-agnostic approach for preventing drive-by malware infectionsrdquoin Proceedings of the 17th ACM Conference on Computer andCommunications Security (CCS rsquo10) pp 440ndash450 ACM Octo-ber 2010
[26] N Provos P Mavrommatis M A Rajab and F Monrose ldquoAllyour iFRAMEs point to usrdquo in Proceedings of the 17th USENIXSecurity Symposium pp 1ndash15 2008
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
2 Computational and Mathematical Methods in Medicine
Internet user
Internet user
Internet user
Landing siteDistribution site
Hopping site
Hopping site
Hopping site
Hopping site
Hopping site
Hopping site
Exploit site
Exploit site
Exploit site
Hopping section Exploit section
Distribution site
Distribution site
RedirectionWeb connection Malicious code dissemination
Figure 1 Definition of landing (or exploit)distribution sites including malicious code
with links to biomedical information systems and produce amalicious URL risk index (MRI) from this reference index
2 Background
Todetect the core-hubnode it is first necessary to understandthe entire framework of malicious code distribution andinfection through malicious websites It is also important tounderstand the typical methods of detecting such websitesand to appreciate certain risk estimation methods for thedetection of malicious sites
21 Malware Site Framework To estimate the risk index of amalware site we need to understand the dissemination routeFigure 1 illustrates the definition and operation principles ofthe malware site detection framework which is the basis forrisk index estimation
As shown in Figure 1 the victim (ie internet user) firstvisits the landing site connected with the distribution site andis then redirected to a hopping site or exploit site and finallydownloads themalicious codeThe internet user is eventuallyinfected by the malicious code and may be damaged byvarious secondary cyber-attacks (eg personal informationleaks system destruction and other host-derived attacks)
22WebCrawling-BasedMalicious Site Detection Most stud-ies on malware sites have mainly focused on detectionThese studies primarily apply a web crawling method thatrapidly collects the URL information of websites througha web crawler-based search engine [8 9] However theweb crawling technology used for malicious code collectionselects and collects the execution files or compressed files thatcontain the malicious code unlike the web crawling appliedby search engines
The web crawler considers URLs with file extensionsof exe or HTTP headers with ldquoapplicationoctet-streamrdquocontent types to be execution files and downloads them Thecrawler then inspects the headers of the downloaded files toconfirm whether they are execution files As execution filescompressed files and MS installation files are inspected anddownloaded in the same way
A number of web crawling-based automatic maliciouscode collection techniques have been proposed most ofwhich search websites via web crawling confirm whether thewebsites include malicious code and then downloadanalyzethe relevant content
3 Analysis of the Risk Index of BiomedicalInformation System-Related Malware Sites
We first propose a method for estimating the risk index ofbiomedical information system-targeted malware sites andestimate the ultimate risk index by analyzing the potentialthreat through a correlation analysis between the distributionsite and the other connected sites
The following sections describe our approach for predict-ing the risk index of the exploitlanding sites that redistributethe malicious code The risk of individual exploitlandingsites is calculated through this prediction
31 Vector-Based Risk Index Estimation Method We employa risk vector calculation to estimate the risk index [10] As aplanar vector is indicated by arranging two real numbers athree-dimensional vector is indicated by arranging three realnumbers in the rectangular coordinate system
Spatial rectangular coordinates are indicated by arrangingthree real numbers that are orthogonal to each other throughthe origin 119874
Computational and Mathematical Methods in Medicine 3
MRI
V1
V2
y (betweenness centrality)
z (degree centrality)
x (eigenvector centrality)
O
MRI998400
Figure 2 Entire analysis diagram for malicious code landing (orexploit)distribution site risk estimation
We fix the three coordinate axes 119909 119910 and 119911 set thepositive direction of the 119909 119910 and 119911 axes and then define thelength scale
As shown in Figure 2 three vectors (connectivity eigen-vector and betweenness) are used to estimate the risk indexof malicious code landing (or exploit)distribution sites andthe length is indicated by the vector sum [10] The purpose isto indicate different vector values as lengths and then quantifythe risk index through this
We thus determine which sites have the highest-riskindex and find the significance-based concentration degreeof the corresponding sites by analyzing the central structureof the exploitlandingdistribution sites within maliciouscode that is connected to medical information systemsTo interpret various meanings more objectively this paperrepresents a risk factor and estimates the ultimate risk indexby analyzing the connectivity [11ndash13] degree eigenvector andbetweenness of the distribution site and exploitlanding siteand vectorizing the calculated value We now define eachelement of the risk index for the detected malicious codeexploitlandingdistribution sites
(i) Degree Centrality Analysis of NodesThis is defined asthe number of links incident upon a nodeThe degreecan be interpreted in terms of the immediate risk ofa node catching whatever is flowing through the net-work (such as malware sites) In the case of a directednetwork (where ties have direction) we usually definetwo separate measures of degree centrality namelythe in-degree and out-degree centrality
(ii) Eigenvector Centrality Analysis of Nodes This mea-sures the influence of a node within a networkRelative scores are assigned to all nodes in thenetwork based on the concept that connections tohigh-scoring nodes contribute more to the score ofthe node in question than equal connections to low-scoring nodes
(iii) Betweenness Centrality Analysis of Nodes This is thenumber of shortest paths from all vertices to allothers that pass through that node A node withhigh betweenness centrality has a large influence onthe transfer of items through the network underthe assumption that the transfer of items follows theshortest path
32 Method to Estimate Malicious URL Risk Index (MRI) Toestimate the risk index of theURLof amalicious code exploitlandingdistribution site we follow the process in Figure 3
(1) Step 1 Node Characteristic Classification Landing (orexploit)distribution site information is classified bythe logs produced through the self-developed mali-cious code detection crawler and the detectionhistory is sorted by time from the unit logs of themali-cious code exploitlandingdistribution siteThe basicrisk is also estimated with the following log informa-tion
A Node Characteristic Whether the infected siteis an exploitlanding site or a distribution siteis confirmed If there is no link to the detectedmalicious code (ie the information on thefirst infected site) the site is defined as adistribution site If the URL of another site isexploiteddistributed the site is defined as anexploitlanding site
B Malicious Code ExploitLandingDistribution SiteInformationThis is theURLof the detectedmali-cious code exploitlandingdistribution siteTheexploitlanding site can be the distributionsite If the distribution site is eliminated by aself-developed or other detection system theexploitlanding site is rendered as the distribu-tion site and operated continuously as a mali-cious code distribution site
C IP Address Country Code amp Site SurvivabilityBasic information is collected through the IPaddress and the related server location and thecurrent operating status is investigated In par-ticular the survivability of the exploitlandingdistribution site is very important in estimatingthe risk index Although the site has beentreated or isolated and is no longer operated thepossibility of reinfection exists if the weak pointis exposed continuously Therefore this shouldbe reflected in the risk index estimation
(2) Step 2 Centrality Analysis of NodeThe following threeindices are applied to the centrality analysis of eachnode
(i) Degree Centrality Analysis(ii) Eigenvector Centrality Analysis(iii) Betweenness Centrality Analysis
4 Computational and Mathematical Methods in Medicine
Step 1Node
characteristicclassification
Eigenvectorcentrality analysis
of node
Betweenesscentrality analysis
of node
Degree centrality analysis of node
Step 2
1st order risk analysis
Step 3
2nd order risk analysis
Step 4
Distribution site risk analysis
Exploit site risk analysis
Weight value
calculation
Crawling DB
MRI estimationStep 5
Figure 3 Entire analysis diagram for risk estimation of malicious code exploitlandingdistribution site
A Degree Centrality Index (DCI)
(i) A node that has more directly connectedneighboring nodes has higher degree cen-trality The scale of direct effects is mea-sured
(ii) Degree centrality is calculated from thecomposition ratio of each node
DCI =sum (weight of incedent link)
of nodes minus 1
Time complexity 119874 (119899) (1)
B Eigenvector Centrality Index (ECI)
(i) Assume that the number of the linksincluded in node 119873119895 is 119897119895 If one of theselinks is connected to node119873119894 the probabil-ity that 119873119895 passes 119873119894 is 1119897119895 Therefore theultimate ECI is as follows
ECI = 119868 (119873119894) =sum 119868 (119873119895)
119897119895
(2)
C Betweenness Centrality Index (BCI)
(i) To measure the BCI measure the degreeto which a node is located on the shortestroute between nodes
(ii) The betweenness centrality of a node ishigher if the node connects more differentnode groups The BCI indicates the degreeto which a node functions as a bridge in theentire network
(iii) It is possible to find the intermediate URLthat links information between fields
(iv) Suppose that 119892119895119896 is the shortest possibleroute between nodes 119895 and 119896 in the networkand 119892119895119896(119899119894) is the shortest possible routebetween nodes 119895 and 119896 that includes node119894 The probability of the shortest route thatincludes node 119894 is 119892119895119896(119899119894)119892119895119896
BCI = 119862119861 (119899119894) =sum119895lt119896 119892119895119896 (119899119894)
119892119895119896
(3)
If the main target node is constructed asa child node of depth 1 the degree willbe increased However the BCI will bedecreased by (3)
(3) Step 3 1st Order Risk Analysis The 1st order risk isestimated by calculating the Euclidean distance of thenode analysis result from Step 2 The 1st order risk isthus estimated by the vector distance formula for thevalues calculated in Step 2
1199031 =radicDCI2 + ECI2 + BCI2 (4)
Computational and Mathematical Methods in Medicine 5
(4) Step 4 2nd Order Risk Analysis
A Distribution Site Risk Analysis The risk indexis estimated by considering the weights (over-lapped infection history and survival ratio)based on the 1st order risk analysis The dis-tribution site risk is calculated by the vector ofthe values calculated in Step 3 the overlappedinfection history (119868) of each distribution sitenode and the actual survival ratio (119878)lowast Survival Ratio (119878) is as follows whether treat-ment has been given after infection (based onone yearrsquos information)
Treatment Probability (1198781)
=Survival Cases
Survival Cases + Treatesd Cases
Failure Probability (1198782)
=Treated Cases
Survival Cases + Treatesd Cases
1199032 = 1199031 times 119868 times 1198781 (If the node has been treated)
1199032 = 1199031 times 119868 times 1198782 (If the node has not been treated)
(5)
B ExploitLanding Site Risk Analysis The riskindex is estimated by considering the weights(overlapped infection history and exposure fre-quency) from the 1st order risk analysis Theexploit site risk is calculated by the vectorof values calculated in Step 3 the overlappedinfection history (119868) of each distribution sitenode and the actual exposure frequency in asearch website (119864)
1199033 = 1199031 times (2 times 119868 times 119864
119868 + 119864) or 1199033 = 1199031 times 119868 (6)
(5) Step 5 Malicious URL Risk Index (MRI) The MRI isestimated from the 1st order risk analysis result andthe risk indexThe following formula can be deducedfrom the 1st order risk analysis result calculated inStep 3 and the risk index of each distributionexploitsite calculated in Step 4 considering the characteris-tics of the corresponding node
119903final = radic11990321 + 11990322 + 11990323
(7)
4 Experimental Results
We conducted experiments to examine the performance ofour zero-day detection method based on MCC
For these experiments we processed the detection logacquired by crawling biomedical information system-relatedmalware sites with the developedMCC in the log form statedin Step 1
The estimated risk values are intuitive in our proposedmodel That is our final interpretation is based on the
crawling result Additionally the crawling method uses ablacklist or known patterns Thus our proposed modelexhibits a low false positive rate
The MCC detection method proceeds as follows Theattacker (hacker) inserts malicious code into a specific web-page by operating a malicious code distribution server on theinternet or by hacking a vulnerable web serverThe clients (orusers) of the web server involuntarily use the exploitlandingdistribution site containing themalicious code and downloadthe malicious code Eventually the attacker collects the clientaccounts and various other information from the infectedserver and proceeds to act maliciously
The proposed system searchescrawls 25 million siteson a continuous basis detectsblocks the inserted maliciouscode and establishesoperates a malicious code blacklist
41 Analysis Results As a post hoc study based on the resultsof the MCC operation for a specific period our results sup-port decision making for proactive responses and follow-upmeasures enabling biomedical information system securityexperts or administrators to maximize their operationalefficiency
Figure 4 shows the MRI estimated through the 1st and2nd order risk analysis after the detection of malicious URLs
Table 1 lists the detected malicious code exploitlandingdistribution URLs (including both exploitlanding sites anddistribution sites) The risk index is a relative value If limitedto the range 0-1 the minimum risk would be fixed at 0 but itis hard to set a clear standard for the maximum risk
In this paper we use a relative risk index that fixes theminimum risk to 0 and indicates the high-risk core malwaresites through prioritization
42 Sensitivity Analysis Thedetection rate of actual zero-dayattacks can be measured using a sensitivity analysis based onthe results given in Table 1 Among the malware sites relatedto zero-day attacks occurring to biomedical informationsystems we analyze distribution sites and exploitlandingsites Table 2 shows the detection rate measurements basedon actual data produced in a specific time window
The results in Table 2 focus on the top five high-risk sitesThe multipath malware site group denotes the number ofexploitlanding sites actually connected with a distributionsite The percentage represents the average detection rate ina specific time window and this detection performance isbetter than in the pre-analysis stage
The average early detection rate of distribution sites andexploitlanding sites is also higher in this section than in thepreanalysis stage That is the proactive elimination of coremaliciouswebsites results in an average improvement in zero-day attack detection of more than 20
43 Visualization of Analysis Results The risk index of eachURL calculated in this paper can be analyzed by verifyingwhether the risk index agrees with the weak point of thecorresponding server
This section analyzes the actual weak point based onthe calculated risk index and verifies whether this indexagrees with the actual prioritization using an error analysis
6 Computational and Mathematical Methods in Medicine
1st highest-risk landing (or exploit) site1st highest-risk distribution site
Figure 4 Visualization of malware site risk
Table 1 MRI estimation result of exploitlandingdistribution sites
Node type URL MRI ReliabilityDistribution site http222lowast lowast lowastlowast lowast lowastlowast lowast lowastchhtml 03965 91Distribution site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcomNewindexhtml 03505 92Distribution site httpa1lowast lowast lowast lowast lowast lowast lowast lowast lowastcom1indexhtml 03058 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 03047 95Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 03026 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 03017 94Distribution site httpa2lowast lowast lowast lowast lowast lowast lowast lowast lowastcom2indexhtml 03009 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastrekr 03003 92Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 02993 92Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02991 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02983 91Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02982 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorg 02970 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 02969 96Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcom 02968 95Exploit site httplowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02967 95Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02966 96Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 02966 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 02962 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcom 02961 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02960 94(ldquolowast lowast lowast lowast lowast lowast lowast lowast lowastrdquo the URL information of malware site)
Computational and Mathematical Methods in Medicine 7
Table 2 Average detection rate of zero-day attacks for a given day
Priority of risk Malware site groupwith multipath
Distribution site withsingle path
Landing (or exploit) sitewith single path
1 233 (15) 215 2822 226 (9) 316 3243 147 (8) 228 1814 184 (10) 197 3235 212 (12) 242 176Average early detection rate 2004 2396 2572
technique Figure 4 visualizes the 1st highest-risk distributionsite according to the MRI
The detection and elimination of high-risk maliciouscode exploitlandingdistribution sites related to biomedicalinformation systems can be achieved by visualizing the 1sthighest-risk exploitlandingdistribution site as shown inFigure 4Thus our proposedmodel focuses on estimating therisk presented by target malware sites in the specific field ofbiomedical information
We verify the performance of the proposed model basedon static analysis However for military government orsimilar organizations we must dynamically filter out coremalware sites based on high-performance hardware plat-forms For this reason our method is a good example of asuitable defensive measure for APT attacks
5 Related Work
Methods for detecting and analyzingwebsites includingmali-cious code can generally be divided into static and dynamicanalysis
51 Static Analysis Static analysis mainly uses machinelearning and pattern matching to detect and classify mali-cious URLs
Ma et al [14 15] presented a classification model thatdetects spam and phishingURLsThismodel uses a statisticalmethod to classify URLs by considering the lexical and host-based properties of malicious URLs Although this methoddetects both spam and phishing URLs it cannot distinguishbetween the two
Another approach is to analyze the JavaScript code inweb pages to find the typical features of malicious codeThis is done either statically [16] or dynamically by loadingthe affected pages in an emulated browser [17] Systemssuch as Prophiler [18] consider both JavaScript and otherfeatures found in HTML and the URLs of malicious pagesWhittaker et al [19] proposed a phishing website classifier toautomatically update Googlersquos phishing blacklist They usedseveral features obtained from domain information and pagecontents
JSAND [20] used amachine learning approach to classifymalicious JavaScript
52 Dynamic Analysis Dynamic analysis analyzes theserverndashclient connection to detect and classify maliciousURLs
In other words dynamic analysis relies on visiting web-siteswith an instrumented browser (often referred to as a hon-eyclient) and monitoring the activities of the machine to findthe typical signatures of successful exploitations (eg the cre-ation of a new process) [21] PhoneyC [22] uses a signature-based low-interaction honeypot to detect malicious websites
Systems such as [23 24] execute web content dynamicallyand capture drive-by downloads based on either signatures oranomaly detection while Blade [25] leverages user behaviormodels for drive-by download detection All of these systemsexhibit good detection results However it is usually costlyto follow the full redirection path and monitor each scriptexecution in real time Moreover their accuracy is highlydependent on the malicious response of the webpage tovulnerable components
Provos et al [26] analyzed the maliciousness of a largecollection of web pages using a machine learning algorithmas a prefilter for VM-based analysis They adopted content-based features including the presence of obfuscated JavaScriptand exploit site-pointing iframes
The main differences between the models proposed inthis paper and previous approaches are as follows
(i) The model proposed in this paper applies a staticmethod to analyze the connectivity between nodesand detects the core-hub node dynamically based onthe risk index
(ii) The proposed model detects and blocks the core-hubnode using link data from the high-risk maliciouswebsites as observed for a specific period of time
(iii) The proposed model prevents the dissemination ofmalicious websites in the early stages by blocking thelink between the core malicious code distribution siteand the exploitlanding site
6 Conclusion
In this paper the 1st order risk of malware infection wasanalyzed using log information estimated by an MCC thatconsiders the DCI BCI and ECI of the main nodes based onthe priority of risk This provides a quantitative value of thepotential risk inherent in the corresponding site (node)
In addition the risk index of exploit sites and distribu-tion sites was calculated by considering their weights Theoverlapped infection history and survival ratio were used toestimate the risk of distribution sites whereas the overlapped
8 Computational and Mathematical Methods in Medicine
infection history and exposure frequency were consideredwhen estimating the risk of exploit sites Finally the MRI wasestimated using the 1st order risk analysis and the risk indexof the distribution sites and exploit sites
In future work we will develop a feature model thatpredicts the seriousness of website security problems by data-mining the logs produced frommalicious code detection andvulnerability scanning tools
As this feature model will be used to predict the risk ofa specific website it should contribute to establish an activemalicious code distribution blocking system that realizesproactive responses beyond the limit of reactive responsesthat rely only on traditional malicious code detection tools
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] J A Hansen and N M Hansen ldquoA taxonomy of vulnerabilitiesin implantable medical devicesrdquo in Proceedings of the 2ndAnnualWorkshop on Security and Privacy inMedical andHome-Care Systems (SPIMACS rsquo10) pp 13ndash20 October 2010
[2] C-S Park ldquoSecurity mechanism based on hospital authen-tication server for secure application of implantable medicaldevicesrdquo BioMed Research International vol 2014 Article ID543051 12 pages 2014
[3] E Hutchins M Cloppert and R Amin ldquoIntelligence-drivencomputer network defense informed by analysis of adversarycampaigns and intrusion kill chainsrdquo in Proceedings of the 6thInternational Conference on Information Warfare and Security(ICIW rsquo11) pp 113ndash125 Academic Conferences March 2011
[4] N Moran ldquoUnderstanding Advanced Persistent ThreatsmdashACase Studyrdquo 2010 httpswwwusenixorgsystemfilesloginarticles105484-Moranpdf
[5] S-J Kim D-E Cho and S-S Yeo ldquoSecure model againstAPT in m-connected SCADA networkrdquo International Journalof Distributed Sensor Networks vol 2014 Article ID 594652 8pages 2014
[6] N Provos P Mavrommatis M Abu Rajab and F MonroseldquoAll your iframes points to usrdquo in Proceedings of the USENIXSecurity 2008
[7] S Lee and J Kim ldquoWARNINGBIRD detecting suspiciousURLsin twitter streamrdquo in Proceedings of the Symposium on Networkand Distributed System Security (NDSS rsquo12) 2012
[8] X Sun Y Wang J Ren Y Zhu and S Liu ldquoCollectinginternet malware based on client-side honeypotrdquo in Proceedingsof the 9th International Conference for YoungComputer Scientists(ICYCS rsquo08) pp 1493ndash1498 Hunan China November 2008
[9] Y-C Cho and J-Y Pan ldquoMultiple-feature extracting modulesbased leak mining system designrdquoThe Scientific World Journalvol 2013 Article ID 704865 11 pages 2013
[10] D H Kim Y-G Kim H P In and H C Jeong ldquoA method forrisk measurement of botnetrsquos malicious activitiesrdquo InformationJournal vol 17 no 1 pp 165ndash180 2014
[11] C Ni C Sugimoto and J Jiang ldquoDegree closeness andbetweenness application of group centrality measurements toexplore macro-disciplinary evolution diachronicallyrdquo in Pro-ceedings of the ISSI pp 1ndash13 Durban South Africa 2011
[12] F Barzinpour B Hoda Ali-Ahmadi S Alizadeh and S G JalaliNaini ldquoClustering networksrsquo heterogeneous data in defining acomprehensive closeness centrality indexrdquoMathematical Prob-lems in Engineering vol 2014 Article ID 202350 10 pages 2014
[13] S K Raghavan Unnithan B Kannan and M JathavedanldquoBetweenness centrality in Some classes of graphsrdquo Interna-tional Journal of Combinatorics vol 2014 Article ID 241723 12pages 2014
[14] J Ma L K Saul S Savage and G M Voelker ldquoBeyondblacklists learning to detectmaliciousweb sites from suspiciousURLsrdquo in Proceedings of the 15th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo09) pp 1245ndash1253 July 2009
[15] J Ma L K Saul S Savage and G M Voelker ldquoIdentifyingsuspicious URLs an application of large-scale online learningrdquoin Proceedings of the 26th Annual International Conference onMachine Learning (ICML rsquo09) pp 681ndash688 2009
[16] C Curtsinger B Livshits B Zorn and C Seifert ldquoZozzlelow-overhead mostly static javascript malware detectionrdquo inProceedings of the USENIX Security Symposium 2011
[17] M Cova C Kruegel and G Vigna ldquoDetection and analysis ofdrive-by-download attacks and malicious JavaScript coderdquo inProceedings of the 19th International World Wide Web Confer-ence (WWW rsquo10) pp 281ndash290 April 2010
[18] D Canali M Cova G Vigna and C Kruegel ldquoProphiler a fastfilter for the large-scale detection of malicious web pagesrdquo inProceedings of the 20th International Conference on World WideWeb (WWW rsquo11) pp 197ndash206 2011
[19] C Whittaker B Ryner and M Nazif ldquoLarge-scale automaticclassification of phishing pagesrdquo in Proceedings of the Sympo-sium on Network and Distributed System Security (NDSS rsquo10)2010
[20] P Agten S van Acker Y Brondsema P H Phung L Desmetand F Piessens ldquoJSand complete client-side sandboxing ofthird-party JavaScript without browser modificationsrdquo in Pro-ceedings of the 28th Annual Computer Security ApplicationsConference (ACSAC rsquo12) pp 1ndash10 ACM December 2012
[21] C Seifert I Welch and P Komisarczuk ldquoHoneyc the low-interaction client honeypotrdquo in Proceedings of the New ZealandComputer Science Research Student Conference (NZCSRCS rsquo07)Hamilton New Zealand 2007
[22] N Jose ldquoPhoneyC a virtual client honeypotrdquo in Proceedingsof the 2nd USENIX Conference on Large-Scale Exploits andEmergentThreats Botnets SpywareWorms andMore USENIXAssociation Berkeley Calif USA April 2009
[23] Y-MWang D Beck X Jiang et al ldquoAutomatedweb patrol withstrider honeymonkeysrdquo in Proceedings of the 2006 Network andDistributed System Security Symposium February 2006
[24] The Honeynet Project Capture-HPC client honeypot 2008httpprojectshoneynetorgcapture-hpc
[25] L Lu V Yegneswaran P Porras and W Lee ldquoBlade an attack-agnostic approach for preventing drive-by malware infectionsrdquoin Proceedings of the 17th ACM Conference on Computer andCommunications Security (CCS rsquo10) pp 440ndash450 ACM Octo-ber 2010
[26] N Provos P Mavrommatis M A Rajab and F Monrose ldquoAllyour iFRAMEs point to usrdquo in Proceedings of the 17th USENIXSecurity Symposium pp 1ndash15 2008
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
Computational and Mathematical Methods in Medicine 3
MRI
V1
V2
y (betweenness centrality)
z (degree centrality)
x (eigenvector centrality)
O
MRI998400
Figure 2 Entire analysis diagram for malicious code landing (orexploit)distribution site risk estimation
We fix the three coordinate axes 119909 119910 and 119911 set thepositive direction of the 119909 119910 and 119911 axes and then define thelength scale
As shown in Figure 2 three vectors (connectivity eigen-vector and betweenness) are used to estimate the risk indexof malicious code landing (or exploit)distribution sites andthe length is indicated by the vector sum [10] The purpose isto indicate different vector values as lengths and then quantifythe risk index through this
We thus determine which sites have the highest-riskindex and find the significance-based concentration degreeof the corresponding sites by analyzing the central structureof the exploitlandingdistribution sites within maliciouscode that is connected to medical information systemsTo interpret various meanings more objectively this paperrepresents a risk factor and estimates the ultimate risk indexby analyzing the connectivity [11ndash13] degree eigenvector andbetweenness of the distribution site and exploitlanding siteand vectorizing the calculated value We now define eachelement of the risk index for the detected malicious codeexploitlandingdistribution sites
(i) Degree Centrality Analysis of NodesThis is defined asthe number of links incident upon a nodeThe degreecan be interpreted in terms of the immediate risk ofa node catching whatever is flowing through the net-work (such as malware sites) In the case of a directednetwork (where ties have direction) we usually definetwo separate measures of degree centrality namelythe in-degree and out-degree centrality
(ii) Eigenvector Centrality Analysis of Nodes This mea-sures the influence of a node within a networkRelative scores are assigned to all nodes in thenetwork based on the concept that connections tohigh-scoring nodes contribute more to the score ofthe node in question than equal connections to low-scoring nodes
(iii) Betweenness Centrality Analysis of Nodes This is thenumber of shortest paths from all vertices to allothers that pass through that node A node withhigh betweenness centrality has a large influence onthe transfer of items through the network underthe assumption that the transfer of items follows theshortest path
32 Method to Estimate Malicious URL Risk Index (MRI) Toestimate the risk index of theURLof amalicious code exploitlandingdistribution site we follow the process in Figure 3
(1) Step 1 Node Characteristic Classification Landing (orexploit)distribution site information is classified bythe logs produced through the self-developed mali-cious code detection crawler and the detectionhistory is sorted by time from the unit logs of themali-cious code exploitlandingdistribution siteThe basicrisk is also estimated with the following log informa-tion
A Node Characteristic Whether the infected siteis an exploitlanding site or a distribution siteis confirmed If there is no link to the detectedmalicious code (ie the information on thefirst infected site) the site is defined as adistribution site If the URL of another site isexploiteddistributed the site is defined as anexploitlanding site
B Malicious Code ExploitLandingDistribution SiteInformationThis is theURLof the detectedmali-cious code exploitlandingdistribution siteTheexploitlanding site can be the distributionsite If the distribution site is eliminated by aself-developed or other detection system theexploitlanding site is rendered as the distribu-tion site and operated continuously as a mali-cious code distribution site
C IP Address Country Code amp Site SurvivabilityBasic information is collected through the IPaddress and the related server location and thecurrent operating status is investigated In par-ticular the survivability of the exploitlandingdistribution site is very important in estimatingthe risk index Although the site has beentreated or isolated and is no longer operated thepossibility of reinfection exists if the weak pointis exposed continuously Therefore this shouldbe reflected in the risk index estimation
(2) Step 2 Centrality Analysis of NodeThe following threeindices are applied to the centrality analysis of eachnode
(i) Degree Centrality Analysis(ii) Eigenvector Centrality Analysis(iii) Betweenness Centrality Analysis
4 Computational and Mathematical Methods in Medicine
Step 1Node
characteristicclassification
Eigenvectorcentrality analysis
of node
Betweenesscentrality analysis
of node
Degree centrality analysis of node
Step 2
1st order risk analysis
Step 3
2nd order risk analysis
Step 4
Distribution site risk analysis
Exploit site risk analysis
Weight value
calculation
Crawling DB
MRI estimationStep 5
Figure 3 Entire analysis diagram for risk estimation of malicious code exploitlandingdistribution site
A Degree Centrality Index (DCI)
(i) A node that has more directly connectedneighboring nodes has higher degree cen-trality The scale of direct effects is mea-sured
(ii) Degree centrality is calculated from thecomposition ratio of each node
DCI =sum (weight of incedent link)
of nodes minus 1
Time complexity 119874 (119899) (1)
B Eigenvector Centrality Index (ECI)
(i) Assume that the number of the linksincluded in node 119873119895 is 119897119895 If one of theselinks is connected to node119873119894 the probabil-ity that 119873119895 passes 119873119894 is 1119897119895 Therefore theultimate ECI is as follows
ECI = 119868 (119873119894) =sum 119868 (119873119895)
119897119895
(2)
C Betweenness Centrality Index (BCI)
(i) To measure the BCI measure the degreeto which a node is located on the shortestroute between nodes
(ii) The betweenness centrality of a node ishigher if the node connects more differentnode groups The BCI indicates the degreeto which a node functions as a bridge in theentire network
(iii) It is possible to find the intermediate URLthat links information between fields
(iv) Suppose that 119892119895119896 is the shortest possibleroute between nodes 119895 and 119896 in the networkand 119892119895119896(119899119894) is the shortest possible routebetween nodes 119895 and 119896 that includes node119894 The probability of the shortest route thatincludes node 119894 is 119892119895119896(119899119894)119892119895119896
BCI = 119862119861 (119899119894) =sum119895lt119896 119892119895119896 (119899119894)
119892119895119896
(3)
If the main target node is constructed asa child node of depth 1 the degree willbe increased However the BCI will bedecreased by (3)
(3) Step 3 1st Order Risk Analysis The 1st order risk isestimated by calculating the Euclidean distance of thenode analysis result from Step 2 The 1st order risk isthus estimated by the vector distance formula for thevalues calculated in Step 2
1199031 =radicDCI2 + ECI2 + BCI2 (4)
Computational and Mathematical Methods in Medicine 5
(4) Step 4 2nd Order Risk Analysis
A Distribution Site Risk Analysis The risk indexis estimated by considering the weights (over-lapped infection history and survival ratio)based on the 1st order risk analysis The dis-tribution site risk is calculated by the vector ofthe values calculated in Step 3 the overlappedinfection history (119868) of each distribution sitenode and the actual survival ratio (119878)lowast Survival Ratio (119878) is as follows whether treat-ment has been given after infection (based onone yearrsquos information)
Treatment Probability (1198781)
=Survival Cases
Survival Cases + Treatesd Cases
Failure Probability (1198782)
=Treated Cases
Survival Cases + Treatesd Cases
1199032 = 1199031 times 119868 times 1198781 (If the node has been treated)
1199032 = 1199031 times 119868 times 1198782 (If the node has not been treated)
(5)
B ExploitLanding Site Risk Analysis The riskindex is estimated by considering the weights(overlapped infection history and exposure fre-quency) from the 1st order risk analysis Theexploit site risk is calculated by the vectorof values calculated in Step 3 the overlappedinfection history (119868) of each distribution sitenode and the actual exposure frequency in asearch website (119864)
1199033 = 1199031 times (2 times 119868 times 119864
119868 + 119864) or 1199033 = 1199031 times 119868 (6)
(5) Step 5 Malicious URL Risk Index (MRI) The MRI isestimated from the 1st order risk analysis result andthe risk indexThe following formula can be deducedfrom the 1st order risk analysis result calculated inStep 3 and the risk index of each distributionexploitsite calculated in Step 4 considering the characteris-tics of the corresponding node
119903final = radic11990321 + 11990322 + 11990323
(7)
4 Experimental Results
We conducted experiments to examine the performance ofour zero-day detection method based on MCC
For these experiments we processed the detection logacquired by crawling biomedical information system-relatedmalware sites with the developedMCC in the log form statedin Step 1
The estimated risk values are intuitive in our proposedmodel That is our final interpretation is based on the
crawling result Additionally the crawling method uses ablacklist or known patterns Thus our proposed modelexhibits a low false positive rate
The MCC detection method proceeds as follows Theattacker (hacker) inserts malicious code into a specific web-page by operating a malicious code distribution server on theinternet or by hacking a vulnerable web serverThe clients (orusers) of the web server involuntarily use the exploitlandingdistribution site containing themalicious code and downloadthe malicious code Eventually the attacker collects the clientaccounts and various other information from the infectedserver and proceeds to act maliciously
The proposed system searchescrawls 25 million siteson a continuous basis detectsblocks the inserted maliciouscode and establishesoperates a malicious code blacklist
41 Analysis Results As a post hoc study based on the resultsof the MCC operation for a specific period our results sup-port decision making for proactive responses and follow-upmeasures enabling biomedical information system securityexperts or administrators to maximize their operationalefficiency
Figure 4 shows the MRI estimated through the 1st and2nd order risk analysis after the detection of malicious URLs
Table 1 lists the detected malicious code exploitlandingdistribution URLs (including both exploitlanding sites anddistribution sites) The risk index is a relative value If limitedto the range 0-1 the minimum risk would be fixed at 0 but itis hard to set a clear standard for the maximum risk
In this paper we use a relative risk index that fixes theminimum risk to 0 and indicates the high-risk core malwaresites through prioritization
42 Sensitivity Analysis Thedetection rate of actual zero-dayattacks can be measured using a sensitivity analysis based onthe results given in Table 1 Among the malware sites relatedto zero-day attacks occurring to biomedical informationsystems we analyze distribution sites and exploitlandingsites Table 2 shows the detection rate measurements basedon actual data produced in a specific time window
The results in Table 2 focus on the top five high-risk sitesThe multipath malware site group denotes the number ofexploitlanding sites actually connected with a distributionsite The percentage represents the average detection rate ina specific time window and this detection performance isbetter than in the pre-analysis stage
The average early detection rate of distribution sites andexploitlanding sites is also higher in this section than in thepreanalysis stage That is the proactive elimination of coremaliciouswebsites results in an average improvement in zero-day attack detection of more than 20
43 Visualization of Analysis Results The risk index of eachURL calculated in this paper can be analyzed by verifyingwhether the risk index agrees with the weak point of thecorresponding server
This section analyzes the actual weak point based onthe calculated risk index and verifies whether this indexagrees with the actual prioritization using an error analysis
6 Computational and Mathematical Methods in Medicine
1st highest-risk landing (or exploit) site1st highest-risk distribution site
Figure 4 Visualization of malware site risk
Table 1 MRI estimation result of exploitlandingdistribution sites
Node type URL MRI ReliabilityDistribution site http222lowast lowast lowastlowast lowast lowastlowast lowast lowastchhtml 03965 91Distribution site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcomNewindexhtml 03505 92Distribution site httpa1lowast lowast lowast lowast lowast lowast lowast lowast lowastcom1indexhtml 03058 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 03047 95Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 03026 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 03017 94Distribution site httpa2lowast lowast lowast lowast lowast lowast lowast lowast lowastcom2indexhtml 03009 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastrekr 03003 92Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 02993 92Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02991 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02983 91Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02982 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorg 02970 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 02969 96Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcom 02968 95Exploit site httplowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02967 95Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02966 96Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 02966 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 02962 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcom 02961 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02960 94(ldquolowast lowast lowast lowast lowast lowast lowast lowast lowastrdquo the URL information of malware site)
Computational and Mathematical Methods in Medicine 7
Table 2 Average detection rate of zero-day attacks for a given day
Priority of risk Malware site groupwith multipath
Distribution site withsingle path
Landing (or exploit) sitewith single path
1 233 (15) 215 2822 226 (9) 316 3243 147 (8) 228 1814 184 (10) 197 3235 212 (12) 242 176Average early detection rate 2004 2396 2572
technique Figure 4 visualizes the 1st highest-risk distributionsite according to the MRI
The detection and elimination of high-risk maliciouscode exploitlandingdistribution sites related to biomedicalinformation systems can be achieved by visualizing the 1sthighest-risk exploitlandingdistribution site as shown inFigure 4Thus our proposedmodel focuses on estimating therisk presented by target malware sites in the specific field ofbiomedical information
We verify the performance of the proposed model basedon static analysis However for military government orsimilar organizations we must dynamically filter out coremalware sites based on high-performance hardware plat-forms For this reason our method is a good example of asuitable defensive measure for APT attacks
5 Related Work
Methods for detecting and analyzingwebsites includingmali-cious code can generally be divided into static and dynamicanalysis
51 Static Analysis Static analysis mainly uses machinelearning and pattern matching to detect and classify mali-cious URLs
Ma et al [14 15] presented a classification model thatdetects spam and phishingURLsThismodel uses a statisticalmethod to classify URLs by considering the lexical and host-based properties of malicious URLs Although this methoddetects both spam and phishing URLs it cannot distinguishbetween the two
Another approach is to analyze the JavaScript code inweb pages to find the typical features of malicious codeThis is done either statically [16] or dynamically by loadingthe affected pages in an emulated browser [17] Systemssuch as Prophiler [18] consider both JavaScript and otherfeatures found in HTML and the URLs of malicious pagesWhittaker et al [19] proposed a phishing website classifier toautomatically update Googlersquos phishing blacklist They usedseveral features obtained from domain information and pagecontents
JSAND [20] used amachine learning approach to classifymalicious JavaScript
52 Dynamic Analysis Dynamic analysis analyzes theserverndashclient connection to detect and classify maliciousURLs
In other words dynamic analysis relies on visiting web-siteswith an instrumented browser (often referred to as a hon-eyclient) and monitoring the activities of the machine to findthe typical signatures of successful exploitations (eg the cre-ation of a new process) [21] PhoneyC [22] uses a signature-based low-interaction honeypot to detect malicious websites
Systems such as [23 24] execute web content dynamicallyand capture drive-by downloads based on either signatures oranomaly detection while Blade [25] leverages user behaviormodels for drive-by download detection All of these systemsexhibit good detection results However it is usually costlyto follow the full redirection path and monitor each scriptexecution in real time Moreover their accuracy is highlydependent on the malicious response of the webpage tovulnerable components
Provos et al [26] analyzed the maliciousness of a largecollection of web pages using a machine learning algorithmas a prefilter for VM-based analysis They adopted content-based features including the presence of obfuscated JavaScriptand exploit site-pointing iframes
The main differences between the models proposed inthis paper and previous approaches are as follows
(i) The model proposed in this paper applies a staticmethod to analyze the connectivity between nodesand detects the core-hub node dynamically based onthe risk index
(ii) The proposed model detects and blocks the core-hubnode using link data from the high-risk maliciouswebsites as observed for a specific period of time
(iii) The proposed model prevents the dissemination ofmalicious websites in the early stages by blocking thelink between the core malicious code distribution siteand the exploitlanding site
6 Conclusion
In this paper the 1st order risk of malware infection wasanalyzed using log information estimated by an MCC thatconsiders the DCI BCI and ECI of the main nodes based onthe priority of risk This provides a quantitative value of thepotential risk inherent in the corresponding site (node)
In addition the risk index of exploit sites and distribu-tion sites was calculated by considering their weights Theoverlapped infection history and survival ratio were used toestimate the risk of distribution sites whereas the overlapped
8 Computational and Mathematical Methods in Medicine
infection history and exposure frequency were consideredwhen estimating the risk of exploit sites Finally the MRI wasestimated using the 1st order risk analysis and the risk indexof the distribution sites and exploit sites
In future work we will develop a feature model thatpredicts the seriousness of website security problems by data-mining the logs produced frommalicious code detection andvulnerability scanning tools
As this feature model will be used to predict the risk ofa specific website it should contribute to establish an activemalicious code distribution blocking system that realizesproactive responses beyond the limit of reactive responsesthat rely only on traditional malicious code detection tools
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] J A Hansen and N M Hansen ldquoA taxonomy of vulnerabilitiesin implantable medical devicesrdquo in Proceedings of the 2ndAnnualWorkshop on Security and Privacy inMedical andHome-Care Systems (SPIMACS rsquo10) pp 13ndash20 October 2010
[2] C-S Park ldquoSecurity mechanism based on hospital authen-tication server for secure application of implantable medicaldevicesrdquo BioMed Research International vol 2014 Article ID543051 12 pages 2014
[3] E Hutchins M Cloppert and R Amin ldquoIntelligence-drivencomputer network defense informed by analysis of adversarycampaigns and intrusion kill chainsrdquo in Proceedings of the 6thInternational Conference on Information Warfare and Security(ICIW rsquo11) pp 113ndash125 Academic Conferences March 2011
[4] N Moran ldquoUnderstanding Advanced Persistent ThreatsmdashACase Studyrdquo 2010 httpswwwusenixorgsystemfilesloginarticles105484-Moranpdf
[5] S-J Kim D-E Cho and S-S Yeo ldquoSecure model againstAPT in m-connected SCADA networkrdquo International Journalof Distributed Sensor Networks vol 2014 Article ID 594652 8pages 2014
[6] N Provos P Mavrommatis M Abu Rajab and F MonroseldquoAll your iframes points to usrdquo in Proceedings of the USENIXSecurity 2008
[7] S Lee and J Kim ldquoWARNINGBIRD detecting suspiciousURLsin twitter streamrdquo in Proceedings of the Symposium on Networkand Distributed System Security (NDSS rsquo12) 2012
[8] X Sun Y Wang J Ren Y Zhu and S Liu ldquoCollectinginternet malware based on client-side honeypotrdquo in Proceedingsof the 9th International Conference for YoungComputer Scientists(ICYCS rsquo08) pp 1493ndash1498 Hunan China November 2008
[9] Y-C Cho and J-Y Pan ldquoMultiple-feature extracting modulesbased leak mining system designrdquoThe Scientific World Journalvol 2013 Article ID 704865 11 pages 2013
[10] D H Kim Y-G Kim H P In and H C Jeong ldquoA method forrisk measurement of botnetrsquos malicious activitiesrdquo InformationJournal vol 17 no 1 pp 165ndash180 2014
[11] C Ni C Sugimoto and J Jiang ldquoDegree closeness andbetweenness application of group centrality measurements toexplore macro-disciplinary evolution diachronicallyrdquo in Pro-ceedings of the ISSI pp 1ndash13 Durban South Africa 2011
[12] F Barzinpour B Hoda Ali-Ahmadi S Alizadeh and S G JalaliNaini ldquoClustering networksrsquo heterogeneous data in defining acomprehensive closeness centrality indexrdquoMathematical Prob-lems in Engineering vol 2014 Article ID 202350 10 pages 2014
[13] S K Raghavan Unnithan B Kannan and M JathavedanldquoBetweenness centrality in Some classes of graphsrdquo Interna-tional Journal of Combinatorics vol 2014 Article ID 241723 12pages 2014
[14] J Ma L K Saul S Savage and G M Voelker ldquoBeyondblacklists learning to detectmaliciousweb sites from suspiciousURLsrdquo in Proceedings of the 15th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo09) pp 1245ndash1253 July 2009
[15] J Ma L K Saul S Savage and G M Voelker ldquoIdentifyingsuspicious URLs an application of large-scale online learningrdquoin Proceedings of the 26th Annual International Conference onMachine Learning (ICML rsquo09) pp 681ndash688 2009
[16] C Curtsinger B Livshits B Zorn and C Seifert ldquoZozzlelow-overhead mostly static javascript malware detectionrdquo inProceedings of the USENIX Security Symposium 2011
[17] M Cova C Kruegel and G Vigna ldquoDetection and analysis ofdrive-by-download attacks and malicious JavaScript coderdquo inProceedings of the 19th International World Wide Web Confer-ence (WWW rsquo10) pp 281ndash290 April 2010
[18] D Canali M Cova G Vigna and C Kruegel ldquoProphiler a fastfilter for the large-scale detection of malicious web pagesrdquo inProceedings of the 20th International Conference on World WideWeb (WWW rsquo11) pp 197ndash206 2011
[19] C Whittaker B Ryner and M Nazif ldquoLarge-scale automaticclassification of phishing pagesrdquo in Proceedings of the Sympo-sium on Network and Distributed System Security (NDSS rsquo10)2010
[20] P Agten S van Acker Y Brondsema P H Phung L Desmetand F Piessens ldquoJSand complete client-side sandboxing ofthird-party JavaScript without browser modificationsrdquo in Pro-ceedings of the 28th Annual Computer Security ApplicationsConference (ACSAC rsquo12) pp 1ndash10 ACM December 2012
[21] C Seifert I Welch and P Komisarczuk ldquoHoneyc the low-interaction client honeypotrdquo in Proceedings of the New ZealandComputer Science Research Student Conference (NZCSRCS rsquo07)Hamilton New Zealand 2007
[22] N Jose ldquoPhoneyC a virtual client honeypotrdquo in Proceedingsof the 2nd USENIX Conference on Large-Scale Exploits andEmergentThreats Botnets SpywareWorms andMore USENIXAssociation Berkeley Calif USA April 2009
[23] Y-MWang D Beck X Jiang et al ldquoAutomatedweb patrol withstrider honeymonkeysrdquo in Proceedings of the 2006 Network andDistributed System Security Symposium February 2006
[24] The Honeynet Project Capture-HPC client honeypot 2008httpprojectshoneynetorgcapture-hpc
[25] L Lu V Yegneswaran P Porras and W Lee ldquoBlade an attack-agnostic approach for preventing drive-by malware infectionsrdquoin Proceedings of the 17th ACM Conference on Computer andCommunications Security (CCS rsquo10) pp 440ndash450 ACM Octo-ber 2010
[26] N Provos P Mavrommatis M A Rajab and F Monrose ldquoAllyour iFRAMEs point to usrdquo in Proceedings of the 17th USENIXSecurity Symposium pp 1ndash15 2008
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
4 Computational and Mathematical Methods in Medicine
Step 1Node
characteristicclassification
Eigenvectorcentrality analysis
of node
Betweenesscentrality analysis
of node
Degree centrality analysis of node
Step 2
1st order risk analysis
Step 3
2nd order risk analysis
Step 4
Distribution site risk analysis
Exploit site risk analysis
Weight value
calculation
Crawling DB
MRI estimationStep 5
Figure 3 Entire analysis diagram for risk estimation of malicious code exploitlandingdistribution site
A Degree Centrality Index (DCI)
(i) A node that has more directly connectedneighboring nodes has higher degree cen-trality The scale of direct effects is mea-sured
(ii) Degree centrality is calculated from thecomposition ratio of each node
DCI =sum (weight of incedent link)
of nodes minus 1
Time complexity 119874 (119899) (1)
B Eigenvector Centrality Index (ECI)
(i) Assume that the number of the linksincluded in node 119873119895 is 119897119895 If one of theselinks is connected to node119873119894 the probabil-ity that 119873119895 passes 119873119894 is 1119897119895 Therefore theultimate ECI is as follows
ECI = 119868 (119873119894) =sum 119868 (119873119895)
119897119895
(2)
C Betweenness Centrality Index (BCI)
(i) To measure the BCI measure the degreeto which a node is located on the shortestroute between nodes
(ii) The betweenness centrality of a node ishigher if the node connects more differentnode groups The BCI indicates the degreeto which a node functions as a bridge in theentire network
(iii) It is possible to find the intermediate URLthat links information between fields
(iv) Suppose that 119892119895119896 is the shortest possibleroute between nodes 119895 and 119896 in the networkand 119892119895119896(119899119894) is the shortest possible routebetween nodes 119895 and 119896 that includes node119894 The probability of the shortest route thatincludes node 119894 is 119892119895119896(119899119894)119892119895119896
BCI = 119862119861 (119899119894) =sum119895lt119896 119892119895119896 (119899119894)
119892119895119896
(3)
If the main target node is constructed asa child node of depth 1 the degree willbe increased However the BCI will bedecreased by (3)
(3) Step 3 1st Order Risk Analysis The 1st order risk isestimated by calculating the Euclidean distance of thenode analysis result from Step 2 The 1st order risk isthus estimated by the vector distance formula for thevalues calculated in Step 2
1199031 =radicDCI2 + ECI2 + BCI2 (4)
Computational and Mathematical Methods in Medicine 5
(4) Step 4 2nd Order Risk Analysis
A Distribution Site Risk Analysis The risk indexis estimated by considering the weights (over-lapped infection history and survival ratio)based on the 1st order risk analysis The dis-tribution site risk is calculated by the vector ofthe values calculated in Step 3 the overlappedinfection history (119868) of each distribution sitenode and the actual survival ratio (119878)lowast Survival Ratio (119878) is as follows whether treat-ment has been given after infection (based onone yearrsquos information)
Treatment Probability (1198781)
=Survival Cases
Survival Cases + Treatesd Cases
Failure Probability (1198782)
=Treated Cases
Survival Cases + Treatesd Cases
1199032 = 1199031 times 119868 times 1198781 (If the node has been treated)
1199032 = 1199031 times 119868 times 1198782 (If the node has not been treated)
(5)
B ExploitLanding Site Risk Analysis The riskindex is estimated by considering the weights(overlapped infection history and exposure fre-quency) from the 1st order risk analysis Theexploit site risk is calculated by the vectorof values calculated in Step 3 the overlappedinfection history (119868) of each distribution sitenode and the actual exposure frequency in asearch website (119864)
1199033 = 1199031 times (2 times 119868 times 119864
119868 + 119864) or 1199033 = 1199031 times 119868 (6)
(5) Step 5 Malicious URL Risk Index (MRI) The MRI isestimated from the 1st order risk analysis result andthe risk indexThe following formula can be deducedfrom the 1st order risk analysis result calculated inStep 3 and the risk index of each distributionexploitsite calculated in Step 4 considering the characteris-tics of the corresponding node
119903final = radic11990321 + 11990322 + 11990323
(7)
4 Experimental Results
We conducted experiments to examine the performance ofour zero-day detection method based on MCC
For these experiments we processed the detection logacquired by crawling biomedical information system-relatedmalware sites with the developedMCC in the log form statedin Step 1
The estimated risk values are intuitive in our proposedmodel That is our final interpretation is based on the
crawling result Additionally the crawling method uses ablacklist or known patterns Thus our proposed modelexhibits a low false positive rate
The MCC detection method proceeds as follows Theattacker (hacker) inserts malicious code into a specific web-page by operating a malicious code distribution server on theinternet or by hacking a vulnerable web serverThe clients (orusers) of the web server involuntarily use the exploitlandingdistribution site containing themalicious code and downloadthe malicious code Eventually the attacker collects the clientaccounts and various other information from the infectedserver and proceeds to act maliciously
The proposed system searchescrawls 25 million siteson a continuous basis detectsblocks the inserted maliciouscode and establishesoperates a malicious code blacklist
41 Analysis Results As a post hoc study based on the resultsof the MCC operation for a specific period our results sup-port decision making for proactive responses and follow-upmeasures enabling biomedical information system securityexperts or administrators to maximize their operationalefficiency
Figure 4 shows the MRI estimated through the 1st and2nd order risk analysis after the detection of malicious URLs
Table 1 lists the detected malicious code exploitlandingdistribution URLs (including both exploitlanding sites anddistribution sites) The risk index is a relative value If limitedto the range 0-1 the minimum risk would be fixed at 0 but itis hard to set a clear standard for the maximum risk
In this paper we use a relative risk index that fixes theminimum risk to 0 and indicates the high-risk core malwaresites through prioritization
42 Sensitivity Analysis Thedetection rate of actual zero-dayattacks can be measured using a sensitivity analysis based onthe results given in Table 1 Among the malware sites relatedto zero-day attacks occurring to biomedical informationsystems we analyze distribution sites and exploitlandingsites Table 2 shows the detection rate measurements basedon actual data produced in a specific time window
The results in Table 2 focus on the top five high-risk sitesThe multipath malware site group denotes the number ofexploitlanding sites actually connected with a distributionsite The percentage represents the average detection rate ina specific time window and this detection performance isbetter than in the pre-analysis stage
The average early detection rate of distribution sites andexploitlanding sites is also higher in this section than in thepreanalysis stage That is the proactive elimination of coremaliciouswebsites results in an average improvement in zero-day attack detection of more than 20
43 Visualization of Analysis Results The risk index of eachURL calculated in this paper can be analyzed by verifyingwhether the risk index agrees with the weak point of thecorresponding server
This section analyzes the actual weak point based onthe calculated risk index and verifies whether this indexagrees with the actual prioritization using an error analysis
6 Computational and Mathematical Methods in Medicine
1st highest-risk landing (or exploit) site1st highest-risk distribution site
Figure 4 Visualization of malware site risk
Table 1 MRI estimation result of exploitlandingdistribution sites
Node type URL MRI ReliabilityDistribution site http222lowast lowast lowastlowast lowast lowastlowast lowast lowastchhtml 03965 91Distribution site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcomNewindexhtml 03505 92Distribution site httpa1lowast lowast lowast lowast lowast lowast lowast lowast lowastcom1indexhtml 03058 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 03047 95Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 03026 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 03017 94Distribution site httpa2lowast lowast lowast lowast lowast lowast lowast lowast lowastcom2indexhtml 03009 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastrekr 03003 92Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 02993 92Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02991 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02983 91Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02982 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorg 02970 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 02969 96Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcom 02968 95Exploit site httplowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02967 95Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02966 96Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 02966 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 02962 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcom 02961 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02960 94(ldquolowast lowast lowast lowast lowast lowast lowast lowast lowastrdquo the URL information of malware site)
Computational and Mathematical Methods in Medicine 7
Table 2 Average detection rate of zero-day attacks for a given day
Priority of risk Malware site groupwith multipath
Distribution site withsingle path
Landing (or exploit) sitewith single path
1 233 (15) 215 2822 226 (9) 316 3243 147 (8) 228 1814 184 (10) 197 3235 212 (12) 242 176Average early detection rate 2004 2396 2572
technique Figure 4 visualizes the 1st highest-risk distributionsite according to the MRI
The detection and elimination of high-risk maliciouscode exploitlandingdistribution sites related to biomedicalinformation systems can be achieved by visualizing the 1sthighest-risk exploitlandingdistribution site as shown inFigure 4Thus our proposedmodel focuses on estimating therisk presented by target malware sites in the specific field ofbiomedical information
We verify the performance of the proposed model basedon static analysis However for military government orsimilar organizations we must dynamically filter out coremalware sites based on high-performance hardware plat-forms For this reason our method is a good example of asuitable defensive measure for APT attacks
5 Related Work
Methods for detecting and analyzingwebsites includingmali-cious code can generally be divided into static and dynamicanalysis
51 Static Analysis Static analysis mainly uses machinelearning and pattern matching to detect and classify mali-cious URLs
Ma et al [14 15] presented a classification model thatdetects spam and phishingURLsThismodel uses a statisticalmethod to classify URLs by considering the lexical and host-based properties of malicious URLs Although this methoddetects both spam and phishing URLs it cannot distinguishbetween the two
Another approach is to analyze the JavaScript code inweb pages to find the typical features of malicious codeThis is done either statically [16] or dynamically by loadingthe affected pages in an emulated browser [17] Systemssuch as Prophiler [18] consider both JavaScript and otherfeatures found in HTML and the URLs of malicious pagesWhittaker et al [19] proposed a phishing website classifier toautomatically update Googlersquos phishing blacklist They usedseveral features obtained from domain information and pagecontents
JSAND [20] used amachine learning approach to classifymalicious JavaScript
52 Dynamic Analysis Dynamic analysis analyzes theserverndashclient connection to detect and classify maliciousURLs
In other words dynamic analysis relies on visiting web-siteswith an instrumented browser (often referred to as a hon-eyclient) and monitoring the activities of the machine to findthe typical signatures of successful exploitations (eg the cre-ation of a new process) [21] PhoneyC [22] uses a signature-based low-interaction honeypot to detect malicious websites
Systems such as [23 24] execute web content dynamicallyand capture drive-by downloads based on either signatures oranomaly detection while Blade [25] leverages user behaviormodels for drive-by download detection All of these systemsexhibit good detection results However it is usually costlyto follow the full redirection path and monitor each scriptexecution in real time Moreover their accuracy is highlydependent on the malicious response of the webpage tovulnerable components
Provos et al [26] analyzed the maliciousness of a largecollection of web pages using a machine learning algorithmas a prefilter for VM-based analysis They adopted content-based features including the presence of obfuscated JavaScriptand exploit site-pointing iframes
The main differences between the models proposed inthis paper and previous approaches are as follows
(i) The model proposed in this paper applies a staticmethod to analyze the connectivity between nodesand detects the core-hub node dynamically based onthe risk index
(ii) The proposed model detects and blocks the core-hubnode using link data from the high-risk maliciouswebsites as observed for a specific period of time
(iii) The proposed model prevents the dissemination ofmalicious websites in the early stages by blocking thelink between the core malicious code distribution siteand the exploitlanding site
6 Conclusion
In this paper the 1st order risk of malware infection wasanalyzed using log information estimated by an MCC thatconsiders the DCI BCI and ECI of the main nodes based onthe priority of risk This provides a quantitative value of thepotential risk inherent in the corresponding site (node)
In addition the risk index of exploit sites and distribu-tion sites was calculated by considering their weights Theoverlapped infection history and survival ratio were used toestimate the risk of distribution sites whereas the overlapped
8 Computational and Mathematical Methods in Medicine
infection history and exposure frequency were consideredwhen estimating the risk of exploit sites Finally the MRI wasestimated using the 1st order risk analysis and the risk indexof the distribution sites and exploit sites
In future work we will develop a feature model thatpredicts the seriousness of website security problems by data-mining the logs produced frommalicious code detection andvulnerability scanning tools
As this feature model will be used to predict the risk ofa specific website it should contribute to establish an activemalicious code distribution blocking system that realizesproactive responses beyond the limit of reactive responsesthat rely only on traditional malicious code detection tools
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] J A Hansen and N M Hansen ldquoA taxonomy of vulnerabilitiesin implantable medical devicesrdquo in Proceedings of the 2ndAnnualWorkshop on Security and Privacy inMedical andHome-Care Systems (SPIMACS rsquo10) pp 13ndash20 October 2010
[2] C-S Park ldquoSecurity mechanism based on hospital authen-tication server for secure application of implantable medicaldevicesrdquo BioMed Research International vol 2014 Article ID543051 12 pages 2014
[3] E Hutchins M Cloppert and R Amin ldquoIntelligence-drivencomputer network defense informed by analysis of adversarycampaigns and intrusion kill chainsrdquo in Proceedings of the 6thInternational Conference on Information Warfare and Security(ICIW rsquo11) pp 113ndash125 Academic Conferences March 2011
[4] N Moran ldquoUnderstanding Advanced Persistent ThreatsmdashACase Studyrdquo 2010 httpswwwusenixorgsystemfilesloginarticles105484-Moranpdf
[5] S-J Kim D-E Cho and S-S Yeo ldquoSecure model againstAPT in m-connected SCADA networkrdquo International Journalof Distributed Sensor Networks vol 2014 Article ID 594652 8pages 2014
[6] N Provos P Mavrommatis M Abu Rajab and F MonroseldquoAll your iframes points to usrdquo in Proceedings of the USENIXSecurity 2008
[7] S Lee and J Kim ldquoWARNINGBIRD detecting suspiciousURLsin twitter streamrdquo in Proceedings of the Symposium on Networkand Distributed System Security (NDSS rsquo12) 2012
[8] X Sun Y Wang J Ren Y Zhu and S Liu ldquoCollectinginternet malware based on client-side honeypotrdquo in Proceedingsof the 9th International Conference for YoungComputer Scientists(ICYCS rsquo08) pp 1493ndash1498 Hunan China November 2008
[9] Y-C Cho and J-Y Pan ldquoMultiple-feature extracting modulesbased leak mining system designrdquoThe Scientific World Journalvol 2013 Article ID 704865 11 pages 2013
[10] D H Kim Y-G Kim H P In and H C Jeong ldquoA method forrisk measurement of botnetrsquos malicious activitiesrdquo InformationJournal vol 17 no 1 pp 165ndash180 2014
[11] C Ni C Sugimoto and J Jiang ldquoDegree closeness andbetweenness application of group centrality measurements toexplore macro-disciplinary evolution diachronicallyrdquo in Pro-ceedings of the ISSI pp 1ndash13 Durban South Africa 2011
[12] F Barzinpour B Hoda Ali-Ahmadi S Alizadeh and S G JalaliNaini ldquoClustering networksrsquo heterogeneous data in defining acomprehensive closeness centrality indexrdquoMathematical Prob-lems in Engineering vol 2014 Article ID 202350 10 pages 2014
[13] S K Raghavan Unnithan B Kannan and M JathavedanldquoBetweenness centrality in Some classes of graphsrdquo Interna-tional Journal of Combinatorics vol 2014 Article ID 241723 12pages 2014
[14] J Ma L K Saul S Savage and G M Voelker ldquoBeyondblacklists learning to detectmaliciousweb sites from suspiciousURLsrdquo in Proceedings of the 15th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo09) pp 1245ndash1253 July 2009
[15] J Ma L K Saul S Savage and G M Voelker ldquoIdentifyingsuspicious URLs an application of large-scale online learningrdquoin Proceedings of the 26th Annual International Conference onMachine Learning (ICML rsquo09) pp 681ndash688 2009
[16] C Curtsinger B Livshits B Zorn and C Seifert ldquoZozzlelow-overhead mostly static javascript malware detectionrdquo inProceedings of the USENIX Security Symposium 2011
[17] M Cova C Kruegel and G Vigna ldquoDetection and analysis ofdrive-by-download attacks and malicious JavaScript coderdquo inProceedings of the 19th International World Wide Web Confer-ence (WWW rsquo10) pp 281ndash290 April 2010
[18] D Canali M Cova G Vigna and C Kruegel ldquoProphiler a fastfilter for the large-scale detection of malicious web pagesrdquo inProceedings of the 20th International Conference on World WideWeb (WWW rsquo11) pp 197ndash206 2011
[19] C Whittaker B Ryner and M Nazif ldquoLarge-scale automaticclassification of phishing pagesrdquo in Proceedings of the Sympo-sium on Network and Distributed System Security (NDSS rsquo10)2010
[20] P Agten S van Acker Y Brondsema P H Phung L Desmetand F Piessens ldquoJSand complete client-side sandboxing ofthird-party JavaScript without browser modificationsrdquo in Pro-ceedings of the 28th Annual Computer Security ApplicationsConference (ACSAC rsquo12) pp 1ndash10 ACM December 2012
[21] C Seifert I Welch and P Komisarczuk ldquoHoneyc the low-interaction client honeypotrdquo in Proceedings of the New ZealandComputer Science Research Student Conference (NZCSRCS rsquo07)Hamilton New Zealand 2007
[22] N Jose ldquoPhoneyC a virtual client honeypotrdquo in Proceedingsof the 2nd USENIX Conference on Large-Scale Exploits andEmergentThreats Botnets SpywareWorms andMore USENIXAssociation Berkeley Calif USA April 2009
[23] Y-MWang D Beck X Jiang et al ldquoAutomatedweb patrol withstrider honeymonkeysrdquo in Proceedings of the 2006 Network andDistributed System Security Symposium February 2006
[24] The Honeynet Project Capture-HPC client honeypot 2008httpprojectshoneynetorgcapture-hpc
[25] L Lu V Yegneswaran P Porras and W Lee ldquoBlade an attack-agnostic approach for preventing drive-by malware infectionsrdquoin Proceedings of the 17th ACM Conference on Computer andCommunications Security (CCS rsquo10) pp 440ndash450 ACM Octo-ber 2010
[26] N Provos P Mavrommatis M A Rajab and F Monrose ldquoAllyour iFRAMEs point to usrdquo in Proceedings of the 17th USENIXSecurity Symposium pp 1ndash15 2008
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
Computational and Mathematical Methods in Medicine 5
(4) Step 4 2nd Order Risk Analysis
A Distribution Site Risk Analysis The risk indexis estimated by considering the weights (over-lapped infection history and survival ratio)based on the 1st order risk analysis The dis-tribution site risk is calculated by the vector ofthe values calculated in Step 3 the overlappedinfection history (119868) of each distribution sitenode and the actual survival ratio (119878)lowast Survival Ratio (119878) is as follows whether treat-ment has been given after infection (based onone yearrsquos information)
Treatment Probability (1198781)
=Survival Cases
Survival Cases + Treatesd Cases
Failure Probability (1198782)
=Treated Cases
Survival Cases + Treatesd Cases
1199032 = 1199031 times 119868 times 1198781 (If the node has been treated)
1199032 = 1199031 times 119868 times 1198782 (If the node has not been treated)
(5)
B ExploitLanding Site Risk Analysis The riskindex is estimated by considering the weights(overlapped infection history and exposure fre-quency) from the 1st order risk analysis Theexploit site risk is calculated by the vectorof values calculated in Step 3 the overlappedinfection history (119868) of each distribution sitenode and the actual exposure frequency in asearch website (119864)
1199033 = 1199031 times (2 times 119868 times 119864
119868 + 119864) or 1199033 = 1199031 times 119868 (6)
(5) Step 5 Malicious URL Risk Index (MRI) The MRI isestimated from the 1st order risk analysis result andthe risk indexThe following formula can be deducedfrom the 1st order risk analysis result calculated inStep 3 and the risk index of each distributionexploitsite calculated in Step 4 considering the characteris-tics of the corresponding node
119903final = radic11990321 + 11990322 + 11990323
(7)
4 Experimental Results
We conducted experiments to examine the performance ofour zero-day detection method based on MCC
For these experiments we processed the detection logacquired by crawling biomedical information system-relatedmalware sites with the developedMCC in the log form statedin Step 1
The estimated risk values are intuitive in our proposedmodel That is our final interpretation is based on the
crawling result Additionally the crawling method uses ablacklist or known patterns Thus our proposed modelexhibits a low false positive rate
The MCC detection method proceeds as follows Theattacker (hacker) inserts malicious code into a specific web-page by operating a malicious code distribution server on theinternet or by hacking a vulnerable web serverThe clients (orusers) of the web server involuntarily use the exploitlandingdistribution site containing themalicious code and downloadthe malicious code Eventually the attacker collects the clientaccounts and various other information from the infectedserver and proceeds to act maliciously
The proposed system searchescrawls 25 million siteson a continuous basis detectsblocks the inserted maliciouscode and establishesoperates a malicious code blacklist
41 Analysis Results As a post hoc study based on the resultsof the MCC operation for a specific period our results sup-port decision making for proactive responses and follow-upmeasures enabling biomedical information system securityexperts or administrators to maximize their operationalefficiency
Figure 4 shows the MRI estimated through the 1st and2nd order risk analysis after the detection of malicious URLs
Table 1 lists the detected malicious code exploitlandingdistribution URLs (including both exploitlanding sites anddistribution sites) The risk index is a relative value If limitedto the range 0-1 the minimum risk would be fixed at 0 but itis hard to set a clear standard for the maximum risk
In this paper we use a relative risk index that fixes theminimum risk to 0 and indicates the high-risk core malwaresites through prioritization
42 Sensitivity Analysis Thedetection rate of actual zero-dayattacks can be measured using a sensitivity analysis based onthe results given in Table 1 Among the malware sites relatedto zero-day attacks occurring to biomedical informationsystems we analyze distribution sites and exploitlandingsites Table 2 shows the detection rate measurements basedon actual data produced in a specific time window
The results in Table 2 focus on the top five high-risk sitesThe multipath malware site group denotes the number ofexploitlanding sites actually connected with a distributionsite The percentage represents the average detection rate ina specific time window and this detection performance isbetter than in the pre-analysis stage
The average early detection rate of distribution sites andexploitlanding sites is also higher in this section than in thepreanalysis stage That is the proactive elimination of coremaliciouswebsites results in an average improvement in zero-day attack detection of more than 20
43 Visualization of Analysis Results The risk index of eachURL calculated in this paper can be analyzed by verifyingwhether the risk index agrees with the weak point of thecorresponding server
This section analyzes the actual weak point based onthe calculated risk index and verifies whether this indexagrees with the actual prioritization using an error analysis
6 Computational and Mathematical Methods in Medicine
1st highest-risk landing (or exploit) site1st highest-risk distribution site
Figure 4 Visualization of malware site risk
Table 1 MRI estimation result of exploitlandingdistribution sites
Node type URL MRI ReliabilityDistribution site http222lowast lowast lowastlowast lowast lowastlowast lowast lowastchhtml 03965 91Distribution site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcomNewindexhtml 03505 92Distribution site httpa1lowast lowast lowast lowast lowast lowast lowast lowast lowastcom1indexhtml 03058 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 03047 95Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 03026 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 03017 94Distribution site httpa2lowast lowast lowast lowast lowast lowast lowast lowast lowastcom2indexhtml 03009 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastrekr 03003 92Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 02993 92Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02991 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02983 91Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02982 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorg 02970 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 02969 96Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcom 02968 95Exploit site httplowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02967 95Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02966 96Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 02966 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 02962 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcom 02961 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02960 94(ldquolowast lowast lowast lowast lowast lowast lowast lowast lowastrdquo the URL information of malware site)
Computational and Mathematical Methods in Medicine 7
Table 2 Average detection rate of zero-day attacks for a given day
Priority of risk Malware site groupwith multipath
Distribution site withsingle path
Landing (or exploit) sitewith single path
1 233 (15) 215 2822 226 (9) 316 3243 147 (8) 228 1814 184 (10) 197 3235 212 (12) 242 176Average early detection rate 2004 2396 2572
technique Figure 4 visualizes the 1st highest-risk distributionsite according to the MRI
The detection and elimination of high-risk maliciouscode exploitlandingdistribution sites related to biomedicalinformation systems can be achieved by visualizing the 1sthighest-risk exploitlandingdistribution site as shown inFigure 4Thus our proposedmodel focuses on estimating therisk presented by target malware sites in the specific field ofbiomedical information
We verify the performance of the proposed model basedon static analysis However for military government orsimilar organizations we must dynamically filter out coremalware sites based on high-performance hardware plat-forms For this reason our method is a good example of asuitable defensive measure for APT attacks
5 Related Work
Methods for detecting and analyzingwebsites includingmali-cious code can generally be divided into static and dynamicanalysis
51 Static Analysis Static analysis mainly uses machinelearning and pattern matching to detect and classify mali-cious URLs
Ma et al [14 15] presented a classification model thatdetects spam and phishingURLsThismodel uses a statisticalmethod to classify URLs by considering the lexical and host-based properties of malicious URLs Although this methoddetects both spam and phishing URLs it cannot distinguishbetween the two
Another approach is to analyze the JavaScript code inweb pages to find the typical features of malicious codeThis is done either statically [16] or dynamically by loadingthe affected pages in an emulated browser [17] Systemssuch as Prophiler [18] consider both JavaScript and otherfeatures found in HTML and the URLs of malicious pagesWhittaker et al [19] proposed a phishing website classifier toautomatically update Googlersquos phishing blacklist They usedseveral features obtained from domain information and pagecontents
JSAND [20] used amachine learning approach to classifymalicious JavaScript
52 Dynamic Analysis Dynamic analysis analyzes theserverndashclient connection to detect and classify maliciousURLs
In other words dynamic analysis relies on visiting web-siteswith an instrumented browser (often referred to as a hon-eyclient) and monitoring the activities of the machine to findthe typical signatures of successful exploitations (eg the cre-ation of a new process) [21] PhoneyC [22] uses a signature-based low-interaction honeypot to detect malicious websites
Systems such as [23 24] execute web content dynamicallyand capture drive-by downloads based on either signatures oranomaly detection while Blade [25] leverages user behaviormodels for drive-by download detection All of these systemsexhibit good detection results However it is usually costlyto follow the full redirection path and monitor each scriptexecution in real time Moreover their accuracy is highlydependent on the malicious response of the webpage tovulnerable components
Provos et al [26] analyzed the maliciousness of a largecollection of web pages using a machine learning algorithmas a prefilter for VM-based analysis They adopted content-based features including the presence of obfuscated JavaScriptand exploit site-pointing iframes
The main differences between the models proposed inthis paper and previous approaches are as follows
(i) The model proposed in this paper applies a staticmethod to analyze the connectivity between nodesand detects the core-hub node dynamically based onthe risk index
(ii) The proposed model detects and blocks the core-hubnode using link data from the high-risk maliciouswebsites as observed for a specific period of time
(iii) The proposed model prevents the dissemination ofmalicious websites in the early stages by blocking thelink between the core malicious code distribution siteand the exploitlanding site
6 Conclusion
In this paper the 1st order risk of malware infection wasanalyzed using log information estimated by an MCC thatconsiders the DCI BCI and ECI of the main nodes based onthe priority of risk This provides a quantitative value of thepotential risk inherent in the corresponding site (node)
In addition the risk index of exploit sites and distribu-tion sites was calculated by considering their weights Theoverlapped infection history and survival ratio were used toestimate the risk of distribution sites whereas the overlapped
8 Computational and Mathematical Methods in Medicine
infection history and exposure frequency were consideredwhen estimating the risk of exploit sites Finally the MRI wasestimated using the 1st order risk analysis and the risk indexof the distribution sites and exploit sites
In future work we will develop a feature model thatpredicts the seriousness of website security problems by data-mining the logs produced frommalicious code detection andvulnerability scanning tools
As this feature model will be used to predict the risk ofa specific website it should contribute to establish an activemalicious code distribution blocking system that realizesproactive responses beyond the limit of reactive responsesthat rely only on traditional malicious code detection tools
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] J A Hansen and N M Hansen ldquoA taxonomy of vulnerabilitiesin implantable medical devicesrdquo in Proceedings of the 2ndAnnualWorkshop on Security and Privacy inMedical andHome-Care Systems (SPIMACS rsquo10) pp 13ndash20 October 2010
[2] C-S Park ldquoSecurity mechanism based on hospital authen-tication server for secure application of implantable medicaldevicesrdquo BioMed Research International vol 2014 Article ID543051 12 pages 2014
[3] E Hutchins M Cloppert and R Amin ldquoIntelligence-drivencomputer network defense informed by analysis of adversarycampaigns and intrusion kill chainsrdquo in Proceedings of the 6thInternational Conference on Information Warfare and Security(ICIW rsquo11) pp 113ndash125 Academic Conferences March 2011
[4] N Moran ldquoUnderstanding Advanced Persistent ThreatsmdashACase Studyrdquo 2010 httpswwwusenixorgsystemfilesloginarticles105484-Moranpdf
[5] S-J Kim D-E Cho and S-S Yeo ldquoSecure model againstAPT in m-connected SCADA networkrdquo International Journalof Distributed Sensor Networks vol 2014 Article ID 594652 8pages 2014
[6] N Provos P Mavrommatis M Abu Rajab and F MonroseldquoAll your iframes points to usrdquo in Proceedings of the USENIXSecurity 2008
[7] S Lee and J Kim ldquoWARNINGBIRD detecting suspiciousURLsin twitter streamrdquo in Proceedings of the Symposium on Networkand Distributed System Security (NDSS rsquo12) 2012
[8] X Sun Y Wang J Ren Y Zhu and S Liu ldquoCollectinginternet malware based on client-side honeypotrdquo in Proceedingsof the 9th International Conference for YoungComputer Scientists(ICYCS rsquo08) pp 1493ndash1498 Hunan China November 2008
[9] Y-C Cho and J-Y Pan ldquoMultiple-feature extracting modulesbased leak mining system designrdquoThe Scientific World Journalvol 2013 Article ID 704865 11 pages 2013
[10] D H Kim Y-G Kim H P In and H C Jeong ldquoA method forrisk measurement of botnetrsquos malicious activitiesrdquo InformationJournal vol 17 no 1 pp 165ndash180 2014
[11] C Ni C Sugimoto and J Jiang ldquoDegree closeness andbetweenness application of group centrality measurements toexplore macro-disciplinary evolution diachronicallyrdquo in Pro-ceedings of the ISSI pp 1ndash13 Durban South Africa 2011
[12] F Barzinpour B Hoda Ali-Ahmadi S Alizadeh and S G JalaliNaini ldquoClustering networksrsquo heterogeneous data in defining acomprehensive closeness centrality indexrdquoMathematical Prob-lems in Engineering vol 2014 Article ID 202350 10 pages 2014
[13] S K Raghavan Unnithan B Kannan and M JathavedanldquoBetweenness centrality in Some classes of graphsrdquo Interna-tional Journal of Combinatorics vol 2014 Article ID 241723 12pages 2014
[14] J Ma L K Saul S Savage and G M Voelker ldquoBeyondblacklists learning to detectmaliciousweb sites from suspiciousURLsrdquo in Proceedings of the 15th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo09) pp 1245ndash1253 July 2009
[15] J Ma L K Saul S Savage and G M Voelker ldquoIdentifyingsuspicious URLs an application of large-scale online learningrdquoin Proceedings of the 26th Annual International Conference onMachine Learning (ICML rsquo09) pp 681ndash688 2009
[16] C Curtsinger B Livshits B Zorn and C Seifert ldquoZozzlelow-overhead mostly static javascript malware detectionrdquo inProceedings of the USENIX Security Symposium 2011
[17] M Cova C Kruegel and G Vigna ldquoDetection and analysis ofdrive-by-download attacks and malicious JavaScript coderdquo inProceedings of the 19th International World Wide Web Confer-ence (WWW rsquo10) pp 281ndash290 April 2010
[18] D Canali M Cova G Vigna and C Kruegel ldquoProphiler a fastfilter for the large-scale detection of malicious web pagesrdquo inProceedings of the 20th International Conference on World WideWeb (WWW rsquo11) pp 197ndash206 2011
[19] C Whittaker B Ryner and M Nazif ldquoLarge-scale automaticclassification of phishing pagesrdquo in Proceedings of the Sympo-sium on Network and Distributed System Security (NDSS rsquo10)2010
[20] P Agten S van Acker Y Brondsema P H Phung L Desmetand F Piessens ldquoJSand complete client-side sandboxing ofthird-party JavaScript without browser modificationsrdquo in Pro-ceedings of the 28th Annual Computer Security ApplicationsConference (ACSAC rsquo12) pp 1ndash10 ACM December 2012
[21] C Seifert I Welch and P Komisarczuk ldquoHoneyc the low-interaction client honeypotrdquo in Proceedings of the New ZealandComputer Science Research Student Conference (NZCSRCS rsquo07)Hamilton New Zealand 2007
[22] N Jose ldquoPhoneyC a virtual client honeypotrdquo in Proceedingsof the 2nd USENIX Conference on Large-Scale Exploits andEmergentThreats Botnets SpywareWorms andMore USENIXAssociation Berkeley Calif USA April 2009
[23] Y-MWang D Beck X Jiang et al ldquoAutomatedweb patrol withstrider honeymonkeysrdquo in Proceedings of the 2006 Network andDistributed System Security Symposium February 2006
[24] The Honeynet Project Capture-HPC client honeypot 2008httpprojectshoneynetorgcapture-hpc
[25] L Lu V Yegneswaran P Porras and W Lee ldquoBlade an attack-agnostic approach for preventing drive-by malware infectionsrdquoin Proceedings of the 17th ACM Conference on Computer andCommunications Security (CCS rsquo10) pp 440ndash450 ACM Octo-ber 2010
[26] N Provos P Mavrommatis M A Rajab and F Monrose ldquoAllyour iFRAMEs point to usrdquo in Proceedings of the 17th USENIXSecurity Symposium pp 1ndash15 2008
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
6 Computational and Mathematical Methods in Medicine
1st highest-risk landing (or exploit) site1st highest-risk distribution site
Figure 4 Visualization of malware site risk
Table 1 MRI estimation result of exploitlandingdistribution sites
Node type URL MRI ReliabilityDistribution site http222lowast lowast lowastlowast lowast lowastlowast lowast lowastchhtml 03965 91Distribution site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcomNewindexhtml 03505 92Distribution site httpa1lowast lowast lowast lowast lowast lowast lowast lowast lowastcom1indexhtml 03058 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 03047 95Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 03026 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 03017 94Distribution site httpa2lowast lowast lowast lowast lowast lowast lowast lowast lowastcom2indexhtml 03009 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastrekr 03003 92Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 02993 92Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02991 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02983 91Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02982 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorg 02970 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 02969 96Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcom 02968 95Exploit site httplowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02967 95Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02966 96Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 02966 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 02962 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcom 02961 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02960 94(ldquolowast lowast lowast lowast lowast lowast lowast lowast lowastrdquo the URL information of malware site)
Computational and Mathematical Methods in Medicine 7
Table 2 Average detection rate of zero-day attacks for a given day
Priority of risk Malware site groupwith multipath
Distribution site withsingle path
Landing (or exploit) sitewith single path
1 233 (15) 215 2822 226 (9) 316 3243 147 (8) 228 1814 184 (10) 197 3235 212 (12) 242 176Average early detection rate 2004 2396 2572
technique Figure 4 visualizes the 1st highest-risk distributionsite according to the MRI
The detection and elimination of high-risk maliciouscode exploitlandingdistribution sites related to biomedicalinformation systems can be achieved by visualizing the 1sthighest-risk exploitlandingdistribution site as shown inFigure 4Thus our proposedmodel focuses on estimating therisk presented by target malware sites in the specific field ofbiomedical information
We verify the performance of the proposed model basedon static analysis However for military government orsimilar organizations we must dynamically filter out coremalware sites based on high-performance hardware plat-forms For this reason our method is a good example of asuitable defensive measure for APT attacks
5 Related Work
Methods for detecting and analyzingwebsites includingmali-cious code can generally be divided into static and dynamicanalysis
51 Static Analysis Static analysis mainly uses machinelearning and pattern matching to detect and classify mali-cious URLs
Ma et al [14 15] presented a classification model thatdetects spam and phishingURLsThismodel uses a statisticalmethod to classify URLs by considering the lexical and host-based properties of malicious URLs Although this methoddetects both spam and phishing URLs it cannot distinguishbetween the two
Another approach is to analyze the JavaScript code inweb pages to find the typical features of malicious codeThis is done either statically [16] or dynamically by loadingthe affected pages in an emulated browser [17] Systemssuch as Prophiler [18] consider both JavaScript and otherfeatures found in HTML and the URLs of malicious pagesWhittaker et al [19] proposed a phishing website classifier toautomatically update Googlersquos phishing blacklist They usedseveral features obtained from domain information and pagecontents
JSAND [20] used amachine learning approach to classifymalicious JavaScript
52 Dynamic Analysis Dynamic analysis analyzes theserverndashclient connection to detect and classify maliciousURLs
In other words dynamic analysis relies on visiting web-siteswith an instrumented browser (often referred to as a hon-eyclient) and monitoring the activities of the machine to findthe typical signatures of successful exploitations (eg the cre-ation of a new process) [21] PhoneyC [22] uses a signature-based low-interaction honeypot to detect malicious websites
Systems such as [23 24] execute web content dynamicallyand capture drive-by downloads based on either signatures oranomaly detection while Blade [25] leverages user behaviormodels for drive-by download detection All of these systemsexhibit good detection results However it is usually costlyto follow the full redirection path and monitor each scriptexecution in real time Moreover their accuracy is highlydependent on the malicious response of the webpage tovulnerable components
Provos et al [26] analyzed the maliciousness of a largecollection of web pages using a machine learning algorithmas a prefilter for VM-based analysis They adopted content-based features including the presence of obfuscated JavaScriptand exploit site-pointing iframes
The main differences between the models proposed inthis paper and previous approaches are as follows
(i) The model proposed in this paper applies a staticmethod to analyze the connectivity between nodesand detects the core-hub node dynamically based onthe risk index
(ii) The proposed model detects and blocks the core-hubnode using link data from the high-risk maliciouswebsites as observed for a specific period of time
(iii) The proposed model prevents the dissemination ofmalicious websites in the early stages by blocking thelink between the core malicious code distribution siteand the exploitlanding site
6 Conclusion
In this paper the 1st order risk of malware infection wasanalyzed using log information estimated by an MCC thatconsiders the DCI BCI and ECI of the main nodes based onthe priority of risk This provides a quantitative value of thepotential risk inherent in the corresponding site (node)
In addition the risk index of exploit sites and distribu-tion sites was calculated by considering their weights Theoverlapped infection history and survival ratio were used toestimate the risk of distribution sites whereas the overlapped
8 Computational and Mathematical Methods in Medicine
infection history and exposure frequency were consideredwhen estimating the risk of exploit sites Finally the MRI wasestimated using the 1st order risk analysis and the risk indexof the distribution sites and exploit sites
In future work we will develop a feature model thatpredicts the seriousness of website security problems by data-mining the logs produced frommalicious code detection andvulnerability scanning tools
As this feature model will be used to predict the risk ofa specific website it should contribute to establish an activemalicious code distribution blocking system that realizesproactive responses beyond the limit of reactive responsesthat rely only on traditional malicious code detection tools
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] J A Hansen and N M Hansen ldquoA taxonomy of vulnerabilitiesin implantable medical devicesrdquo in Proceedings of the 2ndAnnualWorkshop on Security and Privacy inMedical andHome-Care Systems (SPIMACS rsquo10) pp 13ndash20 October 2010
[2] C-S Park ldquoSecurity mechanism based on hospital authen-tication server for secure application of implantable medicaldevicesrdquo BioMed Research International vol 2014 Article ID543051 12 pages 2014
[3] E Hutchins M Cloppert and R Amin ldquoIntelligence-drivencomputer network defense informed by analysis of adversarycampaigns and intrusion kill chainsrdquo in Proceedings of the 6thInternational Conference on Information Warfare and Security(ICIW rsquo11) pp 113ndash125 Academic Conferences March 2011
[4] N Moran ldquoUnderstanding Advanced Persistent ThreatsmdashACase Studyrdquo 2010 httpswwwusenixorgsystemfilesloginarticles105484-Moranpdf
[5] S-J Kim D-E Cho and S-S Yeo ldquoSecure model againstAPT in m-connected SCADA networkrdquo International Journalof Distributed Sensor Networks vol 2014 Article ID 594652 8pages 2014
[6] N Provos P Mavrommatis M Abu Rajab and F MonroseldquoAll your iframes points to usrdquo in Proceedings of the USENIXSecurity 2008
[7] S Lee and J Kim ldquoWARNINGBIRD detecting suspiciousURLsin twitter streamrdquo in Proceedings of the Symposium on Networkand Distributed System Security (NDSS rsquo12) 2012
[8] X Sun Y Wang J Ren Y Zhu and S Liu ldquoCollectinginternet malware based on client-side honeypotrdquo in Proceedingsof the 9th International Conference for YoungComputer Scientists(ICYCS rsquo08) pp 1493ndash1498 Hunan China November 2008
[9] Y-C Cho and J-Y Pan ldquoMultiple-feature extracting modulesbased leak mining system designrdquoThe Scientific World Journalvol 2013 Article ID 704865 11 pages 2013
[10] D H Kim Y-G Kim H P In and H C Jeong ldquoA method forrisk measurement of botnetrsquos malicious activitiesrdquo InformationJournal vol 17 no 1 pp 165ndash180 2014
[11] C Ni C Sugimoto and J Jiang ldquoDegree closeness andbetweenness application of group centrality measurements toexplore macro-disciplinary evolution diachronicallyrdquo in Pro-ceedings of the ISSI pp 1ndash13 Durban South Africa 2011
[12] F Barzinpour B Hoda Ali-Ahmadi S Alizadeh and S G JalaliNaini ldquoClustering networksrsquo heterogeneous data in defining acomprehensive closeness centrality indexrdquoMathematical Prob-lems in Engineering vol 2014 Article ID 202350 10 pages 2014
[13] S K Raghavan Unnithan B Kannan and M JathavedanldquoBetweenness centrality in Some classes of graphsrdquo Interna-tional Journal of Combinatorics vol 2014 Article ID 241723 12pages 2014
[14] J Ma L K Saul S Savage and G M Voelker ldquoBeyondblacklists learning to detectmaliciousweb sites from suspiciousURLsrdquo in Proceedings of the 15th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo09) pp 1245ndash1253 July 2009
[15] J Ma L K Saul S Savage and G M Voelker ldquoIdentifyingsuspicious URLs an application of large-scale online learningrdquoin Proceedings of the 26th Annual International Conference onMachine Learning (ICML rsquo09) pp 681ndash688 2009
[16] C Curtsinger B Livshits B Zorn and C Seifert ldquoZozzlelow-overhead mostly static javascript malware detectionrdquo inProceedings of the USENIX Security Symposium 2011
[17] M Cova C Kruegel and G Vigna ldquoDetection and analysis ofdrive-by-download attacks and malicious JavaScript coderdquo inProceedings of the 19th International World Wide Web Confer-ence (WWW rsquo10) pp 281ndash290 April 2010
[18] D Canali M Cova G Vigna and C Kruegel ldquoProphiler a fastfilter for the large-scale detection of malicious web pagesrdquo inProceedings of the 20th International Conference on World WideWeb (WWW rsquo11) pp 197ndash206 2011
[19] C Whittaker B Ryner and M Nazif ldquoLarge-scale automaticclassification of phishing pagesrdquo in Proceedings of the Sympo-sium on Network and Distributed System Security (NDSS rsquo10)2010
[20] P Agten S van Acker Y Brondsema P H Phung L Desmetand F Piessens ldquoJSand complete client-side sandboxing ofthird-party JavaScript without browser modificationsrdquo in Pro-ceedings of the 28th Annual Computer Security ApplicationsConference (ACSAC rsquo12) pp 1ndash10 ACM December 2012
[21] C Seifert I Welch and P Komisarczuk ldquoHoneyc the low-interaction client honeypotrdquo in Proceedings of the New ZealandComputer Science Research Student Conference (NZCSRCS rsquo07)Hamilton New Zealand 2007
[22] N Jose ldquoPhoneyC a virtual client honeypotrdquo in Proceedingsof the 2nd USENIX Conference on Large-Scale Exploits andEmergentThreats Botnets SpywareWorms andMore USENIXAssociation Berkeley Calif USA April 2009
[23] Y-MWang D Beck X Jiang et al ldquoAutomatedweb patrol withstrider honeymonkeysrdquo in Proceedings of the 2006 Network andDistributed System Security Symposium February 2006
[24] The Honeynet Project Capture-HPC client honeypot 2008httpprojectshoneynetorgcapture-hpc
[25] L Lu V Yegneswaran P Porras and W Lee ldquoBlade an attack-agnostic approach for preventing drive-by malware infectionsrdquoin Proceedings of the 17th ACM Conference on Computer andCommunications Security (CCS rsquo10) pp 440ndash450 ACM Octo-ber 2010
[26] N Provos P Mavrommatis M A Rajab and F Monrose ldquoAllyour iFRAMEs point to usrdquo in Proceedings of the 17th USENIXSecurity Symposium pp 1ndash15 2008
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
Computational and Mathematical Methods in Medicine 7
Table 2 Average detection rate of zero-day attacks for a given day
Priority of risk Malware site groupwith multipath
Distribution site withsingle path
Landing (or exploit) sitewith single path
1 233 (15) 215 2822 226 (9) 316 3243 147 (8) 228 1814 184 (10) 197 3235 212 (12) 242 176Average early detection rate 2004 2396 2572
technique Figure 4 visualizes the 1st highest-risk distributionsite according to the MRI
The detection and elimination of high-risk maliciouscode exploitlandingdistribution sites related to biomedicalinformation systems can be achieved by visualizing the 1sthighest-risk exploitlandingdistribution site as shown inFigure 4Thus our proposedmodel focuses on estimating therisk presented by target malware sites in the specific field ofbiomedical information
We verify the performance of the proposed model basedon static analysis However for military government orsimilar organizations we must dynamically filter out coremalware sites based on high-performance hardware plat-forms For this reason our method is a good example of asuitable defensive measure for APT attacks
5 Related Work
Methods for detecting and analyzingwebsites includingmali-cious code can generally be divided into static and dynamicanalysis
51 Static Analysis Static analysis mainly uses machinelearning and pattern matching to detect and classify mali-cious URLs
Ma et al [14 15] presented a classification model thatdetects spam and phishingURLsThismodel uses a statisticalmethod to classify URLs by considering the lexical and host-based properties of malicious URLs Although this methoddetects both spam and phishing URLs it cannot distinguishbetween the two
Another approach is to analyze the JavaScript code inweb pages to find the typical features of malicious codeThis is done either statically [16] or dynamically by loadingthe affected pages in an emulated browser [17] Systemssuch as Prophiler [18] consider both JavaScript and otherfeatures found in HTML and the URLs of malicious pagesWhittaker et al [19] proposed a phishing website classifier toautomatically update Googlersquos phishing blacklist They usedseveral features obtained from domain information and pagecontents
JSAND [20] used amachine learning approach to classifymalicious JavaScript
52 Dynamic Analysis Dynamic analysis analyzes theserverndashclient connection to detect and classify maliciousURLs
In other words dynamic analysis relies on visiting web-siteswith an instrumented browser (often referred to as a hon-eyclient) and monitoring the activities of the machine to findthe typical signatures of successful exploitations (eg the cre-ation of a new process) [21] PhoneyC [22] uses a signature-based low-interaction honeypot to detect malicious websites
Systems such as [23 24] execute web content dynamicallyand capture drive-by downloads based on either signatures oranomaly detection while Blade [25] leverages user behaviormodels for drive-by download detection All of these systemsexhibit good detection results However it is usually costlyto follow the full redirection path and monitor each scriptexecution in real time Moreover their accuracy is highlydependent on the malicious response of the webpage tovulnerable components
Provos et al [26] analyzed the maliciousness of a largecollection of web pages using a machine learning algorithmas a prefilter for VM-based analysis They adopted content-based features including the presence of obfuscated JavaScriptand exploit site-pointing iframes
The main differences between the models proposed inthis paper and previous approaches are as follows
(i) The model proposed in this paper applies a staticmethod to analyze the connectivity between nodesand detects the core-hub node dynamically based onthe risk index
(ii) The proposed model detects and blocks the core-hubnode using link data from the high-risk maliciouswebsites as observed for a specific period of time
(iii) The proposed model prevents the dissemination ofmalicious websites in the early stages by blocking thelink between the core malicious code distribution siteand the exploitlanding site
6 Conclusion
In this paper the 1st order risk of malware infection wasanalyzed using log information estimated by an MCC thatconsiders the DCI BCI and ECI of the main nodes based onthe priority of risk This provides a quantitative value of thepotential risk inherent in the corresponding site (node)
In addition the risk index of exploit sites and distribu-tion sites was calculated by considering their weights Theoverlapped infection history and survival ratio were used toestimate the risk of distribution sites whereas the overlapped
8 Computational and Mathematical Methods in Medicine
infection history and exposure frequency were consideredwhen estimating the risk of exploit sites Finally the MRI wasestimated using the 1st order risk analysis and the risk indexof the distribution sites and exploit sites
In future work we will develop a feature model thatpredicts the seriousness of website security problems by data-mining the logs produced frommalicious code detection andvulnerability scanning tools
As this feature model will be used to predict the risk ofa specific website it should contribute to establish an activemalicious code distribution blocking system that realizesproactive responses beyond the limit of reactive responsesthat rely only on traditional malicious code detection tools
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] J A Hansen and N M Hansen ldquoA taxonomy of vulnerabilitiesin implantable medical devicesrdquo in Proceedings of the 2ndAnnualWorkshop on Security and Privacy inMedical andHome-Care Systems (SPIMACS rsquo10) pp 13ndash20 October 2010
[2] C-S Park ldquoSecurity mechanism based on hospital authen-tication server for secure application of implantable medicaldevicesrdquo BioMed Research International vol 2014 Article ID543051 12 pages 2014
[3] E Hutchins M Cloppert and R Amin ldquoIntelligence-drivencomputer network defense informed by analysis of adversarycampaigns and intrusion kill chainsrdquo in Proceedings of the 6thInternational Conference on Information Warfare and Security(ICIW rsquo11) pp 113ndash125 Academic Conferences March 2011
[4] N Moran ldquoUnderstanding Advanced Persistent ThreatsmdashACase Studyrdquo 2010 httpswwwusenixorgsystemfilesloginarticles105484-Moranpdf
[5] S-J Kim D-E Cho and S-S Yeo ldquoSecure model againstAPT in m-connected SCADA networkrdquo International Journalof Distributed Sensor Networks vol 2014 Article ID 594652 8pages 2014
[6] N Provos P Mavrommatis M Abu Rajab and F MonroseldquoAll your iframes points to usrdquo in Proceedings of the USENIXSecurity 2008
[7] S Lee and J Kim ldquoWARNINGBIRD detecting suspiciousURLsin twitter streamrdquo in Proceedings of the Symposium on Networkand Distributed System Security (NDSS rsquo12) 2012
[8] X Sun Y Wang J Ren Y Zhu and S Liu ldquoCollectinginternet malware based on client-side honeypotrdquo in Proceedingsof the 9th International Conference for YoungComputer Scientists(ICYCS rsquo08) pp 1493ndash1498 Hunan China November 2008
[9] Y-C Cho and J-Y Pan ldquoMultiple-feature extracting modulesbased leak mining system designrdquoThe Scientific World Journalvol 2013 Article ID 704865 11 pages 2013
[10] D H Kim Y-G Kim H P In and H C Jeong ldquoA method forrisk measurement of botnetrsquos malicious activitiesrdquo InformationJournal vol 17 no 1 pp 165ndash180 2014
[11] C Ni C Sugimoto and J Jiang ldquoDegree closeness andbetweenness application of group centrality measurements toexplore macro-disciplinary evolution diachronicallyrdquo in Pro-ceedings of the ISSI pp 1ndash13 Durban South Africa 2011
[12] F Barzinpour B Hoda Ali-Ahmadi S Alizadeh and S G JalaliNaini ldquoClustering networksrsquo heterogeneous data in defining acomprehensive closeness centrality indexrdquoMathematical Prob-lems in Engineering vol 2014 Article ID 202350 10 pages 2014
[13] S K Raghavan Unnithan B Kannan and M JathavedanldquoBetweenness centrality in Some classes of graphsrdquo Interna-tional Journal of Combinatorics vol 2014 Article ID 241723 12pages 2014
[14] J Ma L K Saul S Savage and G M Voelker ldquoBeyondblacklists learning to detectmaliciousweb sites from suspiciousURLsrdquo in Proceedings of the 15th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo09) pp 1245ndash1253 July 2009
[15] J Ma L K Saul S Savage and G M Voelker ldquoIdentifyingsuspicious URLs an application of large-scale online learningrdquoin Proceedings of the 26th Annual International Conference onMachine Learning (ICML rsquo09) pp 681ndash688 2009
[16] C Curtsinger B Livshits B Zorn and C Seifert ldquoZozzlelow-overhead mostly static javascript malware detectionrdquo inProceedings of the USENIX Security Symposium 2011
[17] M Cova C Kruegel and G Vigna ldquoDetection and analysis ofdrive-by-download attacks and malicious JavaScript coderdquo inProceedings of the 19th International World Wide Web Confer-ence (WWW rsquo10) pp 281ndash290 April 2010
[18] D Canali M Cova G Vigna and C Kruegel ldquoProphiler a fastfilter for the large-scale detection of malicious web pagesrdquo inProceedings of the 20th International Conference on World WideWeb (WWW rsquo11) pp 197ndash206 2011
[19] C Whittaker B Ryner and M Nazif ldquoLarge-scale automaticclassification of phishing pagesrdquo in Proceedings of the Sympo-sium on Network and Distributed System Security (NDSS rsquo10)2010
[20] P Agten S van Acker Y Brondsema P H Phung L Desmetand F Piessens ldquoJSand complete client-side sandboxing ofthird-party JavaScript without browser modificationsrdquo in Pro-ceedings of the 28th Annual Computer Security ApplicationsConference (ACSAC rsquo12) pp 1ndash10 ACM December 2012
[21] C Seifert I Welch and P Komisarczuk ldquoHoneyc the low-interaction client honeypotrdquo in Proceedings of the New ZealandComputer Science Research Student Conference (NZCSRCS rsquo07)Hamilton New Zealand 2007
[22] N Jose ldquoPhoneyC a virtual client honeypotrdquo in Proceedingsof the 2nd USENIX Conference on Large-Scale Exploits andEmergentThreats Botnets SpywareWorms andMore USENIXAssociation Berkeley Calif USA April 2009
[23] Y-MWang D Beck X Jiang et al ldquoAutomatedweb patrol withstrider honeymonkeysrdquo in Proceedings of the 2006 Network andDistributed System Security Symposium February 2006
[24] The Honeynet Project Capture-HPC client honeypot 2008httpprojectshoneynetorgcapture-hpc
[25] L Lu V Yegneswaran P Porras and W Lee ldquoBlade an attack-agnostic approach for preventing drive-by malware infectionsrdquoin Proceedings of the 17th ACM Conference on Computer andCommunications Security (CCS rsquo10) pp 440ndash450 ACM Octo-ber 2010
[26] N Provos P Mavrommatis M A Rajab and F Monrose ldquoAllyour iFRAMEs point to usrdquo in Proceedings of the 17th USENIXSecurity Symposium pp 1ndash15 2008
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
8 Computational and Mathematical Methods in Medicine
infection history and exposure frequency were consideredwhen estimating the risk of exploit sites Finally the MRI wasestimated using the 1st order risk analysis and the risk indexof the distribution sites and exploit sites
In future work we will develop a feature model thatpredicts the seriousness of website security problems by data-mining the logs produced frommalicious code detection andvulnerability scanning tools
As this feature model will be used to predict the risk ofa specific website it should contribute to establish an activemalicious code distribution blocking system that realizesproactive responses beyond the limit of reactive responsesthat rely only on traditional malicious code detection tools
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] J A Hansen and N M Hansen ldquoA taxonomy of vulnerabilitiesin implantable medical devicesrdquo in Proceedings of the 2ndAnnualWorkshop on Security and Privacy inMedical andHome-Care Systems (SPIMACS rsquo10) pp 13ndash20 October 2010
[2] C-S Park ldquoSecurity mechanism based on hospital authen-tication server for secure application of implantable medicaldevicesrdquo BioMed Research International vol 2014 Article ID543051 12 pages 2014
[3] E Hutchins M Cloppert and R Amin ldquoIntelligence-drivencomputer network defense informed by analysis of adversarycampaigns and intrusion kill chainsrdquo in Proceedings of the 6thInternational Conference on Information Warfare and Security(ICIW rsquo11) pp 113ndash125 Academic Conferences March 2011
[4] N Moran ldquoUnderstanding Advanced Persistent ThreatsmdashACase Studyrdquo 2010 httpswwwusenixorgsystemfilesloginarticles105484-Moranpdf
[5] S-J Kim D-E Cho and S-S Yeo ldquoSecure model againstAPT in m-connected SCADA networkrdquo International Journalof Distributed Sensor Networks vol 2014 Article ID 594652 8pages 2014
[6] N Provos P Mavrommatis M Abu Rajab and F MonroseldquoAll your iframes points to usrdquo in Proceedings of the USENIXSecurity 2008
[7] S Lee and J Kim ldquoWARNINGBIRD detecting suspiciousURLsin twitter streamrdquo in Proceedings of the Symposium on Networkand Distributed System Security (NDSS rsquo12) 2012
[8] X Sun Y Wang J Ren Y Zhu and S Liu ldquoCollectinginternet malware based on client-side honeypotrdquo in Proceedingsof the 9th International Conference for YoungComputer Scientists(ICYCS rsquo08) pp 1493ndash1498 Hunan China November 2008
[9] Y-C Cho and J-Y Pan ldquoMultiple-feature extracting modulesbased leak mining system designrdquoThe Scientific World Journalvol 2013 Article ID 704865 11 pages 2013
[10] D H Kim Y-G Kim H P In and H C Jeong ldquoA method forrisk measurement of botnetrsquos malicious activitiesrdquo InformationJournal vol 17 no 1 pp 165ndash180 2014
[11] C Ni C Sugimoto and J Jiang ldquoDegree closeness andbetweenness application of group centrality measurements toexplore macro-disciplinary evolution diachronicallyrdquo in Pro-ceedings of the ISSI pp 1ndash13 Durban South Africa 2011
[12] F Barzinpour B Hoda Ali-Ahmadi S Alizadeh and S G JalaliNaini ldquoClustering networksrsquo heterogeneous data in defining acomprehensive closeness centrality indexrdquoMathematical Prob-lems in Engineering vol 2014 Article ID 202350 10 pages 2014
[13] S K Raghavan Unnithan B Kannan and M JathavedanldquoBetweenness centrality in Some classes of graphsrdquo Interna-tional Journal of Combinatorics vol 2014 Article ID 241723 12pages 2014
[14] J Ma L K Saul S Savage and G M Voelker ldquoBeyondblacklists learning to detectmaliciousweb sites from suspiciousURLsrdquo in Proceedings of the 15th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo09) pp 1245ndash1253 July 2009
[15] J Ma L K Saul S Savage and G M Voelker ldquoIdentifyingsuspicious URLs an application of large-scale online learningrdquoin Proceedings of the 26th Annual International Conference onMachine Learning (ICML rsquo09) pp 681ndash688 2009
[16] C Curtsinger B Livshits B Zorn and C Seifert ldquoZozzlelow-overhead mostly static javascript malware detectionrdquo inProceedings of the USENIX Security Symposium 2011
[17] M Cova C Kruegel and G Vigna ldquoDetection and analysis ofdrive-by-download attacks and malicious JavaScript coderdquo inProceedings of the 19th International World Wide Web Confer-ence (WWW rsquo10) pp 281ndash290 April 2010
[18] D Canali M Cova G Vigna and C Kruegel ldquoProphiler a fastfilter for the large-scale detection of malicious web pagesrdquo inProceedings of the 20th International Conference on World WideWeb (WWW rsquo11) pp 197ndash206 2011
[19] C Whittaker B Ryner and M Nazif ldquoLarge-scale automaticclassification of phishing pagesrdquo in Proceedings of the Sympo-sium on Network and Distributed System Security (NDSS rsquo10)2010
[20] P Agten S van Acker Y Brondsema P H Phung L Desmetand F Piessens ldquoJSand complete client-side sandboxing ofthird-party JavaScript without browser modificationsrdquo in Pro-ceedings of the 28th Annual Computer Security ApplicationsConference (ACSAC rsquo12) pp 1ndash10 ACM December 2012
[21] C Seifert I Welch and P Komisarczuk ldquoHoneyc the low-interaction client honeypotrdquo in Proceedings of the New ZealandComputer Science Research Student Conference (NZCSRCS rsquo07)Hamilton New Zealand 2007
[22] N Jose ldquoPhoneyC a virtual client honeypotrdquo in Proceedingsof the 2nd USENIX Conference on Large-Scale Exploits andEmergentThreats Botnets SpywareWorms andMore USENIXAssociation Berkeley Calif USA April 2009
[23] Y-MWang D Beck X Jiang et al ldquoAutomatedweb patrol withstrider honeymonkeysrdquo in Proceedings of the 2006 Network andDistributed System Security Symposium February 2006
[24] The Honeynet Project Capture-HPC client honeypot 2008httpprojectshoneynetorgcapture-hpc
[25] L Lu V Yegneswaran P Porras and W Lee ldquoBlade an attack-agnostic approach for preventing drive-by malware infectionsrdquoin Proceedings of the 17th ACM Conference on Computer andCommunications Security (CCS rsquo10) pp 440ndash450 ACM Octo-ber 2010
[26] N Provos P Mavrommatis M A Rajab and F Monrose ldquoAllyour iFRAMEs point to usrdquo in Proceedings of the 17th USENIXSecurity Symposium pp 1ndash15 2008
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom