deepak chandarana & richard overill department of computer ...slides).pdfrelated crime credit...
TRANSCRIPT
A Power Law for Cybercrime
Deepak Chandarana & Richard OverillDepartment of Computer Science
King’s College [email protected]
2
Overview
Introduction to cybercrimePower law characterisationExamples of power law relationshipsData collectionResults of analysisInterpretation of resultsConclusion
3
IntroductionCybercrime refers to Internet and computer related crime
Credit card fraud, Financial fraud, Identity fraud, Cyber-extortion, Cyber-sabotage, Cyber-espionage,...Viruses, Worms, Logic Bombs, Trojan Horses, RATs, Rootkits, Denial of Service attacks, Phishing attacks,..
A growing and evolving form of crimeCost estimated at £1.5 trillion pa world-widePoses many challenges for organisations, governments and law enforcement
4
Who carries out Cybercrime?
Insiders (employees)Hackers (cyber-mercenaries)Criminals (serious & organised crime)Terrorists (sub-state groups)Corporations (commercial espionage)Government agencies (counterintelligence)
5
Their Motives
There are many motives:Revenge, ideology, competition, money, influence
Two main classes:Intrinsic: motivated by internal factors Extrinsic: motivated by external factors
These motivations may also be combined
6
Power Law CharacterisationProbability of measuring a particular value of some quantity varies inversely as a power of that value
α is the exponent of the power lawC is the probability normalisation constant
If logarithms are taken:
Has a straight line form with gradient -α:
( )p x Cx α−=
log( ( )) log( ) log( )p x x Cα= − +
y mx c= +
7
Histograms - 1
Number of values that fit into a data range is countedMidpoint of the range is plotted on the x-axisFrequency of each range is plotted on the y-axisProblems with histograms:
Trends can be hiddenWhat bin size to use?Noise in the tail (due to low frequency values)
8
Histograms -2Logarithmic scalesProblem of noise in the tail (Newman, 2005)
Partially overcome noise by using logarithmic binning. Vary the sizes of the bins using a fixed multiplier
9
Cumulative Distribution Function (CDF)
The CDF defines the probability P(X<=x) that X has a value less than or equal to xThe complementary CDF defines the probability P(X>x) that X has a value greater than xX is plotted on the x-axis (the abscissa)CDF/CCDF is plotted on the y-axis (the ordinate)Advantage 1: the CDF is well-defined for values of X which have low probability (the tail)Advantage 2: the CDF on a log-log plot is a straight line with gradient -(α-1)
10
Calculating the Exponent α
Line of best fit (linear regression) introduces serious inaccuracies (Goldstein et al., 2004)Use a MLE formula for α (Newman, 2005):
n is the number of pointsxi, i = 1…n are the measured values of xxmin is the minimum value of x for which the power law behaviour holds - power laws diverge as x approaches zero
1
1 min
1 lnn
i
i
xnx
α−
=
⎡ ⎤= + ⎢ ⎥
⎣ ⎦∑
11
Calculating xminThe distribution deviates from the power law below xminSolar Flare example (Newman, 2005):
xmin obtained by inspection of the graph is not accurateUse the Kolmogorov-Smirnov D-statistic to determine xminfrom a goodness-of-fit test against the empirical CDF:D=maxi(P(X>=xi) - (n-i)/n, (n-i+1)/n - P(X>=xi))
12
Fatal Quarrels
Lewis Fry Richardson (1948) carried out work into the statistics of fatal quarrels from 1820-1945Data was placed into ranges and plotted on logarithmic scales
13
Conventional War
Newman (2005) considered the cumulative distribution of the intensity of 119 wars from 1816-1980
He calculated the exponent to be 1.80
14
TerrorismClauset & Young (2005) considered terrorist attacks 1968-2004They divided events into two categories:
G7 countries follow a power law with exponent 1.71Non-G7 countries have an exponent of 2.5
15
Why different Exponents?
Terrorist attacks in industrialised nations are relatively rare but tend to be large when they do occur (higher levels of security)
Attacks in the less industrialised world tend to be smaller, but more frequent, events (lower levels of security)
16
Aims of this research
Investigate whether cybercrime conforms to a power law modelCompare with conventional war and terrorism models
17
Collecting & Selecting the DataMany data sources were initially considered (UK DTI, UK NHTCU, ACCSS, etc.)Computer Security Institute / Federal Bureau of Investigation (CSI/FBI) Annual Computer Crime and Security Survey was finally selectedMost complete historical data set (1997-2006)x-value = total amount of money lost from an attack (direct + collateral losses) in $USCrimes for which the historical data set is incomplete (e.g. web-site defacement) are omitted, but are used in re-sampling tests
18
Cumulative Distribution Function
Produces a curve, not a straight line, indicating that a singlepower law relationship does not exist
20
Dividing the Curve -2Graph is divided into left and right sidesTo get an overall fit the weighted Pearson’s product moment correlation coefficient is optimised wrt the position of the dividing point:
=
=
=
=
2
2
the number of points on the left side of the graph the number of points on the right side of the graph
correlation coefficient for the left side
correlation coefficient for the right si
l
r
l
r
nn
r
r= +
=
+=
2
2 22
de
weighted mean of the correlation coefficient of the graph. .
l r
l l r r
n n n
rn r n rr
n
22
Division of Crimes - 1
Calculate percentage of data points each type of crime representsTo identify the most prevalent crimes:
Absolute test: A crime that represents less than 10% of the data is not consideredRelative test: If a crime appears on both sides and if its percentage on one side is less than half its percentage on the other side then the smaller percentage is not considered
23
Division of Crimes - 2
Financial FraudInsider Abuse of Net AccessTheft of Proprietary InformationMalware: viruses, worms, Trojans
Crimes on Right Side
Insider Abuse of Net AccessLaptop TheftSabotage of Data of NetworksSystem PenetrationTelecom FraudUnauthorised Insider Access
Crimes on Left Side
Total Annual Losses ($)
24
Division of Crimes - 3Left side = intrinsic (and combined) crimesRight side = extrinsic crimes
Primarily money motivatedFollow a more targeted and organised approachOrganised crime in cybercrimeMay also be an element of crimes on the left side
Why is organised crime in cybercrime?Anonymity of the InternetTrans-border in natureWeak international lawsBig money to be made!
25
What does the Exponent tell us? - 1
2.552.60Right Side
2.51.711.80
1.601.78LeftSide
Non-G7 Terrorism
G7 Terrorism
Conventionalwar
Cybercrime (122 set)
Cybercrime90 subset)
26
What does the Exponent tell us? - 2Crimes on the right side are targeted against larger organisations with stronger defensive measuresThe attacks succeed less frequently, but are large events when they do happenThe attacks on the left side are smaller in scale and carried out on organisations with weaker defencesThe attacks succeed with greater frequency, but have a smaller financial impact
27
Adapting the Model - 1Johnson et al. (2005) analysed the ongoing conflicts in Colombia and Iraq between 1988-2004
They considered how the power law exponent changed over time
28
Adapting the Model - 2
They put forward a model of insurgent warfare to explain the power law behaviour of conventional war and terrorism We adapt this model to the domain of cybercrimeAttack unit - group of people that can organise themselves to act as a single unitAttack strength - amount of money lost due to an event carried out by this attack unit Strength of the attack unit depends on the skill of its members and the electronic weapons they possess
29
Adapting the Model - 3The left side gives a similar picture to war:
There can be a wide distribution of attack unitsCrimes such as System Penetration or Unauthorised Insider Access can be carried out by a single person or a group of attackersThese attack units have different attack strengths As a result there is a wider variation in crimes that occur on the left side compared to the right side
30
Adapting the Model - 4The right side is comparable to terrorism in non-G7 countries
More organised nature of the crimes Consider an organised crime group bringing together a number of hackers to form an attack unit of a specific strength Carry out the attack then disperse the group to help avoid detectionMore transient attack units whose attack strengths change dynamically due to their continual fragmentation and coalescence.
31
ImplicationsOrganisations, governments and law enforcement agencies are fighting enemies with two different attack styles and motivationsThe left side contains crimes of a more intrinsic nature
Variations in the size and strength of attack unitsAttack units are more static in structure
The right side contains crimes of a more extrinsic nature
Attack units are more dynamic in structure
32
Summary & ConclusionsReviewed the power law relationships found in warfareFor cyber-crime (in USA) a single power law relationship does not existEvidence to indicate that a double power law relationship holdsLeft side characteristic of conventional warfareRight side characteristic of non-G7 terrorism
33
References
L F Richardson (1948) J Amer Stat Assoc 43 523-546L F Richardson (1960) Statistics of Deadly Quarrels, Boxwood PressL-E Cederman (2003) Amer Polit Sci Rev 97 135-150M L Goldstein et al. (2004) Eur Phys J B41 255-258M E J Newman (2005) Contemp Phys 46 323-351A Clauset & M Young (2005) arxiv.org/abs/physics/0502014/N Johnson et al. (2005) arxiv.org/abs/physics/0506213/D Chandarana & R E Overill (2007) J Information Warfare (to be submitted)