deepak chandarana & richard overill department of computer ...slides).pdfrelated crime credit...

34
A Power Law for Cybercrime Deepak Chandarana & Richard Overill Department of Computer Science King’s College London [email protected]

Upload: dohanh

Post on 26-Mar-2019

217 views

Category:

Documents


0 download

TRANSCRIPT

A Power Law for Cybercrime

Deepak Chandarana & Richard OverillDepartment of Computer Science

King’s College [email protected]

2

Overview

Introduction to cybercrimePower law characterisationExamples of power law relationshipsData collectionResults of analysisInterpretation of resultsConclusion

3

IntroductionCybercrime refers to Internet and computer related crime

Credit card fraud, Financial fraud, Identity fraud, Cyber-extortion, Cyber-sabotage, Cyber-espionage,...Viruses, Worms, Logic Bombs, Trojan Horses, RATs, Rootkits, Denial of Service attacks, Phishing attacks,..

A growing and evolving form of crimeCost estimated at £1.5 trillion pa world-widePoses many challenges for organisations, governments and law enforcement

4

Who carries out Cybercrime?

Insiders (employees)Hackers (cyber-mercenaries)Criminals (serious & organised crime)Terrorists (sub-state groups)Corporations (commercial espionage)Government agencies (counterintelligence)

5

Their Motives

There are many motives:Revenge, ideology, competition, money, influence

Two main classes:Intrinsic: motivated by internal factors Extrinsic: motivated by external factors

These motivations may also be combined

6

Power Law CharacterisationProbability of measuring a particular value of some quantity varies inversely as a power of that value

α is the exponent of the power lawC is the probability normalisation constant

If logarithms are taken:

Has a straight line form with gradient -α:

( )p x Cx α−=

log( ( )) log( ) log( )p x x Cα= − +

y mx c= +

7

Histograms - 1

Number of values that fit into a data range is countedMidpoint of the range is plotted on the x-axisFrequency of each range is plotted on the y-axisProblems with histograms:

Trends can be hiddenWhat bin size to use?Noise in the tail (due to low frequency values)

8

Histograms -2Logarithmic scalesProblem of noise in the tail (Newman, 2005)

Partially overcome noise by using logarithmic binning. Vary the sizes of the bins using a fixed multiplier

9

Cumulative Distribution Function (CDF)

The CDF defines the probability P(X<=x) that X has a value less than or equal to xThe complementary CDF defines the probability P(X>x) that X has a value greater than xX is plotted on the x-axis (the abscissa)CDF/CCDF is plotted on the y-axis (the ordinate)Advantage 1: the CDF is well-defined for values of X which have low probability (the tail)Advantage 2: the CDF on a log-log plot is a straight line with gradient -(α-1)

10

Calculating the Exponent α

Line of best fit (linear regression) introduces serious inaccuracies (Goldstein et al., 2004)Use a MLE formula for α (Newman, 2005):

n is the number of pointsxi, i = 1…n are the measured values of xxmin is the minimum value of x for which the power law behaviour holds - power laws diverge as x approaches zero

1

1 min

1 lnn

i

i

xnx

α−

=

⎡ ⎤= + ⎢ ⎥

⎣ ⎦∑

11

Calculating xminThe distribution deviates from the power law below xminSolar Flare example (Newman, 2005):

xmin obtained by inspection of the graph is not accurateUse the Kolmogorov-Smirnov D-statistic to determine xminfrom a goodness-of-fit test against the empirical CDF:D=maxi(P(X>=xi) - (n-i)/n, (n-i+1)/n - P(X>=xi))

12

Fatal Quarrels

Lewis Fry Richardson (1948) carried out work into the statistics of fatal quarrels from 1820-1945Data was placed into ranges and plotted on logarithmic scales

13

Conventional War

Newman (2005) considered the cumulative distribution of the intensity of 119 wars from 1816-1980

He calculated the exponent to be 1.80

14

TerrorismClauset & Young (2005) considered terrorist attacks 1968-2004They divided events into two categories:

G7 countries follow a power law with exponent 1.71Non-G7 countries have an exponent of 2.5

15

Why different Exponents?

Terrorist attacks in industrialised nations are relatively rare but tend to be large when they do occur (higher levels of security)

Attacks in the less industrialised world tend to be smaller, but more frequent, events (lower levels of security)

16

Aims of this research

Investigate whether cybercrime conforms to a power law modelCompare with conventional war and terrorism models

17

Collecting & Selecting the DataMany data sources were initially considered (UK DTI, UK NHTCU, ACCSS, etc.)Computer Security Institute / Federal Bureau of Investigation (CSI/FBI) Annual Computer Crime and Security Survey was finally selectedMost complete historical data set (1997-2006)x-value = total amount of money lost from an attack (direct + collateral losses) in $USCrimes for which the historical data set is incomplete (e.g. web-site defacement) are omitted, but are used in re-sampling tests

18

Cumulative Distribution Function

Produces a curve, not a straight line, indicating that a singlepower law relationship does not exist

19

Dividing the Curve - 1

20

Dividing the Curve -2Graph is divided into left and right sidesTo get an overall fit the weighted Pearson’s product moment correlation coefficient is optimised wrt the position of the dividing point:

=

=

=

=

2

2

the number of points on the left side of the graph the number of points on the right side of the graph

correlation coefficient for the left side

correlation coefficient for the right si

l

r

l

r

nn

r

r= +

=

+=

2

2 22

de

weighted mean of the correlation coefficient of the graph. .

l r

l l r r

n n n

rn r n rr

n

21

Dividing the Curve - 3

22

Division of Crimes - 1

Calculate percentage of data points each type of crime representsTo identify the most prevalent crimes:

Absolute test: A crime that represents less than 10% of the data is not consideredRelative test: If a crime appears on both sides and if its percentage on one side is less than half its percentage on the other side then the smaller percentage is not considered

23

Division of Crimes - 2

Financial FraudInsider Abuse of Net AccessTheft of Proprietary InformationMalware: viruses, worms, Trojans

Crimes on Right Side

Insider Abuse of Net AccessLaptop TheftSabotage of Data of NetworksSystem PenetrationTelecom FraudUnauthorised Insider Access

Crimes on Left Side

Total Annual Losses ($)

24

Division of Crimes - 3Left side = intrinsic (and combined) crimesRight side = extrinsic crimes

Primarily money motivatedFollow a more targeted and organised approachOrganised crime in cybercrimeMay also be an element of crimes on the left side

Why is organised crime in cybercrime?Anonymity of the InternetTrans-border in natureWeak international lawsBig money to be made!

25

What does the Exponent tell us? - 1

2.552.60Right Side

2.51.711.80

1.601.78LeftSide

Non-G7 Terrorism

G7 Terrorism

Conventionalwar

Cybercrime (122 set)

Cybercrime90 subset)

26

What does the Exponent tell us? - 2Crimes on the right side are targeted against larger organisations with stronger defensive measuresThe attacks succeed less frequently, but are large events when they do happenThe attacks on the left side are smaller in scale and carried out on organisations with weaker defencesThe attacks succeed with greater frequency, but have a smaller financial impact

27

Adapting the Model - 1Johnson et al. (2005) analysed the ongoing conflicts in Colombia and Iraq between 1988-2004

They considered how the power law exponent changed over time

28

Adapting the Model - 2

They put forward a model of insurgent warfare to explain the power law behaviour of conventional war and terrorism We adapt this model to the domain of cybercrimeAttack unit - group of people that can organise themselves to act as a single unitAttack strength - amount of money lost due to an event carried out by this attack unit Strength of the attack unit depends on the skill of its members and the electronic weapons they possess

29

Adapting the Model - 3The left side gives a similar picture to war:

There can be a wide distribution of attack unitsCrimes such as System Penetration or Unauthorised Insider Access can be carried out by a single person or a group of attackersThese attack units have different attack strengths As a result there is a wider variation in crimes that occur on the left side compared to the right side

30

Adapting the Model - 4The right side is comparable to terrorism in non-G7 countries

More organised nature of the crimes Consider an organised crime group bringing together a number of hackers to form an attack unit of a specific strength Carry out the attack then disperse the group to help avoid detectionMore transient attack units whose attack strengths change dynamically due to their continual fragmentation and coalescence.

31

ImplicationsOrganisations, governments and law enforcement agencies are fighting enemies with two different attack styles and motivationsThe left side contains crimes of a more intrinsic nature

Variations in the size and strength of attack unitsAttack units are more static in structure

The right side contains crimes of a more extrinsic nature

Attack units are more dynamic in structure

32

Summary & ConclusionsReviewed the power law relationships found in warfareFor cyber-crime (in USA) a single power law relationship does not existEvidence to indicate that a double power law relationship holdsLeft side characteristic of conventional warfareRight side characteristic of non-G7 terrorism

33

References

L F Richardson (1948) J Amer Stat Assoc 43 523-546L F Richardson (1960) Statistics of Deadly Quarrels, Boxwood PressL-E Cederman (2003) Amer Polit Sci Rev 97 135-150M L Goldstein et al. (2004) Eur Phys J B41 255-258M E J Newman (2005) Contemp Phys 46 323-351A Clauset & M Young (2005) arxiv.org/abs/physics/0502014/N Johnson et al. (2005) arxiv.org/abs/physics/0506213/D Chandarana & R E Overill (2007) J Information Warfare (to be submitted)

34

Questions