data perturbation an inference control method for database security

42
Data Perturbation Data Perturbation An Inference Control An Inference Control Method for Database Method for Database Security Security Dissertation Defense Dissertation Defense Bob Nielson Bob Nielson Oct 23, 2009 Oct 23, 2009

Upload: shalin

Post on 17-Jan-2016

24 views

Category:

Documents


2 download

DESCRIPTION

Data Perturbation An Inference Control Method for Database Security. Dissertation Defense Bob Nielson Oct 23, 2009. I. Introduction. Most security concerns can be handled with the grant command. Others require a view approach - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Perturbation An Inference Control Method for Database Security

Data PerturbationData PerturbationAn Inference Control Method An Inference Control Method

for Database Securityfor Database SecurityDissertation DefenseDissertation Defense

Bob NielsonBob Nielson

Oct 23, 2009Oct 23, 2009

Page 2: Data Perturbation An Inference Control Method for Database Security

I. IntroductionI. Introduction

• Most security concerns can be handled with the grant command.

• Others require a view approach

• But what happens if we wish to disclose partial information in a table field but not the individual records?

Page 3: Data Perturbation An Inference Control Method for Database Security

I. Introduction – The ProblemI. Introduction – The Problem

• The problem is to allow for statistical analysis of data but still protecting individual records.

• Example: Given a database of cancer patients. Allow for a researcher to know what the cancer rate is, but not that patient X has cancer.

Page 4: Data Perturbation An Inference Control Method for Database Security

I. Introduction – The ProblemI. Introduction – The Problem

Name Dept Sex Salary

Bob CS M 30,000

Fred CS M 100,000

Mary CS F 50,000

Tim IT M 50,000

Tom IT M 60,000

Martha IT F 70,000

Ken IT M 50,000

Page 5: Data Perturbation An Inference Control Method for Database Security

II. Related WorkII. Related Work

• Suppression

• Anonymization

• Partitioning

• Data Logging

• Conceptual

• Hybrid

• Perturbation

Page 6: Data Perturbation An Inference Control Method for Database Security

II. Related Work-II. Related Work-SuppressionSuppression

• Must access n records

• Only n queries per day

• There are known methods to get around these protections.

Page 7: Data Perturbation An Inference Control Method for Database Security

II. Related Work-II. Related Work-AnonymizationAnonymization

• Replace the identifying fields with special characters.

• This method can still be compromised.

Page 8: Data Perturbation An Inference Control Method for Database Security

II. Related Work- II. Related Work- AnonymizationAnonymization

Name Dept Sex Salary

* CS M 30,000

* CS M 100,000

* CS F 50,000

* IT M 50,000

* IT M 60,000

* IT F 70,000

* IT M 50,000

Page 9: Data Perturbation An Inference Control Method for Database Security

II. Related Work-II. Related Work-PartitioningPartitioning

• All queries must access more than one band of records.

Page 10: Data Perturbation An Inference Control Method for Database Security

II. Related Work-II. Related Work-PartitioningPartitioning

Name Dept Sex Salary

Bob CS M 30,000

Fred CS M 100,000

Mary CS F 50,000

Tim IT M 50,000

Tom IT M 60,000

Martha IT F 70,000

Ken IT M 50,000

Page 11: Data Perturbation An Inference Control Method for Database Security

II. Related Work –II. Related Work –LoggingLogging

• A log of every query ran is kept.

• Before a query is allowed all possible inferences are checked. If it releases one record, then that query is not permitted.

• Soon there are no queries allowed.

Page 12: Data Perturbation An Inference Control Method for Database Security

II. Related work – II. Related work – ConceptualConceptual

• Design the database so that no confidential information is stored.

Page 13: Data Perturbation An Inference Control Method for Database Security

II. Related Work –II. Related Work –HybridHybrid

• Try using a combination of several of these methods.

Page 14: Data Perturbation An Inference Control Method for Database Security

II. Related Work - PerturbationII. Related Work - Perturbation

• Output Perturbation

• Data Perturbation

• Liew Perturbation

• Nielson Perturbation

• Note: Perturbation means data changing

Page 15: Data Perturbation An Inference Control Method for Database Security

II. Related Work –II. Related Work –Output PerturbationOutput Perturbation

• Output perturbation works by changing the output of the query not the physical data.

Page 16: Data Perturbation An Inference Control Method for Database Security

II. Related Work – II. Related Work – Output PerturbationOutput Perturbation

Data Output Perturbed Data

1 101

2 92

3 103

4 91

5 100

6 81

7 122

8 113

9 103

10 94

x 100 85

s 11.6

Page 17: Data Perturbation An Inference Control Method for Database Security

II. Related Work –II. Related Work –Data PerturbationData Perturbation

• Data perturbation works by changing the physical data.

• Two common methods:

1. To add a random value to each value

2. To multiple each value by a random value

Page 18: Data Perturbation An Inference Control Method for Database Security

II. Related Work – II. Related Work – Data PerturbationData Perturbation

# Data 20% Output Perturbed Error

6 81 96 15

4 91 95 4

2 92 79 13

10 94 89 5

5 100 114 14

1 101 105 4

3 103 120 17

9 103 106 3

8 113 129 16

7 122 120 2

x 100 105.3 9.3

s 11.6 15.7 6.1

Page 19: Data Perturbation An Inference Control Method for Database Security

II. Related Work –II. Related Work –Data PerturbationData Perturbation

Uniform Random Distribution

0

20

40

60

80

100

120

0 200 400 600 800 1000 1200

Query Size

Fit

nes

s

Fitness

Page 20: Data Perturbation An Inference Control Method for Database Security

II. Related Work –II. Related Work –Liew PerturbationLiew Perturbation

• Liew perturbation steps:

1. Calculate the average, standard deviation, and count of the data

2. Generate a new data set with the same average, standard deviation and count

3. Sort both data sets in ascending order

4. Swap the perturbed values with each other.

Page 21: Data Perturbation An Inference Control Method for Database Security

II. Related Work–Liew PerturbationII. Related Work–Liew Perturbation #

Data Liew Perturbed Data

Error

6 81 87 6

4 91 93 2

2 92 97 5

10 94 99 5

5 100 99 1

1 101 101 0

3 103 105 2

9 103 115 12

8 113 118 5

7 122 121 1

x 100 103.5 3.9

s 11.6 11.1 3.5

Page 22: Data Perturbation An Inference Control Method for Database Security

II. Related Work – II. Related Work – Liew PerturbationLiew Perturbation

Liew Perturbation

0

20

40

60

80

100

120

0 200 400 600 800 1000 1200

Size

Fit

nes

s

Fitness

Page 23: Data Perturbation An Inference Control Method for Database Security

III Hypothesis and ProofIII Hypothesis and Proof

• Prove:

• H1: Nielson perturbation is better than No Perturbation

• H2: Nielson perturbation is better than data perturbation (20%)

• H3: Nielson perturbation is better than Liew perturbation (20%)

Page 24: Data Perturbation An Inference Control Method for Database Security

III Hypothesis and ProofIII Hypothesis and Proof

• Disprove:

• H1: Nielson perturbation is not better than No Perturbation

• H2: Nielson perturbation is not better than data perturbation (20%)

• H3: Nielson perturbation is not better than Liew perturbation (20%)

Page 25: Data Perturbation An Inference Control Method for Database Security

IV. MethodologyIV. Methodology

• What is Nielson Perturbation?

• Calculating the absolute error . . .

• Finding optimal values for Nielson perturbation . . .

• Experimental design . . .

• Conducting the experiment . . .

Page 26: Data Perturbation An Inference Control Method for Database Security

IV. Methodology-IV. Methodology-Nielson PerturbationNielson Perturbation

• Nielson Perturbation is a form of data perturbation.

• Each value is multiplied by a random value between alpha and beta for the first gamma records in the data set.

• This value is randomly negated.

Page 27: Data Perturbation An Inference Control Method for Database Security

IV Methodology- IV Methodology- Nielson PerturbationNielson Perturbation

Page 28: Data Perturbation An Inference Control Method for Database Security

IV. Methodology - NielsonIV. Methodology - Nielson # Data Nielson Perturbed Data Error

6 81 72 9

4 91 81 10

2 92 81 11

10 94 76 18

5 100 116 16

1 101 86 15

3 103 83 20

9 103 89 14

8 113 98 15

7 122 106 16

x 100 88.8 14.4

s 11.6 13.8 3.5

Page 29: Data Perturbation An Inference Control Method for Database Security

IV. Methodology-IV. Methodology-Alpha/Beta/GammaAlpha/Beta/Gamma

• What are the best values?

• An evolutionary algorithm was deployed.

• The results after several days of computation were:

1. Alpha = 2.09

2. Beta = 1.18

3. Gamma = 66.87

Page 30: Data Perturbation An Inference Control Method for Database Security

IV. Methodology-IV. Methodology- Evolutionary Results Evolutionary Results

Alpha Beta Gamma Fitness

1.50 1.11 335.38 46.30

1.75 0.75 269.71 46.49

1.33 1.27 233.69 46.41

1.40 1.33 382.70 46.43

2.09 1.18 66.87 46.20

1.34 1.24 154.18 46.55

1.33 1.16 60.31 47.26

1.48 0.93 193.00 46.90

1.63 1.09 105.35 46.70

1.29 0.97 106.76 46.92

Page 31: Data Perturbation An Inference Control Method for Database Security

IV. Methodology-IV. Methodology-Nielson PerturbationNielson Perturbation

Nielson Perturbation

-250

-200

-150

-100

-50

0

50

100

150

0 200 400 600 800 1000 1200

Query Size

Fit

nes

s

Fitness

Page 32: Data Perturbation An Inference Control Method for Database Security

IV. Methodology-IV. Methodology-Nielson PerturbationNielson Perturbation

Nielson - First Part

-250

-200

-150

-100

-50

0

50

100

150

0 20 40 60 80 100 120

Query Size

Fit

nes

s

Fitness

Page 33: Data Perturbation An Inference Control Method for Database Security

IV. Methodology- The MethodIV. Methodology- The Method

• Calculate the average error of each method.

• Use the law of large numbers:

An average of averages approaches a normal distribution as the sample size grows.

Page 34: Data Perturbation An Inference Control Method for Database Security

IV. Methodology- The MethodIV. Methodology- The Method

• Use a t-test to calculate whether two sample means are statistically different from each other with a significance of 95%

Page 35: Data Perturbation An Inference Control Method for Database Security

IV. Methodology- IV. Methodology- Monte Carlo SimulationMonte Carlo Simulation

• Randomly generate 100,000 databases and execute 100’s of queries.

• I will use arrays to test the accuracy. Speed is of major importance here.

• Arrays vs. databases do not matter for calculating the accuracy of query outputs

Page 36: Data Perturbation An Inference Control Method for Database Security

IV. Methodology- IV. Methodology- Calculating the average errorCalculating the average error

• The error should be bigger with smaller query sizes.

• The error should be smaller with larger query sizes.

Page 37: Data Perturbation An Inference Control Method for Database Security

IV.IV. Methodology-Methodology-The Fitness FunctionThe Fitness Function

e=|x-x’|

If q < n/2

fitness=100-e

Else

fitness=e

Smaller fitness scores are better

Page 38: Data Perturbation An Inference Control Method for Database Security

V. Results and ConclusionsV. Results and Conclusions

Page 39: Data Perturbation An Inference Control Method for Database Security

V. Results and ConclusionsV. Results and Conclusions

Page 40: Data Perturbation An Inference Control Method for Database Security

V. Results and ConclusionsV. Results and Conclusions

Page 41: Data Perturbation An Inference Control Method for Database Security

V. Results and ConclusionsV. Results and ConclusionsSignificanceSignificance

• There is a real need for partial disclosure of a field in a table.

• My method insures a higher degree of security.

• My method still allows for release of averages and totals.

Page 42: Data Perturbation An Inference Control Method for Database Security

VI. Further StudiesVI. Further Studies

• Transformation Times

• On the fly perturbing