protecting genomic privacy in medical tests using...

1
Department of Computer Science and Engineering (CSE), BUET Protecting Genomic Privacy in Medical Tests using Distributed Storage Sharmin Afrose (0905028), Maitraye Das (0905052) SI stores SNPs in DDBs SI extracts SNP Chr 13 : Pos 46897343 C : T SI sequences DNA …..GTCCGGCACTA…. P provides sample to SI Sample: Saliva, hair etc Homozygous Heterozygous 0 1 A C G T 00 01 10 11 : DB1 col1 DB2 col1 DB1 col2 DB3 (P) DB2 col2 1 1 0 1 1 XOR 1 0 C T Genomic Data P provides clinical data to PU Clinical data : age, glucose level etc PU computes disease risk using Logistic Regression Model Total (n+1) databases used (n+1) th database encrypted with patient’s public key and stored in her personal device DDBs maintain protocols appropriately Our Approach Conclusion References System Architecture & Threat Model Introduction Privacy Threats Genomic Background We propose a system for securing genomic privacy in medical tests like disease risk computation, personalized medicine using distributed storage. Currently, we are working on implementation and complexity analysis of the system in detail. Reveals traits, ancestry, ethnicity, vulnerability of diseases etc. Exposes relatives’ genomic data 3 Genomic discrimination in health insurance, employment, education etc. Evaluation Fig 3: Flowchart of disease risk test (a) Sequencing and SNP extraction (b) Retrieval of SNP and computation (a) (b) Fig 4: Effect of privacy level Assumptions: Difference of a single nucleotide between Members of same species Paired chromosomes of an individual Each SNP carries two alleles - one from father, one from mother Both alleles can contain risks for two different diseases SNP Four nucleotides: A, C, G and T 3 billion (approx.) base pairs 99.9% of entire genome is same between two persons 112,743,739 enlisted SNPs by dbSNP Genome Fig 2: System architecture for disease risk computation 1. E. Ayday, J. L. Raisaro, J. P. Hubaux, and J. Rougemont, Protecting and evaluating genomic privacy in medical tests and personalized medicine, in the proceedings of WPES’13, p. 95-106. 2. E. Ayday, J. L. Raisaro, P. J. McLaren, J. Fellay, and J. P. Hubaux, Privacy- preserving computation of disease risk by using genomic, clinical, and environmental data, in the proceedings of HealthTech’13. 3. http://edition.cnn.com/2013/08/07/health/henrietta-lacks-genetic-destiny/ Individual's chances of diseases are largely associated with personal genetic variations. Hence, Genomic data is significantly used in disease susceptibility tests and personalized medicine. Fig 1: DNA fragments showing SNP rs6313 Thesis Goal Privacy-preserved and precise computation of multiple disease risks using genomic and clinical data Novelty: Existing cryptography-based methods 1,2 support only single disease risk test, whereas our approach offers substantial improvement regarding multiple disease risk queries in a privacy-preserving manner. 0.98 1.4 1.82 2.24 2.66 0 1 2 3 3 4 5 6 7 Storage (GB) Privacy Level Not a single SNP content is revealed without collusion of all the (n+1) DDBs and the encryption key of the patient Several separately authorized DDBs are used to enhance privacy, in case patient’s personal device is also hacked SNPs related to a disease is sent in one packet. Hence, communication frequency for a disease risk test is (2n+3) where there is (n+1) DDBs. Privacy Analysis Communication Overhead Storage Analysis Fig 4 shows that with the increase of DDBs, storage cost increases slightly.

Upload: others

Post on 16-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Protecting Genomic Privacy in Medical Tests using ...cse.buet.ac.bd/poster/ThesisPosterFeb2015_id-1-20/... · alleles - one from father, one from mother authorized DDBs are Both alleles

Department of Computer Science and Engineering (CSE), BUET

Protecting Genomic Privacy in Medical Tests

using Distributed Storage

Sharmin Afrose (0905028), Maitraye Das (0905052)

SI stores SNPs in DDBs

SI extracts SNP Chr 13 : Pos 46897343 – C : T

SI sequences DNA …..GTCCGGCACTA….

P provides sample to SI Sample: Saliva, hair etc

Homozygous Heterozygous

0 1

A C G T

00 01 10 11

:

DB1 col1

DB2 col1

DB1 col2

DB3 (P)

DB2 col2

1 1 0 1 1

XOR 1 0

C T Genomic Data

P provides clinical data to PU

Clinical data : age, glucose level etc

PU computes disease risk using Logistic Regression

Model

Total (n+1) databases used

(n+1)th database encrypted with patient’s public key and stored in her

personal device

DDBs maintain protocols appropriately

Our Approach

Conclusion References

System Architecture & Threat Model Introduction

Privacy Threats

Genomic Background

We propose a system for securing genomic privacy in medical tests like

disease risk computation, personalized medicine using distributed storage.

Currently, we are working on implementation and complexity analysis of

the system in detail.

Reveals traits, ancestry,

ethnicity, vulnerability of

diseases etc.

Exposes relatives’

genomic data3

Genomic discrimination

in health insurance,

employment, education

etc.

Evaluation

Fig 3: Flowchart of disease risk test (a) Sequencing and SNP extraction

(b) Retrieval of SNP and computation

(a) (b)

Fig 4: Effect of privacy level

Assumptions:

Difference of a single

nucleotide between

Members of same

species

Paired chromosomes of

an individual

Each SNP carries two

alleles - one from father,

one from mother

Both alleles can contain

risks for two different

diseases

SNP

Four nucleotides:

A, C, G and T

3 billion (approx.)

base pairs

99.9% of entire

genome is same

between two

persons

112,743,739

enlisted SNPs by

dbSNP

Genome

Fig 2: System architecture for disease risk computation

1. E. Ayday, J. L. Raisaro, J. P. Hubaux, and J. Rougemont, Protecting and

evaluating genomic privacy in medical tests and personalized medicine, in the

proceedings of WPES’13, p. 95-106.

2. E. Ayday, J. L. Raisaro, P. J. McLaren, J. Fellay, and J. P. Hubaux, Privacy-

preserving computation of disease risk by using genomic, clinical, and

environmental data, in the proceedings of HealthTech’13.

3. http://edition.cnn.com/2013/08/07/health/henrietta-lacks-genetic-destiny/

Individual's chances of diseases are largely

associated with personal genetic variations. Hence,

Genomic data is significantly used in disease

susceptibility tests and personalized medicine.

Fig 1: DNA fragments showing SNP rs6313

Thesis Goal Privacy-preserved and precise computation of multiple

disease risks using genomic and clinical data

Novelty:

Existing cryptography-based methods1,2 support only

single disease risk test, whereas our approach offers

substantial improvement regarding multiple disease

risk queries in a privacy-preserving manner.

0.98

1.4

1.82

2.24

2.66

0

1

2

3

3 4 5 6 7

Sto

rage

(G

B)

Privacy Level

Not a single SNP

content is revealed

without collusion of all

the (n+1) DDBs and

the encryption key of

the patient

Several separately

authorized DDBs are

used to enhance

privacy, in case

patient’s personal

device is also hacked

SNPs related to a disease is sent in one packet.

Hence, communication frequency for a disease

risk test is (2n+3) where there is (n+1) DDBs.

Privacy Analysis

Communication Overhead

Storage Analysis

Fig 4 shows

that with the

increase of

DDBs,

storage cost

increases

slightly.