towards secure dataflow processing in open distributed systems

21
Computer Science Towards Secure Dataflow Processing in Open Distributed Systems Juan Du, Wei Wei, Xiaohui (Helen) Gu, Ting Yu 1/21

Upload: maeve

Post on 21-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Towards Secure Dataflow Processing in Open Distributed Systems. Juan Du, Wei Wei , Xiaohui (Helen) Gu , Ting Yu. 1 /21. Outline. Introduction Design and Algorithms Experimental Evaluation Related Work Conclusion. 2 /21. Dataflow Processing in Distributed System. f 1. f 5. f 5. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Towards Secure Dataflow Processing in Open Distributed Systems

Computer Science

Towards Secure Dataflow Processing in Open Distributed Systems

Juan Du, Wei Wei, Xiaohui (Helen) Gu, Ting Yu

1/21

Page 2: Towards Secure Dataflow Processing in Open Distributed Systems

Computer Science

Outline

IntroductionDesign and AlgorithmsExperimental EvaluationRelated WorkConclusion

2/21

Page 3: Towards Secure Dataflow Processing in Open Distributed Systems

Computer Science

Composer

Dataflow Processing in Distributed System

3/21

S6 S3

S2S4 S7

S1

S12

Dataflow Si Data processing component

ADU

User

f1

f1 f5

f2f3

f4

f2

di

…di,…

…,f 1

(d i),… …,f2(f1(di)),…

…,f 3

(f 2(f 1

(d i))),…

…,f3(f2(f1(di))),…

Component provider

S9

f5

…di ,…

Page 4: Towards Secure Dataflow Processing in Open Distributed Systems

Computer Science

Run in Open Distributed Systems

Dataflow Processing Applications– Network traffic monitoring– Sensor data analysis– Audio/video surveillance– Scientific data processing

Advantages in Open Distributed Systems– Highly scalable and available infrastructures– No need to maintain hardware and software

Challenges in Open Distributed Systems– Component providers come from different security

domains– Not all data processing components are trustworthy

4/21

Page 5: Towards Secure Dataflow Processing in Open Distributed Systems

Computer Science

Composer

ADU Attack

5/21

S3

S2S4 S7

S1

User

Si Malicious component

S3

S4

… d2, d1

… f 1

(d 2), f

1(d 1

)… f2(f1(d1)

S12

f1

f5

f2f3

f4

f2

Dataflow Si Data processing component

ADUdiComponent provider

S6

f1

S9

f5

… f2(f1(d1), d0

Page 6: Towards Secure Dataflow Processing in Open Distributed Systems

Computer Science

Composer

Dataflow Topology Attack

6/21

S3

S2S4 S7

S1

User

S3

S4

S12

f1

f5

f2f3

f4

f2

… f 1

(d 2), …

…f3(f5(f2(f1(d2)))), …

…f3 (f2(f1(d2)))), …

Si Malicious component

Dataflow Si ADUdiComponent provider

Data processing component

S6

f1

S9

f5

Page 7: Towards Secure Dataflow Processing in Open Distributed Systems

Computer Science

Composer

Function Integrity Attack

7/21

S3

S2S4 S7

S1

User

S3

S4

S12

f1

f5

f2f3

f4

f2

… f0(f1(d2)),…… f1(d2),…

Si Malicious component

Dataflow Si ADUdiComponent provider

Data processing component

S6

f1

S9

f5

… f 1

(d 2), …

Page 8: Towards Secure Dataflow Processing in Open Distributed Systems

Computer Science

System Design

Attack Models– ADU attack– Dataflow topology attack– Function integrity attack

Assumptions– Third-party component providers could be malicious– Composers and users are trusted– PKI is deployed in advance

Goals– Provide integrity and confidentiality for dataflow

processing applications

– Focus on discussing integrity issues

8/21

Page 9: Towards Secure Dataflow Processing in Open Distributed Systems

Computer Science

Provenance-based ADU Protection

“Receipt” packet– ADU dropping attack– s2 may claim it does not receive d

– s1 may claim it sends d, but it doesn’t

S1 S2• d

• [sqn, session_Id, hash(d)]sign_s2

• receipt

C

9/21

•d •

d

Page 10: Towards Secure Dataflow Processing in Open Distributed Systems

Computer Science

Provenance-based ADU Protection

Provenance evidence– Cached or carry-on evidence– Consistency verification between different components

10/21

S1 S2

• f1

• [[h(d), h(f1(d))]sign_s1]key_c

• f1(d)

C C• d

• f2 • f2(f1(d))

• [[h(d), h(f1(d))]sign_s1]key_c

• [[h(f1(d)), h(f2(f1(d)))]sign_s2]key_c

• input

• output• inpu

t• output

Page 11: Towards Secure Dataflow Processing in Open Distributed Systems

Computer Science

sig_c

Dataflow Topology Protection

Cascading topology encryption– Any component cannot change the dataflow topology– Each component only knows its previous hop and next hop

11/21

• C

• s

1

• C

• s

2• s

3

• f

1

• f

2

• f

3

• C s1 s2 s3 C

[s1] [s2] [s3] [C] C key_s1 key_s2 key_s3 sig_c sig_c sig_c

Page 12: Towards Secure Dataflow Processing in Open Distributed Systems

Computer Science

[s1] [s2] [s3] [C]

[s1]sig_c [s2]sig_c [s3]sig_c [C]sig _ c key_s3 key_s2

Dataflow Topology Protection

Cascading topology encryption– Any component cannot change the dataflow topology– Each component only knows its previous hop and next hop– Onion routing [Goldschlag, et al., 1999]

12/21

• C

• s

1

• C

• s

2• s

3

• f

1

• f

2

• f

3

• C s1 s2 s3 C

• [s2]sig_c [s3]sig_c [C]sig _ c key_s3

• [s3]sig_c [C]sig _ c

sig_c C key_s1 key_s2 key_s3 sig_c sig_c sig_c

Page 13: Towards Secure Dataflow Processing in Open Distributed Systems

Computer Science

Function Integrity Attestation

Randomized data attestation– Achieve scalable function integrity attack detection

• Duplicate a random subset of ADUs • Send duplicates to selected functionally equivalent components• Check result consistency

– Continuously perform randomized data attestation

13/21

• f

1

• f

2

• C • C

• d

1

• s

1• s

2• s

3• s

4

• s

5• s

6• s

7• s

8

• d

2

• d

1

• f1(d

1)• f2(f1(d1

))• d2• f1(d

2)

• f2(f1(d2

’))

• d2’ • f1(d

2’)

• f2(f1(d2

))

f2(f1(d2)) = = f2(f1(d2’)) ?

• d

3

• d

3

• d

3

, f1(d3) , f2(f1(d3))

f1(d3’) • f2(f1(d3’))

f2(f1(d3)) = = f2(f1(d3’)) ?

Page 14: Towards Secure Dataflow Processing in Open Distributed Systems

Computer Science

Implementation and Experimental Setup

14/21

Implementation– Implement a prototype of the secure dataflow processing– Follow the design of the IBM System S

Experiment setup– Conduct experiments on Planetlab– Use about 200 hosts – One host represents one component provider – Composer deployed on a pre-defined Planetlab host

Page 15: Towards Secure Dataflow Processing in Open Distributed Systems

Computer Science

Evaluation

15/21

Overhead caused by basic protection schemesRandomized data attestation

– Overhead• in terms of dataflow processing delay• (time of dn getting out - time of d1 getting in ) / n

– Detection probability • non-collusion • collusion

Page 16: Towards Secure Dataflow Processing in Open Distributed Systems

Computer Science

Overhead of Basic Protection Schemes

• The overhead is about 10~15% for both secure dataflow schemes

16/21

Page 17: Towards Secure Dataflow Processing in Open Distributed Systems

Computer Science

Overhead of Randomized Data Attestation

• # of redundant

components k = 5• data size = 1KB• data rate = 10 ADUs/sec• duration = 30s

• Avg dataflow processing delay increases with the number of redundant components used

• Due to sub-optimal dataflow topology

17/21

Page 18: Towards Secure Dataflow Processing in Open Distributed Systems

Computer Science

Detection Probability

• Detection probability increases with duplication probability pu and number of redundant components used

• Detection is harder in collusion scenarios than that in non-collusion scenarios

18/21

Page 19: Towards Secure Dataflow Processing in Open Distributed Systems

Computer Science

Related Work

Distributed dataflow processing– Focuses on resource and performance management issues– Assumes that data processing components are trustworthy

Trust management in distributed systems– Distributed messaging systems [Haeberlen, et al. SOSP

2007]– Pub-sub overlay [Srivatsa, et al., CCS 2005]– None of them addressed secure and scalable dataflow

processing in open distributed system

Byzantine fault-tolerance – in Wide area networks [Amir, et al., DSN 2006]– No trusted party

19/21

Page 20: Towards Secure Dataflow Processing in Open Distributed Systems

Computer Science

Conclusion

Finished Work– The first attempt to address the integrity of dataflow

processing application delivery on open distributed systems– Identify and classify major security attacks– Propose a set of effective protection schemes

Future Work– Non-linear dataflow topology– Integrity attestation on stateful function– Further identify malicious component

20/21

Page 21: Towards Secure Dataflow Processing in Open Distributed Systems

Computer Science

•Thank you•Questions?

21/21