backdoors: a remote healing architecture for cluster...

46
1 Backdoors: A Remote Healing Architecture for Cluster-based Systems Florin Sultan Laboratory for Network Centric Computing http://discolab.rutgers.edu

Upload: doankhanh

Post on 03-Sep-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

1

Backdoors: A Remote HealingArchitecture for Cluster-based

Systems

Florin SultanLaboratory for Network Centric Computing

http://discolab.rutgers.edu

Feb 12, 2004 COANS Seminar Spring 2004 3

The Worst Scenario

Feb 12, 2004 COANS Seminar Spring 2004 4

The Only Way Out…

Feb 12, 2004 COANS Seminar Spring 2004 5

…Not Good for All

Bank.com

TCP/IP

http://www.Bank.comWeb Browsing

Feb 12, 2004 COANS Seminar Spring 2004 6

What Do We Need?n Monitor system health

n OS/application failuresn DoS attack, overloadn intrusion

n Take action to heal the systemn repair damaged state, clean-up corrupted staten extract and recover good staten contain fault/attackn repel intrusion

n Where should these operations be performed?

Feb 12, 2004 COANS Seminar Spring 2004 7

Self-Healing

n Consumes processor cycles (intrusive)n Relies on processor availability

n hang failures make healing impossible

n Relies on OS resourcesn sensitive to resource depletion/unavailability

n Relies on system integrityn state may be corruptedn system may be compromised by an attacker

Feb 12, 2004 COANS Seminar Spring 2004 8

Alternative: Remote Healing

n Perform healing from another systemn target system must allow remote accessn the monitor system must be trusted

n Can we make remote healing nonintrusive?n no extra load on the target systemn no reliance on target resources (processor, OS, etc.)

Feb 12, 2004 COANS Seminar Spring 2004 9

Target Failures

n OS/application hangs or cannot sustain servicen hw: processor, network, disk, etc.n OS: driver bug, deadlock, resource exhaustion, etc.n DoS attack, overload

n Memory still available, yet not accessible viaconventional paths (IP stack, console, etc.)

n Solutionn monitor and detect failuresn recover or repair software state of the affected system

Feb 12, 2004 COANS Seminar Spring 2004 10

The Backdoor

backdoor: a hidden software or hardware mechanism, usually created for testing and troubleshooting

--American National Standard for Telecommunications

Feb 12, 2004 COANS Seminar Spring 2004 11

The Backdoor (BD) Architecture

Processor

I/Odevices

Memory

Backdoor MonitorSystem

Target System

Feb 12, 2004 COANS Seminar Spring 2004 12

Outline

n Introductionn Remote Healing in Clusters of Computersn Backdoor Architecturen Case Study: Recovery in Internet Servicesn Prototypen Conclusions

Feb 12, 2004 COANS Seminar Spring 2004 13

Internet Services Today

n Commercial shift in using the Internetn e-commerce, banking, trading, auctioning, etc.n transactional, time-critical servicesn economic incentive to fault tolerance and service continuity

Bank.comTCP/IP

http://www.Bank.comWeb Browsing

Feb 12, 2004 COANS Seminar Spring 2004 14

Cluster-based Internet Services

Server

Monitor

Server

Monitor

Server

Monitor

Client Client Client Client

Feb 12, 2004 COANS Seminar Spring 2004 15

Cluster-based Internet Services

Server

Monitor

Server

Monitor

Server

Monitor

Client Client Client Client

Feb 12, 2004 COANS Seminar Spring 2004 16

Remote Healing in Clusters

n Goal: survivability of live service staten OS and application-specific

n Target: state critical to service continuityn Remote monitoring and diagnosis

n detect failure, bad state, attack, intrusion

n Remote interventionn recovery of useful state from failed nodesn in-place repair of bad state

Feb 12, 2004 COANS Seminar Spring 2004 17

Backdoor-based Remote Healing

PM I/O

BD

PM I/O

BD

PM I/O

BD

PM I/O

BD

Private secure networkM

T M

M T

TT M

Feb 12, 2004 COANS Seminar Spring 2004 18

Backdoor Architecture Principles

1. Bidirectional accessn both remote input and output operations must be

supported2. Remote memory access

n memory must be accessible remotelyn remote I/O?

3. Availabilityn failure must not impair BD

4. Nonintrusive operationn BD operations must not involve processors of the

target system

Feb 12, 2004 COANS Seminar Spring 2004 19

Backdoor Architecture Principles (cont)

5. Transparencyn BD operation must not be visible to target

6. Access controln monitor and target negotiate access permissions at

the beginningn target cannot “close” the BD afterwards

7. Tamper resistancen target cannot modify the result of a BD operation

Question: How can we implement Backdoor usingexisting technologies?

Feb 12, 2004 COANS Seminar Spring 2004 20

OS

NIC

n Remote DMA (RDMA) Read/Write operationsn Remote processor not involvedn RMC-based networking technologies: VIA, InfiniBand, etc.

Remote Memory Communication (RMC)

CPU

CPUMemory

RMC NIC

CPUMemory

RMC NIC

RDMA Write

RDMA InitiatorTarget

Feb 12, 2004 COANS Seminar Spring 2004 21

Backdoor with RMC

NIC CPU

CPUMemory

RMC NIC

CPU

Memory

RMC NIC

MonitorTargetMONITOR(RDMA-R)

REPAIR/RECOVER(RDMA-R/W)

Feb 12, 2004 COANS Seminar Spring 2004 22

RMC Compliance with BD Principles

Y-Access control

Y?Transparency

YNonintrusiveness

YAvailabilityYRemote memory accessYBidirectional access

Tamper resistance Y

Feb 12, 2004 COANS Seminar Spring 2004 23

Remote Healing Architecture

Nonintrusive remote accessBD

OSMonitoring Repair /

Recovery Monitoring Repair /Recovery

Passive Gateway

T M

ActionDetectionCritical stateApplication

Active Gateway

API

Feb 12, 2004 COANS Seminar Spring 2004 24

Monitoring over RMC-BD

CPUMemory

RMC NIC

CPU

RMC NIC

Target Monitor

ExternalizedState

Memory

RemoteviewDetection

Monitor: progress, anomalous events, integrity constraints, etc.

Feb 12, 2004 COANS Seminar Spring 2004 25

Repair over RMC-BD

CPUMemory

RMC NIC

CPU

RMC NIC

Target Monitor

Memory

CorrectStateRepaired

State

Feb 12, 2004 COANS Seminar Spring 2004 26

Recovery over RMC-BD

CPUMemory

RMC NIC

CPU

RMC NIC

Target Monitor

Memory

RecoverableState

RecoveredState

Feb 12, 2004 COANS Seminar Spring 2004 27

And Possibly More…

n Remote control of I/O devicesn access state in peripheral devices, e.g., OS swap space

n Dynamically inject code/data in a live systemn test, diagnosis, repair handlersn fast system reboot through OS memory overlayn fast restart of application components (micro-reboot)

n Monitor for intrusion/attack detection

Feb 12, 2004 COANS Seminar Spring 2004 28

Case Study: Recovery In Internet Services

n Remote healing is not just RMC!n RMC provides just a way of access

n Requires OS supportn Failure Detectionn Session Recovery

Feb 12, 2004 COANS Seminar Spring 2004 29

OS Support

n Monitoring: Progress Box (PB)n progress counter: {scalar value, update deadline}n PB = set of progress counters in OS memoryn API to allocate and update progress counters in PBn monitor reads PB, checks counters, detects stalls

n Recovery: State Box (SB)n encapsulates per-session server staten API to export/import application state to/from SBn backup node reads SB, reinstates session, resumes service

Feb 12, 2004 COANS Seminar Spring 2004 30

Failure Detection with PB

Target OS MonitorProgressBox

n Target system updates progress counters in PBn Examples: interrupts (global, per-device), context

switches, connections accepted, etc.n Monitor process

n scans remote PB, checks counters, detects stalls

BD

Feb 12, 2004 COANS Seminar Spring 2004 31

Recovery With SB

n Fine-grained, essential service staten Application-specific components (SB_APP)

n E.g., document name, offset in document, etc.n OS-specific components (SB_IO)

n E.g., send/receive TCP buffersn An SB can be distributed over multiple processes

(multi-tier servers)n Backup node extracts SB from a failed node and

reinstates it locally

Feb 12, 2004 COANS Seminar Spring 2004 32

SB Structure

C1

SB3

C2

C3SB2

SB1

TCP state pipe stateApp state App state

Front-end server process

Back-end server process SB_APP

SB_IO

Feb 12, 2004 COANS Seminar Spring 2004 33

Backdoors Prototype

n Implemented using Myrinet NICs with modifiedfirmwaren remote Read/Write DMAn remote OS locking (syscalls, interrupt handlers)

n Modified FreeBSD kerneln Progress Boxn State Box

n Modified server applications

Feb 12, 2004 COANS Seminar Spring 2004 34

A Realistic Sample Application:Multi-tier Auction Service (RUBiS)

Back-End

MySQL DB

Front-End (FE)

Apache web server

Middle Tier (MT)

Tomcat + JBoss

Feb 12, 2004 COANS Seminar Spring 2004 35

Recoverable RUBiS

SB = {reqID, req}

SB = {reqID, tid, result}

Feb 12, 2004 COANS Seminar Spring 2004 36

Experimental Evaluation

n 2.4 GHz, 1 GB RAM, 1Gbps Ethernet, MyrinetLanaiX 133 MHz PCI

n Fault injectionn synthetic freeze: halt CPU, disable device interrupts,

disable network interface, trap to kernel debuggern emulated crashes in buggy network drivers

n Experimentsn Microbenchmarksn Failover correctnessn Failover throughput and latency

Feb 12, 2004 COANS Seminar Spring 2004 37

Microbenchmarks

n Monitor CPU usage, sampling a 100-counter PBn 46% worst-case (infinite loop)n < 5% @ 10 ms, < 1% @ 100 msn High sampling rates possible

n Low overhead SB APIn export/import: < 30 usn extract + reinstate a 10 KB front-end SB: 358 us

Feb 12, 2004 COANS Seminar Spring 2004 38

Failure-free Overhead

0

1,000

2,000

3,000

4,000

5,000

6,000

7,000

8,000

20 100 300 500 700 900 1,100

Clients

Req

uests

/min

Base

Recoverable FE

Recoverable FE+MT

Feb 12, 2004 COANS Seminar Spring 2004 39

Failover Correctness

n Workload/run: 600 requests from 200 clientsn request = DB queries + DB table update

n Two correctness tests across crash & recoveryn End-to-end consistency (crash invisible to client)n Database integrity (exactly-once semantics preserved)

n All crash-test runs were validated

Feb 12, 2004 COANS Seminar Spring 2004 40

Failover Throughput (FE+MT crash)

Feb 12, 2004 COANS Seminar Spring 2004 41

Failover Latency (FE+MT crash)

Feb 12, 2004 COANS Seminar Spring 2004 42

Related Work

n DEC WRL Titan system [Mogul ’86]n Recovery Box [Baker ‘93]n Rio reliable file cache [Chen ‘96]n Online OS reconfiguration [Soules ‘03]n Virtual machines [Bressoud ‘95, Dunlap ‘02]n Automatic repair of data structures [Demski ‘03]

Feb 12, 2004 COANS Seminar Spring 2004 43

Conclusions

n Backdoor: system architecture for nonintrusiveremote healingn monitoring without using processor cyclesn repair, recovery even when remote processor is not

available

n BD prototype for transparent recovery of activeservice sessions in cluster-based Internetservices

Feb 12, 2004 COANS Seminar Spring 2004 44

Current and Future Work

n Remote repair of OS staten OS support and API for healing-conscious

applicationsn programmer performs application-specific monitoring,

repair and recovery

n Securing the BDn low-level access control through BD Guard entities

implemented in firmware

n Remote control of I/O devices

Feb 12, 2004 COANS Seminar Spring 2004 45

The People Behind Backdoors

n Aniruddha Bohran Stephen Smaldonen Yufei Pann Iulian Neamtiu (Maryland)n Pascal Gallard (IRISA/INRIA)n Liviu Iftode

Feb 12, 2004 COANS Seminar Spring 2004 46

Thank you!

http://discolab.rutgers.edu/bda