johns hopkins & purdue 1 approved for public release, distribution unlimited scalability,...
TRANSCRIPT
Johns Hopkins & Purdue 1Approved for Public Release, Distribution Unlimited
Scalability, Accountability and Instant Information Access for
Network Centric Warfare
Department of Computer ScienceJohns Hopkins University
Yair Amir (PI), Claudiu Danilov, John Lane, Jonathan Shapiro, Ciprian Tutu
Cristina Nita Rotaru
Department of Computer SciencesPurdue University
http://www.cnds.jhu.edu
Johns Hopkins & Purdue 2Approved for Public Release, Distribution Unlimited
Network Centric Warfare Environments
• Wide area network settings.– C3I systems usually span large geographical
distances.– Communication between sites is conducted
over unreliable channels.
• Timely decisions based on available information.
• Required update semantics are not general in many cases
• Critical information is often not large.• Source uniqueness.
Johns Hopkins & Purdue 3Approved for Public Release, Distribution Unlimited
Network Centric Warfare Environments
• Wide area network settings.• Timely decisions based on available
information.– Intermittent network connectivity
• Results in high latency for propagation and for consistent replication of updates.
– Decisions may have to be made promptly.• Based on the best currently available information.
• Required update semantics are not general in many cases
• Critical information is often not large.• Source uniqueness.
Johns Hopkins & Purdue 4Approved for Public Release, Distribution Unlimited
Network Centric Warfare Environments
• Wide area network settings.• Timely decisions based on available
information.• Required update semantics are not general in
many cases.– Weaker update semantics may suffice.– Common operation picture:
• Commutative update semantics.• Timestamp resolution (most recent update wins).
• Critical information is often not large.• Source uniqueness.
Johns Hopkins & Purdue 5Approved for Public Release, Distribution Unlimited
Network Centric Warfare Environments
• Wide area network settings.• Timely decisions based on available
information.• Required update semantics are not general in
many cases• Critical information is often not large.
– Compared with current hardware capabilities.• Location of friendly forces and enemy forces.• A few plans.
– Allows storing all updates throughout the duration of engagement (several months).
• Source uniqueness.
Johns Hopkins & Purdue 6Approved for Public Release, Distribution Unlimited
Network Centric Warfare Environments
• Wide area network settings.• Timely decisions based on available
information.• Required update semantics are not general in
many cases• Critical information is often not large.• Source uniqueness.
– Every input (update) is initiated by one unique source.
Johns Hopkins & Purdue 7Approved for Public Release, Distribution Unlimited
Network Centric Warfare Environments
• Wide area network settings.• Timely decisions based on available
information.• Required update semantics are not general in
many cases• Critical information is often not large.• Source uniqueness.
Johns Hopkins & Purdue 8Approved for Public Release, Distribution Unlimited
Malicious Insider Threats
• The insider attack has traditionally been a primary threat to computer systems. ( http://csrc.nist.gov ).
• The explosion of the Internet made things worse: Insiders commit about 80% of all computer and Internet related crime (www.intergov.org) and CSI/FBI 2003 Computer Crime and Security Survey.
• Insiders: participants with legitimate access or those that bypassed the protection mechanisms and exhibit arbitrary (malicious) behavior.
Johns Hopkins & Purdue 9Approved for Public Release, Distribution Unlimited
Dealing with Insider Threats
• Detection: use intrusion detection systems; however, they are not perfect (high false positives rate).
• Prevention: use access control, firewalls, proactive security; but vulnerabilities still exist (OS bugs, buffer overflow, cover channels, etc).
• Mitigation (tolerate/cope): use mechanisms that provide service to correct participants while under attack, even if several participants are compromised.
• The above methods do not exclude each other.
Johns Hopkins & Purdue 10Approved for Public Release, Distribution Unlimited
Outline• Network centric warfare environments.• Peer Byzantine replication limitations.• Research approach.
– Scaling wide area intrusion tolerance replication via hierarchy
• Local Byzantine replication within sites.• Fault tolerant replication on the wide area.
– Client accountability.• Accountability graph.• Snapshots for fast regenerations.
– Exploiting application semantics.
• Next steps.• Technology transitioning.• Summary.
Johns Hopkins & Purdue 11Approved for Public Release, Distribution Unlimited
A Distributed Systems Service
• Message-passing system.
• Clients issue requests to servers, then wait for answers.
• Replicated servers process the request, then provide answers to clients.
Server
Replicas 1 o o o2 3 3f+1
Clients
A site
Johns Hopkins & Purdue 12Approved for Public Release, Distribution Unlimited
State Machine Replication
• Requests must be ordered in a consistent manner by all servers.
• Usually one server manages the ordering process based on information from the other participants, then informs everybody about what was decided.
• If the leader dies, a new leader must be selected to ensure progress.
• Benign faults: Paxos [Lam98,Lam01]: must contact f+1 out of 2f+1 servers and uses 2 rounds to allow consistent progress.
• Byzantine faults: BFT [CL99]: must contact 2f+1 out of 3f+1 servers and uses 3 rounds to allow consistent progress.
Johns Hopkins & Purdue 13Approved for Public Release, Distribution Unlimited
A Replicated Server System
• Maintaining consistent servers [Sch90] :– To tolerate f benign faults, 2f+1
servers are needed.– To tolerate f malicious faults: 3f+1
servers are needed.
• Responding to read-only clients’ request [Sch90] :– If the servers support only benign
faults: 1 answer is enough.– If the servers can be malicious: the
client must wait for f +1 identical answers, f being the number of malicious servers.
Johns Hopkins & Purdue 14Approved for Public Release, Distribution Unlimited
Peer Byzantine Replication Limitations
• Limited scalability due to multiple all-peer exchange.– 3-round all-peer exchange.
• Very costly on high latency wide area links.• Not very scalable.
• Strong connectivity is required.• Construct consistent total order.• Focus is solely on replica protection.
Johns Hopkins & Purdue 15Approved for Public Release, Distribution Unlimited
Peer Byzantine Replication Limitations
• Limited scalability due to multiple all-peer exchange.
• Strong connectivity is required.– 2f+1 (out of 3f+1) to allow progress and f+1 to
get an answer.• Partitions are a real issue.• Clients depend on remote information.
– Bad news: Provably optimal.• We need to pay something to get something else.
• Construct consistent total order.• Focus is solely on replica protection.
Johns Hopkins & Purdue 16Approved for Public Release, Distribution Unlimited
Peer Byzantine Replication Limitations
• Limited scalability due to multiple all-peer exchange.
• Strong connectivity is required.• Construct consistent total order.
– Agreement is achieved on the order of updates before applying them.
• Very useful - supports general update semantics.• Maybe sub-optimal for C3I applications that need only
commutative semantics.
• Focus is solely on replica protection.
Johns Hopkins & Purdue 17Approved for Public Release, Distribution Unlimited
Peer Byzantine Replication Limitations
• Limited scalability due to multiple all-peer exchange.
• Strong connectivity is required.• Construct consistent total order.• Focus is solely on replica protection.
– Compromised clients can inject wrong (though valid) input through authorized channels.
• Wrong input will be consistently replicated to all servers.
Johns Hopkins & Purdue 18Approved for Public Release, Distribution Unlimited
Local Byzantine Replication Within a Site
• No trust between participants in a site– A site acts as one unit that can only crash if the
assumptions are met.
• How to make sure that one server can not manipulate the order?– Threshold cryptography seems a good
direction.
• Use BFT-like [CL99, YMVAD03] protocols and threshold cryptography to guarantee that any valid message leaving the site is correct.
Johns Hopkins & Purdue 19Approved for Public Release, Distribution Unlimited
Fault Tolerant Replication Engine
RegPrim
TransPrim
ExchangeStates
NonPrim
Construct
Trans Memb
ExchangeMessagesUn No
Last CPCLast
CPC
LastState
PossiblePrim
No Primor
Trans Memb
Recover
Trans Memb
Reg MembReg MembTrans Memb
Reg Memb
Reg MembUpdate
update (Red)Update (Yellow)Update (Green)
1a 1b ? 0
[AT02]
Johns Hopkins & Purdue 20Approved for Public Release, Distribution Unlimited
Fault Tolerant Experiments over Wide-Area Network
• A real experimental network (CAIRN). • Was also modeled in the Emulab facility.
ISIPC
ISIPC4
TISWPC
ISEPC3
ISEPC
UDELPC
MITPC
38.8 ms1.86Mbits/sec
1.4 ms1.47Mbits/sec
4.9 ms9.81Mbits/sec
3.6 ms1.42Mbits/sec
100 Mb/s< 1ms
100 Mb/s<1ms
Virginia
Delaware
Boston
San Jose
Los Angeles
Johns Hopkins & Purdue 21Approved for Public Release, Distribution Unlimited
Throughput Comparison (WAN)
050
100150
200250
300350
400
0 14 28 42 56 70 84 98 112 126 140
number of clients (7 replicas on wide area)
upda
te t
rans
actio
ns /
sec
ond
FT Replication Engine Upper bound 2PC
[ADMST02]
Johns Hopkins & Purdue 22Approved for Public Release, Distribution Unlimited
Hierarchical Architecture
• Each site acts as a logical unit that can crash.• Fault-tolerant protocols between sites.
Server
Replicas 1 o o o2 3 3f+1
ClientsA site
Johns Hopkins & Purdue 23Approved for Public Release, Distribution Unlimited
Hierarchical Architecture Details
ByzantineReplication
Fault TolerantReplication
OverSecure Spread
Server Replica 1
Wide area representative
ByzantineReplication
Fault TolerantReplication
OverSecure Spread
Server Replica 2
Wide area standby
ByzantineReplication
Fault TolerantReplication
OverSecure Spread
Server Replica 3f+1
Wide area standby
o o o
Wide area network
Local area network
Local SiteClients
ByzantineReplication
Mon
itorFault Tolerant
ReplicationOver
Secure Spread
Server Replica 1
Wide area representative
ByzantineReplication
Mon
itorFault Tolerant
ReplicationOver
Secure Spread
Server Replica 2
Wide area standby
ByzantineReplication
Fault TolerantReplication
OverSecure Spread
Server Replica 3f+1
Wide area standby
o o o
Wide area network
Local area network
Local SiteClients
Mon
itor
Johns Hopkins & Purdue 24Approved for Public Release, Distribution Unlimited
Payment & Potential Gain• Protects against f Byzantine faults in each
site for the price of having 3f+1 replicas in every site.
• Box numbers / a total site compromise.
• Read queries are limited to the local site.
• On a network with diameter of 50 ms.– It takes at least 300 milliseconds to
complete 3 wide area round trips used by peer Byzantine replication methods.
– FT Replication engine was shown to be achieve 5 times the performance of 2PC.
• Goal– > factor of 3 compared with a peer system.
Johns Hopkins & Purdue 25Approved for Public Release, Distribution Unlimited
Alternative Scalable Architecture
• Use physical trusted nodes assumed to be working under a weaker adversary: can crash and recover, but can not be compromised.
• Take advantage of the trusted nodes to run an optimized Byzantine replication algorithm, potentially reducing the number of rounds.
• Use protocols where communication over WAN only take place between trusted nodes, thus avoiding high-latency.
• Similar approaches: [CLNV02, Ver03, SurS03]
Johns Hopkins & Purdue 26Approved for Public Release, Distribution Unlimited
What About Corrupted Clients?
• We can not detect corrupted clients without external information (can take advantage of detection mechanisms).
• Can we bring the system to a “clean” state if we have external information about compromised clients?
• Proposed solution: accountability graph.
A -DAG
Johns Hopkins & Purdue 27Approved for Public Release, Distribution Unlimited
Client Accountability Graph
Client Update
Tim
e
• A direct acyclic graph of updates.
• Each update links to previous updates modifying data it read (causal predecessors).
Johns Hopkins & Purdue 28Approved for Public Release, Distribution Unlimited
Client Accountability Graph
X
Clean update Corrupted update Suspicious update
Tim
e
Limits adversary power:• Adversary can inject
updates only as a compromised client.
• Once a compromised network avoids delivering an update, it cannot deliver causally following updates.
Useful for risk assessment.
Johns Hopkins & Purdue 29Approved for Public Release, Distribution Unlimited
Enabling Fast Regeneration Using Snapshots
X
Most recent snapshot
Clean update Corrupted update Suspicious update
Tim
e
Periodic snapshots limit state regeneration calculation.
For our application domain, it seems feasible to maintain continuous information of a long period of time
Johns Hopkins & Purdue 30Approved for Public Release, Distribution Unlimited
Overall Architecture
ByzantineReplication
Fault TolerantReplication
OverSecure Spread
A-DAG
Server Replica 1
Wide area representative
ByzantineReplication
Fault TolerantReplication
OverSecure Spread
A-DAG
Server Replica 2
Wide area standby
ByzantineReplication
Fault TolerantReplication
OverSecure Spread
A-DAG
Server Replica 3f+1
Wide area standby
o o o
Wide area network
Local area network
Local SiteClients
ByzantineReplication
Mon
itorFault Tolerant
ReplicationOver
Secure Spread
A-DAG
Server Replica 1
Wide area representative
ByzantineReplication
Mon
itorFault Tolerant
ReplicationOver
Secure Spread
A-DAG
Server Replica 2
Wide area standby
ByzantineReplication
Fault TolerantReplication
OverSecure Spread
A-DAG
Server Replica 3f+1
Wide area standby
o o o
Wide area network
Local area network
Local SiteClients
Mon
itor
Johns Hopkins & Purdue 31Approved for Public Release, Distribution Unlimited
Risks and Challenges• Interface the Byzantine-tolerant replication and
Fault-tolerant replication components. • Investigate the impact of threshold digital
signatures on performance and complexity. • Interface Byzantine-tolerant replication with the
client accountability graph.• Use of application semantics to optimize
protocols. • Design optimizations to make the cost of the
architecture very small when no faults occur. • Take into account confidentiality under
corrupted servers model.
Johns Hopkins & Purdue 32Approved for Public Release, Distribution Unlimited
Impact
New ideas
Scalability, Accountability and Instant Information Access forNetwork-Centric Warfare
ScheduleResulting systems with at least 3 times higher throughput, lower latency and high availability for updates over wide area networks. Clear path for technology transitions intoMilitary C3I systems.
http://www.cnds.jhu.edu/funding/srs/
June 04
Dec 04
June05
Dec 05
C3I model, baseline and demo
Componentanalysis & design
ComponentImplement.
System integration & evaluation
Final C3I demoand baseline eval
First scalable wide-area intrusion-tolerant replication architecture.
Providing accountability for authorized but malicious client updates.
Exploiting update semantics to provide instant and consistent information access.
Comp.eval.