Johns Hopkins & Purdue 128 Jan 05
Scalability, Accountability and Instant Information Access for
Network Centric Warfare
Department of Computer ScienceJohns Hopkins University
Yair Amir, Claudiu Danilov, Jon Kirsch, John Lane, Jonathan Shapiro, Ciprian Tutu
Chi-Bun Chan, Cristina Nita-Rotaru, David Zage
Department of Computer SciencePurdue University
http://www.cnds.jhu.edu
Johns Hopkins & Purdue 228 Jan 05
Network Centric Warfare Applications
• Operate on wide area network settings, communication is often conducted over unreliable channels.
• Require timely decisions based on available information, although information may not be ‘the most recent and consistent’ because of intermittent network connectivity.
• Weaker update semantics may be sufficient for several applications.
• Critical information is often not large.
• Every piece of information is usually generated by a unique source.
Johns Hopkins & Purdue 328 Jan 05
Network Centric Warfare Applications
• Operate on wide area network settings, communication is often conducted over unreliable channels.
• Require timely decisions based on available information, although information may not be ‘the most recent and consistent’ because of intermittent network connectivity.
• Weaker update semantics may be sufficient for several applications.
• Critical information is often not large.
• Every piece of information is usually generated by a unique source.
Fits many non-military applications as well
Johns Hopkins & Purdue 428 Jan 05
Dealing with Insider Threats
Project goals:
• Scaling survivable replication to wide area networks.
– Performance, performance, performance.
• Dealing with malicious clients.
– Compromised clients can inject authenticated but incorrect data.
– Maybe hard to detect on the fly.
– Malicious or just an honest error? Can be useful for both.
• Exploiting application update semantics for replication speedup in malicious environments.
– Will not be discussed today.
Johns Hopkins & Purdue 528 Jan 05
A Distributed Systems Service Model
• Message-passing system.
• Clients issue requests to servers, then wait for answers.
• Replicated servers process the request, then provide answers to clients.
Server
Replicas 1 o o o2 3 N
Clients
A site
Johns Hopkins & Purdue 628 Jan 05
Outline• Introduction.• Client Accountability.
– Concept.– Performance viability.– Applications.
• Scaling wide area intrusion tolerance replication.– BFT – current state of the art. – A new hierarchical approach.– Constructing a trusted entity in the local site.
• Threshold cryptography based approach.
• Integration.• Summary.
Johns Hopkins & Purdue 728 Jan 05
Compromised Clients
• Hard Problem: Compromised clients can inject authenticated but incorrect data into the system, misleading honest clients.– Authentication and access control not sufficient.– An almost ignored problem.
• A new goal: Generic tools for accountability enforcement.– Causality tracking of updates and dependencies
to facilitate instant analysis and regeneration of a clean state once corrupt data is flagged.
– While not inventing the wheel: detecting corrupt updates via external intrusion detection, application-specific knowledge, or human in the loop.
Johns Hopkins & Purdue 828 Jan 05
Real-time Accountability Graph
• Solution:
Accountability Graph
Accountability enforcement and causality tracking of updates and dependencies in a Direct Acyclic Graph with periodic snapshots.
Also called A-DAG
C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8
C o rru p t u p d a teC lean u p d a te
Tim
e
A -DAGA -DAG
Johns Hopkins & Purdue 928 Jan 05
Real-time Accountability Graph in Action
• Marking: Upon detection of incorrect data, trace it to the corrupt update and from that, mark all causally-dependent updates as corrupted or suspected.
• Regeneration: Real-time State regeneration based on last good snapshot and non-corrupted or non-suspected updates.
• Also useful for online what-if scenarios and for offline damage assessment.
C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8
C o rru p t u p d a te S u sp ic io u s u p d a teC lean u p d a te
Tim
e
Johns Hopkins & Purdue 1028 Jan 05
Real-time Accountability Graph Optimization
• Premise: Non-compromised clients can be trusted with update dependency reporting.
• Result: Dramatically reduce false-positive rate of suspicious updates by eliminating FIFO links of non-compromised clients.
• Performance: Ability to track and traverse millions of updates within a couple of seconds.
C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8
C o rru p t u p d a te S u sp ic io u s u p d a teC lean u p d a te
Tim
e
Johns Hopkins & Purdue 1128 Jan 05
A-DAG Performance• Quick enough for many applications
Traversal time as a function of number of updatesAccountability Graph data structure
0
0.5
1
1.5
2
2.5
3
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
Updates (millions)
Tim
e (s
ec)
Johns Hopkins & Purdue 1228 Jan 05
A-DAG Performance (cont.)• Quick enough for many applications
Traversal Time as a function of the number of clientskeeping 4,000,000 updates as a constant.
Accountability Graph data structure
0
0.5
1
1.5
2
2.5
3
3.5
0 20000 40000 60000 80000 100000 120000
Number of clients
Tim
e (s
ec)
Johns Hopkins & Purdue 1328 Jan 05
A-DAG Performance (Cont.)• Quick enough for many applications
Traversal time as a function of num. of dependencies for each update,Keeping 4,000,000 updates as a constant and with 100,000 clients.
Accountability Graph data structure
0
1
2
3
4
5
6
7
8
9
0 5 10 15 20 25 30 35
Number of dependencies
Tim
e (s
ec)
Johns Hopkins & Purdue 1628 Jan 05
Application to Open SourceSoftware Development
• Hard Problem: Vulnerability to life cycle attacks.
• A New Goal: – Quick analysis of impact of discovered life-cycle
vulnerability or vulnerabilities.– Insight into where to invest limited resources to
monitor against future life cycle attacks.
• Actually applicable to any software (not just open source).
Johns Hopkins & Purdue 1728 Jan 05
Capability Dependencies in Red Hat Linux (1997-2004)
Cummulative Distribution Function of Dependent Capabilities
0
20
40
60
80
100
0 10000 20000 30000 40000 50000
Number of Dependent Capabilities
Cap
abili
ties
(%
)
15 distributions: Red Hat 4.1 to Fedora 2.
Johns Hopkins & Purdue 1828 Jan 05
Capability Dependencies in Red Hat Linux - Zooming In
Cummulative Distribution Function of Dependent Capabilities
0
20
40
60
80
100
0 20 40 60 80 100
Number of Dependent Capabilities
Cap
abili
ties
(%
)
15 distributions: Red Hat 4.1 to Fedora 2.
Johns Hopkins & Purdue 1928 Jan 05
Outline• Introduction.• Client Accountability.
– Concept.– Performance viability.– Applications.
• Scaling wide area intrusion tolerance replication– BFT – current state of the art. – A new hierarchical approach.– Constructing a trusted entity in the local site.
• Threshold cryptography based approach.
• Integration.• Summary.
Johns Hopkins & Purdue 2028 Jan 05
State Machine Replication
• Main Challenge: Ensuring coordination between servers.– Requires agreement on the request to be
processed and consistent order of requests.
• Benign faults: Paxos [Lam98,Lam01]: must contact f+1 out of 2f+1 servers and uses 2 rounds to allow consistent progress.
• Byzantine faults: BFT [CL99]: must contact 2f+1 out of 3f+1 servers and uses 3 rounds to allow consistent progress.
Johns Hopkins & Purdue 2128 Jan 05
A Replicated Server System
• Maintaining consistent servers [Sch90] :– To tolerate f benign faults, 2f+1
servers are needed.– To tolerate f malicious faults: 3f+1
servers are needed.
• Responding to read-only clients’ request [Sch90] :– If the servers support only benign
faults: 1 answer is enough.– If the servers can be malicious: the
client must wait for f +1 identical answers, f being the number of malicious servers.
Johns Hopkins & Purdue 2228 Jan 05
Peer Byzantine Replication Limitations
• Limited scalability due to 3 round all-peer exchange.
• Strong connectivity is required.– 2f+1 (out of 3f+1) to allow progress and f+1 to
get an answer.• Partitions are a real issue.• Clients depend on remote information.
– Bad news: Provably optimal.• We need to pay something to get something else.
• Construct consistent total order.• Focus is solely on replica protection.
Johns Hopkins & Purdue 2328 Jan 05
Peer Byzantine ReplicationBFT [CL99]
State of the Art in Byzantine Replication
Johns Hopkins & Purdue 2428 Jan 05
Symmetric Wide Area Network
• Synthetic network used for analysis and understanding.
• 5 sites, each of which connected to all other sites with equal latency links.
• Each site has 4 replicas (except one site with 3 replicas due to current BFT setup).
• Total – 19 replicas in the system.
• Each wide area link has a 10Mbits/sec capacity.
• Varied wide area latencies between 10ms - 400ms.
Johns Hopkins & Purdue 2528 Jan 05
Practical Wide-Area Network
• A real experimental network (CAIRN). • Was modeled in the Emulab facility.• Capacity of wide area links was modified to be 10Mbits/sec to better
reflect current realities.
ISIPC
ISIPC4
TISWPC
ISEPC3
ISEPC
UDELPC
MITPC
38.8 ms1.86Mbits/sec
1.4 ms1.47Mbits/sec
4.9 ms9.81Mbits/sec
3.6 ms1.42Mbits/sec
100 Mb/s< 1ms
100 Mb/s<1ms
Virginia
Delaware
Boston
San Jose
Los Angeles
Johns Hopkins & Purdue 2628 Jan 05
BFT Wide Area Performance
• Almost out of the box BFT, which is a very good prototype software.
• 19 Replicas. • Does not write to disk.
Update Latency as a Function of Network Diameter (Symmetric Topology)
0
500
1000
1500
2000
2500
3000
0 100 200 300 400
Network Diameter (ms)
Update
Late
ncy (
ms)
5 Clients1 Client
Johns Hopkins & Purdue 2728 Jan 05
BFT Wide Area Performance (Cont).
• Almost out of the box BFT, which is a very good prototype software.
• 19 Replicas.• Does not write to disk.
Update Latency as a Function of Network Diameter (CAIRN Topology)
0
200
400
600
800
1000
1200
1400
0 1 2 3 4 5
CAIRN Network Multiple( Original CAIRN network has a diameter of45ms)
Update
Late
ncy (
ms)
5 Clients4 Clients3 Clients2 Clients1 Client
Johns Hopkins & Purdue 2828 Jan 05
BFT Wide Area Performance (cont.)
• Note: A 50ms Symmetric network vs. a Native CAIRN network.
Throughput as a Function of the Number of Clients
0
1
2
3
4
5
6
7
8
9
0 1 2 3 4 5 6
Number of Clients
Thro
ughput
(Update
s p
er
second)
CAIRN (45ms diam)
Symmetric (50ms diam)
Johns Hopkins & Purdue 2928 Jan 05
Outline• Introduction.• Client Accountability.
– Concept.– Performance viability.– Applications.
• Scaling wide area intrusion tolerance replication– BFT – current state of the art. – A new hierarchical approach.– Constructing a trusted entity in the local site.
• Threshold cryptography based approach.
• Integration.• Summary.
Johns Hopkins & Purdue 3028 Jan 05
A New Approach:Hierarchical Architecture
• Each site acts as a trusted logical unit that can crash or partition.
• Between sites:– Fault-tolerant protocols between sites.– Alternatively – Byzantine protocols also between sites.
• There is no free lunch – we pay with more hardware…
Server
Replicas 1 o o o2 3 3f+1
ClientsA site
Johns Hopkins & Purdue 3128 Jan 05
Constructing a Trusted Entity in the Local Site
• No trust between participants in a site– A site acts as one unit that can only crash if the
assumptions are met.
• Initial idea: – Use BFT-like [CL99, YMVAD03] protocols to
mask local Byzantine replicas.
• How to make sure that local Byzantine replicas cannot misrepresent the site on the wide area network?– Threshold cryptography seems a good
direction.– Also appealing in terms of management.
Johns Hopkins & Purdue 3228 Jan 05
Hierarchical Architecture Details
Local AreaByzantine
Replication
Mon
itorWide Area
Fault TolerantReplication
Server Replica 1
Wide area representative
Local AreaByzantine
Replication
Mon
itorWide Area
Fault TolerantReplication
Server Replica 2
Wide area standby
Local AreaByzantine
Replication
Mon
itorWide Area
Fault TolerantReplication
Server Replica 3f+1
Wide area standby
o o o
Wide area network
Local area network
Local SiteClients
Johns Hopkins & Purdue 3328 Jan 05
BFT As a Potential Building Block
• Fault model: – Less than third can have faults of any kind, including benign
faults - not practical in our opinion.– Will need to write to disk to protect against partial amnesia in
benign faults.
• Consequence:– Current numbers underestimate baseline cost.– Real latency will be higher due to disk writes in each round.
• In Addition:– Very good implementation to demonstrate the concept.– But not as a building block for us going forward (some
stability and robustness issues).
• Can be solved with a new implementation.
Johns Hopkins & Purdue 3428 Jan 05
BFT Local Area Performance
• BFT in a local area network, 19 replicas, no disk writes.
Throughput as a Function of the Number of Clients (100 Mbits LAN)
0
20
40
60
80
100
120
140
160
180
0 2 4 6 8 10 12
Number of Clients
Th
rou
gh
pu
t (U
pdate
s p
er s
econ
d)
Johns Hopkins & Purdue 3528 Jan 05
BFT Local Area Performance
• BFT in a local area network, 19 replicas, no disk writes.
Update Latency as a Function of the Number of Clients (100 Mbits LAN)
0
10
20
30
40
50
60
70
0 2 4 6 8 10 12
Number of Clients
Update
Late
ncy (
ms)
Johns Hopkins & Purdue 3628 Jan 05
Outline• Introduction.• Client Accountability.
– Concept.– Performance viability.– Applications.
• Scaling wide area intrusion tolerance replication– BFT – current state of the art. – A new hierarchical approach.– Constructing a trusted entity in the local site.
• Threshold cryptography based approach.
• Integration.• Summary.
Johns Hopkins & Purdue 4028 Jan 05
Threshold Digital Signatures
Problem: N entities authenticate a message by generating one signature s.t. any k entities can create a valid signature, but k-1 cannot.
Solution: Threshold digital signatures (no party knows the secret key, only its individual share).
Issues with threshold digital signatures: Trusted dealer / decentralized. An insider can submit a ‘bad’ share, requires verifiable secret sharing, but when to do it? Highly interactive in key generation and share verification. Size of the signature increases linearly with the number of players.
Johns Hopkins & Purdue 4128 Jan 05
RSA Threshold Digital Signatures
Our choice is the RSA threshold signature proposed by Shoup in 1999 (Practical Threshold Signatures, Eurocrypt 2000).
Provides verifiable secret sharing. Size of signature bounded by a constant multiple of n,
where n is the RSA modulus. Security proof in the random oracle model. Signature share generation and verification completely
non-interactive. Accepts schemes where the number of required shares
can be greater than f+1 (fits the agreement problem). Used by other projects: COCA (Cornell), SINTRA (IBM
Zurich).
Johns Hopkins & Purdue 4428 Jan 05
A trusted dealer generates the public (e,n) and private (d) RSA keys and then splits the private key d to N shares, s.t any k out of N are enough to reconstruct the secret.
Select randomly a polynomial with a k-1 degree (as in Shamir’s secret sharing).
The dealer computes individual shares si. Dealer creates verification proof (involves modular
exponentiation). More expensive than regular RSA, requires also safe
primes.
Threshold RSA: Share Generation
Johns Hopkins & Purdue 4528 Jan 05
Threshold RSA: Generating a Threshold Signature
Each entity owns a share si - Computes its individual signature and a proof of
correctness (based on individual shares and verification proof).
- Sends the individual signatures and the proof of correctness of the signature to the combiner.
The combiner - Collects all individual signatures.- Verifies that they were generated using the shares from the
initial secret that was split (using the proof of correctness)- Generates the threshold signature.
Much more expensive than one regular RSA signature
Johns Hopkins & Purdue 4628 Jan 05
Threshold RSA: Verifying the Threshold Signature
Anybody can verify the signature based on the public key (remember that only the private key was split).
Computation cost similar with to a single regular RSA digital signature verification.
Consequence for us: Remote sites only need one
public key per site.
Johns Hopkins & Purdue 5028 Jan 05
Threshold Cryptography Library
We implemented a library providing support for generating Threshold RSA signatures.
Implementation uses OpenSSL. Can be used by any application requiring
threshold digital signatures. Used to get performance results. Our plans are for the library to be released as
open source when we are happy with it.
Johns Hopkins & Purdue 5128 Jan 05
Testing Environment
Platform: i686 Intel(R) Pentium(R) 4 CPU 2.6 MHz GenuineIntel GNU/Linux, 256 Mb Ram
Library relies on Openssl :- Used OpenSSL 0.9.7d 17 Mar 2004.
Baseline operations:- RSA 1024-bits sign: 4.8 ms, verify 0.2 ms- Perform modular exponentiation 1024 bits,
3 ms.- Generate a 1024 bits RSA key 160ms.
Johns Hopkins & Purdue 5228 Jan 05
Threshold RSA: Key Generation
Generate Threshold RSA Key (1024 bits)
0
1
2
3
4
5
6
7
1 2 3 4 5
f (N=3f+1, k=f+1)
Tim
e (
secon
ds)
Generate (N,k)Threshold RSAKey
Generate N RSAkeys
Johns Hopkins & Purdue 5328 Jan 05
Threshold RSA: Signing
Generate Threshold RSA Signature (1024 bits)
0
50
100
150
200
250
1 2 3 4 5
f (N=3f+1, k=f+1)
Tim
e (
millisecon
ds)
GenerateThreshold RSAPartial Signature
Combine k=f+1Partial Signatureswith proof-verification
Generate aThreshold RSASignature
Generate f+1RSA Signatures
Johns Hopkins & Purdue 5428 Jan 05
When Do We Need Verifiable Secret Sharing?
Optimistic case: The combiner can check that the signature is correct by using the public key. Proof for correctness and share verification are not needed in such a case, while maintaining all cryptography guarantees.
Malicious case: Signature does not verify: Detect which share(s) are incorrect: Use verifiable
secret sharing. Requires proof for correctness and share verification.
Potentially create a correct threshold signature by using other shares than the ones that were incorrect.
Overall scheme may add a signature on original share (instead of the proof).
Johns Hopkins & Purdue 5528 Jan 05
Threshold RSA: Signing (cont.)
Generate Threshold RSA Signature (1024 bits)
0
50
100
150
200
250
1 2 3 4 5
f (N=3f+1, k=f+1)
Tim
e (
mil
liseco
nd
s)
Combine k=f+1 PartialSignatures with proof-verification
Generate a Threshold RSASignature with handlingproofs
Generate Threshold RSAPartial Signature withoutProof
Combine k=f+1 PartialSignatures without proof-verification
Generate a Threshold RSASignature without handlingproofs
Generate f+1 RSASignatures
Johns Hopkins & Purdue 5628 Jan 05
Threshold RSA: Signing (Cont.)
Generate Threshold RSA Signature (1024 bits)
0
5
10
15
20
25
30
35
1 2 3 4 5
f (N=3f+1, k=f+1)
Tim
e (
millisecon
ds)
GenerateThreshold RSAPartial Signaturewithout Proof
Combine k=f+1Partial Signatureswithout proof-verification
Generate aThreshold RSASignaturewithout handlingproofsGenerate f+1RSA Signatures
Johns Hopkins & Purdue 5728 Jan 05
Threshold RSA: VerifyingThreshold RSA Verification (1024 bits)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1 2 3 4 5
f (N=3f+1, k=f+1)
Tim
e (
millisecond
s)
Verify (N, k)Threshold RSASignatureVerify f+1 RSASignatures
Johns Hopkins & Purdue 5828 Jan 05
Threshold Cryptography as a Building Block
• Compared with f+1 regular RSA signatures– Better than vector RSA if used inside a more sophisticated
protocol.
• Issues to consider:– Rate of malicious behavior, easier management, message
size overhead, computation overhead.
• Current thinking:– May be used beyond initial goal.
• Some of its properties can help construct an overall better, single protocol, compared with the BFT and Threshold crypto combination.
Johns Hopkins & Purdue 5928 Jan 05
Outline• Introduction.• Client Accountability.
– Concept.– Performance viability.– Applications.
• Scaling wide area intrusion tolerance replication– BFT – current state of the art. – A new hierarchical approach.– Constructing a trusted entity in the local site.
• Threshold cryptography based approach.
• Integration.• Summary.
Johns Hopkins & Purdue 6028 Jan 05
Overall Architecture
Local AreaByzantine
Replication
Mon
itorWide Area
Fault TolerantReplication
A-DAG
Server Replica 1
Wide area representative
Local AreaByzantine
Replication
Mon
itorWide Area
Fault TolerantReplication
A-DAG
Server Replica 2
Wide area standby
Local AreaByzantine
Replication
Mon
itorWide Area
Fault TolerantReplication
A-DAG
Server Replica 3f+1
Wide area standby
o o o
Wide area network
Local area network
Local SiteClients
Johns Hopkins & Purdue 6128 Jan 05
Impact
New ideas
Scalability, Accountability and Instant Information Access forNetwork-Centric Warfare
ScheduleResulting systems with at least 3 times higher throughput, lower latency and high availability for updates over wide area networks. Clear path for technology transitions intoMilitary C3I systems such as the Army Future Combat System.
http://www.cnds.jhu.edu/funding/srs/
June 04
Dec 04
June05
Dec 05
C3I model, baseline and demo
Componentanalysis & design
ComponentImplement.
System integration & evaluation
Final C3I demoand baseline eval
First scalable wide-area intrusion-tolerant replication architecture.
Providing accountability for authorized but malicious client updates.
Exploiting update semantics to provide instant and consistent information access.
Comp.eval.