hyperledger fabric: a distributed operating system for ... fabric: a distributed operating system...

15
Hyperledger Fabric: A Distributed Operating System for Permissioned Blockchains Elli Androulaki Artem Barger Vita Bortnikov Christian Cachin Konstantinos Christidis Angelo De Caro David Enyeart Christopher Ferris Gennady Laventman Yacov Manevich Srinivasan Muralidharan * Chet Murthy Binh Nguyen * Manish Sethi Gari Singh Keith Smith Alessandro Sorniotti Chrysoula Stathakopoulou Marko Vukoli´ c Sharon Weed Cocco Jason Yellick IBM Abstract Hyperledger Fabric is a modular and extensible open-source system for deploying and operating permissioned block- chains. Fabric is currently used in more than 400 prototypes and proofs-of-concept of distributed ledger technology, as well as several production systems, across different indus- tries and use cases. Starting from the premise that there are no “one-size- fits-all” solutions, Fabric is the first truly extensible block- chain system for running distributed applications. It supports modular consensus protocols, which allows the system to be tailored to particular use cases and trust models. Fab- ric is also the first blockchain system that runs distributed applications written in general-purpose programming lan- guages, without systemic dependency on a native cryptocur- rency. This stands in sharp contrast to existing blockchain platforms for running smart contracts that require code to be written in domain-specific languages or rely on a cryp- tocurrency. Furthermore, it uses a portable notion of mem- bership for realizing the permissioned model, which may be integrated with industry-standard identity management. To support such flexibility, Fabric takes a novel approach to the design of a permissioned blockchain and revamps the way blockchains cope with non-determinism, resource exhaus- tion, and performance attacks. This paper describes Fabric, its architecture, the ratio- nale behind various design decisions, its security model and guarantees, its most prominent implementation aspects, as * Work done at IBM, now with State Street Corp. Work done at IBM. [Copyright notice will appear here once ’preprint’ option is removed.] well as its distributed application programming model. We further evaluate Fabric by implementing and benchmark- ing a Bitcoin-inspired digital currency. We show that Fabric achieves end-to-end throughput of more than 3500 transac- tions per second in certain popular deployment configura- tions, with sub-second latency. 1. Introduction A blockchain can be defined as an immutable ledger for recording transactions, maintained within a distributed net- work of mutually untrusting peers. Every peer maintains a copy of the ledger. The peers execute a consensus pro- tocol to validate transactions, group them into blocks, and build a hash chain over the blocks. This process forms the ledger by ordering the transactions, as is necessary for con- sistency. Blockchains have emerged with Bitcoin (http: //bitcoin.org/) and are widely regarded as a promising technology to run trusted exchanges in the digital world. In a public or permissionless blockchain anyone can par- ticipate without a specific identity. Public blockchains typi- cally involve a native cryptocurrency and often use consen- sus based on “proof of work” (PoW) and economic incen- tives. Permissioned blockchains, on the other hand, run a blockchain among a set of known, identified participants. A permissioned blockchain provides a way to secure the inter- actions among a group of entities that have a common goal but which do not fully trust each other, such as businesses that exchange funds, goods, or information. By relying on the identities of the peers, a permissioned blockchain can use traditional Byzantine-fault tolerant (BFT) consensus. Blockchains may execute arbitrary, programmable trans- action logic in the form of smart contracts, as exemplified by Ethereum (http://ethereum.org/). The scripts in Bit- coin were a predecessor of the concept. A smart contract functions as a trusted distributed application and gains its security from the blockchain and the underlying consensus among the peers. This closely resembles the well-known ap- proach of building resilient applications with state-machine replication (SMR) [31]. However, blockchains depart from traditional SMR with Byzantine faults in important ways: (1) arXiv:1801.10228v1 [cs.DC] 30 Jan 2018

Upload: buingoc

Post on 20-Jun-2018

230 views

Category:

Documents


0 download

TRANSCRIPT

Hyperledger Fabric A Distributed Operating Systemfor Permissioned Blockchains

Elli Androulaki Artem Barger Vita Bortnikov Christian Cachin KonstantinosChristidis Angelo De Caro David Enyeart Christopher Ferris Gennady LaventmanYacov Manevich Srinivasan Muralidharan lowast Chet Murthy dagger Binh Nguyen lowast Manish

Sethi Gari Singh Keith Smith Alessandro Sorniotti Chrysoula StathakopoulouMarko Vukolic Sharon Weed Cocco Jason Yellick

I B M

AbstractHyperledger Fabric is a modular and extensible open-sourcesystem for deploying and operating permissioned block-chains Fabric is currently used in more than 400 prototypesand proofs-of-concept of distributed ledger technology aswell as several production systems across different indus-tries and use cases

Starting from the premise that there are no ldquoone-size-fits-allrdquo solutions Fabric is the first truly extensible block-chain system for running distributed applications It supportsmodular consensus protocols which allows the system tobe tailored to particular use cases and trust models Fab-ric is also the first blockchain system that runs distributedapplications written in general-purpose programming lan-guages without systemic dependency on a native cryptocur-rency This stands in sharp contrast to existing blockchainplatforms for running smart contracts that require code tobe written in domain-specific languages or rely on a cryp-tocurrency Furthermore it uses a portable notion of mem-bership for realizing the permissioned model which may beintegrated with industry-standard identity management Tosupport such flexibility Fabric takes a novel approach to thedesign of a permissioned blockchain and revamps the wayblockchains cope with non-determinism resource exhaus-tion and performance attacks

This paper describes Fabric its architecture the ratio-nale behind various design decisions its security model andguarantees its most prominent implementation aspects as

lowastWork done at IBM now with State Street CorpdaggerWork done at IBM

[Copyright notice will appear here once rsquopreprintrsquo option is removed]

well as its distributed application programming model Wefurther evaluate Fabric by implementing and benchmark-ing a Bitcoin-inspired digital currency We show that Fabricachieves end-to-end throughput of more than 3500 transac-tions per second in certain popular deployment configura-tions with sub-second latency

1 IntroductionA blockchain can be defined as an immutable ledger forrecording transactions maintained within a distributed net-work of mutually untrusting peers Every peer maintainsa copy of the ledger The peers execute a consensus pro-tocol to validate transactions group them into blocks andbuild a hash chain over the blocks This process forms theledger by ordering the transactions as is necessary for con-sistency Blockchains have emerged with Bitcoin (httpbitcoinorg) and are widely regarded as a promisingtechnology to run trusted exchanges in the digital world

In a public or permissionless blockchain anyone can par-ticipate without a specific identity Public blockchains typi-cally involve a native cryptocurrency and often use consen-sus based on ldquoproof of workrdquo (PoW) and economic incen-tives Permissioned blockchains on the other hand run ablockchain among a set of known identified participants Apermissioned blockchain provides a way to secure the inter-actions among a group of entities that have a common goalbut which do not fully trust each other such as businessesthat exchange funds goods or information By relying onthe identities of the peers a permissioned blockchain canuse traditional Byzantine-fault tolerant (BFT) consensus

Blockchains may execute arbitrary programmable trans-action logic in the form of smart contracts as exemplifiedby Ethereum (httpethereumorg) The scripts in Bit-coin were a predecessor of the concept A smart contractfunctions as a trusted distributed application and gains itssecurity from the blockchain and the underlying consensusamong the peers This closely resembles the well-known ap-proach of building resilient applications with state-machinereplication (SMR) [31] However blockchains depart fromtraditional SMR with Byzantine faults in important ways (1)

arX

iv1

801

1022

8v1

[cs

DC

] 3

0 Ja

n 20

18

not only one but many distributed applications run concur-rently (2) applications may be deployed dynamically and byanyone and (3) the application code is untrusted potentiallyeven malicious These differences necessitate new designs

Many existing smart-contract blockchains follow theblueprint of SMR [31] and implement so-called active repli-cation [13] a protocol for consensus or atomic broadcastfirst orders the transactions and propagates them to all peersand second each peer executes the transactions sequentiallyWe call this the order-execute architecture it requires allpeers to execute every transaction and all transactions to bedeterministic The order-execute architecture can be foundin virtually all existing blockchain systems ranging frompublic ones such as Ethereum (with PoW-based consensus)to permissioned ones (with BFT-type consensus) such asTendermint (httptendermintcom) Chain (httpchaincom) and Quorum (httpwwwjpmorgancomglobalQuorum) Although the order-execute designis not immediately apparent in all systems because the addi-tional transaction validation step may blur it the limitationsof order-execute are inherent in all every peer executes ev-ery transaction and transactions must be deterministic

Prior permissioned blockchains suffer from many limita-tions which often stem from their permissionless relativesor from using the order-execute architecture In particular

bull Consensus is hard-coded within the platform which con-tradicts the well-established understanding that there isno ldquoone-size-fits-allrdquo (BFT) consensus protocol [33]

bull The trust model of transaction validation is determinedby the consensus protocol and cannot be adapted to therequirements of the smart contract

bull Smart contracts must be written in a fixed non-standardor domain-specific language which hinders wide-spreadadoption and may lead to programming errors

bull The sequential execution of all transactions by all peerslimits performance and complex measures are neededto prevent denial-of-service attacks against the platformoriginating from untrusted contracts (such as accountingfor runtime with ldquogasrdquo in Ethereum)

bull Transactions must be deterministic which can be difficultto ensure programmatically

bull Every smart contract runs on all peers which is at oddswith confidentiality and prohibits the dissemination ofcontract code and state to a subset of peers

In this paper we describe Hyperledger Fabric or simplyFabric an open-source (httpgithubcomhyperledgerfabric) blockchain platform that overcomes these limita-tions Fabric is one of the projects of Hyperledger (httpwwwhyperledgerorg) under the auspices of the LinuxFoundation (httpwwwlinuxfoundationorg) Fab-ric is used in more than 400 prototypes proofs-of-conceptand in production distributed-ledger systems across differ-ent industries and use cases These use cases include but arenot limited to areas such as dispute resolution trade logis-

tics FX netting food safety contract management diamondprovenance rewards point management low liquidity se-curities trading and settlement identity management andsettlement through digital currency

Fabric introduces a new blockchain architecture aimingat resiliency flexibility scalability and confidentiality De-signed as a modular and extensible general-purpose per-missioned blockchain Fabric supports the execution of dis-tributed applications written in standard programming lan-guages This makes Fabric the first distributed operating sys-tem for permissioned blockchains

The architecture of Fabric follows a novel execute-order-validate paradigm for distributed execution of untrustedcode in an untrusted environment It separates the trans-action flow into three steps which may be run on differententities in the system (1) executing a transaction and check-ing its correctness thereby endorsing it (corresponding toldquotransaction validationrdquo in other blockchains) (2) order-ing through a consensus protocol irrespective of transactionsemantics and (3) transaction validation per application-specific trust assumptions which also prevents race condi-tions due to concurrency

This design departs radically from the order-executeparadigm in that Fabric typically executes transactions be-fore reaching final agreement on their order It combinesthe two well-known approaches to replication passive andactive as follows

First Fabric uses passive or primary-backup replica-tion [6 13] as often found in distributed databases but withmiddleware-based asymmetric update processing [24 25]and ported to untrusted environments with Byzantine faultsIn Fabric every transaction is executed (endorsed) only by asubset of the peers which allows for parallel execution andaddresses potential non-determinism drawing on ldquoexecute-verifyrdquo BFT replication [21] A flexible endorsement policyspecifies which peers or how many of them need to vouchfor the correct execution of a given smart contract

Second Fabric incorporates active replication in thesense that the transactionrsquos effects on the ledger state areonly written after reaching consensus on a total order amongthem in the deterministic validation step executed by eachpeer individually This allows Fabric to respect application-specific trust assumptions according to the transaction en-dorsement Moreover the ordering of state updates is del-egated to a modular component for consensus (ie atomicbroadcast) which is stateless and logically decoupled fromthe peers that execute transactions and maintain the ledgerSince consensus is modular its implementation can be tai-lored to the trust assumption of a particular deployment Al-though it is readily possible to use the blockchain peers alsofor implementing consensus the separation of the two rolesadds flexibility and allows one to rely on well-establishedtoolkits for CFT (crash fault-tolerant) or BFT ordering

2 201821

Overall this hybrid replication design which mixes pas-sive and active replication in the Byzantine model and theexecute-order-validate paradigm represent the main inno-vation in Fabric architecture They resolve the issues men-tioned before and make Fabric a scalable system for permis-sioned blockchains supporting flexible trust assumptions

To implement this architecture Fabric contains modularbuilding blocks for each of the following components

Ordering service An ordering service atomically broad-casts state updates to peers and establishes consensus onthe order of transactions It has been implemented withApache KafkaZooKeeper (httpkafkaapacheorg) and with BFT-SMaRt [3]

Identity and membership A membership service provideris responsible for associating peers with cryptographicidentities It maintains the permissioned nature of Fabric

Scalable dissemination An optional peer-to-peer gossipservice disseminates the blocks output by ordering ser-vice to all peers

Smart-contract execution Smart contracts in Fabric runwithin a container environment for isolation They canbe written in standard programming languages but do nothave direct access to the ledger state

Ledger maintenance Each peer locally maintains the ledgerin the form of the append-only blockchain and as a snap-shot of the most recent state in a key-value store (KVS)The KVS can be implemented by standard libraries suchas LevelDB or Apache CouchDB

The remainder of this paper describes the architectureof Fabric and our experience with it Section 2 summa-rizes the state of the art and explains the rationale behindvarious design decisions Section 3 introduces the architec-ture and the execute-order-validate approach of Fabric in de-tail illustrating the transaction execution flow In Section 4the key components of Fabric are defined in particular theordering service membership service peer-to-peer gossipledger database and smart-contract API Results and in-sights gained in a performance evaluation of Fabric with aBitcoin-inspired cryptocurrency deployed in a cluster en-vironment on commodity public cloud VMs are given inSection 5 They show that Fabric achieves in popular de-ployment configurations throughput of more than 3500 tpsachieving finality [36] with latency of a few hundred ms Fi-nally Section 6 discusses related work

2 Background21 Order-Execute Architecture for BlockchainsAll previous blockchain systems permissioned or not fol-low the order-execute architecture This means that theblockchain network orders transactions first using a con-

Update stateOrder Execute

Deterministic ()execution

Persist state on all peers

Consensus or atomic broadcast

Figure 1 Order-execute architecture in replicated services

sensus protocol and then executes them in the same orderon all peers sequentially1

For instance a PoW-based permissionless blockchainsuch as Ethereum combines consensus and execution oftransactions as follows (1) every peer (ie a node that par-ticipates in consensus) assembles a block containing validtransactions (to establish validity this peer already pre-executes those transactions) (2) the peer tries to solve a PoWpuzzle [28] (3) if the peer is lucky and solves the puzzle itdisseminates the block to the network via a gossip protocoland (4) every peer receiving the block validates the solutionto the puzzle and all transactions in the block Effectivelyevery peer thereby repeats the execution of the lucky peerfrom its first step Moreover all peers execute the transac-tions sequentially (within one block and across blocks) Theorder-execute architecture is illustrated by Fig 1

Existing permissioned blockchains such as TendermintChain or Quorum typically use BFT consensus [9] pro-vided by PBFT [11] or other protocols for atomic broad-cast Nevertheless they all follow the same order-executeapproach and implement classical active SMR [13 31]

22 Limitations of Order-ExecuteThe order-execute architecture is conceptually simple andtherefore also widely used However it has several draw-backs when used in a general-purpose permissioned block-chain We discuss the three most significant ones next

Sequential execution Executing the transactions sequen-tially on all peers limits the effective throughput that can beachieved by the blockchain In particular since the through-put is inversely proportional to the execution latency thismay become a performance bottleneck for all but the sim-plest smart contracts Moreover recall that in contrast to tra-ditional SMR the blockchain forms a universal computingengine and its payload applications might be deployed by anadversary A denial-of-service (DoS) attack which severelyreduces the performance of such a blockchain could simplyintroduce smart contracts that take a very long time to exe-cute For example a smart contract that executes an infiniteloop has a fatal effect but cannot be detected automaticallybecause the halting problem is unsolvable

To cope with this issue public programmable block-chains with a cryptocurrency account for the execution costEthereum for example introduces the concept of gas con-

1In many blockchains with a hard-coded primary application such asBitcoin this transaction execution is called ldquotransaction validationrdquo Herewe call this step transaction execution to harmonize the terminology

3 201821

sumed by a transaction execution which is converted at agas price to a cost in the cryptocurrency and billed to thesubmitter of the transaction Ethereum goes a long way tosupport this concept assigns a cost to every low-level com-putation step and introduces its own VM monitor to controlexecution Although this appears to be a viable solution forpublic blockchains it is not adequate for the permissionedmodel for a general-purpose system without a native cryp-tocurrency

The distributed-systems literature proposes many waysto improve performance compared to sequential executionfor instance through parallel execution of unrelated oper-ations [30] Unfortunately such techniques are still to beapplied successfully in the blockchain context of smart con-tracts For instance one challenge is the requirement fordeterministically inferring all dependencies across smartcontracts which is particularly challenging when combinedwith possible confidentiality constraints Furthermore thesetechniques are of no help against DoS attacks by contractcode from untrusted developers

Non-deterministic code Another important problem foran order-execute architecture are non-deterministic transac-tions Operations executed after consensus in active SMRmust be deterministic or the distributed ledger ldquoforksrdquo andviolates the basic premise of a blockchain that all peers holdthe same state This is usually addressed by programmingblockchains in domain-specific languages (eg EthereumSolidity) that are expressive enough for their applications butlimited to deterministic execution However such languagesare difficult to design for the implementer and require addi-tional learning by the programmer Writing smart contractsin a general-purpose language (eg Go Java CC++) in-stead appears more attractive and accelerates the adoption ofblockchain solutions

Unfortunately generic languages pose many problemsfor ensuring deterministic execution Even if the applicationdeveloper does not introduce obviously non-deterministicoperations hidden implementation details can have the samedevastating effect (eg a map iterator is not deterministic inGo) To make matters worse on a blockchain the burdento create deterministic applications lies on the potentiallyuntrusted programmer Only one non-deterministic contractcreated with malicious intent is enough to bring the wholeblockchain to a halt A modular solution to filter divergingoperations on a blockchain has also been investigated [8]but it appears costly in practice

Confidentiality of execution According to the blueprintof public blockchains many permissioned systems run allsmart contracts on all peers However many intended usecases for permissioned blockchains require confidentialityie that access to smart-contract logic transaction dataor ledger state can be restricted Although cryptographictechniques ranging from data encryption to advanced zero-knowledge proofs [2] and verifiable computation [26] can

help to achieve confidentiality this often comes with a con-siderable overhead and is not viable in practice

Fortunately it suffices to propagate the same state to allpeers instead of running the same code everywhere Thusthe execution of a smart contract can be restricted to a subsetof the peers trusted for this task that vouch for the resultsof the execution This design departs from active replicationtowards a variant of passive replication [6] adapted to thetrust model of blockchain

23 Further Limitations of Existing ArchitecturesFixed trust model Most permissioned blockchains rely onasynchronous BFT replication protocols to establish consen-sus [36] Such protocols typically rely on a security assump-tion that among n gt 3f peers up to f are tolerated to mis-behave and exhibit so-called Byzantine faults [4] The samepeers often execute the applications as well under the samesecurity assumption (even though one could actually restrictBFT execution to fewer peers [37]) However such a quan-titative trust assumption irrespective of peersrsquo roles in thesystem may not match the trust required for smart-contractexecution In a flexible system trust at the application levelshould not be fixed to trust at the protocol level A general-purpose blockchain should decouple these two assumptionsand permit flexible trust models for applications

Hard-coded consensus Fabric is the first blockchain sys-tem that introduced pluggable consensus Before Fabric vir-tually all blockchain systems permissioned or not camewith a hard-coded consensus protocol However decades ofresearch on consensus protocols have shown there is no suchldquoone-size-fits-allrdquo solution For instance BFT protocols dif-fer widely in their performance when deployed in potentiallyadversarial environments [33] A protocol with a ldquochainrdquocommunication pattern exhibits provably optimal through-put on a LAN cluster with symmetric and homogeneouslinks [18] but degrades badly on a wide-area heterogeneousnetwork Furthermore external conditions such as load net-work parameters and actual faults or attacks may vary overtime in a given deployment For these reasons BFT consen-sus should be inherently reconfigurable and ideally adapt dy-namically to a changing environment [1] Another importantaspect is to match the protocolrsquos trust assumption to a givenblockchain deployment scenario Indeed one may want toreplace BFT consensus with a protocol based on an alterna-tive trust model such as XFT [27] or a CFT protocol suchas PaxosRaft [29] and ZooKeeper [20] or even a permis-sionless protocol

24 Experience with Order-Execute BlockchainPrior to realizing the execute-order-validate architecture ofFabric the team gained experience with building a per-missioned blockchain platform in the order-execute modelwith PBFT [11] for consensus From feedback obtained inmany proof-of-concept applications the limitations of this

4 201821

approach became immediately clear For instance users of-ten observed diverging states at the peers and reported abug in the consensus protocol in all cases closer inspectionrevealed that the culprit was non-deterministic transactioncode Other complaints addressed limited performance egldquoonly five transactions per secondrdquo until users confessedthat their average transaction took 200ms to execute Wehave learned that the key properties of a blockchain sys-tem namely consistency security and performance mustnot depend on the knowledge and goodwill of its users inparticular since the blockchain should run in an untrustedenvironment

3 ArchitectureIn this section we introduce the three-phase execute-order-validate architecture and then explain the transaction flowThe components of Fabric are discussed in Section 4

31 Fabric OverviewFabric is a distributed operating system for permissio-ned blockchains that executes distributed applications writ-ten in general-purpose programming languages (eg GoJava Nodejs) It securely tracks its execution history inan append-only replicated ledger data structure and has nocryptocurrency built in

Fabric introduces the execute-order-validate blockchainarchitecture and does not follow the standard order-executedesign for reasons explained in Section 2 In a nutshell adistributed application for Fabric consists of two parts

bull A smart contract called chaincode which is programcode that implements the application logic and runs dur-ing the execution phase The chaincode is the central partof a distributed application in Fabric and may be writtenby an untrusted developer Special chaincodes exist formanaging the blockchain system and maintaining param-eters collectively called system chaincodes (Sec 46)

bull An endorsement policy that is evaluated in the validationphase Endorsement policies cannot be chosen or modi-fied by untrusted application developers they are part ofthe system An endorsement policy acts as a static libraryfor transaction validation in Fabric which can merely beparameterized by the chaincode Only designated admin-istrators may run system management functions and havethe right to modify the endorsement policyA typical endorsement policy lets the chaincode specifythe endorsers for a transaction in the form of a set of peersthat are necessary for endorsement it uses a monotonelogical expression on sets such as ldquothree out of fiverdquo orldquoA and B or B and Crdquo Custom endorsement policiesmay implement arbitrary logic (eg our Bitcoin-inspiredcryptocurrency in Sec 51)

A client sends transactions to the peers specified by theendorsement policy Each transaction is then executed byspecific peers and its output is recorded this step is also

Update state

Order rw-sets Atomic broadcast

(consensus) Stateless ordering

service

Persist state on all peers

Simulate trans and endorse

Create rw-set Collect endorse-

ments

OrderExecute Validate

Validate endorse-ments amp rw-sets

Eliminate invalid and conflicting trans

Figure 2 Execute-order-validate architecture of Fabric (rw-set means a readset and writeset as explained in Sec 32)

called endorsement After execution transactions enter theordering phase which uses a pluggable consensus protocolto produce a totally ordered sequence of endorsed transac-tions grouped in blocks These are broadcast to all peerswith the (optional) help of gossip Unlike standard activereplication [31] which totally orders transaction inputs Fab-ric orders transaction outputs combined with state depen-dencies as computed during the execution phase Each peerthen validates the state changes from endorsed transactionswith respect to the endorsement policy and the consistencyof the execution in the validation phase All peers validatethe transactions in the same order and validation is determin-istic In this sense Fabric introduces a novel hybrid replica-tion paradigm in the Byzantine model which combines pas-sive replication (the pre-consensus computation of state up-dates) and active replication (the post-consensus validationof execution results and state changes)

The execute-order-validate approach is illustrated by Fig 2

A Fabric blockchain consists of a set of nodes that form anetwork As Fabric is permissioned all nodes that participatein the network have an identity as provided by a modularmembership service provider (MSP) (Sec 41) Nodes in aFabric network take up one of three roles

Clients submit transaction proposals for execution helporchestrate the execution phase and finally broadcasttransactions for ordering

Peers execute transaction proposals and validate transac-tions Peers also maintain the blockchain ledger anappend-only data structure recording all transactions inthe form of a hash chain as well as the state a succinctrepresentation of the latest ledger state Not all peers ex-ecute all transaction proposals only a subset of themcalled endorsing peers (or simply endorsers) as speci-fied by the policy of the chaincode to which the transac-tion pertains However all peers maintain the completeledger

Orderering Service Nodes (OSN) (or simply orderers)are the nodes that collectively form the ordering ser-vice In short the ordering service establishes the totalorder of all transactions in Fabric where each transac-tion contains state updates and dependencies computedduring the execution phase along with cryptographic sig-

5 201821

client endorsingpeer 1

endorsingpeer 2

endorsingpeer 3

Peer(non-endorsing)

ord

ering service

orderers

Invocation Commit

1 1 1

2

3

4 4

55

5 5

1 Chaincode execution

2 Endorsementcollection

34Ordering

BroadcastDelivery5 Validation

Figure 3 Fabric high level transaction flow

natures of the endorsing peers that computed them Or-derers are entirely unaware of the application state anddo not participate in the execution nor in the validationof transactions This design choice renders consensus inFabric as modular as possible and simplifies replacementof consensus protocols in Fabric

Since it is possible to run one physical node with multipleroles Fabric can also be operated like a traditional peer-to-peer blockchain system in which every node maintainsthe state and invokes validates and orders transactions Thetransaction flow in Fabric across the different types of nodesis depicted in Fig 3

Compared to the description focusing on one singleblockchain so far a Fabric network actually supports mul-tiple blockchains connected to the same ordering serviceEach such blockchain is called a channel and may havedifferent peers as its members Channels can be used topartition the state of the blockchain network but consen-sus across channels is not coordinated and the total order oftransactions in each channel is separate from the others Cer-tain deployments that consider all orderers as trusted mayalso implement by-channel access control for peers In thefollowing we mention channels only briefly and concentrateon one single channel

The next three sections explain the transaction flow inFabric and illustrate the steps of the execution ordering andvalidation phases A Fabric network is shown in Fig 4

32 Execution PhaseIn the execution phase clients send the transaction proposal(or simply proposal) to one or more endorsers for execu-tion Recall that every chaincode implicitly specifies a set ofendorsers via the endorsement policy A proposal containsthe identity of the submitting client (according to the MSP)the transaction payload in the form of an operation to exe-cute parameters and the identifier of the chaincode to whichit belongs a nonce to be used only once by each client (suchas a counter or a random value) and a transaction identifierderived from the client identifier and the nonce The clientalso signs the proposal

Appl

MSP

P

SDK

P P P P P P

SDK

OSN

P

OSNOSNOSNOSN

Client

Orderingservice

Peer-to-peer gossip

Peers (P)

Client

Appl

Appl

Figure 4 A Fabric network with federated MSPs and run-ning multiple (differently shaded and colored) chaincodesselectively installed on peers according to policy

The endorsers simulate the proposal by executing the op-eration on the specified chaincode which has been installedon the blockchain The chaincode runs in a Docker containerisolated from the main endorser process

A proposal is simulated against the endorserrsquos localblockchain state without any synchronization with otherpeers at this point moreover endorsers do not persist theresults of the simulation to the ledger state The state ofthe blockchain is maintained by the peer transaction man-ager (PTM) in the form of a versioned key-value store inwhich successive updates to a key have monotonically in-creasing version numbers (Sec 44) The state created by achaincode is scoped exclusively to that chaincode and can-not be accessed directly by another chaincode Note that thechaincode is not supposed to maintain the local state in theprogram code only what it maintains in the blockchain statethat is accessed with GetState PutState and DelState oper-ations Given the appropriate permission a chaincode mayinvoke another chaincode to access its state within the samechannel

As a result of the simulation each endorser produces avalue writeset consisting of the state updates produced bysimulation (ie the modified keys along with their new val-ues) as well as a readset representing the version dependen-cies of the proposal simulation (ie all keys read during sim-ulation along with their version numbers) After the simula-tion the endorser cryptographically signs a message calledendorsement which contains readset and writeset (togetherwith metadata such as transaction ID endorser ID and en-dorser signature) and sends it back to the client in a proposalresponse The client collects endorsements until they satisfythe endorsement policy of the chaincode which the trans-action invokes (see Sec 34) In particular this requires allendorsers as determined by the policy to produce the sameexecution result (ie identical readset and writeset) Thenthe client proceeds to create the transaction and passes it tothe ordering service

6 201821

Discussion on design choices As the endorsers simulatethe proposal without synchronizing with other endorserstwo endorsers may execute it on different states of the ledgerand produce different outputs For the standard endorse-ment policy which requires multiple endorsers to producethe same result this implies that under high contention ofoperations accessing the same keys a client may not beable to satisfy the endorsement policy This is a new ele-ment compared to primary-backup replication in replicateddatabases with synchronization through middleware [24] aconsequence of the assumption that no single peer is trustedfor correct execution in a blockchain

We consciously adopted this design as it considerablysimplifies the architecture and is adequate for typical block-chain applications As demonstrated by the approach of Bit-coin distributed applications can be formulated such thatcontention by operations accessing the same state can be re-duced or eliminated completely in the normal case (eg inBitcoin two operations that modify the same ldquoobjectrdquo arenot allowed and represent a double-spending attack [28])

Executing a transaction before the ordering phase is crit-ical to tolerating non-deterministic chaincodes as discussedin Section 2 A chaincode in Fabric with non-deterministictransactions can only endanger the liveness of its own oper-ations because a client cannot gather a sufficient number ofendorsements for instance This is much more acceptable inpractice than the situation in an order-execute architecturewhere non-deterministic operations lead to inconsistenciesin the state of the peers

Finally tolerating non-deterministic execution also ad-dresses DoS attacks from untrusted chaincode as an endorsercan simply abort an execution according to a local policy ifit suspects a DoS attack This will not endanger the consis-tency of the system and again such unilateral abortion ofexecution is not possible in order-execute architectures

33 Ordering PhaseWhen a client has collected enough endorsements on a pro-posal it assembles a transaction and submits this to the or-dering service The transaction contains the transaction pay-load (ie the chaincode operation including parameters)transaction metadata and a set of endorsements The order-ing phase establishes a total order on all submitted transac-tions per channel In other words ordering atomically broad-casts [7] endorsements and thereby establishes consensus ontransactions despite faulty orderers Moreover the orderingservice batches multiple transactions into blocks and outputsa hash-chained sequence of blocks containing transactionsGrouping or batching transactions into blocks improves thethroughput of the broadcast protocol which is a well-knowntechnique in the context of fault-tolerant broadcasts

At a high level the interface of the ordering service onlysupports the following two operations These operations areinvoked by a peer and implicitly parameterized by a channelidentifier

bull broadcast(tx) A client calls this operation to broadcastan arbitrary transaction tx which usually contains thetransaction payload and a signature of the client fordissemination

bull B larr deliver(s) A client calls this to retrieve block Bwith non-negative sequence number s The block con-tains a list of transactions [tx1 txk] and a hash-chain value h representing the block with sequence num-ber sminus1 ie B = ([tx1 txk] h) As the client maycall this multiple times and always returns the same blockonce it is available we say the peer delivers block B withsequence number s when it receives B for the first timeupon invoking deliver(s)

The ordering service ensures that the delivered blocks onone channel are totally ordered More specifically orderingensures the following safety properties for each channel

Agreement For any two blocks B delivered with sequencenumber s and Bprime delivered with sprime at correct peers suchthat s = sprime it holds B = Bprime

Hashchain integrity If some correct peer delivers a block Bwith number s and another correct peer delivers blockBprime = ([tx1 txk] h

prime) with number s + 1 then itholds hprime = H(B) where H(middot) denotes the cryptographichash function

No skipping If a correct peer p delivers a block with num-ber s gt 0 then for each i = 0 s minus 1 peer p hasalready delivered a block with number i

No creation When a correct peer delivers block B withnumber s then for every tx isin B some client has alreadybroadcast tx

For liveness the ordering service supports at least thefollowing ldquoeventualrdquo property

Validity If a correct client invokes broadcast(tx) then ev-ery correct peer eventually delivers a block B that in-cludes tx with some sequence number

However every individual ordering implementation is al-lowed to come with its own liveness and fairness guaranteeswith respect to client requests

Since there may be a large number of peers in the block-chain network but only relatively few nodes are expectedto implement the ordering service Fabric can be config-ured to use a built-in gossip service for disseminating deliv-ered blocks from the ordering service to all peers (Sec 43)The implementation of gossip is scalable and agnostic to theparticular implementation of the ordering service hence itworks with both CFT and BFT ordering services ensuringthe modularity of Fabric

The ordering service may also perform access controlchecks to see if a client is allowed to broadcast messagesor receive blocks on a given channel This and other featuresof the ordering service are further explained in Section 42

7 201821

Discussion on design choices It is very important that theordering service does not maintain any state of the block-chain and neither validates nor executes transactions Thisarchitecture is a crucial defining feature of Fabric andmakes Fabric the first blockchain system to totally sepa-rate consensus from execution and validation This makesconsensus as modular as possible and enables an ecosystemof consensus protocols implementing the ordering service

34 Validation PhaseBlocks are delivered to peers either via a direct connectionto the ordering service or through gossip When a new blockarrives it enters the validation phase consisting of threesequential steps

1 The endorsement policy evaluation occurs in parallelfor all transactions within the block The evaluation isthe task of the so-called validation system chaincode(VSCC) a static library that is part of the blockchainrsquosconfiguration and is responsible for validating the en-dorsement with respect to the endorsement policy config-ured for the chaincode (see Sec 46) If the endorsementis not satisfied the transaction is marked as invalid andits effects are disregarded

2 A read-write conflict check is done for all transactions inthe block sequentially For each transaction it comparesthe versions of the keys in the readset field to those in thecurrent state of the ledger as stored locally by the peerand ensures they are still the same If the versions do notmatch the transaction is marked as invalid and its effectsare disregarded

3 The ledger update phase runs last in which the block isappended to the locally stored ledger and the blockchainstate is updated In particular when adding the blockto the ledger the results of the validity checks in thefirst two steps are persisted as well in the form of a bitmask denoting the transactions that are valid within theblock This facilitates the reconstruction of the state at alater time Furthermore all state updates are applied bywriting all key-value pairs in writeset to the local state

The default VSCC in Fabric allows monotone logical ex-pressions over the set of endorsers configured for a chain-code to be expressed The VSCC evaluation verifies that theset of peers as expressed through valid signatures on en-dorsements of the transaction satisfy the expression Differ-ent VSCC policies can be configured statically however

Discussion on design choices The ledger of Fabric con-tains all transactions including those that are deemed in-valid This follows from the overall design because orderingservice which is agnostic to chaincode state produces thechain of the blocks and because the validation is done by thepeers post-consensus This feature is needed in certain usecases that require tracking of invalid transactions during sub-sequent audits and stands in contrast to other blockchains

Committer

Peer

Endorser

Ledger

Gossip

KVS

Validation configuration sys chaincode

Block store peer trans manager (PTM)

LevelDB CouchDB

Peer-to-peer block gossip

Chaincode execution containers

Figure 5 Components of a Fabric peer

(eg Bitcoin and Ethereum) where the ledger contains onlyvalid transactions

4 Fabric ComponentsFabric is written in Go and uses the gRPC framework(httpgrpcio) for communication between clientspeers and orderers In the following we describe some im-portant components in more detail Figure 5 shows the com-ponents of a peer

41 Membership ServiceThe membership service provider (MSP) maintains the iden-tities of all nodes in the system (clients peers and order-ers) and is responsible for issuing node credentials that areused for authentication and authorization Since Fabric ispermissioned all interactions among nodes occur throughmessages that are authenticated typically with digital sig-natures The membership service comprises a componentat each node where it may authenticate transactions ver-ify the integrity of transactions sign endorsements validateendorsements and authenticate other blockchain operationsTools for key management and registration of nodes are alsopart of the MSP

The MSP is an abstraction for which different instantia-tions are possible The default MSP implementation in Fab-ric handles standard PKI methods for authentication basedon digital signatures and can accommodate commercial cer-tification authorities (CAs) A stand-alone CA is provided aswell with Fabric called Fabric-CA Furthermore alternativeMSP implementations are envisaged such as one relying onanonymous credentials for authorizing a client to invoke atransaction without linking this to an identity [10]

Fabric allows two modes for setting up a blockchain net-work In offline mode credentials are generated by a CAand distributed out-of-band to all nodes Peers and order-ers can only be registered in offline mode For enrollingclients Fabric-CA provides an online mode that issues cryp-tographic credentials to them The MSP configuration mustensure that all nodes especially all peers recognize the sameidentities and authentications as valid

8 201821

The MSP permits identity federation for example whenmultiple organizations operate a blockchain network Eachorganization issues identities to its own members and everypeer recognizes members of all organizations This can beachieved with multiple MSP instantiations for example bycreating a mapping between each organization and an MSP

42 Ordering ServiceThe ordering service manages multiple channels On everychannel it provides the following services

1 Atomic broadcast for establishing order on transactionsimplementing the broadcast and deliver calls (Sec 33)

2 Reconfiguration of a channel when its members mod-ify the channel by broadcasting a configuration updatetransaction (Sec 46)

3 Optionally access control in those configurations wherethe ordering service acts as a trusted entity restrictingbroadcasting of transactions and receiving of blocks tospecified clients and peers

The ordering service is bootstrapped with a genesis blockon the system channel This block carries a configurationtransaction that defines the properties of the ordering ser-vice

The current production implementation consists of or-dering-service nodes (OSNs) that implement the opera-tions described here and communicate through the sys-tem channel The actual atomic broadcast function is pro-vided by an instance of Apache Kafka (httpkafkaapacheorg) which offers scalable publish-subscribe mes-saging and strong consistency despite node crashes basedon ZooKeeper Kafka may run on physical nodes separatefrom the OSNs The OSNs act as proxies between the peersand Kafka

An OSN directly injects a newly received transaction tothe atomic broadcast (eg to the Kafka broker) On the otherhand the nodes batch transactions received from the atomicbroadcast and form blocks A block is cut as soon as one ofthree conditions is met (1) the block contains the specifiedmaximal number of transactions (2) the block has reached amaximal size (in bytes) or (3) an amount of time has elapsedsince the first transaction of a new block was received asexplained below

This batching process is deterministic and therefore pro-duces the same blocks at all nodes It is easy to see thatthe first two conditions are trivially deterministic given thestream of transactions received from the atomic broadcastTo ensure deterministic block production in the third casea node starts a timer when it reads the first transaction ina block from the atomic broadcast If the block is not yetcut when the timer expires the node broadcasts a specialtime-to-cut transaction on the channel which indicates thesequence number of the block which it intends to cut On theother hand every node immediately cuts a new block upon

receiving the first time-to-cut transaction for the given blocknumber Since this transaction is atomically delivered to allconnected nodes they all include the same list of transac-tions in the block (To deploy this scheme in the presence off Byzantine-faulty OSNs the block is cut only when receiv-ing f + 1 time-to-cut transactions)

The orderers persist a range of the most recently deliveredblocks directly to their filesystem so they can answer topeers retrieving blocks through deliver

The orderer using Kafka is one of three ordering serviceimplementations currently available A centralized orderercalled Solo runs on one node and is used for development Aproof-of-concept orderer based on BFT-SMaRt [3] has alsobeen made available [34] it ensures the atomic broadcastservice but not yet reconfiguration and access control Thisillustrates the modularity of consensus in Fabric

43 Peer GossipOne advantage of separating the execution ordering andvalidation phases is that they can be scaled independentlyHowever since most consensus algorithms (in the CFT andBFT models) are bandwidth-bound the throughput of the or-dering service is capped by the network capacity of its nodesConsensus cannot be scaled up by adding more nodes [1436] rather throughput will decrease However since order-ing and validation are decoupled we are interested in ef-ficiently broadcasting the execution results to all peers forvalidation after the ordering phase This is precisely thegoal of the gossip component which utilizes epidemic mul-ticast [15] for this purpose The blocks are signed by theordering service This means that a peer can upon receiv-ing all blocks independently assemble the blockchain andverify its integrity

Dissemination through gossip is robust and resistant tothe node failures in contrast to overlay networks Makingrandom selections allows gossip to reduce the overhead ofmaintaining connectivity among peers and moreover re-duces the attack surface Gossip works well in the permis-sioned environment where it can withstand sybil attacks andmessage forgery

The communication layer for gossip is based on gRPCand utilizes TLS with mutual authentication which enableseach side to bind the TLS credentials to the identity of theremote peer The gossip component maintains an up-to-datemembership view of the online peers in the system All peersindependently build a local view from periodically dissem-inated membership data Furthermore a peer can reconnectto the view after a crash or a network outage

The main purpose of gossip is to reliably distribute mes-sages ie blocks from the ordering phase among the peerstaking advantage of a push-pull protocol It uses two phasesfor information dissemination during push each peer se-lects a random set of active neighbors from the membershipview and forwards them the message during pull each peerperiodically probes a set of randomly selected peers and re-

9 201821

quests missing messages It has been shown [15 22] thatusing both methods in tandem is crucial to optimally utilizethe available bandwidth and to ensure that all peers receiveall messages with high probability

In order to reduce the load of sending blocks from the or-dering nodes to the network the protocol also elects a leaderpeer that pulls blocks from the ordering service on their be-half and initiates the gossip distribution This mechanism isresilient to leader failures

Another task of gossip is to transfer the state to newlyjoining peers and peers that were disconnected for a longtime They need to receive all blocks in the chain This fea-ture relies on the fact that the largest block-sequence num-ber stored by each peer is disseminated with the membershipdata

44 LedgerThe ledger component at each peer maintains the ledgerand the blockchain state on persistent storage and enablessimulation validation and ledger-update phases Broadlyit consists of a block store and a peer transaction manager(PTM)

Ledger block store The ledger block store persists trans-action blocks and is implemented as a set of append-onlyfiles Since the blocks are immutable and arrive in a defi-nite order an append-only structure gives maximum perfor-mance In addition the block store maintains a few indicesfor random access to a block or to a transaction in a block

Peer transaction manager (PTM) The PTM maintainsthe latest state in a versioned key-value store It stores onetuple of the form (key val ver) for each unique entry keystored by any chaincode containing its most recently storedvalue val and its latest version ver The version consistsof the block sequence number and the sequence number ofthe transaction (that stores the entry) within the block Thismakes the version unique and monotonically increasing

The PTM uses a local key-value store to realize the ver-sioned key-value store implemented by a LevelDB key-value database implemented in Go (httpsgithubcomsyndtrgoleveldb) or Apache CouchDB (httpcouchdbapacheorg)

During simulation the PTM provides a stable snapshotof the latest state to the transaction As mentioned in Sec-tion 32 the PTM records in readset a tuple (key ver) foreach entry accessed by GetState and in writeset a tuple(key val) for each entry updated with PutState by the trans-action In addition the PTM supports range queries forwhich it computes a cryptographic hash of the query results(a set of tuples (key ver)) and adds the query string itself andthe hash to readset

For transaction validation (Sec 34) the PTM validatesall transactions in a block sequentially This checks whethera transaction conflicts with any preceding transaction (withinthe block or earlier) For any key in readset if the version

recorded in readset differs from the version present in thelatest state (assuming that all preceding valid transactionsare committed) then the PTM marks the transaction as in-valid For range queries the PTM re-executes the query andcompares the hash with the one present in readset to ensurethat no phantom reads occur This read-write conflict seman-tics results in one-copy serializability [23]

The ledger component tolerates a crash of the peer duringthe ledger update as follows After receiving a new block thePTM has already performed validation and marked transac-tions as valid or invalid within the block using a bit mask asmentioned in Section 34 The ledger now writes the blockto the ledger block store flushes it to disk and subsequentlyupdates the block store indices Then the PTM applies thestate changes from writeset of all valid transactions to the lo-cal versioned store Finally it computes and persists a valuesavepoint which denotes the largest successfully appliedblock number The value savepoint is used to recover theindices and the latest state from the persisted blocks whenrecovering from a crash

45 Chaincode ExecutionChaincode is executed within an environment that is looselycoupled with the rest of the peer and that supports pluginsfor adding new languages for programming chaincodes Cur-rently three languages are supported for chaincode Go Javaand Node

Every user-level or application chaincode runs in a sepa-rate process within a Docker container environment whichisolates the chaincodes from each other and from the peerThis also simplifies the management of the lifecycle forchaincodes (ie starting stopping or aborting chaincode)The chaincode and the peer communicate using gRPC mes-sages Through this loose coupling the peer is agnostic ofthe actual language in which chaincode is implemented

In contrast to application chaincode system chaincoderuns directly in the peer process System chaincode canimplement specific functions needed by Fabric and may beused in situations where the isolation among user chaincodesis overly restrictive More details on system chaincodes aregiven in the next section

46 Configuration and System ChaincodesFabricrsquos basic behavior is customized through channel con-figuration and through special chaincodes known as systemchaincodes

Channel configuration Recall that a channel forms onelogical blockchain The configuration of a channel is main-tained in metadata persisted in special configuration blocksEach configuration block contains the full channel config-uration and does not contain any other transactions Eachblockchain begins with a configuration block known as thegenesis block which is used to bootstrap the channel Thechannel configuration includes

10 201821

bull Definitions of the MSPs for the participating nodesbull The network addresses of the OSNsbull Shared configuration for the consensus implementation

and the ordering service such as batch size and timeoutsbull Rules governing access to the ordering service operations

(broadcast and deliver)bull Rules governing how each part the channel configuration

may be modified

The configuration of a channel may be updated using achannel configuration update transaction This transactioncontains a representation of the changes to be made to theconfiguration as well as a set of signatures The orderingservice nodes evaluate whether the update is valid by usingthe current configuration to verify that the modifications areauthorized using the signatures The orderers then generate anew configuration block which embeds the new configura-tion and the configuration update transaction Peers receiv-ing this block validate whether the configuration update isauthorized based on the current configuration if valid theyupdate their current configuration

System chaincodes The application chaincodes are de-ployed with a reference to an endorsement system chaincode(ESCC) and to a validation system chaincode (VSCC) Thesetwo chaincodes are selected in a symmetric way such thatthe output of the ESCC (an endorsement) may be validatedas part of the input to the VSCC

The ESCC takes as input a proposal and the proposalsimulation results If the results are satisfactory then theESCC produces a response containing the results and theendorsement For the default ESCC this endorsement issimply a signature by the peerrsquos local signing identity

The VSCC takes as input a transaction and outputswhether that transaction is valid For the default VSCCthe endorsements are collected and evaluated against theendorsement policy specified for the chaincode

Further system chaincodes implement other support func-tions such as configuration and chaincode lifecycle

5 EvaluationEven though Fabric is not yet performance-tuned and opti-mized we report in this section on some preliminary per-formance numbers Fabric is a complex distributed systemits performance depends on many parameters including thechoice of a distributed application and transaction size theordering service and consensus implementation and their pa-rameters the network parameters and topology of nodes inthe network the hardware on which nodes run the num-ber of nodes and channels further configuration parametersand the network dynamics Therefore in-depth performanceevaluation of Fabric is postponed to future work

In the absence of a standard benchmark for blockchainswe use the most prominent blockchain application for eval-uating Fabric a simple authority-minted cryptocurrency that

uses the data model of Bitcoin which we call Fabric coin(abbreviated hereafter as Fabcoin) This allows us to put theperformance of Fabric in the context of other permissionedblockchains which are often derived from Bitcoin or Ethe-reum For example it is also the application used in bench-marks of other permissioned blockchains [19 32]

In the following we first describe Fabcoin (Sec 51)which also demonstrates how to customize the validationphase and endorsement policy In Section 52 we present thebenchmark and discuss our results

51 Fabric Coin (Fabcoin)UTXO cryptocurrencies The data model introduced byBitcoin [28] has become known as ldquounspent transaction out-putrdquo or UTXO and is also used by many other cryptocur-rencies and distributed applications UTXO represents eachstep in the evolution of a data object as a separate atomicstate on the ledger Such a state is created by a transactionand destroyed (or ldquoconsumedrdquo) by another unique transac-tion occurring later Every given transaction destroys a num-ber of input states and creates one or more output states Aldquocoinrdquo in Bitcoin is initially created by a coinbase transac-tion that rewards the ldquominerrdquo of the block This appears onthe ledger as a coin state designating the miner as the ownerAny coin can be spent in the sense that the coin is assignedto a new owner by a transaction that atomically destroys thecurrent coin state designating the previous owner and createsanother coin state representing the new owner

We capture the UTXO model in the key-value store ofFabric as follows Each UTXO state corresponds to a uniqueKVS entry that is created once (the coin state is ldquounspentrdquo)and destroyed once (the coin state is ldquospentrdquo) Equivalentlyevery state may be seen as a KVS entry with logical ver-sion 0 after creation when it is destroyed again it receivesversion 1 There should not be any concurrent updates tosuch entries (eg attempting to update a coin state in differ-ent ways amounts to double-spending the coin)

Value in the UTXO model is transferred through transac-tions that refer to several input states that all belong to theentity issuing the transaction An entity owns a state becausethe public key of the entity is contained in the state itselfEvery transaction creates one or more output states in theKVS representing the new owners deletes the input states inthe KVS and ensures that the sum of the values in the inputstates equals the sum of the output statesrsquo values There isalso a policy determining how value is created (eg coin-base transactions in Bitcoin or specific mint operations inother systems) or destroyed (ie as a fee consumed by theexecution)

Fabcoin implementation Each state in Fabcoin is a tupleof the form (key val) = (txidj (amount owner label)) de-noting the coin state created as the j-th output of a transac-tion with identifier txid and allocating amount units labeledwith label to the entity whose public key is owner Labels are

11 201821

strings used to identify a given type of a coin (eg lsquoUSDlsquolsquoEURlsquo lsquoFBClsquo) Transaction identifiers are short values thatuniquely identify every Fabric transaction The Fabcoin im-plementation consists of three parts (1) a client wallet (2)the Fabcoin chaincode and (3) a custom VSCC for Fabcoinimplementing its endorsement policy

Client wallet By default each Fabric client maintains aFabcoin wallet that locally stores a set of cryptographic keysallowing the client to spend coins For creating a SPENDtransaction that transfers one or more coins the client walletcreates a Fabcoin request request = (inputs outputs sigs)containing (1) a list of input coin states inputs = [in ]that specify coin states (in ) the client wishes to spend aswell as (2) a list of output coin states outputs = [(amount ownerlabel) ] The client wallet signs with the private keys thatcorrespond to the input coin states the concatenation of theFabcoin request and a nonce which is a part of every Fabrictransaction and adds the signatures in a set sigs A SPENDtransaction is valid when the sum of the amounts in the in-put coin states is at least the sum of the amounts in the out-puts and when the output amounts are positive For a MINTtransaction that creates new coins inputs contains only anidentifier (ie a reference to a public key) of a special en-tity called Central Bank (CB) whereas outputs contains anarbitrary number of coin states To be considered valid thesignatures of a MINT transaction in sigs must be a crypto-graphic signature under the public key of CB over the con-catenation of the Fabcoin request and of the aforementionednonce Fabcoin may be configured to use multiple CBs orspecify a threshold number of signatures from a set of CBsFinally the client wallet includes the Fabcoin request into atransaction and sends this to a peer of its choice

Fabcoin chaincode A peer runs the chaincode of Fab-coin which simulates the transaction and creates readsets andwritesets In a nutshell in the case of a SPEND transactionfor every input coin state in isin inputs the chaincode first per-forms GetState(in) this puts in into the readset along with itscurrent version in Fabricrsquos versioned KVS (Sec 44) Thenthe chaincode executes DelState(in) for every input state inwhich also adds in to the writeset and effectively marks thecoin state as ldquospentrdquo Finally for j = 1 |outputs| thechaincode executes PutState(txidj out) with the j-th out-put out = (amount owner label) in outputs In addition apeer optionally runs the transaction validation code as de-scribed next in the VSCC step for Fabcoin this is not neces-sary since the custom VSCC actually validates transactionsbut it allows the (correct) peers to filter out potentially mal-formed transactions In our implementation the chaincoderuns the Fabcoin VSCC without cryptographically verifyingthe signatures

Custom VSCC Finally every peer validates Fabcoin trans-actions using custom VSCC This verifies first the cryp-tographic signature(s) in sigs under the respective public

key(s) and performs semantic validation as follows For aMINT transaction it checks that the output states are createdunder the matching transaction identifier (txid) and that alloutput amounts are positive For a SPEND transaction theVSCC additionally verifies (1) that for all input coin statesan entry in the readset has been created and that it was alsoadded to the writeset and marked as deleted (2) that the sumof the amounts for all input coin states equals the sum ofamounts of all output coin states and (3) that input and out-put coin labels match Here the VSCC obtains the amountsfor the input coins by retrieving their current values from theledger

Notice that the Fabcoin VSCC does not check transac-tions for double spending as this occurs through Fabricrsquosstandard validation that runs after the custom VSCC In par-ticular if two transactions attempt to assign the same un-spent coin state to a new owner both would pass the VSCClogic but would be caught subsequently in the read-writeconflict check performed by the PTM According to Sec-tions 34 and 44 the PTM verifies that the current versionnumber stored in the ledger matches the one in the readsethence after the first transaction has changed the version ofthe coin state the transaction ordered second will be recog-nized as invalid

52 ExperimentsSetup Unless explicitly mentioned differently in our ex-periments (1) nodes run on Fabric version v11-preview2

instrumented for performance evaluation through local log-ging (2) nodes are hosted in a single IBM Cloud (SoftLayer)datacenter as dedicated VMs interconnected with 1Gbps net-working (3) all nodes are 20 GHz 16-vCPU VMs runningUbuntu with 8GB of RAM and SSDs as local disks (4) asingle-channel ordering service runs a typical Kafka orderersetup with 3 ZooKeeper nodes 4 Kafka brokers and 3 Fab-ric orderers all on distinct VMs (5) there are 5 peers in to-tal all of which are Fabcoin endorsers and (6) signaturesuse the default 256-bit ECDSA scheme In order to measureand stage latencies in the transaction flow spanning multiplenodes the node clocks are synchronized with an NTP ser-vice throughout the experiments All communication amongFabric nodes is configured to use TLS

Methodology In every experiment in the first phase weinvoke transactions that contain only Fabcoin MINT opera-tions to produce the coins and then run a second phase ofthe experiment in which we invoke Fabcoin SPEND opera-tion on previously minted coins (effectively running single-input single-output SPEND transactions) When reportingthroughput measurements we use an increasing number ofFabric CLI clients (modified to issue concurrent requests)running on a single VM until the end-to-end throughputis saturated and state the throughput just before saturationThroughput numbers are reported as an average throughput

2Patched with commit ID 9e770062 in the Fabric master branch

12 201821

measured throughout the duration of an experiment disre-garding the ldquotailrdquo of the experiment where throughput dropsdue to some client threads finishing submitting their share oftransactions In every experiment client threads submit atleast 500k MINT and SPEND transactions

Choosing the block size A critical Fabric configurationparameter that impacts both throughput and latency is blocksize To fix the block size for subsequent experiments andto evaluate the impact of block size on performance we ranexperiments varying block size from 05MB to 4MB Resultsare depicted in Fig 6 showing peak throughput measured atthe peers along with the corresponding average end-to-end(e2e) latency

Figure 6 Impact of block size on throughput and latency

We can observe that throughput does not significantlyimprove beyond a block size of 2MB but latency gets worse(as expected) Therefore we adopt 2MB as the block sizefor the following experiments with the goal of maximizingthe measured throughput assuming the end-to-end latencyof roughly 500ms is acceptable

Size of transactions During this experiment we also ob-served the size MINT and SPEND transactions In particularthe 2MB-blocks contained 473 MINT or 670 SPEND transac-tions ie the average transaction size is 306kB for SPENDand 433kB for MINT In general transactions in Fabric arelarge because they carry certificate information BesidesMINT transactions of Fabcoin are larger than SPEND trans-actions because they carry CB certificates This is an avenuefor future improvement of both Fabric and Fabcoin

Impact of peer CPU Fabric peers run many CPU-intensivecryptographic operations To estimate the impact of CPU onthroughput we performed a set of experiments in which 4peers run on 4 8 16 and 32 vCPU VMs while also doingcoarse-grained latency staging of block validation to betteridentify bottlenecks Our experiment focused on the vali-dation phase as ordering with Kafka ordering service hasnever been a bottleneck in our experiments The validation

phase and in particular the VSCC validation of Fabcoinis computationally intensive due to many digital signatureverifications performed in this phase

Figure 7 Impact of peer CPU on end-to-end throughputand block validation latency

The results with 2MB blocks are shown in Fig 7 SPENDlatency is not shown in Fig 7 as it scales similarly to MINTWe can observe that Fabcoin VSCC validation scales quasi-linearly with CPU as Fabric VSCC validation is embarrass-ingly parallel However the read-write-check and ledger-access stages are sequential and become dominant with alarger number of cores (vCPUs) This suggests that futureversions of Fabric could profit from pipelining validationstages (which are now sequential) optimizing stable-storageaccess and parallelizing read-write dependency checks

Finally in this experiment we measured over 3560 trans-actions per second (tps) average SPEND throughput at the32-vCPU peer The MINT throughput is in general slightlylower than that of SPEND but the difference remains within10

Latency profiling by stages We further performed coarse-grained profiling of latency during our previous experimentat the peak reported throughput Results are depicted inTable 1 The ordering phase comprises broadcast-deliverlatency and internal latency within a peer before valida-tion starts The table reports average latencies for MINTand SPEND standard deviation and tail latencies (99 and999)

We can observe that ordering dominates the overall la-tency We observe that average latencies are below 550mswith sub-second tail latencies In particular the highestend-to-end latencies in our experiment come from the firstblocks during the load build-up Latency under lower loadcan be regulated and reduced by leveraging the time-to-cutparameter of the orderer (see Sec 33) which we basicallydo not use in our experiments as we set it to a large value

SSD vs RAM disk To evaluate the overhead of using lo-cal SSDs for stable storage we repeated the previous exper-iment with RAM disks (tmpfs) mounted as stable storage at

13 201821

all peer VMs The benefits are limited as tmpfs only helpswith the ledger phase of the validation at the peer We mea-sured sustained peak throughput at 3870 SPEND tps at 32-vCPU peer roughly a 9 improvement over SSD

avg stdev 99 999(1) endorsement 56 75 24 42 15 21 19 26(2) ordering 248 365 600 920 484 624 523 636(3) VSCC val 310 353 102 90 727 570 113 1084(4) RW check 348 615 39 93 470 885 590 933(5) ledger 506 722 62 88 701 975 725 105(6) validation (3+4+5) 116 169 128 178 156 216 199 230(7) end-to-end (1+2+6) 371 542 63 94 612 805 646 813

Table 1 Latency statistics in milliseconds (ms) for MINTand SPEND broken down into five stages at a 32-vCPUpeer with 2MB blocks Validation (6) comprises stages 34 and 5 the end-to-end latency contains stages 1ndash5

6 Related WorkThe architecture of Fabric resembles that of a middleware-replicated database as pioneered by Kemme and Alonso [24]However all existing work on this addressed only crash fail-ures not the setting of distributed trust that corresponds to aBFT system For instance a replicated database with asym-metric update processing [25 Sec 63] relies on one node toexecute each transaction which would not work on a block-chain The execute-order-validate architecture of Fabric canbe interpreted as a generalization of this work to the Byzan-tine model with practical applications to distributed ledgers

Byzantium [17] and HRDB [35] are two further prede-cessors of Fabric from the viewpoint of BFT database repli-cation Byzantium allows transactions to run in parallel anduses active replication but totally orders BEGIN and COM-MITROLLBACK using a BFT middleware In its opti-mistic mode every operation is coordinated by a single mas-ter replica if the master is suspected to be Byzantine allreplicas execute the transaction operations for the master andit triggers a costly protocol to change the master HRDB re-lies in an even stronger way on a correct master In contrastto Fabric both systems use active replication cannot handlea flexible trust model and rely on deterministic operationsby the replicas However their database API is richer thanthe KVS model of Fabric

In Eve [21] a related architecture for SMR has been ex-plored also in the BFT model Its peers execute transactionsconcurrently and then verify that they all reach the same out-put state using a consensus protocol If the states divergethey roll back and execute operations sequentially Eve con-tains the element of independent execution which also existsin Fabric but offers none of its other features

A large number of distributed ledger platforms in thepermissioned model have come out recently which makesit impossible to compare to all (some prominent ones areTendermint [5] Quorum [19] Chain Core [12] Multi-chain (httpswwwmultichaincom) HyperledgerSawtooth (httpssawtoothhyperledgerorg) the

Volt proposal [32] and more see references in recent over-views [9 16]) All platforms follow the order-execute archi-tecture as discussed in Section 2 As a representative exam-ple take the Quorum platform [19] an enterprise-focusedversion of Ethereum With its consensus based on Raft [29]it disseminates a transaction to all peers using gossip and theRaft leader (called minter) assembles valid transactions to ablock and distributes this using Raft All peers execute thetransaction in the order decided by the leader Therefore itsuffers from the limitations mentioned in Sections 1ndash2

References[1] P-L Aublin R Guerraoui N Knezevic V Quema and

M Vukolic The next 700 BFT protocols ACM Trans Com-put Syst 32(4)121ndash1245 Jan 2015

[2] E Ben-Sasson A Chiesa C Garman M Green I MiersE Tromer and M Virza Zerocash Decentralized anonymouspayments from bitcoin In IEEE Symposium on Security ampPrivacy pages 459ndash474 2014

[3] A N Bessani J Sousa and E A P Alchieri State ma-chine replication for the masses with BFT-SMART In In-ternational Conference on Dependable Systems and Networks(DSN) pages 355ndash362 2014

[4] G Bracha and S Toueg Asynchronous consensus and broad-cast protocols J ACM 32(4)824ndash840 1985

[5] E Buchman Tendermint Byzantine fault tolerance in the ageof blockchains MSc Thesis University of Guelph CanadaJune 2016

[6] N Budhiraja K Marzullo F B Schneider and S Toueg Theprimary-backup approach In S Mullender editor DistributedSystems (2nd Ed) pages 199ndash216 ACM PressAddison-Wesley 1993

[7] C Cachin R Guerraoui and L E T Rodrigues Introductionto Reliable and Secure Distributed Programming (2 ed)Springer 2011

[8] C Cachin S Schubert and M Vukolic Non-determinismin byzantine fault-tolerant replication In 20th InternationalConference on Principles of Distributed Systems (OPODIS)2016

[9] C Cachin and M Vukolic Blockchain consensus protocolsin the wild In A W Richa editor 31st Intl Symposium onDistributed Computing (DISC 2017) pages 11ndash116 2017

[10] J Camenisch and E V Herreweghen Design and imple-mentation of the idemix anonymous credential system InACM Conference on Computer and Communications Security(CCS) pages 21ndash30 2002

[11] M Castro and B Liskov Practical Byzantine fault toler-ance and proactive recovery ACM Trans Comput Syst20(4)398ndash461 Nov 2002

[12] Chain Chain protocol whitepaper httpschaincom

docs12protocolpaperswhitepaper 2017

[13] B Charron-Bost F Pedone and A Schiper editors Replica-tion Theory and Practice volume 5959 of Lecture Notes inComputer Science Springer 2010

14 201821

[14] K Croman C Decker I Eyal A E Gencer A JuelsA Kosba A Miller P Saxena E Shi E G Sirer et alOn scaling decentralized blockchains In International Con-ference on Financial Cryptography and Data Security (FC)pages 106ndash125 Springer 2016

[15] A Demers D Greene C Hauser W Irish J LarsonS Shenker H Sturgis D Swinehart and D Terry Epidemicalgorithms for replicated database maintenance In ACMSymposium on Principles of Distributed Computing (PODC)pages 1ndash12 ACM 1987

[16] T T A Dinh R Liu M Zhang G Chen B C Ooi andJ Wang Untangling blockchain A data processing viewof blockchain systems e-print arXiv170805665 [csDB]2017

[17] R Garcia R Rodrigues and N M Preguica Efficient mid-dleware for Byzantine fault tolerant database replication InEuropean Conference on Computer Systems (EuroSys) pages107ndash122 2011

[18] R Guerraoui R R Levy B Pochon and V Quema Through-put optimal total order broadcast for cluster environmentsACM Trans Comput Syst 28(2)51ndash532 2010

[19] J P Morgan Quorum whitepaper httpsgithubcom

jpmorganchasequorum-docs 2016

[20] F P Junqueira B C Reed and M Serafini Zab High-performance broadcast for primary-backup systems In In-ternational Conference on Dependable Systems amp Networks(DSN) pages 245ndash256 2011

[21] M Kapritsos Y Wang V Quema A Clement L Alvisiand M Dahlin All about Eve Execute-verify replicationfor multi-core servers In Symposium on Operating SystemsDesign and Implementation (OSDI) pages 237ndash250 2012

[22] R Karp C Schindelhauer S Shenker and B Vocking Ran-domized rumor spreading In Symposium on Foundations ofComputer Science (FOCS) pages 565ndash574 IEEE 2000

[23] B Kemme One-copy-serializability In Encyclopedia ofDatabase Systems pages 1947ndash1948 Springer 2009

[24] B Kemme and G Alonso A new approach to developingand implementing eager database replication protocols ACMTransactions on Database Systems 25(3)333ndash379 2000

[25] B Kemme R Jimenez-Peris and M Patino-MartınezDatabase Replication Synthesis Lectures on Data Manage-ment Morgan amp Claypool Publishers 2010

[26] A E Kosba A Miller E Shi Z Wen and C PapamanthouHawk The blockchain model of cryptography and privacy-preserving smart contracts In 37th IEEE Symposium onSecurity amp Privacy 2016

[27] S Liu P Viotti C Cachin V Quema and M Vukolic XFTpractical fault tolerance beyond crashes In Symposium onOperating Systems Design and Implementation (OSDI) pages485ndash500 2016

[28] S Nakamoto Bitcoin A peer-to-peer electronic cash systemhttpwwwbitcoinorgbitcoinpdf 2009

[29] D Ongaro and J Ousterhout In search of an understandableconsensus algorithm In USENIX Annual Technical Confer-ence (ATC) pages 305ndash320 2014

[30] F Pedone and A Schiper Handling message semanticswith Generic Broadcast protocols Distributed Computing15(2)97ndash107 2002

[31] F B Schneider Implementing fault-tolerant services usingthe state machine approach A tutorial ACM Comput Surv22(4)299ndash319 1990

[32] S Setty S Basu L Zhou M L Roberts and R VenkatesanEnabling secure and resource-efficient blockchain networkswith VOLT Technical Report MSR-TR-2017-38 MicrosoftResearch 2017

[33] A Singh T Das P Maniatis P Druschel and T Roscoe BFTprotocols under fire In Symposium on Networked SystemsDesign amp Implementation (NSDI) pages 189ndash204 2008

[34] J Sousa A Bessani and M Vukolic A Byzantine fault-tolerant ordering service for the Hyperledger Fabric block-chain platform Technical Report arXiv170906921 CoRR2017

[35] B Vandiver H Balakrishnan B Liskov and S MaddenTolerating Byzantine faults in transaction processing systemsusing commit barrier scheduling In ACM Symposium onOperating Systems Principles (SOSP) pages 59ndash72 2007

[36] M Vukolic The quest for scalable blockchain fabric Proof-of-work vs BFT replication In International Workshop onOpen Problems in Network Security (iNetSec) pages 112ndash125 2015

[37] J Yin J Martin A Venkataramani L Alvisi and M DahlinSeparating agreement from execution for Byzantine fault tol-erant services In ACM Symposium on Operating SystemsPrinciples (SOSP) pages 253ndash267 2003

15 201821

  • 1 Introduction
  • 2 Background
    • 21 Order-Execute Architecture for Blockchains
    • 22 Limitations of Order-Execute
    • 23 Further Limitations of Existing Architectures
    • 24 Experience with Order-Execute Blockchain
      • 3 Architecture
        • 31 Fabric Overview
        • 32 Execution Phase
        • 33 Ordering Phase
        • 34 Validation Phase
          • 4 Fabric Components
            • 41 Membership Service
            • 42 Ordering Service
            • 43 Peer Gossip
            • 44 Ledger
            • 45 Chaincode Execution
            • 46 Configuration and System Chaincodes
              • 5 Evaluation
                • 51 Fabric Coin (Fabcoin)
                • 52 Experiments
                  • 6 Related Work

not only one but many distributed applications run concur-rently (2) applications may be deployed dynamically and byanyone and (3) the application code is untrusted potentiallyeven malicious These differences necessitate new designs

Many existing smart-contract blockchains follow theblueprint of SMR [31] and implement so-called active repli-cation [13] a protocol for consensus or atomic broadcastfirst orders the transactions and propagates them to all peersand second each peer executes the transactions sequentiallyWe call this the order-execute architecture it requires allpeers to execute every transaction and all transactions to bedeterministic The order-execute architecture can be foundin virtually all existing blockchain systems ranging frompublic ones such as Ethereum (with PoW-based consensus)to permissioned ones (with BFT-type consensus) such asTendermint (httptendermintcom) Chain (httpchaincom) and Quorum (httpwwwjpmorgancomglobalQuorum) Although the order-execute designis not immediately apparent in all systems because the addi-tional transaction validation step may blur it the limitationsof order-execute are inherent in all every peer executes ev-ery transaction and transactions must be deterministic

Prior permissioned blockchains suffer from many limita-tions which often stem from their permissionless relativesor from using the order-execute architecture In particular

bull Consensus is hard-coded within the platform which con-tradicts the well-established understanding that there isno ldquoone-size-fits-allrdquo (BFT) consensus protocol [33]

bull The trust model of transaction validation is determinedby the consensus protocol and cannot be adapted to therequirements of the smart contract

bull Smart contracts must be written in a fixed non-standardor domain-specific language which hinders wide-spreadadoption and may lead to programming errors

bull The sequential execution of all transactions by all peerslimits performance and complex measures are neededto prevent denial-of-service attacks against the platformoriginating from untrusted contracts (such as accountingfor runtime with ldquogasrdquo in Ethereum)

bull Transactions must be deterministic which can be difficultto ensure programmatically

bull Every smart contract runs on all peers which is at oddswith confidentiality and prohibits the dissemination ofcontract code and state to a subset of peers

In this paper we describe Hyperledger Fabric or simplyFabric an open-source (httpgithubcomhyperledgerfabric) blockchain platform that overcomes these limita-tions Fabric is one of the projects of Hyperledger (httpwwwhyperledgerorg) under the auspices of the LinuxFoundation (httpwwwlinuxfoundationorg) Fab-ric is used in more than 400 prototypes proofs-of-conceptand in production distributed-ledger systems across differ-ent industries and use cases These use cases include but arenot limited to areas such as dispute resolution trade logis-

tics FX netting food safety contract management diamondprovenance rewards point management low liquidity se-curities trading and settlement identity management andsettlement through digital currency

Fabric introduces a new blockchain architecture aimingat resiliency flexibility scalability and confidentiality De-signed as a modular and extensible general-purpose per-missioned blockchain Fabric supports the execution of dis-tributed applications written in standard programming lan-guages This makes Fabric the first distributed operating sys-tem for permissioned blockchains

The architecture of Fabric follows a novel execute-order-validate paradigm for distributed execution of untrustedcode in an untrusted environment It separates the trans-action flow into three steps which may be run on differententities in the system (1) executing a transaction and check-ing its correctness thereby endorsing it (corresponding toldquotransaction validationrdquo in other blockchains) (2) order-ing through a consensus protocol irrespective of transactionsemantics and (3) transaction validation per application-specific trust assumptions which also prevents race condi-tions due to concurrency

This design departs radically from the order-executeparadigm in that Fabric typically executes transactions be-fore reaching final agreement on their order It combinesthe two well-known approaches to replication passive andactive as follows

First Fabric uses passive or primary-backup replica-tion [6 13] as often found in distributed databases but withmiddleware-based asymmetric update processing [24 25]and ported to untrusted environments with Byzantine faultsIn Fabric every transaction is executed (endorsed) only by asubset of the peers which allows for parallel execution andaddresses potential non-determinism drawing on ldquoexecute-verifyrdquo BFT replication [21] A flexible endorsement policyspecifies which peers or how many of them need to vouchfor the correct execution of a given smart contract

Second Fabric incorporates active replication in thesense that the transactionrsquos effects on the ledger state areonly written after reaching consensus on a total order amongthem in the deterministic validation step executed by eachpeer individually This allows Fabric to respect application-specific trust assumptions according to the transaction en-dorsement Moreover the ordering of state updates is del-egated to a modular component for consensus (ie atomicbroadcast) which is stateless and logically decoupled fromthe peers that execute transactions and maintain the ledgerSince consensus is modular its implementation can be tai-lored to the trust assumption of a particular deployment Al-though it is readily possible to use the blockchain peers alsofor implementing consensus the separation of the two rolesadds flexibility and allows one to rely on well-establishedtoolkits for CFT (crash fault-tolerant) or BFT ordering

2 201821

Overall this hybrid replication design which mixes pas-sive and active replication in the Byzantine model and theexecute-order-validate paradigm represent the main inno-vation in Fabric architecture They resolve the issues men-tioned before and make Fabric a scalable system for permis-sioned blockchains supporting flexible trust assumptions

To implement this architecture Fabric contains modularbuilding blocks for each of the following components

Ordering service An ordering service atomically broad-casts state updates to peers and establishes consensus onthe order of transactions It has been implemented withApache KafkaZooKeeper (httpkafkaapacheorg) and with BFT-SMaRt [3]

Identity and membership A membership service provideris responsible for associating peers with cryptographicidentities It maintains the permissioned nature of Fabric

Scalable dissemination An optional peer-to-peer gossipservice disseminates the blocks output by ordering ser-vice to all peers

Smart-contract execution Smart contracts in Fabric runwithin a container environment for isolation They canbe written in standard programming languages but do nothave direct access to the ledger state

Ledger maintenance Each peer locally maintains the ledgerin the form of the append-only blockchain and as a snap-shot of the most recent state in a key-value store (KVS)The KVS can be implemented by standard libraries suchas LevelDB or Apache CouchDB

The remainder of this paper describes the architectureof Fabric and our experience with it Section 2 summa-rizes the state of the art and explains the rationale behindvarious design decisions Section 3 introduces the architec-ture and the execute-order-validate approach of Fabric in de-tail illustrating the transaction execution flow In Section 4the key components of Fabric are defined in particular theordering service membership service peer-to-peer gossipledger database and smart-contract API Results and in-sights gained in a performance evaluation of Fabric with aBitcoin-inspired cryptocurrency deployed in a cluster en-vironment on commodity public cloud VMs are given inSection 5 They show that Fabric achieves in popular de-ployment configurations throughput of more than 3500 tpsachieving finality [36] with latency of a few hundred ms Fi-nally Section 6 discusses related work

2 Background21 Order-Execute Architecture for BlockchainsAll previous blockchain systems permissioned or not fol-low the order-execute architecture This means that theblockchain network orders transactions first using a con-

Update stateOrder Execute

Deterministic ()execution

Persist state on all peers

Consensus or atomic broadcast

Figure 1 Order-execute architecture in replicated services

sensus protocol and then executes them in the same orderon all peers sequentially1

For instance a PoW-based permissionless blockchainsuch as Ethereum combines consensus and execution oftransactions as follows (1) every peer (ie a node that par-ticipates in consensus) assembles a block containing validtransactions (to establish validity this peer already pre-executes those transactions) (2) the peer tries to solve a PoWpuzzle [28] (3) if the peer is lucky and solves the puzzle itdisseminates the block to the network via a gossip protocoland (4) every peer receiving the block validates the solutionto the puzzle and all transactions in the block Effectivelyevery peer thereby repeats the execution of the lucky peerfrom its first step Moreover all peers execute the transac-tions sequentially (within one block and across blocks) Theorder-execute architecture is illustrated by Fig 1

Existing permissioned blockchains such as TendermintChain or Quorum typically use BFT consensus [9] pro-vided by PBFT [11] or other protocols for atomic broad-cast Nevertheless they all follow the same order-executeapproach and implement classical active SMR [13 31]

22 Limitations of Order-ExecuteThe order-execute architecture is conceptually simple andtherefore also widely used However it has several draw-backs when used in a general-purpose permissioned block-chain We discuss the three most significant ones next

Sequential execution Executing the transactions sequen-tially on all peers limits the effective throughput that can beachieved by the blockchain In particular since the through-put is inversely proportional to the execution latency thismay become a performance bottleneck for all but the sim-plest smart contracts Moreover recall that in contrast to tra-ditional SMR the blockchain forms a universal computingengine and its payload applications might be deployed by anadversary A denial-of-service (DoS) attack which severelyreduces the performance of such a blockchain could simplyintroduce smart contracts that take a very long time to exe-cute For example a smart contract that executes an infiniteloop has a fatal effect but cannot be detected automaticallybecause the halting problem is unsolvable

To cope with this issue public programmable block-chains with a cryptocurrency account for the execution costEthereum for example introduces the concept of gas con-

1In many blockchains with a hard-coded primary application such asBitcoin this transaction execution is called ldquotransaction validationrdquo Herewe call this step transaction execution to harmonize the terminology

3 201821

sumed by a transaction execution which is converted at agas price to a cost in the cryptocurrency and billed to thesubmitter of the transaction Ethereum goes a long way tosupport this concept assigns a cost to every low-level com-putation step and introduces its own VM monitor to controlexecution Although this appears to be a viable solution forpublic blockchains it is not adequate for the permissionedmodel for a general-purpose system without a native cryp-tocurrency

The distributed-systems literature proposes many waysto improve performance compared to sequential executionfor instance through parallel execution of unrelated oper-ations [30] Unfortunately such techniques are still to beapplied successfully in the blockchain context of smart con-tracts For instance one challenge is the requirement fordeterministically inferring all dependencies across smartcontracts which is particularly challenging when combinedwith possible confidentiality constraints Furthermore thesetechniques are of no help against DoS attacks by contractcode from untrusted developers

Non-deterministic code Another important problem foran order-execute architecture are non-deterministic transac-tions Operations executed after consensus in active SMRmust be deterministic or the distributed ledger ldquoforksrdquo andviolates the basic premise of a blockchain that all peers holdthe same state This is usually addressed by programmingblockchains in domain-specific languages (eg EthereumSolidity) that are expressive enough for their applications butlimited to deterministic execution However such languagesare difficult to design for the implementer and require addi-tional learning by the programmer Writing smart contractsin a general-purpose language (eg Go Java CC++) in-stead appears more attractive and accelerates the adoption ofblockchain solutions

Unfortunately generic languages pose many problemsfor ensuring deterministic execution Even if the applicationdeveloper does not introduce obviously non-deterministicoperations hidden implementation details can have the samedevastating effect (eg a map iterator is not deterministic inGo) To make matters worse on a blockchain the burdento create deterministic applications lies on the potentiallyuntrusted programmer Only one non-deterministic contractcreated with malicious intent is enough to bring the wholeblockchain to a halt A modular solution to filter divergingoperations on a blockchain has also been investigated [8]but it appears costly in practice

Confidentiality of execution According to the blueprintof public blockchains many permissioned systems run allsmart contracts on all peers However many intended usecases for permissioned blockchains require confidentialityie that access to smart-contract logic transaction dataor ledger state can be restricted Although cryptographictechniques ranging from data encryption to advanced zero-knowledge proofs [2] and verifiable computation [26] can

help to achieve confidentiality this often comes with a con-siderable overhead and is not viable in practice

Fortunately it suffices to propagate the same state to allpeers instead of running the same code everywhere Thusthe execution of a smart contract can be restricted to a subsetof the peers trusted for this task that vouch for the resultsof the execution This design departs from active replicationtowards a variant of passive replication [6] adapted to thetrust model of blockchain

23 Further Limitations of Existing ArchitecturesFixed trust model Most permissioned blockchains rely onasynchronous BFT replication protocols to establish consen-sus [36] Such protocols typically rely on a security assump-tion that among n gt 3f peers up to f are tolerated to mis-behave and exhibit so-called Byzantine faults [4] The samepeers often execute the applications as well under the samesecurity assumption (even though one could actually restrictBFT execution to fewer peers [37]) However such a quan-titative trust assumption irrespective of peersrsquo roles in thesystem may not match the trust required for smart-contractexecution In a flexible system trust at the application levelshould not be fixed to trust at the protocol level A general-purpose blockchain should decouple these two assumptionsand permit flexible trust models for applications

Hard-coded consensus Fabric is the first blockchain sys-tem that introduced pluggable consensus Before Fabric vir-tually all blockchain systems permissioned or not camewith a hard-coded consensus protocol However decades ofresearch on consensus protocols have shown there is no suchldquoone-size-fits-allrdquo solution For instance BFT protocols dif-fer widely in their performance when deployed in potentiallyadversarial environments [33] A protocol with a ldquochainrdquocommunication pattern exhibits provably optimal through-put on a LAN cluster with symmetric and homogeneouslinks [18] but degrades badly on a wide-area heterogeneousnetwork Furthermore external conditions such as load net-work parameters and actual faults or attacks may vary overtime in a given deployment For these reasons BFT consen-sus should be inherently reconfigurable and ideally adapt dy-namically to a changing environment [1] Another importantaspect is to match the protocolrsquos trust assumption to a givenblockchain deployment scenario Indeed one may want toreplace BFT consensus with a protocol based on an alterna-tive trust model such as XFT [27] or a CFT protocol suchas PaxosRaft [29] and ZooKeeper [20] or even a permis-sionless protocol

24 Experience with Order-Execute BlockchainPrior to realizing the execute-order-validate architecture ofFabric the team gained experience with building a per-missioned blockchain platform in the order-execute modelwith PBFT [11] for consensus From feedback obtained inmany proof-of-concept applications the limitations of this

4 201821

approach became immediately clear For instance users of-ten observed diverging states at the peers and reported abug in the consensus protocol in all cases closer inspectionrevealed that the culprit was non-deterministic transactioncode Other complaints addressed limited performance egldquoonly five transactions per secondrdquo until users confessedthat their average transaction took 200ms to execute Wehave learned that the key properties of a blockchain sys-tem namely consistency security and performance mustnot depend on the knowledge and goodwill of its users inparticular since the blockchain should run in an untrustedenvironment

3 ArchitectureIn this section we introduce the three-phase execute-order-validate architecture and then explain the transaction flowThe components of Fabric are discussed in Section 4

31 Fabric OverviewFabric is a distributed operating system for permissio-ned blockchains that executes distributed applications writ-ten in general-purpose programming languages (eg GoJava Nodejs) It securely tracks its execution history inan append-only replicated ledger data structure and has nocryptocurrency built in

Fabric introduces the execute-order-validate blockchainarchitecture and does not follow the standard order-executedesign for reasons explained in Section 2 In a nutshell adistributed application for Fabric consists of two parts

bull A smart contract called chaincode which is programcode that implements the application logic and runs dur-ing the execution phase The chaincode is the central partof a distributed application in Fabric and may be writtenby an untrusted developer Special chaincodes exist formanaging the blockchain system and maintaining param-eters collectively called system chaincodes (Sec 46)

bull An endorsement policy that is evaluated in the validationphase Endorsement policies cannot be chosen or modi-fied by untrusted application developers they are part ofthe system An endorsement policy acts as a static libraryfor transaction validation in Fabric which can merely beparameterized by the chaincode Only designated admin-istrators may run system management functions and havethe right to modify the endorsement policyA typical endorsement policy lets the chaincode specifythe endorsers for a transaction in the form of a set of peersthat are necessary for endorsement it uses a monotonelogical expression on sets such as ldquothree out of fiverdquo orldquoA and B or B and Crdquo Custom endorsement policiesmay implement arbitrary logic (eg our Bitcoin-inspiredcryptocurrency in Sec 51)

A client sends transactions to the peers specified by theendorsement policy Each transaction is then executed byspecific peers and its output is recorded this step is also

Update state

Order rw-sets Atomic broadcast

(consensus) Stateless ordering

service

Persist state on all peers

Simulate trans and endorse

Create rw-set Collect endorse-

ments

OrderExecute Validate

Validate endorse-ments amp rw-sets

Eliminate invalid and conflicting trans

Figure 2 Execute-order-validate architecture of Fabric (rw-set means a readset and writeset as explained in Sec 32)

called endorsement After execution transactions enter theordering phase which uses a pluggable consensus protocolto produce a totally ordered sequence of endorsed transac-tions grouped in blocks These are broadcast to all peerswith the (optional) help of gossip Unlike standard activereplication [31] which totally orders transaction inputs Fab-ric orders transaction outputs combined with state depen-dencies as computed during the execution phase Each peerthen validates the state changes from endorsed transactionswith respect to the endorsement policy and the consistencyof the execution in the validation phase All peers validatethe transactions in the same order and validation is determin-istic In this sense Fabric introduces a novel hybrid replica-tion paradigm in the Byzantine model which combines pas-sive replication (the pre-consensus computation of state up-dates) and active replication (the post-consensus validationof execution results and state changes)

The execute-order-validate approach is illustrated by Fig 2

A Fabric blockchain consists of a set of nodes that form anetwork As Fabric is permissioned all nodes that participatein the network have an identity as provided by a modularmembership service provider (MSP) (Sec 41) Nodes in aFabric network take up one of three roles

Clients submit transaction proposals for execution helporchestrate the execution phase and finally broadcasttransactions for ordering

Peers execute transaction proposals and validate transac-tions Peers also maintain the blockchain ledger anappend-only data structure recording all transactions inthe form of a hash chain as well as the state a succinctrepresentation of the latest ledger state Not all peers ex-ecute all transaction proposals only a subset of themcalled endorsing peers (or simply endorsers) as speci-fied by the policy of the chaincode to which the transac-tion pertains However all peers maintain the completeledger

Orderering Service Nodes (OSN) (or simply orderers)are the nodes that collectively form the ordering ser-vice In short the ordering service establishes the totalorder of all transactions in Fabric where each transac-tion contains state updates and dependencies computedduring the execution phase along with cryptographic sig-

5 201821

client endorsingpeer 1

endorsingpeer 2

endorsingpeer 3

Peer(non-endorsing)

ord

ering service

orderers

Invocation Commit

1 1 1

2

3

4 4

55

5 5

1 Chaincode execution

2 Endorsementcollection

34Ordering

BroadcastDelivery5 Validation

Figure 3 Fabric high level transaction flow

natures of the endorsing peers that computed them Or-derers are entirely unaware of the application state anddo not participate in the execution nor in the validationof transactions This design choice renders consensus inFabric as modular as possible and simplifies replacementof consensus protocols in Fabric

Since it is possible to run one physical node with multipleroles Fabric can also be operated like a traditional peer-to-peer blockchain system in which every node maintainsthe state and invokes validates and orders transactions Thetransaction flow in Fabric across the different types of nodesis depicted in Fig 3

Compared to the description focusing on one singleblockchain so far a Fabric network actually supports mul-tiple blockchains connected to the same ordering serviceEach such blockchain is called a channel and may havedifferent peers as its members Channels can be used topartition the state of the blockchain network but consen-sus across channels is not coordinated and the total order oftransactions in each channel is separate from the others Cer-tain deployments that consider all orderers as trusted mayalso implement by-channel access control for peers In thefollowing we mention channels only briefly and concentrateon one single channel

The next three sections explain the transaction flow inFabric and illustrate the steps of the execution ordering andvalidation phases A Fabric network is shown in Fig 4

32 Execution PhaseIn the execution phase clients send the transaction proposal(or simply proposal) to one or more endorsers for execu-tion Recall that every chaincode implicitly specifies a set ofendorsers via the endorsement policy A proposal containsthe identity of the submitting client (according to the MSP)the transaction payload in the form of an operation to exe-cute parameters and the identifier of the chaincode to whichit belongs a nonce to be used only once by each client (suchas a counter or a random value) and a transaction identifierderived from the client identifier and the nonce The clientalso signs the proposal

Appl

MSP

P

SDK

P P P P P P

SDK

OSN

P

OSNOSNOSNOSN

Client

Orderingservice

Peer-to-peer gossip

Peers (P)

Client

Appl

Appl

Figure 4 A Fabric network with federated MSPs and run-ning multiple (differently shaded and colored) chaincodesselectively installed on peers according to policy

The endorsers simulate the proposal by executing the op-eration on the specified chaincode which has been installedon the blockchain The chaincode runs in a Docker containerisolated from the main endorser process

A proposal is simulated against the endorserrsquos localblockchain state without any synchronization with otherpeers at this point moreover endorsers do not persist theresults of the simulation to the ledger state The state ofthe blockchain is maintained by the peer transaction man-ager (PTM) in the form of a versioned key-value store inwhich successive updates to a key have monotonically in-creasing version numbers (Sec 44) The state created by achaincode is scoped exclusively to that chaincode and can-not be accessed directly by another chaincode Note that thechaincode is not supposed to maintain the local state in theprogram code only what it maintains in the blockchain statethat is accessed with GetState PutState and DelState oper-ations Given the appropriate permission a chaincode mayinvoke another chaincode to access its state within the samechannel

As a result of the simulation each endorser produces avalue writeset consisting of the state updates produced bysimulation (ie the modified keys along with their new val-ues) as well as a readset representing the version dependen-cies of the proposal simulation (ie all keys read during sim-ulation along with their version numbers) After the simula-tion the endorser cryptographically signs a message calledendorsement which contains readset and writeset (togetherwith metadata such as transaction ID endorser ID and en-dorser signature) and sends it back to the client in a proposalresponse The client collects endorsements until they satisfythe endorsement policy of the chaincode which the trans-action invokes (see Sec 34) In particular this requires allendorsers as determined by the policy to produce the sameexecution result (ie identical readset and writeset) Thenthe client proceeds to create the transaction and passes it tothe ordering service

6 201821

Discussion on design choices As the endorsers simulatethe proposal without synchronizing with other endorserstwo endorsers may execute it on different states of the ledgerand produce different outputs For the standard endorse-ment policy which requires multiple endorsers to producethe same result this implies that under high contention ofoperations accessing the same keys a client may not beable to satisfy the endorsement policy This is a new ele-ment compared to primary-backup replication in replicateddatabases with synchronization through middleware [24] aconsequence of the assumption that no single peer is trustedfor correct execution in a blockchain

We consciously adopted this design as it considerablysimplifies the architecture and is adequate for typical block-chain applications As demonstrated by the approach of Bit-coin distributed applications can be formulated such thatcontention by operations accessing the same state can be re-duced or eliminated completely in the normal case (eg inBitcoin two operations that modify the same ldquoobjectrdquo arenot allowed and represent a double-spending attack [28])

Executing a transaction before the ordering phase is crit-ical to tolerating non-deterministic chaincodes as discussedin Section 2 A chaincode in Fabric with non-deterministictransactions can only endanger the liveness of its own oper-ations because a client cannot gather a sufficient number ofendorsements for instance This is much more acceptable inpractice than the situation in an order-execute architecturewhere non-deterministic operations lead to inconsistenciesin the state of the peers

Finally tolerating non-deterministic execution also ad-dresses DoS attacks from untrusted chaincode as an endorsercan simply abort an execution according to a local policy ifit suspects a DoS attack This will not endanger the consis-tency of the system and again such unilateral abortion ofexecution is not possible in order-execute architectures

33 Ordering PhaseWhen a client has collected enough endorsements on a pro-posal it assembles a transaction and submits this to the or-dering service The transaction contains the transaction pay-load (ie the chaincode operation including parameters)transaction metadata and a set of endorsements The order-ing phase establishes a total order on all submitted transac-tions per channel In other words ordering atomically broad-casts [7] endorsements and thereby establishes consensus ontransactions despite faulty orderers Moreover the orderingservice batches multiple transactions into blocks and outputsa hash-chained sequence of blocks containing transactionsGrouping or batching transactions into blocks improves thethroughput of the broadcast protocol which is a well-knowntechnique in the context of fault-tolerant broadcasts

At a high level the interface of the ordering service onlysupports the following two operations These operations areinvoked by a peer and implicitly parameterized by a channelidentifier

bull broadcast(tx) A client calls this operation to broadcastan arbitrary transaction tx which usually contains thetransaction payload and a signature of the client fordissemination

bull B larr deliver(s) A client calls this to retrieve block Bwith non-negative sequence number s The block con-tains a list of transactions [tx1 txk] and a hash-chain value h representing the block with sequence num-ber sminus1 ie B = ([tx1 txk] h) As the client maycall this multiple times and always returns the same blockonce it is available we say the peer delivers block B withsequence number s when it receives B for the first timeupon invoking deliver(s)

The ordering service ensures that the delivered blocks onone channel are totally ordered More specifically orderingensures the following safety properties for each channel

Agreement For any two blocks B delivered with sequencenumber s and Bprime delivered with sprime at correct peers suchthat s = sprime it holds B = Bprime

Hashchain integrity If some correct peer delivers a block Bwith number s and another correct peer delivers blockBprime = ([tx1 txk] h

prime) with number s + 1 then itholds hprime = H(B) where H(middot) denotes the cryptographichash function

No skipping If a correct peer p delivers a block with num-ber s gt 0 then for each i = 0 s minus 1 peer p hasalready delivered a block with number i

No creation When a correct peer delivers block B withnumber s then for every tx isin B some client has alreadybroadcast tx

For liveness the ordering service supports at least thefollowing ldquoeventualrdquo property

Validity If a correct client invokes broadcast(tx) then ev-ery correct peer eventually delivers a block B that in-cludes tx with some sequence number

However every individual ordering implementation is al-lowed to come with its own liveness and fairness guaranteeswith respect to client requests

Since there may be a large number of peers in the block-chain network but only relatively few nodes are expectedto implement the ordering service Fabric can be config-ured to use a built-in gossip service for disseminating deliv-ered blocks from the ordering service to all peers (Sec 43)The implementation of gossip is scalable and agnostic to theparticular implementation of the ordering service hence itworks with both CFT and BFT ordering services ensuringthe modularity of Fabric

The ordering service may also perform access controlchecks to see if a client is allowed to broadcast messagesor receive blocks on a given channel This and other featuresof the ordering service are further explained in Section 42

7 201821

Discussion on design choices It is very important that theordering service does not maintain any state of the block-chain and neither validates nor executes transactions Thisarchitecture is a crucial defining feature of Fabric andmakes Fabric the first blockchain system to totally sepa-rate consensus from execution and validation This makesconsensus as modular as possible and enables an ecosystemof consensus protocols implementing the ordering service

34 Validation PhaseBlocks are delivered to peers either via a direct connectionto the ordering service or through gossip When a new blockarrives it enters the validation phase consisting of threesequential steps

1 The endorsement policy evaluation occurs in parallelfor all transactions within the block The evaluation isthe task of the so-called validation system chaincode(VSCC) a static library that is part of the blockchainrsquosconfiguration and is responsible for validating the en-dorsement with respect to the endorsement policy config-ured for the chaincode (see Sec 46) If the endorsementis not satisfied the transaction is marked as invalid andits effects are disregarded

2 A read-write conflict check is done for all transactions inthe block sequentially For each transaction it comparesthe versions of the keys in the readset field to those in thecurrent state of the ledger as stored locally by the peerand ensures they are still the same If the versions do notmatch the transaction is marked as invalid and its effectsare disregarded

3 The ledger update phase runs last in which the block isappended to the locally stored ledger and the blockchainstate is updated In particular when adding the blockto the ledger the results of the validity checks in thefirst two steps are persisted as well in the form of a bitmask denoting the transactions that are valid within theblock This facilitates the reconstruction of the state at alater time Furthermore all state updates are applied bywriting all key-value pairs in writeset to the local state

The default VSCC in Fabric allows monotone logical ex-pressions over the set of endorsers configured for a chain-code to be expressed The VSCC evaluation verifies that theset of peers as expressed through valid signatures on en-dorsements of the transaction satisfy the expression Differ-ent VSCC policies can be configured statically however

Discussion on design choices The ledger of Fabric con-tains all transactions including those that are deemed in-valid This follows from the overall design because orderingservice which is agnostic to chaincode state produces thechain of the blocks and because the validation is done by thepeers post-consensus This feature is needed in certain usecases that require tracking of invalid transactions during sub-sequent audits and stands in contrast to other blockchains

Committer

Peer

Endorser

Ledger

Gossip

KVS

Validation configuration sys chaincode

Block store peer trans manager (PTM)

LevelDB CouchDB

Peer-to-peer block gossip

Chaincode execution containers

Figure 5 Components of a Fabric peer

(eg Bitcoin and Ethereum) where the ledger contains onlyvalid transactions

4 Fabric ComponentsFabric is written in Go and uses the gRPC framework(httpgrpcio) for communication between clientspeers and orderers In the following we describe some im-portant components in more detail Figure 5 shows the com-ponents of a peer

41 Membership ServiceThe membership service provider (MSP) maintains the iden-tities of all nodes in the system (clients peers and order-ers) and is responsible for issuing node credentials that areused for authentication and authorization Since Fabric ispermissioned all interactions among nodes occur throughmessages that are authenticated typically with digital sig-natures The membership service comprises a componentat each node where it may authenticate transactions ver-ify the integrity of transactions sign endorsements validateendorsements and authenticate other blockchain operationsTools for key management and registration of nodes are alsopart of the MSP

The MSP is an abstraction for which different instantia-tions are possible The default MSP implementation in Fab-ric handles standard PKI methods for authentication basedon digital signatures and can accommodate commercial cer-tification authorities (CAs) A stand-alone CA is provided aswell with Fabric called Fabric-CA Furthermore alternativeMSP implementations are envisaged such as one relying onanonymous credentials for authorizing a client to invoke atransaction without linking this to an identity [10]

Fabric allows two modes for setting up a blockchain net-work In offline mode credentials are generated by a CAand distributed out-of-band to all nodes Peers and order-ers can only be registered in offline mode For enrollingclients Fabric-CA provides an online mode that issues cryp-tographic credentials to them The MSP configuration mustensure that all nodes especially all peers recognize the sameidentities and authentications as valid

8 201821

The MSP permits identity federation for example whenmultiple organizations operate a blockchain network Eachorganization issues identities to its own members and everypeer recognizes members of all organizations This can beachieved with multiple MSP instantiations for example bycreating a mapping between each organization and an MSP

42 Ordering ServiceThe ordering service manages multiple channels On everychannel it provides the following services

1 Atomic broadcast for establishing order on transactionsimplementing the broadcast and deliver calls (Sec 33)

2 Reconfiguration of a channel when its members mod-ify the channel by broadcasting a configuration updatetransaction (Sec 46)

3 Optionally access control in those configurations wherethe ordering service acts as a trusted entity restrictingbroadcasting of transactions and receiving of blocks tospecified clients and peers

The ordering service is bootstrapped with a genesis blockon the system channel This block carries a configurationtransaction that defines the properties of the ordering ser-vice

The current production implementation consists of or-dering-service nodes (OSNs) that implement the opera-tions described here and communicate through the sys-tem channel The actual atomic broadcast function is pro-vided by an instance of Apache Kafka (httpkafkaapacheorg) which offers scalable publish-subscribe mes-saging and strong consistency despite node crashes basedon ZooKeeper Kafka may run on physical nodes separatefrom the OSNs The OSNs act as proxies between the peersand Kafka

An OSN directly injects a newly received transaction tothe atomic broadcast (eg to the Kafka broker) On the otherhand the nodes batch transactions received from the atomicbroadcast and form blocks A block is cut as soon as one ofthree conditions is met (1) the block contains the specifiedmaximal number of transactions (2) the block has reached amaximal size (in bytes) or (3) an amount of time has elapsedsince the first transaction of a new block was received asexplained below

This batching process is deterministic and therefore pro-duces the same blocks at all nodes It is easy to see thatthe first two conditions are trivially deterministic given thestream of transactions received from the atomic broadcastTo ensure deterministic block production in the third casea node starts a timer when it reads the first transaction ina block from the atomic broadcast If the block is not yetcut when the timer expires the node broadcasts a specialtime-to-cut transaction on the channel which indicates thesequence number of the block which it intends to cut On theother hand every node immediately cuts a new block upon

receiving the first time-to-cut transaction for the given blocknumber Since this transaction is atomically delivered to allconnected nodes they all include the same list of transac-tions in the block (To deploy this scheme in the presence off Byzantine-faulty OSNs the block is cut only when receiv-ing f + 1 time-to-cut transactions)

The orderers persist a range of the most recently deliveredblocks directly to their filesystem so they can answer topeers retrieving blocks through deliver

The orderer using Kafka is one of three ordering serviceimplementations currently available A centralized orderercalled Solo runs on one node and is used for development Aproof-of-concept orderer based on BFT-SMaRt [3] has alsobeen made available [34] it ensures the atomic broadcastservice but not yet reconfiguration and access control Thisillustrates the modularity of consensus in Fabric

43 Peer GossipOne advantage of separating the execution ordering andvalidation phases is that they can be scaled independentlyHowever since most consensus algorithms (in the CFT andBFT models) are bandwidth-bound the throughput of the or-dering service is capped by the network capacity of its nodesConsensus cannot be scaled up by adding more nodes [1436] rather throughput will decrease However since order-ing and validation are decoupled we are interested in ef-ficiently broadcasting the execution results to all peers forvalidation after the ordering phase This is precisely thegoal of the gossip component which utilizes epidemic mul-ticast [15] for this purpose The blocks are signed by theordering service This means that a peer can upon receiv-ing all blocks independently assemble the blockchain andverify its integrity

Dissemination through gossip is robust and resistant tothe node failures in contrast to overlay networks Makingrandom selections allows gossip to reduce the overhead ofmaintaining connectivity among peers and moreover re-duces the attack surface Gossip works well in the permis-sioned environment where it can withstand sybil attacks andmessage forgery

The communication layer for gossip is based on gRPCand utilizes TLS with mutual authentication which enableseach side to bind the TLS credentials to the identity of theremote peer The gossip component maintains an up-to-datemembership view of the online peers in the system All peersindependently build a local view from periodically dissem-inated membership data Furthermore a peer can reconnectto the view after a crash or a network outage

The main purpose of gossip is to reliably distribute mes-sages ie blocks from the ordering phase among the peerstaking advantage of a push-pull protocol It uses two phasesfor information dissemination during push each peer se-lects a random set of active neighbors from the membershipview and forwards them the message during pull each peerperiodically probes a set of randomly selected peers and re-

9 201821

quests missing messages It has been shown [15 22] thatusing both methods in tandem is crucial to optimally utilizethe available bandwidth and to ensure that all peers receiveall messages with high probability

In order to reduce the load of sending blocks from the or-dering nodes to the network the protocol also elects a leaderpeer that pulls blocks from the ordering service on their be-half and initiates the gossip distribution This mechanism isresilient to leader failures

Another task of gossip is to transfer the state to newlyjoining peers and peers that were disconnected for a longtime They need to receive all blocks in the chain This fea-ture relies on the fact that the largest block-sequence num-ber stored by each peer is disseminated with the membershipdata

44 LedgerThe ledger component at each peer maintains the ledgerand the blockchain state on persistent storage and enablessimulation validation and ledger-update phases Broadlyit consists of a block store and a peer transaction manager(PTM)

Ledger block store The ledger block store persists trans-action blocks and is implemented as a set of append-onlyfiles Since the blocks are immutable and arrive in a defi-nite order an append-only structure gives maximum perfor-mance In addition the block store maintains a few indicesfor random access to a block or to a transaction in a block

Peer transaction manager (PTM) The PTM maintainsthe latest state in a versioned key-value store It stores onetuple of the form (key val ver) for each unique entry keystored by any chaincode containing its most recently storedvalue val and its latest version ver The version consistsof the block sequence number and the sequence number ofthe transaction (that stores the entry) within the block Thismakes the version unique and monotonically increasing

The PTM uses a local key-value store to realize the ver-sioned key-value store implemented by a LevelDB key-value database implemented in Go (httpsgithubcomsyndtrgoleveldb) or Apache CouchDB (httpcouchdbapacheorg)

During simulation the PTM provides a stable snapshotof the latest state to the transaction As mentioned in Sec-tion 32 the PTM records in readset a tuple (key ver) foreach entry accessed by GetState and in writeset a tuple(key val) for each entry updated with PutState by the trans-action In addition the PTM supports range queries forwhich it computes a cryptographic hash of the query results(a set of tuples (key ver)) and adds the query string itself andthe hash to readset

For transaction validation (Sec 34) the PTM validatesall transactions in a block sequentially This checks whethera transaction conflicts with any preceding transaction (withinthe block or earlier) For any key in readset if the version

recorded in readset differs from the version present in thelatest state (assuming that all preceding valid transactionsare committed) then the PTM marks the transaction as in-valid For range queries the PTM re-executes the query andcompares the hash with the one present in readset to ensurethat no phantom reads occur This read-write conflict seman-tics results in one-copy serializability [23]

The ledger component tolerates a crash of the peer duringthe ledger update as follows After receiving a new block thePTM has already performed validation and marked transac-tions as valid or invalid within the block using a bit mask asmentioned in Section 34 The ledger now writes the blockto the ledger block store flushes it to disk and subsequentlyupdates the block store indices Then the PTM applies thestate changes from writeset of all valid transactions to the lo-cal versioned store Finally it computes and persists a valuesavepoint which denotes the largest successfully appliedblock number The value savepoint is used to recover theindices and the latest state from the persisted blocks whenrecovering from a crash

45 Chaincode ExecutionChaincode is executed within an environment that is looselycoupled with the rest of the peer and that supports pluginsfor adding new languages for programming chaincodes Cur-rently three languages are supported for chaincode Go Javaand Node

Every user-level or application chaincode runs in a sepa-rate process within a Docker container environment whichisolates the chaincodes from each other and from the peerThis also simplifies the management of the lifecycle forchaincodes (ie starting stopping or aborting chaincode)The chaincode and the peer communicate using gRPC mes-sages Through this loose coupling the peer is agnostic ofthe actual language in which chaincode is implemented

In contrast to application chaincode system chaincoderuns directly in the peer process System chaincode canimplement specific functions needed by Fabric and may beused in situations where the isolation among user chaincodesis overly restrictive More details on system chaincodes aregiven in the next section

46 Configuration and System ChaincodesFabricrsquos basic behavior is customized through channel con-figuration and through special chaincodes known as systemchaincodes

Channel configuration Recall that a channel forms onelogical blockchain The configuration of a channel is main-tained in metadata persisted in special configuration blocksEach configuration block contains the full channel config-uration and does not contain any other transactions Eachblockchain begins with a configuration block known as thegenesis block which is used to bootstrap the channel Thechannel configuration includes

10 201821

bull Definitions of the MSPs for the participating nodesbull The network addresses of the OSNsbull Shared configuration for the consensus implementation

and the ordering service such as batch size and timeoutsbull Rules governing access to the ordering service operations

(broadcast and deliver)bull Rules governing how each part the channel configuration

may be modified

The configuration of a channel may be updated using achannel configuration update transaction This transactioncontains a representation of the changes to be made to theconfiguration as well as a set of signatures The orderingservice nodes evaluate whether the update is valid by usingthe current configuration to verify that the modifications areauthorized using the signatures The orderers then generate anew configuration block which embeds the new configura-tion and the configuration update transaction Peers receiv-ing this block validate whether the configuration update isauthorized based on the current configuration if valid theyupdate their current configuration

System chaincodes The application chaincodes are de-ployed with a reference to an endorsement system chaincode(ESCC) and to a validation system chaincode (VSCC) Thesetwo chaincodes are selected in a symmetric way such thatthe output of the ESCC (an endorsement) may be validatedas part of the input to the VSCC

The ESCC takes as input a proposal and the proposalsimulation results If the results are satisfactory then theESCC produces a response containing the results and theendorsement For the default ESCC this endorsement issimply a signature by the peerrsquos local signing identity

The VSCC takes as input a transaction and outputswhether that transaction is valid For the default VSCCthe endorsements are collected and evaluated against theendorsement policy specified for the chaincode

Further system chaincodes implement other support func-tions such as configuration and chaincode lifecycle

5 EvaluationEven though Fabric is not yet performance-tuned and opti-mized we report in this section on some preliminary per-formance numbers Fabric is a complex distributed systemits performance depends on many parameters including thechoice of a distributed application and transaction size theordering service and consensus implementation and their pa-rameters the network parameters and topology of nodes inthe network the hardware on which nodes run the num-ber of nodes and channels further configuration parametersand the network dynamics Therefore in-depth performanceevaluation of Fabric is postponed to future work

In the absence of a standard benchmark for blockchainswe use the most prominent blockchain application for eval-uating Fabric a simple authority-minted cryptocurrency that

uses the data model of Bitcoin which we call Fabric coin(abbreviated hereafter as Fabcoin) This allows us to put theperformance of Fabric in the context of other permissionedblockchains which are often derived from Bitcoin or Ethe-reum For example it is also the application used in bench-marks of other permissioned blockchains [19 32]

In the following we first describe Fabcoin (Sec 51)which also demonstrates how to customize the validationphase and endorsement policy In Section 52 we present thebenchmark and discuss our results

51 Fabric Coin (Fabcoin)UTXO cryptocurrencies The data model introduced byBitcoin [28] has become known as ldquounspent transaction out-putrdquo or UTXO and is also used by many other cryptocur-rencies and distributed applications UTXO represents eachstep in the evolution of a data object as a separate atomicstate on the ledger Such a state is created by a transactionand destroyed (or ldquoconsumedrdquo) by another unique transac-tion occurring later Every given transaction destroys a num-ber of input states and creates one or more output states Aldquocoinrdquo in Bitcoin is initially created by a coinbase transac-tion that rewards the ldquominerrdquo of the block This appears onthe ledger as a coin state designating the miner as the ownerAny coin can be spent in the sense that the coin is assignedto a new owner by a transaction that atomically destroys thecurrent coin state designating the previous owner and createsanother coin state representing the new owner

We capture the UTXO model in the key-value store ofFabric as follows Each UTXO state corresponds to a uniqueKVS entry that is created once (the coin state is ldquounspentrdquo)and destroyed once (the coin state is ldquospentrdquo) Equivalentlyevery state may be seen as a KVS entry with logical ver-sion 0 after creation when it is destroyed again it receivesversion 1 There should not be any concurrent updates tosuch entries (eg attempting to update a coin state in differ-ent ways amounts to double-spending the coin)

Value in the UTXO model is transferred through transac-tions that refer to several input states that all belong to theentity issuing the transaction An entity owns a state becausethe public key of the entity is contained in the state itselfEvery transaction creates one or more output states in theKVS representing the new owners deletes the input states inthe KVS and ensures that the sum of the values in the inputstates equals the sum of the output statesrsquo values There isalso a policy determining how value is created (eg coin-base transactions in Bitcoin or specific mint operations inother systems) or destroyed (ie as a fee consumed by theexecution)

Fabcoin implementation Each state in Fabcoin is a tupleof the form (key val) = (txidj (amount owner label)) de-noting the coin state created as the j-th output of a transac-tion with identifier txid and allocating amount units labeledwith label to the entity whose public key is owner Labels are

11 201821

strings used to identify a given type of a coin (eg lsquoUSDlsquolsquoEURlsquo lsquoFBClsquo) Transaction identifiers are short values thatuniquely identify every Fabric transaction The Fabcoin im-plementation consists of three parts (1) a client wallet (2)the Fabcoin chaincode and (3) a custom VSCC for Fabcoinimplementing its endorsement policy

Client wallet By default each Fabric client maintains aFabcoin wallet that locally stores a set of cryptographic keysallowing the client to spend coins For creating a SPENDtransaction that transfers one or more coins the client walletcreates a Fabcoin request request = (inputs outputs sigs)containing (1) a list of input coin states inputs = [in ]that specify coin states (in ) the client wishes to spend aswell as (2) a list of output coin states outputs = [(amount ownerlabel) ] The client wallet signs with the private keys thatcorrespond to the input coin states the concatenation of theFabcoin request and a nonce which is a part of every Fabrictransaction and adds the signatures in a set sigs A SPENDtransaction is valid when the sum of the amounts in the in-put coin states is at least the sum of the amounts in the out-puts and when the output amounts are positive For a MINTtransaction that creates new coins inputs contains only anidentifier (ie a reference to a public key) of a special en-tity called Central Bank (CB) whereas outputs contains anarbitrary number of coin states To be considered valid thesignatures of a MINT transaction in sigs must be a crypto-graphic signature under the public key of CB over the con-catenation of the Fabcoin request and of the aforementionednonce Fabcoin may be configured to use multiple CBs orspecify a threshold number of signatures from a set of CBsFinally the client wallet includes the Fabcoin request into atransaction and sends this to a peer of its choice

Fabcoin chaincode A peer runs the chaincode of Fab-coin which simulates the transaction and creates readsets andwritesets In a nutshell in the case of a SPEND transactionfor every input coin state in isin inputs the chaincode first per-forms GetState(in) this puts in into the readset along with itscurrent version in Fabricrsquos versioned KVS (Sec 44) Thenthe chaincode executes DelState(in) for every input state inwhich also adds in to the writeset and effectively marks thecoin state as ldquospentrdquo Finally for j = 1 |outputs| thechaincode executes PutState(txidj out) with the j-th out-put out = (amount owner label) in outputs In addition apeer optionally runs the transaction validation code as de-scribed next in the VSCC step for Fabcoin this is not neces-sary since the custom VSCC actually validates transactionsbut it allows the (correct) peers to filter out potentially mal-formed transactions In our implementation the chaincoderuns the Fabcoin VSCC without cryptographically verifyingthe signatures

Custom VSCC Finally every peer validates Fabcoin trans-actions using custom VSCC This verifies first the cryp-tographic signature(s) in sigs under the respective public

key(s) and performs semantic validation as follows For aMINT transaction it checks that the output states are createdunder the matching transaction identifier (txid) and that alloutput amounts are positive For a SPEND transaction theVSCC additionally verifies (1) that for all input coin statesan entry in the readset has been created and that it was alsoadded to the writeset and marked as deleted (2) that the sumof the amounts for all input coin states equals the sum ofamounts of all output coin states and (3) that input and out-put coin labels match Here the VSCC obtains the amountsfor the input coins by retrieving their current values from theledger

Notice that the Fabcoin VSCC does not check transac-tions for double spending as this occurs through Fabricrsquosstandard validation that runs after the custom VSCC In par-ticular if two transactions attempt to assign the same un-spent coin state to a new owner both would pass the VSCClogic but would be caught subsequently in the read-writeconflict check performed by the PTM According to Sec-tions 34 and 44 the PTM verifies that the current versionnumber stored in the ledger matches the one in the readsethence after the first transaction has changed the version ofthe coin state the transaction ordered second will be recog-nized as invalid

52 ExperimentsSetup Unless explicitly mentioned differently in our ex-periments (1) nodes run on Fabric version v11-preview2

instrumented for performance evaluation through local log-ging (2) nodes are hosted in a single IBM Cloud (SoftLayer)datacenter as dedicated VMs interconnected with 1Gbps net-working (3) all nodes are 20 GHz 16-vCPU VMs runningUbuntu with 8GB of RAM and SSDs as local disks (4) asingle-channel ordering service runs a typical Kafka orderersetup with 3 ZooKeeper nodes 4 Kafka brokers and 3 Fab-ric orderers all on distinct VMs (5) there are 5 peers in to-tal all of which are Fabcoin endorsers and (6) signaturesuse the default 256-bit ECDSA scheme In order to measureand stage latencies in the transaction flow spanning multiplenodes the node clocks are synchronized with an NTP ser-vice throughout the experiments All communication amongFabric nodes is configured to use TLS

Methodology In every experiment in the first phase weinvoke transactions that contain only Fabcoin MINT opera-tions to produce the coins and then run a second phase ofthe experiment in which we invoke Fabcoin SPEND opera-tion on previously minted coins (effectively running single-input single-output SPEND transactions) When reportingthroughput measurements we use an increasing number ofFabric CLI clients (modified to issue concurrent requests)running on a single VM until the end-to-end throughputis saturated and state the throughput just before saturationThroughput numbers are reported as an average throughput

2Patched with commit ID 9e770062 in the Fabric master branch

12 201821

measured throughout the duration of an experiment disre-garding the ldquotailrdquo of the experiment where throughput dropsdue to some client threads finishing submitting their share oftransactions In every experiment client threads submit atleast 500k MINT and SPEND transactions

Choosing the block size A critical Fabric configurationparameter that impacts both throughput and latency is blocksize To fix the block size for subsequent experiments andto evaluate the impact of block size on performance we ranexperiments varying block size from 05MB to 4MB Resultsare depicted in Fig 6 showing peak throughput measured atthe peers along with the corresponding average end-to-end(e2e) latency

Figure 6 Impact of block size on throughput and latency

We can observe that throughput does not significantlyimprove beyond a block size of 2MB but latency gets worse(as expected) Therefore we adopt 2MB as the block sizefor the following experiments with the goal of maximizingthe measured throughput assuming the end-to-end latencyof roughly 500ms is acceptable

Size of transactions During this experiment we also ob-served the size MINT and SPEND transactions In particularthe 2MB-blocks contained 473 MINT or 670 SPEND transac-tions ie the average transaction size is 306kB for SPENDand 433kB for MINT In general transactions in Fabric arelarge because they carry certificate information BesidesMINT transactions of Fabcoin are larger than SPEND trans-actions because they carry CB certificates This is an avenuefor future improvement of both Fabric and Fabcoin

Impact of peer CPU Fabric peers run many CPU-intensivecryptographic operations To estimate the impact of CPU onthroughput we performed a set of experiments in which 4peers run on 4 8 16 and 32 vCPU VMs while also doingcoarse-grained latency staging of block validation to betteridentify bottlenecks Our experiment focused on the vali-dation phase as ordering with Kafka ordering service hasnever been a bottleneck in our experiments The validation

phase and in particular the VSCC validation of Fabcoinis computationally intensive due to many digital signatureverifications performed in this phase

Figure 7 Impact of peer CPU on end-to-end throughputand block validation latency

The results with 2MB blocks are shown in Fig 7 SPENDlatency is not shown in Fig 7 as it scales similarly to MINTWe can observe that Fabcoin VSCC validation scales quasi-linearly with CPU as Fabric VSCC validation is embarrass-ingly parallel However the read-write-check and ledger-access stages are sequential and become dominant with alarger number of cores (vCPUs) This suggests that futureversions of Fabric could profit from pipelining validationstages (which are now sequential) optimizing stable-storageaccess and parallelizing read-write dependency checks

Finally in this experiment we measured over 3560 trans-actions per second (tps) average SPEND throughput at the32-vCPU peer The MINT throughput is in general slightlylower than that of SPEND but the difference remains within10

Latency profiling by stages We further performed coarse-grained profiling of latency during our previous experimentat the peak reported throughput Results are depicted inTable 1 The ordering phase comprises broadcast-deliverlatency and internal latency within a peer before valida-tion starts The table reports average latencies for MINTand SPEND standard deviation and tail latencies (99 and999)

We can observe that ordering dominates the overall la-tency We observe that average latencies are below 550mswith sub-second tail latencies In particular the highestend-to-end latencies in our experiment come from the firstblocks during the load build-up Latency under lower loadcan be regulated and reduced by leveraging the time-to-cutparameter of the orderer (see Sec 33) which we basicallydo not use in our experiments as we set it to a large value

SSD vs RAM disk To evaluate the overhead of using lo-cal SSDs for stable storage we repeated the previous exper-iment with RAM disks (tmpfs) mounted as stable storage at

13 201821

all peer VMs The benefits are limited as tmpfs only helpswith the ledger phase of the validation at the peer We mea-sured sustained peak throughput at 3870 SPEND tps at 32-vCPU peer roughly a 9 improvement over SSD

avg stdev 99 999(1) endorsement 56 75 24 42 15 21 19 26(2) ordering 248 365 600 920 484 624 523 636(3) VSCC val 310 353 102 90 727 570 113 1084(4) RW check 348 615 39 93 470 885 590 933(5) ledger 506 722 62 88 701 975 725 105(6) validation (3+4+5) 116 169 128 178 156 216 199 230(7) end-to-end (1+2+6) 371 542 63 94 612 805 646 813

Table 1 Latency statistics in milliseconds (ms) for MINTand SPEND broken down into five stages at a 32-vCPUpeer with 2MB blocks Validation (6) comprises stages 34 and 5 the end-to-end latency contains stages 1ndash5

6 Related WorkThe architecture of Fabric resembles that of a middleware-replicated database as pioneered by Kemme and Alonso [24]However all existing work on this addressed only crash fail-ures not the setting of distributed trust that corresponds to aBFT system For instance a replicated database with asym-metric update processing [25 Sec 63] relies on one node toexecute each transaction which would not work on a block-chain The execute-order-validate architecture of Fabric canbe interpreted as a generalization of this work to the Byzan-tine model with practical applications to distributed ledgers

Byzantium [17] and HRDB [35] are two further prede-cessors of Fabric from the viewpoint of BFT database repli-cation Byzantium allows transactions to run in parallel anduses active replication but totally orders BEGIN and COM-MITROLLBACK using a BFT middleware In its opti-mistic mode every operation is coordinated by a single mas-ter replica if the master is suspected to be Byzantine allreplicas execute the transaction operations for the master andit triggers a costly protocol to change the master HRDB re-lies in an even stronger way on a correct master In contrastto Fabric both systems use active replication cannot handlea flexible trust model and rely on deterministic operationsby the replicas However their database API is richer thanthe KVS model of Fabric

In Eve [21] a related architecture for SMR has been ex-plored also in the BFT model Its peers execute transactionsconcurrently and then verify that they all reach the same out-put state using a consensus protocol If the states divergethey roll back and execute operations sequentially Eve con-tains the element of independent execution which also existsin Fabric but offers none of its other features

A large number of distributed ledger platforms in thepermissioned model have come out recently which makesit impossible to compare to all (some prominent ones areTendermint [5] Quorum [19] Chain Core [12] Multi-chain (httpswwwmultichaincom) HyperledgerSawtooth (httpssawtoothhyperledgerorg) the

Volt proposal [32] and more see references in recent over-views [9 16]) All platforms follow the order-execute archi-tecture as discussed in Section 2 As a representative exam-ple take the Quorum platform [19] an enterprise-focusedversion of Ethereum With its consensus based on Raft [29]it disseminates a transaction to all peers using gossip and theRaft leader (called minter) assembles valid transactions to ablock and distributes this using Raft All peers execute thetransaction in the order decided by the leader Therefore itsuffers from the limitations mentioned in Sections 1ndash2

References[1] P-L Aublin R Guerraoui N Knezevic V Quema and

M Vukolic The next 700 BFT protocols ACM Trans Com-put Syst 32(4)121ndash1245 Jan 2015

[2] E Ben-Sasson A Chiesa C Garman M Green I MiersE Tromer and M Virza Zerocash Decentralized anonymouspayments from bitcoin In IEEE Symposium on Security ampPrivacy pages 459ndash474 2014

[3] A N Bessani J Sousa and E A P Alchieri State ma-chine replication for the masses with BFT-SMART In In-ternational Conference on Dependable Systems and Networks(DSN) pages 355ndash362 2014

[4] G Bracha and S Toueg Asynchronous consensus and broad-cast protocols J ACM 32(4)824ndash840 1985

[5] E Buchman Tendermint Byzantine fault tolerance in the ageof blockchains MSc Thesis University of Guelph CanadaJune 2016

[6] N Budhiraja K Marzullo F B Schneider and S Toueg Theprimary-backup approach In S Mullender editor DistributedSystems (2nd Ed) pages 199ndash216 ACM PressAddison-Wesley 1993

[7] C Cachin R Guerraoui and L E T Rodrigues Introductionto Reliable and Secure Distributed Programming (2 ed)Springer 2011

[8] C Cachin S Schubert and M Vukolic Non-determinismin byzantine fault-tolerant replication In 20th InternationalConference on Principles of Distributed Systems (OPODIS)2016

[9] C Cachin and M Vukolic Blockchain consensus protocolsin the wild In A W Richa editor 31st Intl Symposium onDistributed Computing (DISC 2017) pages 11ndash116 2017

[10] J Camenisch and E V Herreweghen Design and imple-mentation of the idemix anonymous credential system InACM Conference on Computer and Communications Security(CCS) pages 21ndash30 2002

[11] M Castro and B Liskov Practical Byzantine fault toler-ance and proactive recovery ACM Trans Comput Syst20(4)398ndash461 Nov 2002

[12] Chain Chain protocol whitepaper httpschaincom

docs12protocolpaperswhitepaper 2017

[13] B Charron-Bost F Pedone and A Schiper editors Replica-tion Theory and Practice volume 5959 of Lecture Notes inComputer Science Springer 2010

14 201821

[14] K Croman C Decker I Eyal A E Gencer A JuelsA Kosba A Miller P Saxena E Shi E G Sirer et alOn scaling decentralized blockchains In International Con-ference on Financial Cryptography and Data Security (FC)pages 106ndash125 Springer 2016

[15] A Demers D Greene C Hauser W Irish J LarsonS Shenker H Sturgis D Swinehart and D Terry Epidemicalgorithms for replicated database maintenance In ACMSymposium on Principles of Distributed Computing (PODC)pages 1ndash12 ACM 1987

[16] T T A Dinh R Liu M Zhang G Chen B C Ooi andJ Wang Untangling blockchain A data processing viewof blockchain systems e-print arXiv170805665 [csDB]2017

[17] R Garcia R Rodrigues and N M Preguica Efficient mid-dleware for Byzantine fault tolerant database replication InEuropean Conference on Computer Systems (EuroSys) pages107ndash122 2011

[18] R Guerraoui R R Levy B Pochon and V Quema Through-put optimal total order broadcast for cluster environmentsACM Trans Comput Syst 28(2)51ndash532 2010

[19] J P Morgan Quorum whitepaper httpsgithubcom

jpmorganchasequorum-docs 2016

[20] F P Junqueira B C Reed and M Serafini Zab High-performance broadcast for primary-backup systems In In-ternational Conference on Dependable Systems amp Networks(DSN) pages 245ndash256 2011

[21] M Kapritsos Y Wang V Quema A Clement L Alvisiand M Dahlin All about Eve Execute-verify replicationfor multi-core servers In Symposium on Operating SystemsDesign and Implementation (OSDI) pages 237ndash250 2012

[22] R Karp C Schindelhauer S Shenker and B Vocking Ran-domized rumor spreading In Symposium on Foundations ofComputer Science (FOCS) pages 565ndash574 IEEE 2000

[23] B Kemme One-copy-serializability In Encyclopedia ofDatabase Systems pages 1947ndash1948 Springer 2009

[24] B Kemme and G Alonso A new approach to developingand implementing eager database replication protocols ACMTransactions on Database Systems 25(3)333ndash379 2000

[25] B Kemme R Jimenez-Peris and M Patino-MartınezDatabase Replication Synthesis Lectures on Data Manage-ment Morgan amp Claypool Publishers 2010

[26] A E Kosba A Miller E Shi Z Wen and C PapamanthouHawk The blockchain model of cryptography and privacy-preserving smart contracts In 37th IEEE Symposium onSecurity amp Privacy 2016

[27] S Liu P Viotti C Cachin V Quema and M Vukolic XFTpractical fault tolerance beyond crashes In Symposium onOperating Systems Design and Implementation (OSDI) pages485ndash500 2016

[28] S Nakamoto Bitcoin A peer-to-peer electronic cash systemhttpwwwbitcoinorgbitcoinpdf 2009

[29] D Ongaro and J Ousterhout In search of an understandableconsensus algorithm In USENIX Annual Technical Confer-ence (ATC) pages 305ndash320 2014

[30] F Pedone and A Schiper Handling message semanticswith Generic Broadcast protocols Distributed Computing15(2)97ndash107 2002

[31] F B Schneider Implementing fault-tolerant services usingthe state machine approach A tutorial ACM Comput Surv22(4)299ndash319 1990

[32] S Setty S Basu L Zhou M L Roberts and R VenkatesanEnabling secure and resource-efficient blockchain networkswith VOLT Technical Report MSR-TR-2017-38 MicrosoftResearch 2017

[33] A Singh T Das P Maniatis P Druschel and T Roscoe BFTprotocols under fire In Symposium on Networked SystemsDesign amp Implementation (NSDI) pages 189ndash204 2008

[34] J Sousa A Bessani and M Vukolic A Byzantine fault-tolerant ordering service for the Hyperledger Fabric block-chain platform Technical Report arXiv170906921 CoRR2017

[35] B Vandiver H Balakrishnan B Liskov and S MaddenTolerating Byzantine faults in transaction processing systemsusing commit barrier scheduling In ACM Symposium onOperating Systems Principles (SOSP) pages 59ndash72 2007

[36] M Vukolic The quest for scalable blockchain fabric Proof-of-work vs BFT replication In International Workshop onOpen Problems in Network Security (iNetSec) pages 112ndash125 2015

[37] J Yin J Martin A Venkataramani L Alvisi and M DahlinSeparating agreement from execution for Byzantine fault tol-erant services In ACM Symposium on Operating SystemsPrinciples (SOSP) pages 253ndash267 2003

15 201821

  • 1 Introduction
  • 2 Background
    • 21 Order-Execute Architecture for Blockchains
    • 22 Limitations of Order-Execute
    • 23 Further Limitations of Existing Architectures
    • 24 Experience with Order-Execute Blockchain
      • 3 Architecture
        • 31 Fabric Overview
        • 32 Execution Phase
        • 33 Ordering Phase
        • 34 Validation Phase
          • 4 Fabric Components
            • 41 Membership Service
            • 42 Ordering Service
            • 43 Peer Gossip
            • 44 Ledger
            • 45 Chaincode Execution
            • 46 Configuration and System Chaincodes
              • 5 Evaluation
                • 51 Fabric Coin (Fabcoin)
                • 52 Experiments
                  • 6 Related Work

Overall this hybrid replication design which mixes pas-sive and active replication in the Byzantine model and theexecute-order-validate paradigm represent the main inno-vation in Fabric architecture They resolve the issues men-tioned before and make Fabric a scalable system for permis-sioned blockchains supporting flexible trust assumptions

To implement this architecture Fabric contains modularbuilding blocks for each of the following components

Ordering service An ordering service atomically broad-casts state updates to peers and establishes consensus onthe order of transactions It has been implemented withApache KafkaZooKeeper (httpkafkaapacheorg) and with BFT-SMaRt [3]

Identity and membership A membership service provideris responsible for associating peers with cryptographicidentities It maintains the permissioned nature of Fabric

Scalable dissemination An optional peer-to-peer gossipservice disseminates the blocks output by ordering ser-vice to all peers

Smart-contract execution Smart contracts in Fabric runwithin a container environment for isolation They canbe written in standard programming languages but do nothave direct access to the ledger state

Ledger maintenance Each peer locally maintains the ledgerin the form of the append-only blockchain and as a snap-shot of the most recent state in a key-value store (KVS)The KVS can be implemented by standard libraries suchas LevelDB or Apache CouchDB

The remainder of this paper describes the architectureof Fabric and our experience with it Section 2 summa-rizes the state of the art and explains the rationale behindvarious design decisions Section 3 introduces the architec-ture and the execute-order-validate approach of Fabric in de-tail illustrating the transaction execution flow In Section 4the key components of Fabric are defined in particular theordering service membership service peer-to-peer gossipledger database and smart-contract API Results and in-sights gained in a performance evaluation of Fabric with aBitcoin-inspired cryptocurrency deployed in a cluster en-vironment on commodity public cloud VMs are given inSection 5 They show that Fabric achieves in popular de-ployment configurations throughput of more than 3500 tpsachieving finality [36] with latency of a few hundred ms Fi-nally Section 6 discusses related work

2 Background21 Order-Execute Architecture for BlockchainsAll previous blockchain systems permissioned or not fol-low the order-execute architecture This means that theblockchain network orders transactions first using a con-

Update stateOrder Execute

Deterministic ()execution

Persist state on all peers

Consensus or atomic broadcast

Figure 1 Order-execute architecture in replicated services

sensus protocol and then executes them in the same orderon all peers sequentially1

For instance a PoW-based permissionless blockchainsuch as Ethereum combines consensus and execution oftransactions as follows (1) every peer (ie a node that par-ticipates in consensus) assembles a block containing validtransactions (to establish validity this peer already pre-executes those transactions) (2) the peer tries to solve a PoWpuzzle [28] (3) if the peer is lucky and solves the puzzle itdisseminates the block to the network via a gossip protocoland (4) every peer receiving the block validates the solutionto the puzzle and all transactions in the block Effectivelyevery peer thereby repeats the execution of the lucky peerfrom its first step Moreover all peers execute the transac-tions sequentially (within one block and across blocks) Theorder-execute architecture is illustrated by Fig 1

Existing permissioned blockchains such as TendermintChain or Quorum typically use BFT consensus [9] pro-vided by PBFT [11] or other protocols for atomic broad-cast Nevertheless they all follow the same order-executeapproach and implement classical active SMR [13 31]

22 Limitations of Order-ExecuteThe order-execute architecture is conceptually simple andtherefore also widely used However it has several draw-backs when used in a general-purpose permissioned block-chain We discuss the three most significant ones next

Sequential execution Executing the transactions sequen-tially on all peers limits the effective throughput that can beachieved by the blockchain In particular since the through-put is inversely proportional to the execution latency thismay become a performance bottleneck for all but the sim-plest smart contracts Moreover recall that in contrast to tra-ditional SMR the blockchain forms a universal computingengine and its payload applications might be deployed by anadversary A denial-of-service (DoS) attack which severelyreduces the performance of such a blockchain could simplyintroduce smart contracts that take a very long time to exe-cute For example a smart contract that executes an infiniteloop has a fatal effect but cannot be detected automaticallybecause the halting problem is unsolvable

To cope with this issue public programmable block-chains with a cryptocurrency account for the execution costEthereum for example introduces the concept of gas con-

1In many blockchains with a hard-coded primary application such asBitcoin this transaction execution is called ldquotransaction validationrdquo Herewe call this step transaction execution to harmonize the terminology

3 201821

sumed by a transaction execution which is converted at agas price to a cost in the cryptocurrency and billed to thesubmitter of the transaction Ethereum goes a long way tosupport this concept assigns a cost to every low-level com-putation step and introduces its own VM monitor to controlexecution Although this appears to be a viable solution forpublic blockchains it is not adequate for the permissionedmodel for a general-purpose system without a native cryp-tocurrency

The distributed-systems literature proposes many waysto improve performance compared to sequential executionfor instance through parallel execution of unrelated oper-ations [30] Unfortunately such techniques are still to beapplied successfully in the blockchain context of smart con-tracts For instance one challenge is the requirement fordeterministically inferring all dependencies across smartcontracts which is particularly challenging when combinedwith possible confidentiality constraints Furthermore thesetechniques are of no help against DoS attacks by contractcode from untrusted developers

Non-deterministic code Another important problem foran order-execute architecture are non-deterministic transac-tions Operations executed after consensus in active SMRmust be deterministic or the distributed ledger ldquoforksrdquo andviolates the basic premise of a blockchain that all peers holdthe same state This is usually addressed by programmingblockchains in domain-specific languages (eg EthereumSolidity) that are expressive enough for their applications butlimited to deterministic execution However such languagesare difficult to design for the implementer and require addi-tional learning by the programmer Writing smart contractsin a general-purpose language (eg Go Java CC++) in-stead appears more attractive and accelerates the adoption ofblockchain solutions

Unfortunately generic languages pose many problemsfor ensuring deterministic execution Even if the applicationdeveloper does not introduce obviously non-deterministicoperations hidden implementation details can have the samedevastating effect (eg a map iterator is not deterministic inGo) To make matters worse on a blockchain the burdento create deterministic applications lies on the potentiallyuntrusted programmer Only one non-deterministic contractcreated with malicious intent is enough to bring the wholeblockchain to a halt A modular solution to filter divergingoperations on a blockchain has also been investigated [8]but it appears costly in practice

Confidentiality of execution According to the blueprintof public blockchains many permissioned systems run allsmart contracts on all peers However many intended usecases for permissioned blockchains require confidentialityie that access to smart-contract logic transaction dataor ledger state can be restricted Although cryptographictechniques ranging from data encryption to advanced zero-knowledge proofs [2] and verifiable computation [26] can

help to achieve confidentiality this often comes with a con-siderable overhead and is not viable in practice

Fortunately it suffices to propagate the same state to allpeers instead of running the same code everywhere Thusthe execution of a smart contract can be restricted to a subsetof the peers trusted for this task that vouch for the resultsof the execution This design departs from active replicationtowards a variant of passive replication [6] adapted to thetrust model of blockchain

23 Further Limitations of Existing ArchitecturesFixed trust model Most permissioned blockchains rely onasynchronous BFT replication protocols to establish consen-sus [36] Such protocols typically rely on a security assump-tion that among n gt 3f peers up to f are tolerated to mis-behave and exhibit so-called Byzantine faults [4] The samepeers often execute the applications as well under the samesecurity assumption (even though one could actually restrictBFT execution to fewer peers [37]) However such a quan-titative trust assumption irrespective of peersrsquo roles in thesystem may not match the trust required for smart-contractexecution In a flexible system trust at the application levelshould not be fixed to trust at the protocol level A general-purpose blockchain should decouple these two assumptionsand permit flexible trust models for applications

Hard-coded consensus Fabric is the first blockchain sys-tem that introduced pluggable consensus Before Fabric vir-tually all blockchain systems permissioned or not camewith a hard-coded consensus protocol However decades ofresearch on consensus protocols have shown there is no suchldquoone-size-fits-allrdquo solution For instance BFT protocols dif-fer widely in their performance when deployed in potentiallyadversarial environments [33] A protocol with a ldquochainrdquocommunication pattern exhibits provably optimal through-put on a LAN cluster with symmetric and homogeneouslinks [18] but degrades badly on a wide-area heterogeneousnetwork Furthermore external conditions such as load net-work parameters and actual faults or attacks may vary overtime in a given deployment For these reasons BFT consen-sus should be inherently reconfigurable and ideally adapt dy-namically to a changing environment [1] Another importantaspect is to match the protocolrsquos trust assumption to a givenblockchain deployment scenario Indeed one may want toreplace BFT consensus with a protocol based on an alterna-tive trust model such as XFT [27] or a CFT protocol suchas PaxosRaft [29] and ZooKeeper [20] or even a permis-sionless protocol

24 Experience with Order-Execute BlockchainPrior to realizing the execute-order-validate architecture ofFabric the team gained experience with building a per-missioned blockchain platform in the order-execute modelwith PBFT [11] for consensus From feedback obtained inmany proof-of-concept applications the limitations of this

4 201821

approach became immediately clear For instance users of-ten observed diverging states at the peers and reported abug in the consensus protocol in all cases closer inspectionrevealed that the culprit was non-deterministic transactioncode Other complaints addressed limited performance egldquoonly five transactions per secondrdquo until users confessedthat their average transaction took 200ms to execute Wehave learned that the key properties of a blockchain sys-tem namely consistency security and performance mustnot depend on the knowledge and goodwill of its users inparticular since the blockchain should run in an untrustedenvironment

3 ArchitectureIn this section we introduce the three-phase execute-order-validate architecture and then explain the transaction flowThe components of Fabric are discussed in Section 4

31 Fabric OverviewFabric is a distributed operating system for permissio-ned blockchains that executes distributed applications writ-ten in general-purpose programming languages (eg GoJava Nodejs) It securely tracks its execution history inan append-only replicated ledger data structure and has nocryptocurrency built in

Fabric introduces the execute-order-validate blockchainarchitecture and does not follow the standard order-executedesign for reasons explained in Section 2 In a nutshell adistributed application for Fabric consists of two parts

bull A smart contract called chaincode which is programcode that implements the application logic and runs dur-ing the execution phase The chaincode is the central partof a distributed application in Fabric and may be writtenby an untrusted developer Special chaincodes exist formanaging the blockchain system and maintaining param-eters collectively called system chaincodes (Sec 46)

bull An endorsement policy that is evaluated in the validationphase Endorsement policies cannot be chosen or modi-fied by untrusted application developers they are part ofthe system An endorsement policy acts as a static libraryfor transaction validation in Fabric which can merely beparameterized by the chaincode Only designated admin-istrators may run system management functions and havethe right to modify the endorsement policyA typical endorsement policy lets the chaincode specifythe endorsers for a transaction in the form of a set of peersthat are necessary for endorsement it uses a monotonelogical expression on sets such as ldquothree out of fiverdquo orldquoA and B or B and Crdquo Custom endorsement policiesmay implement arbitrary logic (eg our Bitcoin-inspiredcryptocurrency in Sec 51)

A client sends transactions to the peers specified by theendorsement policy Each transaction is then executed byspecific peers and its output is recorded this step is also

Update state

Order rw-sets Atomic broadcast

(consensus) Stateless ordering

service

Persist state on all peers

Simulate trans and endorse

Create rw-set Collect endorse-

ments

OrderExecute Validate

Validate endorse-ments amp rw-sets

Eliminate invalid and conflicting trans

Figure 2 Execute-order-validate architecture of Fabric (rw-set means a readset and writeset as explained in Sec 32)

called endorsement After execution transactions enter theordering phase which uses a pluggable consensus protocolto produce a totally ordered sequence of endorsed transac-tions grouped in blocks These are broadcast to all peerswith the (optional) help of gossip Unlike standard activereplication [31] which totally orders transaction inputs Fab-ric orders transaction outputs combined with state depen-dencies as computed during the execution phase Each peerthen validates the state changes from endorsed transactionswith respect to the endorsement policy and the consistencyof the execution in the validation phase All peers validatethe transactions in the same order and validation is determin-istic In this sense Fabric introduces a novel hybrid replica-tion paradigm in the Byzantine model which combines pas-sive replication (the pre-consensus computation of state up-dates) and active replication (the post-consensus validationof execution results and state changes)

The execute-order-validate approach is illustrated by Fig 2

A Fabric blockchain consists of a set of nodes that form anetwork As Fabric is permissioned all nodes that participatein the network have an identity as provided by a modularmembership service provider (MSP) (Sec 41) Nodes in aFabric network take up one of three roles

Clients submit transaction proposals for execution helporchestrate the execution phase and finally broadcasttransactions for ordering

Peers execute transaction proposals and validate transac-tions Peers also maintain the blockchain ledger anappend-only data structure recording all transactions inthe form of a hash chain as well as the state a succinctrepresentation of the latest ledger state Not all peers ex-ecute all transaction proposals only a subset of themcalled endorsing peers (or simply endorsers) as speci-fied by the policy of the chaincode to which the transac-tion pertains However all peers maintain the completeledger

Orderering Service Nodes (OSN) (or simply orderers)are the nodes that collectively form the ordering ser-vice In short the ordering service establishes the totalorder of all transactions in Fabric where each transac-tion contains state updates and dependencies computedduring the execution phase along with cryptographic sig-

5 201821

client endorsingpeer 1

endorsingpeer 2

endorsingpeer 3

Peer(non-endorsing)

ord

ering service

orderers

Invocation Commit

1 1 1

2

3

4 4

55

5 5

1 Chaincode execution

2 Endorsementcollection

34Ordering

BroadcastDelivery5 Validation

Figure 3 Fabric high level transaction flow

natures of the endorsing peers that computed them Or-derers are entirely unaware of the application state anddo not participate in the execution nor in the validationof transactions This design choice renders consensus inFabric as modular as possible and simplifies replacementof consensus protocols in Fabric

Since it is possible to run one physical node with multipleroles Fabric can also be operated like a traditional peer-to-peer blockchain system in which every node maintainsthe state and invokes validates and orders transactions Thetransaction flow in Fabric across the different types of nodesis depicted in Fig 3

Compared to the description focusing on one singleblockchain so far a Fabric network actually supports mul-tiple blockchains connected to the same ordering serviceEach such blockchain is called a channel and may havedifferent peers as its members Channels can be used topartition the state of the blockchain network but consen-sus across channels is not coordinated and the total order oftransactions in each channel is separate from the others Cer-tain deployments that consider all orderers as trusted mayalso implement by-channel access control for peers In thefollowing we mention channels only briefly and concentrateon one single channel

The next three sections explain the transaction flow inFabric and illustrate the steps of the execution ordering andvalidation phases A Fabric network is shown in Fig 4

32 Execution PhaseIn the execution phase clients send the transaction proposal(or simply proposal) to one or more endorsers for execu-tion Recall that every chaincode implicitly specifies a set ofendorsers via the endorsement policy A proposal containsthe identity of the submitting client (according to the MSP)the transaction payload in the form of an operation to exe-cute parameters and the identifier of the chaincode to whichit belongs a nonce to be used only once by each client (suchas a counter or a random value) and a transaction identifierderived from the client identifier and the nonce The clientalso signs the proposal

Appl

MSP

P

SDK

P P P P P P

SDK

OSN

P

OSNOSNOSNOSN

Client

Orderingservice

Peer-to-peer gossip

Peers (P)

Client

Appl

Appl

Figure 4 A Fabric network with federated MSPs and run-ning multiple (differently shaded and colored) chaincodesselectively installed on peers according to policy

The endorsers simulate the proposal by executing the op-eration on the specified chaincode which has been installedon the blockchain The chaincode runs in a Docker containerisolated from the main endorser process

A proposal is simulated against the endorserrsquos localblockchain state without any synchronization with otherpeers at this point moreover endorsers do not persist theresults of the simulation to the ledger state The state ofthe blockchain is maintained by the peer transaction man-ager (PTM) in the form of a versioned key-value store inwhich successive updates to a key have monotonically in-creasing version numbers (Sec 44) The state created by achaincode is scoped exclusively to that chaincode and can-not be accessed directly by another chaincode Note that thechaincode is not supposed to maintain the local state in theprogram code only what it maintains in the blockchain statethat is accessed with GetState PutState and DelState oper-ations Given the appropriate permission a chaincode mayinvoke another chaincode to access its state within the samechannel

As a result of the simulation each endorser produces avalue writeset consisting of the state updates produced bysimulation (ie the modified keys along with their new val-ues) as well as a readset representing the version dependen-cies of the proposal simulation (ie all keys read during sim-ulation along with their version numbers) After the simula-tion the endorser cryptographically signs a message calledendorsement which contains readset and writeset (togetherwith metadata such as transaction ID endorser ID and en-dorser signature) and sends it back to the client in a proposalresponse The client collects endorsements until they satisfythe endorsement policy of the chaincode which the trans-action invokes (see Sec 34) In particular this requires allendorsers as determined by the policy to produce the sameexecution result (ie identical readset and writeset) Thenthe client proceeds to create the transaction and passes it tothe ordering service

6 201821

Discussion on design choices As the endorsers simulatethe proposal without synchronizing with other endorserstwo endorsers may execute it on different states of the ledgerand produce different outputs For the standard endorse-ment policy which requires multiple endorsers to producethe same result this implies that under high contention ofoperations accessing the same keys a client may not beable to satisfy the endorsement policy This is a new ele-ment compared to primary-backup replication in replicateddatabases with synchronization through middleware [24] aconsequence of the assumption that no single peer is trustedfor correct execution in a blockchain

We consciously adopted this design as it considerablysimplifies the architecture and is adequate for typical block-chain applications As demonstrated by the approach of Bit-coin distributed applications can be formulated such thatcontention by operations accessing the same state can be re-duced or eliminated completely in the normal case (eg inBitcoin two operations that modify the same ldquoobjectrdquo arenot allowed and represent a double-spending attack [28])

Executing a transaction before the ordering phase is crit-ical to tolerating non-deterministic chaincodes as discussedin Section 2 A chaincode in Fabric with non-deterministictransactions can only endanger the liveness of its own oper-ations because a client cannot gather a sufficient number ofendorsements for instance This is much more acceptable inpractice than the situation in an order-execute architecturewhere non-deterministic operations lead to inconsistenciesin the state of the peers

Finally tolerating non-deterministic execution also ad-dresses DoS attacks from untrusted chaincode as an endorsercan simply abort an execution according to a local policy ifit suspects a DoS attack This will not endanger the consis-tency of the system and again such unilateral abortion ofexecution is not possible in order-execute architectures

33 Ordering PhaseWhen a client has collected enough endorsements on a pro-posal it assembles a transaction and submits this to the or-dering service The transaction contains the transaction pay-load (ie the chaincode operation including parameters)transaction metadata and a set of endorsements The order-ing phase establishes a total order on all submitted transac-tions per channel In other words ordering atomically broad-casts [7] endorsements and thereby establishes consensus ontransactions despite faulty orderers Moreover the orderingservice batches multiple transactions into blocks and outputsa hash-chained sequence of blocks containing transactionsGrouping or batching transactions into blocks improves thethroughput of the broadcast protocol which is a well-knowntechnique in the context of fault-tolerant broadcasts

At a high level the interface of the ordering service onlysupports the following two operations These operations areinvoked by a peer and implicitly parameterized by a channelidentifier

bull broadcast(tx) A client calls this operation to broadcastan arbitrary transaction tx which usually contains thetransaction payload and a signature of the client fordissemination

bull B larr deliver(s) A client calls this to retrieve block Bwith non-negative sequence number s The block con-tains a list of transactions [tx1 txk] and a hash-chain value h representing the block with sequence num-ber sminus1 ie B = ([tx1 txk] h) As the client maycall this multiple times and always returns the same blockonce it is available we say the peer delivers block B withsequence number s when it receives B for the first timeupon invoking deliver(s)

The ordering service ensures that the delivered blocks onone channel are totally ordered More specifically orderingensures the following safety properties for each channel

Agreement For any two blocks B delivered with sequencenumber s and Bprime delivered with sprime at correct peers suchthat s = sprime it holds B = Bprime

Hashchain integrity If some correct peer delivers a block Bwith number s and another correct peer delivers blockBprime = ([tx1 txk] h

prime) with number s + 1 then itholds hprime = H(B) where H(middot) denotes the cryptographichash function

No skipping If a correct peer p delivers a block with num-ber s gt 0 then for each i = 0 s minus 1 peer p hasalready delivered a block with number i

No creation When a correct peer delivers block B withnumber s then for every tx isin B some client has alreadybroadcast tx

For liveness the ordering service supports at least thefollowing ldquoeventualrdquo property

Validity If a correct client invokes broadcast(tx) then ev-ery correct peer eventually delivers a block B that in-cludes tx with some sequence number

However every individual ordering implementation is al-lowed to come with its own liveness and fairness guaranteeswith respect to client requests

Since there may be a large number of peers in the block-chain network but only relatively few nodes are expectedto implement the ordering service Fabric can be config-ured to use a built-in gossip service for disseminating deliv-ered blocks from the ordering service to all peers (Sec 43)The implementation of gossip is scalable and agnostic to theparticular implementation of the ordering service hence itworks with both CFT and BFT ordering services ensuringthe modularity of Fabric

The ordering service may also perform access controlchecks to see if a client is allowed to broadcast messagesor receive blocks on a given channel This and other featuresof the ordering service are further explained in Section 42

7 201821

Discussion on design choices It is very important that theordering service does not maintain any state of the block-chain and neither validates nor executes transactions Thisarchitecture is a crucial defining feature of Fabric andmakes Fabric the first blockchain system to totally sepa-rate consensus from execution and validation This makesconsensus as modular as possible and enables an ecosystemof consensus protocols implementing the ordering service

34 Validation PhaseBlocks are delivered to peers either via a direct connectionto the ordering service or through gossip When a new blockarrives it enters the validation phase consisting of threesequential steps

1 The endorsement policy evaluation occurs in parallelfor all transactions within the block The evaluation isthe task of the so-called validation system chaincode(VSCC) a static library that is part of the blockchainrsquosconfiguration and is responsible for validating the en-dorsement with respect to the endorsement policy config-ured for the chaincode (see Sec 46) If the endorsementis not satisfied the transaction is marked as invalid andits effects are disregarded

2 A read-write conflict check is done for all transactions inthe block sequentially For each transaction it comparesthe versions of the keys in the readset field to those in thecurrent state of the ledger as stored locally by the peerand ensures they are still the same If the versions do notmatch the transaction is marked as invalid and its effectsare disregarded

3 The ledger update phase runs last in which the block isappended to the locally stored ledger and the blockchainstate is updated In particular when adding the blockto the ledger the results of the validity checks in thefirst two steps are persisted as well in the form of a bitmask denoting the transactions that are valid within theblock This facilitates the reconstruction of the state at alater time Furthermore all state updates are applied bywriting all key-value pairs in writeset to the local state

The default VSCC in Fabric allows monotone logical ex-pressions over the set of endorsers configured for a chain-code to be expressed The VSCC evaluation verifies that theset of peers as expressed through valid signatures on en-dorsements of the transaction satisfy the expression Differ-ent VSCC policies can be configured statically however

Discussion on design choices The ledger of Fabric con-tains all transactions including those that are deemed in-valid This follows from the overall design because orderingservice which is agnostic to chaincode state produces thechain of the blocks and because the validation is done by thepeers post-consensus This feature is needed in certain usecases that require tracking of invalid transactions during sub-sequent audits and stands in contrast to other blockchains

Committer

Peer

Endorser

Ledger

Gossip

KVS

Validation configuration sys chaincode

Block store peer trans manager (PTM)

LevelDB CouchDB

Peer-to-peer block gossip

Chaincode execution containers

Figure 5 Components of a Fabric peer

(eg Bitcoin and Ethereum) where the ledger contains onlyvalid transactions

4 Fabric ComponentsFabric is written in Go and uses the gRPC framework(httpgrpcio) for communication between clientspeers and orderers In the following we describe some im-portant components in more detail Figure 5 shows the com-ponents of a peer

41 Membership ServiceThe membership service provider (MSP) maintains the iden-tities of all nodes in the system (clients peers and order-ers) and is responsible for issuing node credentials that areused for authentication and authorization Since Fabric ispermissioned all interactions among nodes occur throughmessages that are authenticated typically with digital sig-natures The membership service comprises a componentat each node where it may authenticate transactions ver-ify the integrity of transactions sign endorsements validateendorsements and authenticate other blockchain operationsTools for key management and registration of nodes are alsopart of the MSP

The MSP is an abstraction for which different instantia-tions are possible The default MSP implementation in Fab-ric handles standard PKI methods for authentication basedon digital signatures and can accommodate commercial cer-tification authorities (CAs) A stand-alone CA is provided aswell with Fabric called Fabric-CA Furthermore alternativeMSP implementations are envisaged such as one relying onanonymous credentials for authorizing a client to invoke atransaction without linking this to an identity [10]

Fabric allows two modes for setting up a blockchain net-work In offline mode credentials are generated by a CAand distributed out-of-band to all nodes Peers and order-ers can only be registered in offline mode For enrollingclients Fabric-CA provides an online mode that issues cryp-tographic credentials to them The MSP configuration mustensure that all nodes especially all peers recognize the sameidentities and authentications as valid

8 201821

The MSP permits identity federation for example whenmultiple organizations operate a blockchain network Eachorganization issues identities to its own members and everypeer recognizes members of all organizations This can beachieved with multiple MSP instantiations for example bycreating a mapping between each organization and an MSP

42 Ordering ServiceThe ordering service manages multiple channels On everychannel it provides the following services

1 Atomic broadcast for establishing order on transactionsimplementing the broadcast and deliver calls (Sec 33)

2 Reconfiguration of a channel when its members mod-ify the channel by broadcasting a configuration updatetransaction (Sec 46)

3 Optionally access control in those configurations wherethe ordering service acts as a trusted entity restrictingbroadcasting of transactions and receiving of blocks tospecified clients and peers

The ordering service is bootstrapped with a genesis blockon the system channel This block carries a configurationtransaction that defines the properties of the ordering ser-vice

The current production implementation consists of or-dering-service nodes (OSNs) that implement the opera-tions described here and communicate through the sys-tem channel The actual atomic broadcast function is pro-vided by an instance of Apache Kafka (httpkafkaapacheorg) which offers scalable publish-subscribe mes-saging and strong consistency despite node crashes basedon ZooKeeper Kafka may run on physical nodes separatefrom the OSNs The OSNs act as proxies between the peersand Kafka

An OSN directly injects a newly received transaction tothe atomic broadcast (eg to the Kafka broker) On the otherhand the nodes batch transactions received from the atomicbroadcast and form blocks A block is cut as soon as one ofthree conditions is met (1) the block contains the specifiedmaximal number of transactions (2) the block has reached amaximal size (in bytes) or (3) an amount of time has elapsedsince the first transaction of a new block was received asexplained below

This batching process is deterministic and therefore pro-duces the same blocks at all nodes It is easy to see thatthe first two conditions are trivially deterministic given thestream of transactions received from the atomic broadcastTo ensure deterministic block production in the third casea node starts a timer when it reads the first transaction ina block from the atomic broadcast If the block is not yetcut when the timer expires the node broadcasts a specialtime-to-cut transaction on the channel which indicates thesequence number of the block which it intends to cut On theother hand every node immediately cuts a new block upon

receiving the first time-to-cut transaction for the given blocknumber Since this transaction is atomically delivered to allconnected nodes they all include the same list of transac-tions in the block (To deploy this scheme in the presence off Byzantine-faulty OSNs the block is cut only when receiv-ing f + 1 time-to-cut transactions)

The orderers persist a range of the most recently deliveredblocks directly to their filesystem so they can answer topeers retrieving blocks through deliver

The orderer using Kafka is one of three ordering serviceimplementations currently available A centralized orderercalled Solo runs on one node and is used for development Aproof-of-concept orderer based on BFT-SMaRt [3] has alsobeen made available [34] it ensures the atomic broadcastservice but not yet reconfiguration and access control Thisillustrates the modularity of consensus in Fabric

43 Peer GossipOne advantage of separating the execution ordering andvalidation phases is that they can be scaled independentlyHowever since most consensus algorithms (in the CFT andBFT models) are bandwidth-bound the throughput of the or-dering service is capped by the network capacity of its nodesConsensus cannot be scaled up by adding more nodes [1436] rather throughput will decrease However since order-ing and validation are decoupled we are interested in ef-ficiently broadcasting the execution results to all peers forvalidation after the ordering phase This is precisely thegoal of the gossip component which utilizes epidemic mul-ticast [15] for this purpose The blocks are signed by theordering service This means that a peer can upon receiv-ing all blocks independently assemble the blockchain andverify its integrity

Dissemination through gossip is robust and resistant tothe node failures in contrast to overlay networks Makingrandom selections allows gossip to reduce the overhead ofmaintaining connectivity among peers and moreover re-duces the attack surface Gossip works well in the permis-sioned environment where it can withstand sybil attacks andmessage forgery

The communication layer for gossip is based on gRPCand utilizes TLS with mutual authentication which enableseach side to bind the TLS credentials to the identity of theremote peer The gossip component maintains an up-to-datemembership view of the online peers in the system All peersindependently build a local view from periodically dissem-inated membership data Furthermore a peer can reconnectto the view after a crash or a network outage

The main purpose of gossip is to reliably distribute mes-sages ie blocks from the ordering phase among the peerstaking advantage of a push-pull protocol It uses two phasesfor information dissemination during push each peer se-lects a random set of active neighbors from the membershipview and forwards them the message during pull each peerperiodically probes a set of randomly selected peers and re-

9 201821

quests missing messages It has been shown [15 22] thatusing both methods in tandem is crucial to optimally utilizethe available bandwidth and to ensure that all peers receiveall messages with high probability

In order to reduce the load of sending blocks from the or-dering nodes to the network the protocol also elects a leaderpeer that pulls blocks from the ordering service on their be-half and initiates the gossip distribution This mechanism isresilient to leader failures

Another task of gossip is to transfer the state to newlyjoining peers and peers that were disconnected for a longtime They need to receive all blocks in the chain This fea-ture relies on the fact that the largest block-sequence num-ber stored by each peer is disseminated with the membershipdata

44 LedgerThe ledger component at each peer maintains the ledgerand the blockchain state on persistent storage and enablessimulation validation and ledger-update phases Broadlyit consists of a block store and a peer transaction manager(PTM)

Ledger block store The ledger block store persists trans-action blocks and is implemented as a set of append-onlyfiles Since the blocks are immutable and arrive in a defi-nite order an append-only structure gives maximum perfor-mance In addition the block store maintains a few indicesfor random access to a block or to a transaction in a block

Peer transaction manager (PTM) The PTM maintainsthe latest state in a versioned key-value store It stores onetuple of the form (key val ver) for each unique entry keystored by any chaincode containing its most recently storedvalue val and its latest version ver The version consistsof the block sequence number and the sequence number ofthe transaction (that stores the entry) within the block Thismakes the version unique and monotonically increasing

The PTM uses a local key-value store to realize the ver-sioned key-value store implemented by a LevelDB key-value database implemented in Go (httpsgithubcomsyndtrgoleveldb) or Apache CouchDB (httpcouchdbapacheorg)

During simulation the PTM provides a stable snapshotof the latest state to the transaction As mentioned in Sec-tion 32 the PTM records in readset a tuple (key ver) foreach entry accessed by GetState and in writeset a tuple(key val) for each entry updated with PutState by the trans-action In addition the PTM supports range queries forwhich it computes a cryptographic hash of the query results(a set of tuples (key ver)) and adds the query string itself andthe hash to readset

For transaction validation (Sec 34) the PTM validatesall transactions in a block sequentially This checks whethera transaction conflicts with any preceding transaction (withinthe block or earlier) For any key in readset if the version

recorded in readset differs from the version present in thelatest state (assuming that all preceding valid transactionsare committed) then the PTM marks the transaction as in-valid For range queries the PTM re-executes the query andcompares the hash with the one present in readset to ensurethat no phantom reads occur This read-write conflict seman-tics results in one-copy serializability [23]

The ledger component tolerates a crash of the peer duringthe ledger update as follows After receiving a new block thePTM has already performed validation and marked transac-tions as valid or invalid within the block using a bit mask asmentioned in Section 34 The ledger now writes the blockto the ledger block store flushes it to disk and subsequentlyupdates the block store indices Then the PTM applies thestate changes from writeset of all valid transactions to the lo-cal versioned store Finally it computes and persists a valuesavepoint which denotes the largest successfully appliedblock number The value savepoint is used to recover theindices and the latest state from the persisted blocks whenrecovering from a crash

45 Chaincode ExecutionChaincode is executed within an environment that is looselycoupled with the rest of the peer and that supports pluginsfor adding new languages for programming chaincodes Cur-rently three languages are supported for chaincode Go Javaand Node

Every user-level or application chaincode runs in a sepa-rate process within a Docker container environment whichisolates the chaincodes from each other and from the peerThis also simplifies the management of the lifecycle forchaincodes (ie starting stopping or aborting chaincode)The chaincode and the peer communicate using gRPC mes-sages Through this loose coupling the peer is agnostic ofthe actual language in which chaincode is implemented

In contrast to application chaincode system chaincoderuns directly in the peer process System chaincode canimplement specific functions needed by Fabric and may beused in situations where the isolation among user chaincodesis overly restrictive More details on system chaincodes aregiven in the next section

46 Configuration and System ChaincodesFabricrsquos basic behavior is customized through channel con-figuration and through special chaincodes known as systemchaincodes

Channel configuration Recall that a channel forms onelogical blockchain The configuration of a channel is main-tained in metadata persisted in special configuration blocksEach configuration block contains the full channel config-uration and does not contain any other transactions Eachblockchain begins with a configuration block known as thegenesis block which is used to bootstrap the channel Thechannel configuration includes

10 201821

bull Definitions of the MSPs for the participating nodesbull The network addresses of the OSNsbull Shared configuration for the consensus implementation

and the ordering service such as batch size and timeoutsbull Rules governing access to the ordering service operations

(broadcast and deliver)bull Rules governing how each part the channel configuration

may be modified

The configuration of a channel may be updated using achannel configuration update transaction This transactioncontains a representation of the changes to be made to theconfiguration as well as a set of signatures The orderingservice nodes evaluate whether the update is valid by usingthe current configuration to verify that the modifications areauthorized using the signatures The orderers then generate anew configuration block which embeds the new configura-tion and the configuration update transaction Peers receiv-ing this block validate whether the configuration update isauthorized based on the current configuration if valid theyupdate their current configuration

System chaincodes The application chaincodes are de-ployed with a reference to an endorsement system chaincode(ESCC) and to a validation system chaincode (VSCC) Thesetwo chaincodes are selected in a symmetric way such thatthe output of the ESCC (an endorsement) may be validatedas part of the input to the VSCC

The ESCC takes as input a proposal and the proposalsimulation results If the results are satisfactory then theESCC produces a response containing the results and theendorsement For the default ESCC this endorsement issimply a signature by the peerrsquos local signing identity

The VSCC takes as input a transaction and outputswhether that transaction is valid For the default VSCCthe endorsements are collected and evaluated against theendorsement policy specified for the chaincode

Further system chaincodes implement other support func-tions such as configuration and chaincode lifecycle

5 EvaluationEven though Fabric is not yet performance-tuned and opti-mized we report in this section on some preliminary per-formance numbers Fabric is a complex distributed systemits performance depends on many parameters including thechoice of a distributed application and transaction size theordering service and consensus implementation and their pa-rameters the network parameters and topology of nodes inthe network the hardware on which nodes run the num-ber of nodes and channels further configuration parametersand the network dynamics Therefore in-depth performanceevaluation of Fabric is postponed to future work

In the absence of a standard benchmark for blockchainswe use the most prominent blockchain application for eval-uating Fabric a simple authority-minted cryptocurrency that

uses the data model of Bitcoin which we call Fabric coin(abbreviated hereafter as Fabcoin) This allows us to put theperformance of Fabric in the context of other permissionedblockchains which are often derived from Bitcoin or Ethe-reum For example it is also the application used in bench-marks of other permissioned blockchains [19 32]

In the following we first describe Fabcoin (Sec 51)which also demonstrates how to customize the validationphase and endorsement policy In Section 52 we present thebenchmark and discuss our results

51 Fabric Coin (Fabcoin)UTXO cryptocurrencies The data model introduced byBitcoin [28] has become known as ldquounspent transaction out-putrdquo or UTXO and is also used by many other cryptocur-rencies and distributed applications UTXO represents eachstep in the evolution of a data object as a separate atomicstate on the ledger Such a state is created by a transactionand destroyed (or ldquoconsumedrdquo) by another unique transac-tion occurring later Every given transaction destroys a num-ber of input states and creates one or more output states Aldquocoinrdquo in Bitcoin is initially created by a coinbase transac-tion that rewards the ldquominerrdquo of the block This appears onthe ledger as a coin state designating the miner as the ownerAny coin can be spent in the sense that the coin is assignedto a new owner by a transaction that atomically destroys thecurrent coin state designating the previous owner and createsanother coin state representing the new owner

We capture the UTXO model in the key-value store ofFabric as follows Each UTXO state corresponds to a uniqueKVS entry that is created once (the coin state is ldquounspentrdquo)and destroyed once (the coin state is ldquospentrdquo) Equivalentlyevery state may be seen as a KVS entry with logical ver-sion 0 after creation when it is destroyed again it receivesversion 1 There should not be any concurrent updates tosuch entries (eg attempting to update a coin state in differ-ent ways amounts to double-spending the coin)

Value in the UTXO model is transferred through transac-tions that refer to several input states that all belong to theentity issuing the transaction An entity owns a state becausethe public key of the entity is contained in the state itselfEvery transaction creates one or more output states in theKVS representing the new owners deletes the input states inthe KVS and ensures that the sum of the values in the inputstates equals the sum of the output statesrsquo values There isalso a policy determining how value is created (eg coin-base transactions in Bitcoin or specific mint operations inother systems) or destroyed (ie as a fee consumed by theexecution)

Fabcoin implementation Each state in Fabcoin is a tupleof the form (key val) = (txidj (amount owner label)) de-noting the coin state created as the j-th output of a transac-tion with identifier txid and allocating amount units labeledwith label to the entity whose public key is owner Labels are

11 201821

strings used to identify a given type of a coin (eg lsquoUSDlsquolsquoEURlsquo lsquoFBClsquo) Transaction identifiers are short values thatuniquely identify every Fabric transaction The Fabcoin im-plementation consists of three parts (1) a client wallet (2)the Fabcoin chaincode and (3) a custom VSCC for Fabcoinimplementing its endorsement policy

Client wallet By default each Fabric client maintains aFabcoin wallet that locally stores a set of cryptographic keysallowing the client to spend coins For creating a SPENDtransaction that transfers one or more coins the client walletcreates a Fabcoin request request = (inputs outputs sigs)containing (1) a list of input coin states inputs = [in ]that specify coin states (in ) the client wishes to spend aswell as (2) a list of output coin states outputs = [(amount ownerlabel) ] The client wallet signs with the private keys thatcorrespond to the input coin states the concatenation of theFabcoin request and a nonce which is a part of every Fabrictransaction and adds the signatures in a set sigs A SPENDtransaction is valid when the sum of the amounts in the in-put coin states is at least the sum of the amounts in the out-puts and when the output amounts are positive For a MINTtransaction that creates new coins inputs contains only anidentifier (ie a reference to a public key) of a special en-tity called Central Bank (CB) whereas outputs contains anarbitrary number of coin states To be considered valid thesignatures of a MINT transaction in sigs must be a crypto-graphic signature under the public key of CB over the con-catenation of the Fabcoin request and of the aforementionednonce Fabcoin may be configured to use multiple CBs orspecify a threshold number of signatures from a set of CBsFinally the client wallet includes the Fabcoin request into atransaction and sends this to a peer of its choice

Fabcoin chaincode A peer runs the chaincode of Fab-coin which simulates the transaction and creates readsets andwritesets In a nutshell in the case of a SPEND transactionfor every input coin state in isin inputs the chaincode first per-forms GetState(in) this puts in into the readset along with itscurrent version in Fabricrsquos versioned KVS (Sec 44) Thenthe chaincode executes DelState(in) for every input state inwhich also adds in to the writeset and effectively marks thecoin state as ldquospentrdquo Finally for j = 1 |outputs| thechaincode executes PutState(txidj out) with the j-th out-put out = (amount owner label) in outputs In addition apeer optionally runs the transaction validation code as de-scribed next in the VSCC step for Fabcoin this is not neces-sary since the custom VSCC actually validates transactionsbut it allows the (correct) peers to filter out potentially mal-formed transactions In our implementation the chaincoderuns the Fabcoin VSCC without cryptographically verifyingthe signatures

Custom VSCC Finally every peer validates Fabcoin trans-actions using custom VSCC This verifies first the cryp-tographic signature(s) in sigs under the respective public

key(s) and performs semantic validation as follows For aMINT transaction it checks that the output states are createdunder the matching transaction identifier (txid) and that alloutput amounts are positive For a SPEND transaction theVSCC additionally verifies (1) that for all input coin statesan entry in the readset has been created and that it was alsoadded to the writeset and marked as deleted (2) that the sumof the amounts for all input coin states equals the sum ofamounts of all output coin states and (3) that input and out-put coin labels match Here the VSCC obtains the amountsfor the input coins by retrieving their current values from theledger

Notice that the Fabcoin VSCC does not check transac-tions for double spending as this occurs through Fabricrsquosstandard validation that runs after the custom VSCC In par-ticular if two transactions attempt to assign the same un-spent coin state to a new owner both would pass the VSCClogic but would be caught subsequently in the read-writeconflict check performed by the PTM According to Sec-tions 34 and 44 the PTM verifies that the current versionnumber stored in the ledger matches the one in the readsethence after the first transaction has changed the version ofthe coin state the transaction ordered second will be recog-nized as invalid

52 ExperimentsSetup Unless explicitly mentioned differently in our ex-periments (1) nodes run on Fabric version v11-preview2

instrumented for performance evaluation through local log-ging (2) nodes are hosted in a single IBM Cloud (SoftLayer)datacenter as dedicated VMs interconnected with 1Gbps net-working (3) all nodes are 20 GHz 16-vCPU VMs runningUbuntu with 8GB of RAM and SSDs as local disks (4) asingle-channel ordering service runs a typical Kafka orderersetup with 3 ZooKeeper nodes 4 Kafka brokers and 3 Fab-ric orderers all on distinct VMs (5) there are 5 peers in to-tal all of which are Fabcoin endorsers and (6) signaturesuse the default 256-bit ECDSA scheme In order to measureand stage latencies in the transaction flow spanning multiplenodes the node clocks are synchronized with an NTP ser-vice throughout the experiments All communication amongFabric nodes is configured to use TLS

Methodology In every experiment in the first phase weinvoke transactions that contain only Fabcoin MINT opera-tions to produce the coins and then run a second phase ofthe experiment in which we invoke Fabcoin SPEND opera-tion on previously minted coins (effectively running single-input single-output SPEND transactions) When reportingthroughput measurements we use an increasing number ofFabric CLI clients (modified to issue concurrent requests)running on a single VM until the end-to-end throughputis saturated and state the throughput just before saturationThroughput numbers are reported as an average throughput

2Patched with commit ID 9e770062 in the Fabric master branch

12 201821

measured throughout the duration of an experiment disre-garding the ldquotailrdquo of the experiment where throughput dropsdue to some client threads finishing submitting their share oftransactions In every experiment client threads submit atleast 500k MINT and SPEND transactions

Choosing the block size A critical Fabric configurationparameter that impacts both throughput and latency is blocksize To fix the block size for subsequent experiments andto evaluate the impact of block size on performance we ranexperiments varying block size from 05MB to 4MB Resultsare depicted in Fig 6 showing peak throughput measured atthe peers along with the corresponding average end-to-end(e2e) latency

Figure 6 Impact of block size on throughput and latency

We can observe that throughput does not significantlyimprove beyond a block size of 2MB but latency gets worse(as expected) Therefore we adopt 2MB as the block sizefor the following experiments with the goal of maximizingthe measured throughput assuming the end-to-end latencyof roughly 500ms is acceptable

Size of transactions During this experiment we also ob-served the size MINT and SPEND transactions In particularthe 2MB-blocks contained 473 MINT or 670 SPEND transac-tions ie the average transaction size is 306kB for SPENDand 433kB for MINT In general transactions in Fabric arelarge because they carry certificate information BesidesMINT transactions of Fabcoin are larger than SPEND trans-actions because they carry CB certificates This is an avenuefor future improvement of both Fabric and Fabcoin

Impact of peer CPU Fabric peers run many CPU-intensivecryptographic operations To estimate the impact of CPU onthroughput we performed a set of experiments in which 4peers run on 4 8 16 and 32 vCPU VMs while also doingcoarse-grained latency staging of block validation to betteridentify bottlenecks Our experiment focused on the vali-dation phase as ordering with Kafka ordering service hasnever been a bottleneck in our experiments The validation

phase and in particular the VSCC validation of Fabcoinis computationally intensive due to many digital signatureverifications performed in this phase

Figure 7 Impact of peer CPU on end-to-end throughputand block validation latency

The results with 2MB blocks are shown in Fig 7 SPENDlatency is not shown in Fig 7 as it scales similarly to MINTWe can observe that Fabcoin VSCC validation scales quasi-linearly with CPU as Fabric VSCC validation is embarrass-ingly parallel However the read-write-check and ledger-access stages are sequential and become dominant with alarger number of cores (vCPUs) This suggests that futureversions of Fabric could profit from pipelining validationstages (which are now sequential) optimizing stable-storageaccess and parallelizing read-write dependency checks

Finally in this experiment we measured over 3560 trans-actions per second (tps) average SPEND throughput at the32-vCPU peer The MINT throughput is in general slightlylower than that of SPEND but the difference remains within10

Latency profiling by stages We further performed coarse-grained profiling of latency during our previous experimentat the peak reported throughput Results are depicted inTable 1 The ordering phase comprises broadcast-deliverlatency and internal latency within a peer before valida-tion starts The table reports average latencies for MINTand SPEND standard deviation and tail latencies (99 and999)

We can observe that ordering dominates the overall la-tency We observe that average latencies are below 550mswith sub-second tail latencies In particular the highestend-to-end latencies in our experiment come from the firstblocks during the load build-up Latency under lower loadcan be regulated and reduced by leveraging the time-to-cutparameter of the orderer (see Sec 33) which we basicallydo not use in our experiments as we set it to a large value

SSD vs RAM disk To evaluate the overhead of using lo-cal SSDs for stable storage we repeated the previous exper-iment with RAM disks (tmpfs) mounted as stable storage at

13 201821

all peer VMs The benefits are limited as tmpfs only helpswith the ledger phase of the validation at the peer We mea-sured sustained peak throughput at 3870 SPEND tps at 32-vCPU peer roughly a 9 improvement over SSD

avg stdev 99 999(1) endorsement 56 75 24 42 15 21 19 26(2) ordering 248 365 600 920 484 624 523 636(3) VSCC val 310 353 102 90 727 570 113 1084(4) RW check 348 615 39 93 470 885 590 933(5) ledger 506 722 62 88 701 975 725 105(6) validation (3+4+5) 116 169 128 178 156 216 199 230(7) end-to-end (1+2+6) 371 542 63 94 612 805 646 813

Table 1 Latency statistics in milliseconds (ms) for MINTand SPEND broken down into five stages at a 32-vCPUpeer with 2MB blocks Validation (6) comprises stages 34 and 5 the end-to-end latency contains stages 1ndash5

6 Related WorkThe architecture of Fabric resembles that of a middleware-replicated database as pioneered by Kemme and Alonso [24]However all existing work on this addressed only crash fail-ures not the setting of distributed trust that corresponds to aBFT system For instance a replicated database with asym-metric update processing [25 Sec 63] relies on one node toexecute each transaction which would not work on a block-chain The execute-order-validate architecture of Fabric canbe interpreted as a generalization of this work to the Byzan-tine model with practical applications to distributed ledgers

Byzantium [17] and HRDB [35] are two further prede-cessors of Fabric from the viewpoint of BFT database repli-cation Byzantium allows transactions to run in parallel anduses active replication but totally orders BEGIN and COM-MITROLLBACK using a BFT middleware In its opti-mistic mode every operation is coordinated by a single mas-ter replica if the master is suspected to be Byzantine allreplicas execute the transaction operations for the master andit triggers a costly protocol to change the master HRDB re-lies in an even stronger way on a correct master In contrastto Fabric both systems use active replication cannot handlea flexible trust model and rely on deterministic operationsby the replicas However their database API is richer thanthe KVS model of Fabric

In Eve [21] a related architecture for SMR has been ex-plored also in the BFT model Its peers execute transactionsconcurrently and then verify that they all reach the same out-put state using a consensus protocol If the states divergethey roll back and execute operations sequentially Eve con-tains the element of independent execution which also existsin Fabric but offers none of its other features

A large number of distributed ledger platforms in thepermissioned model have come out recently which makesit impossible to compare to all (some prominent ones areTendermint [5] Quorum [19] Chain Core [12] Multi-chain (httpswwwmultichaincom) HyperledgerSawtooth (httpssawtoothhyperledgerorg) the

Volt proposal [32] and more see references in recent over-views [9 16]) All platforms follow the order-execute archi-tecture as discussed in Section 2 As a representative exam-ple take the Quorum platform [19] an enterprise-focusedversion of Ethereum With its consensus based on Raft [29]it disseminates a transaction to all peers using gossip and theRaft leader (called minter) assembles valid transactions to ablock and distributes this using Raft All peers execute thetransaction in the order decided by the leader Therefore itsuffers from the limitations mentioned in Sections 1ndash2

References[1] P-L Aublin R Guerraoui N Knezevic V Quema and

M Vukolic The next 700 BFT protocols ACM Trans Com-put Syst 32(4)121ndash1245 Jan 2015

[2] E Ben-Sasson A Chiesa C Garman M Green I MiersE Tromer and M Virza Zerocash Decentralized anonymouspayments from bitcoin In IEEE Symposium on Security ampPrivacy pages 459ndash474 2014

[3] A N Bessani J Sousa and E A P Alchieri State ma-chine replication for the masses with BFT-SMART In In-ternational Conference on Dependable Systems and Networks(DSN) pages 355ndash362 2014

[4] G Bracha and S Toueg Asynchronous consensus and broad-cast protocols J ACM 32(4)824ndash840 1985

[5] E Buchman Tendermint Byzantine fault tolerance in the ageof blockchains MSc Thesis University of Guelph CanadaJune 2016

[6] N Budhiraja K Marzullo F B Schneider and S Toueg Theprimary-backup approach In S Mullender editor DistributedSystems (2nd Ed) pages 199ndash216 ACM PressAddison-Wesley 1993

[7] C Cachin R Guerraoui and L E T Rodrigues Introductionto Reliable and Secure Distributed Programming (2 ed)Springer 2011

[8] C Cachin S Schubert and M Vukolic Non-determinismin byzantine fault-tolerant replication In 20th InternationalConference on Principles of Distributed Systems (OPODIS)2016

[9] C Cachin and M Vukolic Blockchain consensus protocolsin the wild In A W Richa editor 31st Intl Symposium onDistributed Computing (DISC 2017) pages 11ndash116 2017

[10] J Camenisch and E V Herreweghen Design and imple-mentation of the idemix anonymous credential system InACM Conference on Computer and Communications Security(CCS) pages 21ndash30 2002

[11] M Castro and B Liskov Practical Byzantine fault toler-ance and proactive recovery ACM Trans Comput Syst20(4)398ndash461 Nov 2002

[12] Chain Chain protocol whitepaper httpschaincom

docs12protocolpaperswhitepaper 2017

[13] B Charron-Bost F Pedone and A Schiper editors Replica-tion Theory and Practice volume 5959 of Lecture Notes inComputer Science Springer 2010

14 201821

[14] K Croman C Decker I Eyal A E Gencer A JuelsA Kosba A Miller P Saxena E Shi E G Sirer et alOn scaling decentralized blockchains In International Con-ference on Financial Cryptography and Data Security (FC)pages 106ndash125 Springer 2016

[15] A Demers D Greene C Hauser W Irish J LarsonS Shenker H Sturgis D Swinehart and D Terry Epidemicalgorithms for replicated database maintenance In ACMSymposium on Principles of Distributed Computing (PODC)pages 1ndash12 ACM 1987

[16] T T A Dinh R Liu M Zhang G Chen B C Ooi andJ Wang Untangling blockchain A data processing viewof blockchain systems e-print arXiv170805665 [csDB]2017

[17] R Garcia R Rodrigues and N M Preguica Efficient mid-dleware for Byzantine fault tolerant database replication InEuropean Conference on Computer Systems (EuroSys) pages107ndash122 2011

[18] R Guerraoui R R Levy B Pochon and V Quema Through-put optimal total order broadcast for cluster environmentsACM Trans Comput Syst 28(2)51ndash532 2010

[19] J P Morgan Quorum whitepaper httpsgithubcom

jpmorganchasequorum-docs 2016

[20] F P Junqueira B C Reed and M Serafini Zab High-performance broadcast for primary-backup systems In In-ternational Conference on Dependable Systems amp Networks(DSN) pages 245ndash256 2011

[21] M Kapritsos Y Wang V Quema A Clement L Alvisiand M Dahlin All about Eve Execute-verify replicationfor multi-core servers In Symposium on Operating SystemsDesign and Implementation (OSDI) pages 237ndash250 2012

[22] R Karp C Schindelhauer S Shenker and B Vocking Ran-domized rumor spreading In Symposium on Foundations ofComputer Science (FOCS) pages 565ndash574 IEEE 2000

[23] B Kemme One-copy-serializability In Encyclopedia ofDatabase Systems pages 1947ndash1948 Springer 2009

[24] B Kemme and G Alonso A new approach to developingand implementing eager database replication protocols ACMTransactions on Database Systems 25(3)333ndash379 2000

[25] B Kemme R Jimenez-Peris and M Patino-MartınezDatabase Replication Synthesis Lectures on Data Manage-ment Morgan amp Claypool Publishers 2010

[26] A E Kosba A Miller E Shi Z Wen and C PapamanthouHawk The blockchain model of cryptography and privacy-preserving smart contracts In 37th IEEE Symposium onSecurity amp Privacy 2016

[27] S Liu P Viotti C Cachin V Quema and M Vukolic XFTpractical fault tolerance beyond crashes In Symposium onOperating Systems Design and Implementation (OSDI) pages485ndash500 2016

[28] S Nakamoto Bitcoin A peer-to-peer electronic cash systemhttpwwwbitcoinorgbitcoinpdf 2009

[29] D Ongaro and J Ousterhout In search of an understandableconsensus algorithm In USENIX Annual Technical Confer-ence (ATC) pages 305ndash320 2014

[30] F Pedone and A Schiper Handling message semanticswith Generic Broadcast protocols Distributed Computing15(2)97ndash107 2002

[31] F B Schneider Implementing fault-tolerant services usingthe state machine approach A tutorial ACM Comput Surv22(4)299ndash319 1990

[32] S Setty S Basu L Zhou M L Roberts and R VenkatesanEnabling secure and resource-efficient blockchain networkswith VOLT Technical Report MSR-TR-2017-38 MicrosoftResearch 2017

[33] A Singh T Das P Maniatis P Druschel and T Roscoe BFTprotocols under fire In Symposium on Networked SystemsDesign amp Implementation (NSDI) pages 189ndash204 2008

[34] J Sousa A Bessani and M Vukolic A Byzantine fault-tolerant ordering service for the Hyperledger Fabric block-chain platform Technical Report arXiv170906921 CoRR2017

[35] B Vandiver H Balakrishnan B Liskov and S MaddenTolerating Byzantine faults in transaction processing systemsusing commit barrier scheduling In ACM Symposium onOperating Systems Principles (SOSP) pages 59ndash72 2007

[36] M Vukolic The quest for scalable blockchain fabric Proof-of-work vs BFT replication In International Workshop onOpen Problems in Network Security (iNetSec) pages 112ndash125 2015

[37] J Yin J Martin A Venkataramani L Alvisi and M DahlinSeparating agreement from execution for Byzantine fault tol-erant services In ACM Symposium on Operating SystemsPrinciples (SOSP) pages 253ndash267 2003

15 201821

  • 1 Introduction
  • 2 Background
    • 21 Order-Execute Architecture for Blockchains
    • 22 Limitations of Order-Execute
    • 23 Further Limitations of Existing Architectures
    • 24 Experience with Order-Execute Blockchain
      • 3 Architecture
        • 31 Fabric Overview
        • 32 Execution Phase
        • 33 Ordering Phase
        • 34 Validation Phase
          • 4 Fabric Components
            • 41 Membership Service
            • 42 Ordering Service
            • 43 Peer Gossip
            • 44 Ledger
            • 45 Chaincode Execution
            • 46 Configuration and System Chaincodes
              • 5 Evaluation
                • 51 Fabric Coin (Fabcoin)
                • 52 Experiments
                  • 6 Related Work

sumed by a transaction execution which is converted at agas price to a cost in the cryptocurrency and billed to thesubmitter of the transaction Ethereum goes a long way tosupport this concept assigns a cost to every low-level com-putation step and introduces its own VM monitor to controlexecution Although this appears to be a viable solution forpublic blockchains it is not adequate for the permissionedmodel for a general-purpose system without a native cryp-tocurrency

The distributed-systems literature proposes many waysto improve performance compared to sequential executionfor instance through parallel execution of unrelated oper-ations [30] Unfortunately such techniques are still to beapplied successfully in the blockchain context of smart con-tracts For instance one challenge is the requirement fordeterministically inferring all dependencies across smartcontracts which is particularly challenging when combinedwith possible confidentiality constraints Furthermore thesetechniques are of no help against DoS attacks by contractcode from untrusted developers

Non-deterministic code Another important problem foran order-execute architecture are non-deterministic transac-tions Operations executed after consensus in active SMRmust be deterministic or the distributed ledger ldquoforksrdquo andviolates the basic premise of a blockchain that all peers holdthe same state This is usually addressed by programmingblockchains in domain-specific languages (eg EthereumSolidity) that are expressive enough for their applications butlimited to deterministic execution However such languagesare difficult to design for the implementer and require addi-tional learning by the programmer Writing smart contractsin a general-purpose language (eg Go Java CC++) in-stead appears more attractive and accelerates the adoption ofblockchain solutions

Unfortunately generic languages pose many problemsfor ensuring deterministic execution Even if the applicationdeveloper does not introduce obviously non-deterministicoperations hidden implementation details can have the samedevastating effect (eg a map iterator is not deterministic inGo) To make matters worse on a blockchain the burdento create deterministic applications lies on the potentiallyuntrusted programmer Only one non-deterministic contractcreated with malicious intent is enough to bring the wholeblockchain to a halt A modular solution to filter divergingoperations on a blockchain has also been investigated [8]but it appears costly in practice

Confidentiality of execution According to the blueprintof public blockchains many permissioned systems run allsmart contracts on all peers However many intended usecases for permissioned blockchains require confidentialityie that access to smart-contract logic transaction dataor ledger state can be restricted Although cryptographictechniques ranging from data encryption to advanced zero-knowledge proofs [2] and verifiable computation [26] can

help to achieve confidentiality this often comes with a con-siderable overhead and is not viable in practice

Fortunately it suffices to propagate the same state to allpeers instead of running the same code everywhere Thusthe execution of a smart contract can be restricted to a subsetof the peers trusted for this task that vouch for the resultsof the execution This design departs from active replicationtowards a variant of passive replication [6] adapted to thetrust model of blockchain

23 Further Limitations of Existing ArchitecturesFixed trust model Most permissioned blockchains rely onasynchronous BFT replication protocols to establish consen-sus [36] Such protocols typically rely on a security assump-tion that among n gt 3f peers up to f are tolerated to mis-behave and exhibit so-called Byzantine faults [4] The samepeers often execute the applications as well under the samesecurity assumption (even though one could actually restrictBFT execution to fewer peers [37]) However such a quan-titative trust assumption irrespective of peersrsquo roles in thesystem may not match the trust required for smart-contractexecution In a flexible system trust at the application levelshould not be fixed to trust at the protocol level A general-purpose blockchain should decouple these two assumptionsand permit flexible trust models for applications

Hard-coded consensus Fabric is the first blockchain sys-tem that introduced pluggable consensus Before Fabric vir-tually all blockchain systems permissioned or not camewith a hard-coded consensus protocol However decades ofresearch on consensus protocols have shown there is no suchldquoone-size-fits-allrdquo solution For instance BFT protocols dif-fer widely in their performance when deployed in potentiallyadversarial environments [33] A protocol with a ldquochainrdquocommunication pattern exhibits provably optimal through-put on a LAN cluster with symmetric and homogeneouslinks [18] but degrades badly on a wide-area heterogeneousnetwork Furthermore external conditions such as load net-work parameters and actual faults or attacks may vary overtime in a given deployment For these reasons BFT consen-sus should be inherently reconfigurable and ideally adapt dy-namically to a changing environment [1] Another importantaspect is to match the protocolrsquos trust assumption to a givenblockchain deployment scenario Indeed one may want toreplace BFT consensus with a protocol based on an alterna-tive trust model such as XFT [27] or a CFT protocol suchas PaxosRaft [29] and ZooKeeper [20] or even a permis-sionless protocol

24 Experience with Order-Execute BlockchainPrior to realizing the execute-order-validate architecture ofFabric the team gained experience with building a per-missioned blockchain platform in the order-execute modelwith PBFT [11] for consensus From feedback obtained inmany proof-of-concept applications the limitations of this

4 201821

approach became immediately clear For instance users of-ten observed diverging states at the peers and reported abug in the consensus protocol in all cases closer inspectionrevealed that the culprit was non-deterministic transactioncode Other complaints addressed limited performance egldquoonly five transactions per secondrdquo until users confessedthat their average transaction took 200ms to execute Wehave learned that the key properties of a blockchain sys-tem namely consistency security and performance mustnot depend on the knowledge and goodwill of its users inparticular since the blockchain should run in an untrustedenvironment

3 ArchitectureIn this section we introduce the three-phase execute-order-validate architecture and then explain the transaction flowThe components of Fabric are discussed in Section 4

31 Fabric OverviewFabric is a distributed operating system for permissio-ned blockchains that executes distributed applications writ-ten in general-purpose programming languages (eg GoJava Nodejs) It securely tracks its execution history inan append-only replicated ledger data structure and has nocryptocurrency built in

Fabric introduces the execute-order-validate blockchainarchitecture and does not follow the standard order-executedesign for reasons explained in Section 2 In a nutshell adistributed application for Fabric consists of two parts

bull A smart contract called chaincode which is programcode that implements the application logic and runs dur-ing the execution phase The chaincode is the central partof a distributed application in Fabric and may be writtenby an untrusted developer Special chaincodes exist formanaging the blockchain system and maintaining param-eters collectively called system chaincodes (Sec 46)

bull An endorsement policy that is evaluated in the validationphase Endorsement policies cannot be chosen or modi-fied by untrusted application developers they are part ofthe system An endorsement policy acts as a static libraryfor transaction validation in Fabric which can merely beparameterized by the chaincode Only designated admin-istrators may run system management functions and havethe right to modify the endorsement policyA typical endorsement policy lets the chaincode specifythe endorsers for a transaction in the form of a set of peersthat are necessary for endorsement it uses a monotonelogical expression on sets such as ldquothree out of fiverdquo orldquoA and B or B and Crdquo Custom endorsement policiesmay implement arbitrary logic (eg our Bitcoin-inspiredcryptocurrency in Sec 51)

A client sends transactions to the peers specified by theendorsement policy Each transaction is then executed byspecific peers and its output is recorded this step is also

Update state

Order rw-sets Atomic broadcast

(consensus) Stateless ordering

service

Persist state on all peers

Simulate trans and endorse

Create rw-set Collect endorse-

ments

OrderExecute Validate

Validate endorse-ments amp rw-sets

Eliminate invalid and conflicting trans

Figure 2 Execute-order-validate architecture of Fabric (rw-set means a readset and writeset as explained in Sec 32)

called endorsement After execution transactions enter theordering phase which uses a pluggable consensus protocolto produce a totally ordered sequence of endorsed transac-tions grouped in blocks These are broadcast to all peerswith the (optional) help of gossip Unlike standard activereplication [31] which totally orders transaction inputs Fab-ric orders transaction outputs combined with state depen-dencies as computed during the execution phase Each peerthen validates the state changes from endorsed transactionswith respect to the endorsement policy and the consistencyof the execution in the validation phase All peers validatethe transactions in the same order and validation is determin-istic In this sense Fabric introduces a novel hybrid replica-tion paradigm in the Byzantine model which combines pas-sive replication (the pre-consensus computation of state up-dates) and active replication (the post-consensus validationof execution results and state changes)

The execute-order-validate approach is illustrated by Fig 2

A Fabric blockchain consists of a set of nodes that form anetwork As Fabric is permissioned all nodes that participatein the network have an identity as provided by a modularmembership service provider (MSP) (Sec 41) Nodes in aFabric network take up one of three roles

Clients submit transaction proposals for execution helporchestrate the execution phase and finally broadcasttransactions for ordering

Peers execute transaction proposals and validate transac-tions Peers also maintain the blockchain ledger anappend-only data structure recording all transactions inthe form of a hash chain as well as the state a succinctrepresentation of the latest ledger state Not all peers ex-ecute all transaction proposals only a subset of themcalled endorsing peers (or simply endorsers) as speci-fied by the policy of the chaincode to which the transac-tion pertains However all peers maintain the completeledger

Orderering Service Nodes (OSN) (or simply orderers)are the nodes that collectively form the ordering ser-vice In short the ordering service establishes the totalorder of all transactions in Fabric where each transac-tion contains state updates and dependencies computedduring the execution phase along with cryptographic sig-

5 201821

client endorsingpeer 1

endorsingpeer 2

endorsingpeer 3

Peer(non-endorsing)

ord

ering service

orderers

Invocation Commit

1 1 1

2

3

4 4

55

5 5

1 Chaincode execution

2 Endorsementcollection

34Ordering

BroadcastDelivery5 Validation

Figure 3 Fabric high level transaction flow

natures of the endorsing peers that computed them Or-derers are entirely unaware of the application state anddo not participate in the execution nor in the validationof transactions This design choice renders consensus inFabric as modular as possible and simplifies replacementof consensus protocols in Fabric

Since it is possible to run one physical node with multipleroles Fabric can also be operated like a traditional peer-to-peer blockchain system in which every node maintainsthe state and invokes validates and orders transactions Thetransaction flow in Fabric across the different types of nodesis depicted in Fig 3

Compared to the description focusing on one singleblockchain so far a Fabric network actually supports mul-tiple blockchains connected to the same ordering serviceEach such blockchain is called a channel and may havedifferent peers as its members Channels can be used topartition the state of the blockchain network but consen-sus across channels is not coordinated and the total order oftransactions in each channel is separate from the others Cer-tain deployments that consider all orderers as trusted mayalso implement by-channel access control for peers In thefollowing we mention channels only briefly and concentrateon one single channel

The next three sections explain the transaction flow inFabric and illustrate the steps of the execution ordering andvalidation phases A Fabric network is shown in Fig 4

32 Execution PhaseIn the execution phase clients send the transaction proposal(or simply proposal) to one or more endorsers for execu-tion Recall that every chaincode implicitly specifies a set ofendorsers via the endorsement policy A proposal containsthe identity of the submitting client (according to the MSP)the transaction payload in the form of an operation to exe-cute parameters and the identifier of the chaincode to whichit belongs a nonce to be used only once by each client (suchas a counter or a random value) and a transaction identifierderived from the client identifier and the nonce The clientalso signs the proposal

Appl

MSP

P

SDK

P P P P P P

SDK

OSN

P

OSNOSNOSNOSN

Client

Orderingservice

Peer-to-peer gossip

Peers (P)

Client

Appl

Appl

Figure 4 A Fabric network with federated MSPs and run-ning multiple (differently shaded and colored) chaincodesselectively installed on peers according to policy

The endorsers simulate the proposal by executing the op-eration on the specified chaincode which has been installedon the blockchain The chaincode runs in a Docker containerisolated from the main endorser process

A proposal is simulated against the endorserrsquos localblockchain state without any synchronization with otherpeers at this point moreover endorsers do not persist theresults of the simulation to the ledger state The state ofthe blockchain is maintained by the peer transaction man-ager (PTM) in the form of a versioned key-value store inwhich successive updates to a key have monotonically in-creasing version numbers (Sec 44) The state created by achaincode is scoped exclusively to that chaincode and can-not be accessed directly by another chaincode Note that thechaincode is not supposed to maintain the local state in theprogram code only what it maintains in the blockchain statethat is accessed with GetState PutState and DelState oper-ations Given the appropriate permission a chaincode mayinvoke another chaincode to access its state within the samechannel

As a result of the simulation each endorser produces avalue writeset consisting of the state updates produced bysimulation (ie the modified keys along with their new val-ues) as well as a readset representing the version dependen-cies of the proposal simulation (ie all keys read during sim-ulation along with their version numbers) After the simula-tion the endorser cryptographically signs a message calledendorsement which contains readset and writeset (togetherwith metadata such as transaction ID endorser ID and en-dorser signature) and sends it back to the client in a proposalresponse The client collects endorsements until they satisfythe endorsement policy of the chaincode which the trans-action invokes (see Sec 34) In particular this requires allendorsers as determined by the policy to produce the sameexecution result (ie identical readset and writeset) Thenthe client proceeds to create the transaction and passes it tothe ordering service

6 201821

Discussion on design choices As the endorsers simulatethe proposal without synchronizing with other endorserstwo endorsers may execute it on different states of the ledgerand produce different outputs For the standard endorse-ment policy which requires multiple endorsers to producethe same result this implies that under high contention ofoperations accessing the same keys a client may not beable to satisfy the endorsement policy This is a new ele-ment compared to primary-backup replication in replicateddatabases with synchronization through middleware [24] aconsequence of the assumption that no single peer is trustedfor correct execution in a blockchain

We consciously adopted this design as it considerablysimplifies the architecture and is adequate for typical block-chain applications As demonstrated by the approach of Bit-coin distributed applications can be formulated such thatcontention by operations accessing the same state can be re-duced or eliminated completely in the normal case (eg inBitcoin two operations that modify the same ldquoobjectrdquo arenot allowed and represent a double-spending attack [28])

Executing a transaction before the ordering phase is crit-ical to tolerating non-deterministic chaincodes as discussedin Section 2 A chaincode in Fabric with non-deterministictransactions can only endanger the liveness of its own oper-ations because a client cannot gather a sufficient number ofendorsements for instance This is much more acceptable inpractice than the situation in an order-execute architecturewhere non-deterministic operations lead to inconsistenciesin the state of the peers

Finally tolerating non-deterministic execution also ad-dresses DoS attacks from untrusted chaincode as an endorsercan simply abort an execution according to a local policy ifit suspects a DoS attack This will not endanger the consis-tency of the system and again such unilateral abortion ofexecution is not possible in order-execute architectures

33 Ordering PhaseWhen a client has collected enough endorsements on a pro-posal it assembles a transaction and submits this to the or-dering service The transaction contains the transaction pay-load (ie the chaincode operation including parameters)transaction metadata and a set of endorsements The order-ing phase establishes a total order on all submitted transac-tions per channel In other words ordering atomically broad-casts [7] endorsements and thereby establishes consensus ontransactions despite faulty orderers Moreover the orderingservice batches multiple transactions into blocks and outputsa hash-chained sequence of blocks containing transactionsGrouping or batching transactions into blocks improves thethroughput of the broadcast protocol which is a well-knowntechnique in the context of fault-tolerant broadcasts

At a high level the interface of the ordering service onlysupports the following two operations These operations areinvoked by a peer and implicitly parameterized by a channelidentifier

bull broadcast(tx) A client calls this operation to broadcastan arbitrary transaction tx which usually contains thetransaction payload and a signature of the client fordissemination

bull B larr deliver(s) A client calls this to retrieve block Bwith non-negative sequence number s The block con-tains a list of transactions [tx1 txk] and a hash-chain value h representing the block with sequence num-ber sminus1 ie B = ([tx1 txk] h) As the client maycall this multiple times and always returns the same blockonce it is available we say the peer delivers block B withsequence number s when it receives B for the first timeupon invoking deliver(s)

The ordering service ensures that the delivered blocks onone channel are totally ordered More specifically orderingensures the following safety properties for each channel

Agreement For any two blocks B delivered with sequencenumber s and Bprime delivered with sprime at correct peers suchthat s = sprime it holds B = Bprime

Hashchain integrity If some correct peer delivers a block Bwith number s and another correct peer delivers blockBprime = ([tx1 txk] h

prime) with number s + 1 then itholds hprime = H(B) where H(middot) denotes the cryptographichash function

No skipping If a correct peer p delivers a block with num-ber s gt 0 then for each i = 0 s minus 1 peer p hasalready delivered a block with number i

No creation When a correct peer delivers block B withnumber s then for every tx isin B some client has alreadybroadcast tx

For liveness the ordering service supports at least thefollowing ldquoeventualrdquo property

Validity If a correct client invokes broadcast(tx) then ev-ery correct peer eventually delivers a block B that in-cludes tx with some sequence number

However every individual ordering implementation is al-lowed to come with its own liveness and fairness guaranteeswith respect to client requests

Since there may be a large number of peers in the block-chain network but only relatively few nodes are expectedto implement the ordering service Fabric can be config-ured to use a built-in gossip service for disseminating deliv-ered blocks from the ordering service to all peers (Sec 43)The implementation of gossip is scalable and agnostic to theparticular implementation of the ordering service hence itworks with both CFT and BFT ordering services ensuringthe modularity of Fabric

The ordering service may also perform access controlchecks to see if a client is allowed to broadcast messagesor receive blocks on a given channel This and other featuresof the ordering service are further explained in Section 42

7 201821

Discussion on design choices It is very important that theordering service does not maintain any state of the block-chain and neither validates nor executes transactions Thisarchitecture is a crucial defining feature of Fabric andmakes Fabric the first blockchain system to totally sepa-rate consensus from execution and validation This makesconsensus as modular as possible and enables an ecosystemof consensus protocols implementing the ordering service

34 Validation PhaseBlocks are delivered to peers either via a direct connectionto the ordering service or through gossip When a new blockarrives it enters the validation phase consisting of threesequential steps

1 The endorsement policy evaluation occurs in parallelfor all transactions within the block The evaluation isthe task of the so-called validation system chaincode(VSCC) a static library that is part of the blockchainrsquosconfiguration and is responsible for validating the en-dorsement with respect to the endorsement policy config-ured for the chaincode (see Sec 46) If the endorsementis not satisfied the transaction is marked as invalid andits effects are disregarded

2 A read-write conflict check is done for all transactions inthe block sequentially For each transaction it comparesthe versions of the keys in the readset field to those in thecurrent state of the ledger as stored locally by the peerand ensures they are still the same If the versions do notmatch the transaction is marked as invalid and its effectsare disregarded

3 The ledger update phase runs last in which the block isappended to the locally stored ledger and the blockchainstate is updated In particular when adding the blockto the ledger the results of the validity checks in thefirst two steps are persisted as well in the form of a bitmask denoting the transactions that are valid within theblock This facilitates the reconstruction of the state at alater time Furthermore all state updates are applied bywriting all key-value pairs in writeset to the local state

The default VSCC in Fabric allows monotone logical ex-pressions over the set of endorsers configured for a chain-code to be expressed The VSCC evaluation verifies that theset of peers as expressed through valid signatures on en-dorsements of the transaction satisfy the expression Differ-ent VSCC policies can be configured statically however

Discussion on design choices The ledger of Fabric con-tains all transactions including those that are deemed in-valid This follows from the overall design because orderingservice which is agnostic to chaincode state produces thechain of the blocks and because the validation is done by thepeers post-consensus This feature is needed in certain usecases that require tracking of invalid transactions during sub-sequent audits and stands in contrast to other blockchains

Committer

Peer

Endorser

Ledger

Gossip

KVS

Validation configuration sys chaincode

Block store peer trans manager (PTM)

LevelDB CouchDB

Peer-to-peer block gossip

Chaincode execution containers

Figure 5 Components of a Fabric peer

(eg Bitcoin and Ethereum) where the ledger contains onlyvalid transactions

4 Fabric ComponentsFabric is written in Go and uses the gRPC framework(httpgrpcio) for communication between clientspeers and orderers In the following we describe some im-portant components in more detail Figure 5 shows the com-ponents of a peer

41 Membership ServiceThe membership service provider (MSP) maintains the iden-tities of all nodes in the system (clients peers and order-ers) and is responsible for issuing node credentials that areused for authentication and authorization Since Fabric ispermissioned all interactions among nodes occur throughmessages that are authenticated typically with digital sig-natures The membership service comprises a componentat each node where it may authenticate transactions ver-ify the integrity of transactions sign endorsements validateendorsements and authenticate other blockchain operationsTools for key management and registration of nodes are alsopart of the MSP

The MSP is an abstraction for which different instantia-tions are possible The default MSP implementation in Fab-ric handles standard PKI methods for authentication basedon digital signatures and can accommodate commercial cer-tification authorities (CAs) A stand-alone CA is provided aswell with Fabric called Fabric-CA Furthermore alternativeMSP implementations are envisaged such as one relying onanonymous credentials for authorizing a client to invoke atransaction without linking this to an identity [10]

Fabric allows two modes for setting up a blockchain net-work In offline mode credentials are generated by a CAand distributed out-of-band to all nodes Peers and order-ers can only be registered in offline mode For enrollingclients Fabric-CA provides an online mode that issues cryp-tographic credentials to them The MSP configuration mustensure that all nodes especially all peers recognize the sameidentities and authentications as valid

8 201821

The MSP permits identity federation for example whenmultiple organizations operate a blockchain network Eachorganization issues identities to its own members and everypeer recognizes members of all organizations This can beachieved with multiple MSP instantiations for example bycreating a mapping between each organization and an MSP

42 Ordering ServiceThe ordering service manages multiple channels On everychannel it provides the following services

1 Atomic broadcast for establishing order on transactionsimplementing the broadcast and deliver calls (Sec 33)

2 Reconfiguration of a channel when its members mod-ify the channel by broadcasting a configuration updatetransaction (Sec 46)

3 Optionally access control in those configurations wherethe ordering service acts as a trusted entity restrictingbroadcasting of transactions and receiving of blocks tospecified clients and peers

The ordering service is bootstrapped with a genesis blockon the system channel This block carries a configurationtransaction that defines the properties of the ordering ser-vice

The current production implementation consists of or-dering-service nodes (OSNs) that implement the opera-tions described here and communicate through the sys-tem channel The actual atomic broadcast function is pro-vided by an instance of Apache Kafka (httpkafkaapacheorg) which offers scalable publish-subscribe mes-saging and strong consistency despite node crashes basedon ZooKeeper Kafka may run on physical nodes separatefrom the OSNs The OSNs act as proxies between the peersand Kafka

An OSN directly injects a newly received transaction tothe atomic broadcast (eg to the Kafka broker) On the otherhand the nodes batch transactions received from the atomicbroadcast and form blocks A block is cut as soon as one ofthree conditions is met (1) the block contains the specifiedmaximal number of transactions (2) the block has reached amaximal size (in bytes) or (3) an amount of time has elapsedsince the first transaction of a new block was received asexplained below

This batching process is deterministic and therefore pro-duces the same blocks at all nodes It is easy to see thatthe first two conditions are trivially deterministic given thestream of transactions received from the atomic broadcastTo ensure deterministic block production in the third casea node starts a timer when it reads the first transaction ina block from the atomic broadcast If the block is not yetcut when the timer expires the node broadcasts a specialtime-to-cut transaction on the channel which indicates thesequence number of the block which it intends to cut On theother hand every node immediately cuts a new block upon

receiving the first time-to-cut transaction for the given blocknumber Since this transaction is atomically delivered to allconnected nodes they all include the same list of transac-tions in the block (To deploy this scheme in the presence off Byzantine-faulty OSNs the block is cut only when receiv-ing f + 1 time-to-cut transactions)

The orderers persist a range of the most recently deliveredblocks directly to their filesystem so they can answer topeers retrieving blocks through deliver

The orderer using Kafka is one of three ordering serviceimplementations currently available A centralized orderercalled Solo runs on one node and is used for development Aproof-of-concept orderer based on BFT-SMaRt [3] has alsobeen made available [34] it ensures the atomic broadcastservice but not yet reconfiguration and access control Thisillustrates the modularity of consensus in Fabric

43 Peer GossipOne advantage of separating the execution ordering andvalidation phases is that they can be scaled independentlyHowever since most consensus algorithms (in the CFT andBFT models) are bandwidth-bound the throughput of the or-dering service is capped by the network capacity of its nodesConsensus cannot be scaled up by adding more nodes [1436] rather throughput will decrease However since order-ing and validation are decoupled we are interested in ef-ficiently broadcasting the execution results to all peers forvalidation after the ordering phase This is precisely thegoal of the gossip component which utilizes epidemic mul-ticast [15] for this purpose The blocks are signed by theordering service This means that a peer can upon receiv-ing all blocks independently assemble the blockchain andverify its integrity

Dissemination through gossip is robust and resistant tothe node failures in contrast to overlay networks Makingrandom selections allows gossip to reduce the overhead ofmaintaining connectivity among peers and moreover re-duces the attack surface Gossip works well in the permis-sioned environment where it can withstand sybil attacks andmessage forgery

The communication layer for gossip is based on gRPCand utilizes TLS with mutual authentication which enableseach side to bind the TLS credentials to the identity of theremote peer The gossip component maintains an up-to-datemembership view of the online peers in the system All peersindependently build a local view from periodically dissem-inated membership data Furthermore a peer can reconnectto the view after a crash or a network outage

The main purpose of gossip is to reliably distribute mes-sages ie blocks from the ordering phase among the peerstaking advantage of a push-pull protocol It uses two phasesfor information dissemination during push each peer se-lects a random set of active neighbors from the membershipview and forwards them the message during pull each peerperiodically probes a set of randomly selected peers and re-

9 201821

quests missing messages It has been shown [15 22] thatusing both methods in tandem is crucial to optimally utilizethe available bandwidth and to ensure that all peers receiveall messages with high probability

In order to reduce the load of sending blocks from the or-dering nodes to the network the protocol also elects a leaderpeer that pulls blocks from the ordering service on their be-half and initiates the gossip distribution This mechanism isresilient to leader failures

Another task of gossip is to transfer the state to newlyjoining peers and peers that were disconnected for a longtime They need to receive all blocks in the chain This fea-ture relies on the fact that the largest block-sequence num-ber stored by each peer is disseminated with the membershipdata

44 LedgerThe ledger component at each peer maintains the ledgerand the blockchain state on persistent storage and enablessimulation validation and ledger-update phases Broadlyit consists of a block store and a peer transaction manager(PTM)

Ledger block store The ledger block store persists trans-action blocks and is implemented as a set of append-onlyfiles Since the blocks are immutable and arrive in a defi-nite order an append-only structure gives maximum perfor-mance In addition the block store maintains a few indicesfor random access to a block or to a transaction in a block

Peer transaction manager (PTM) The PTM maintainsthe latest state in a versioned key-value store It stores onetuple of the form (key val ver) for each unique entry keystored by any chaincode containing its most recently storedvalue val and its latest version ver The version consistsof the block sequence number and the sequence number ofthe transaction (that stores the entry) within the block Thismakes the version unique and monotonically increasing

The PTM uses a local key-value store to realize the ver-sioned key-value store implemented by a LevelDB key-value database implemented in Go (httpsgithubcomsyndtrgoleveldb) or Apache CouchDB (httpcouchdbapacheorg)

During simulation the PTM provides a stable snapshotof the latest state to the transaction As mentioned in Sec-tion 32 the PTM records in readset a tuple (key ver) foreach entry accessed by GetState and in writeset a tuple(key val) for each entry updated with PutState by the trans-action In addition the PTM supports range queries forwhich it computes a cryptographic hash of the query results(a set of tuples (key ver)) and adds the query string itself andthe hash to readset

For transaction validation (Sec 34) the PTM validatesall transactions in a block sequentially This checks whethera transaction conflicts with any preceding transaction (withinthe block or earlier) For any key in readset if the version

recorded in readset differs from the version present in thelatest state (assuming that all preceding valid transactionsare committed) then the PTM marks the transaction as in-valid For range queries the PTM re-executes the query andcompares the hash with the one present in readset to ensurethat no phantom reads occur This read-write conflict seman-tics results in one-copy serializability [23]

The ledger component tolerates a crash of the peer duringthe ledger update as follows After receiving a new block thePTM has already performed validation and marked transac-tions as valid or invalid within the block using a bit mask asmentioned in Section 34 The ledger now writes the blockto the ledger block store flushes it to disk and subsequentlyupdates the block store indices Then the PTM applies thestate changes from writeset of all valid transactions to the lo-cal versioned store Finally it computes and persists a valuesavepoint which denotes the largest successfully appliedblock number The value savepoint is used to recover theindices and the latest state from the persisted blocks whenrecovering from a crash

45 Chaincode ExecutionChaincode is executed within an environment that is looselycoupled with the rest of the peer and that supports pluginsfor adding new languages for programming chaincodes Cur-rently three languages are supported for chaincode Go Javaand Node

Every user-level or application chaincode runs in a sepa-rate process within a Docker container environment whichisolates the chaincodes from each other and from the peerThis also simplifies the management of the lifecycle forchaincodes (ie starting stopping or aborting chaincode)The chaincode and the peer communicate using gRPC mes-sages Through this loose coupling the peer is agnostic ofthe actual language in which chaincode is implemented

In contrast to application chaincode system chaincoderuns directly in the peer process System chaincode canimplement specific functions needed by Fabric and may beused in situations where the isolation among user chaincodesis overly restrictive More details on system chaincodes aregiven in the next section

46 Configuration and System ChaincodesFabricrsquos basic behavior is customized through channel con-figuration and through special chaincodes known as systemchaincodes

Channel configuration Recall that a channel forms onelogical blockchain The configuration of a channel is main-tained in metadata persisted in special configuration blocksEach configuration block contains the full channel config-uration and does not contain any other transactions Eachblockchain begins with a configuration block known as thegenesis block which is used to bootstrap the channel Thechannel configuration includes

10 201821

bull Definitions of the MSPs for the participating nodesbull The network addresses of the OSNsbull Shared configuration for the consensus implementation

and the ordering service such as batch size and timeoutsbull Rules governing access to the ordering service operations

(broadcast and deliver)bull Rules governing how each part the channel configuration

may be modified

The configuration of a channel may be updated using achannel configuration update transaction This transactioncontains a representation of the changes to be made to theconfiguration as well as a set of signatures The orderingservice nodes evaluate whether the update is valid by usingthe current configuration to verify that the modifications areauthorized using the signatures The orderers then generate anew configuration block which embeds the new configura-tion and the configuration update transaction Peers receiv-ing this block validate whether the configuration update isauthorized based on the current configuration if valid theyupdate their current configuration

System chaincodes The application chaincodes are de-ployed with a reference to an endorsement system chaincode(ESCC) and to a validation system chaincode (VSCC) Thesetwo chaincodes are selected in a symmetric way such thatthe output of the ESCC (an endorsement) may be validatedas part of the input to the VSCC

The ESCC takes as input a proposal and the proposalsimulation results If the results are satisfactory then theESCC produces a response containing the results and theendorsement For the default ESCC this endorsement issimply a signature by the peerrsquos local signing identity

The VSCC takes as input a transaction and outputswhether that transaction is valid For the default VSCCthe endorsements are collected and evaluated against theendorsement policy specified for the chaincode

Further system chaincodes implement other support func-tions such as configuration and chaincode lifecycle

5 EvaluationEven though Fabric is not yet performance-tuned and opti-mized we report in this section on some preliminary per-formance numbers Fabric is a complex distributed systemits performance depends on many parameters including thechoice of a distributed application and transaction size theordering service and consensus implementation and their pa-rameters the network parameters and topology of nodes inthe network the hardware on which nodes run the num-ber of nodes and channels further configuration parametersand the network dynamics Therefore in-depth performanceevaluation of Fabric is postponed to future work

In the absence of a standard benchmark for blockchainswe use the most prominent blockchain application for eval-uating Fabric a simple authority-minted cryptocurrency that

uses the data model of Bitcoin which we call Fabric coin(abbreviated hereafter as Fabcoin) This allows us to put theperformance of Fabric in the context of other permissionedblockchains which are often derived from Bitcoin or Ethe-reum For example it is also the application used in bench-marks of other permissioned blockchains [19 32]

In the following we first describe Fabcoin (Sec 51)which also demonstrates how to customize the validationphase and endorsement policy In Section 52 we present thebenchmark and discuss our results

51 Fabric Coin (Fabcoin)UTXO cryptocurrencies The data model introduced byBitcoin [28] has become known as ldquounspent transaction out-putrdquo or UTXO and is also used by many other cryptocur-rencies and distributed applications UTXO represents eachstep in the evolution of a data object as a separate atomicstate on the ledger Such a state is created by a transactionand destroyed (or ldquoconsumedrdquo) by another unique transac-tion occurring later Every given transaction destroys a num-ber of input states and creates one or more output states Aldquocoinrdquo in Bitcoin is initially created by a coinbase transac-tion that rewards the ldquominerrdquo of the block This appears onthe ledger as a coin state designating the miner as the ownerAny coin can be spent in the sense that the coin is assignedto a new owner by a transaction that atomically destroys thecurrent coin state designating the previous owner and createsanother coin state representing the new owner

We capture the UTXO model in the key-value store ofFabric as follows Each UTXO state corresponds to a uniqueKVS entry that is created once (the coin state is ldquounspentrdquo)and destroyed once (the coin state is ldquospentrdquo) Equivalentlyevery state may be seen as a KVS entry with logical ver-sion 0 after creation when it is destroyed again it receivesversion 1 There should not be any concurrent updates tosuch entries (eg attempting to update a coin state in differ-ent ways amounts to double-spending the coin)

Value in the UTXO model is transferred through transac-tions that refer to several input states that all belong to theentity issuing the transaction An entity owns a state becausethe public key of the entity is contained in the state itselfEvery transaction creates one or more output states in theKVS representing the new owners deletes the input states inthe KVS and ensures that the sum of the values in the inputstates equals the sum of the output statesrsquo values There isalso a policy determining how value is created (eg coin-base transactions in Bitcoin or specific mint operations inother systems) or destroyed (ie as a fee consumed by theexecution)

Fabcoin implementation Each state in Fabcoin is a tupleof the form (key val) = (txidj (amount owner label)) de-noting the coin state created as the j-th output of a transac-tion with identifier txid and allocating amount units labeledwith label to the entity whose public key is owner Labels are

11 201821

strings used to identify a given type of a coin (eg lsquoUSDlsquolsquoEURlsquo lsquoFBClsquo) Transaction identifiers are short values thatuniquely identify every Fabric transaction The Fabcoin im-plementation consists of three parts (1) a client wallet (2)the Fabcoin chaincode and (3) a custom VSCC for Fabcoinimplementing its endorsement policy

Client wallet By default each Fabric client maintains aFabcoin wallet that locally stores a set of cryptographic keysallowing the client to spend coins For creating a SPENDtransaction that transfers one or more coins the client walletcreates a Fabcoin request request = (inputs outputs sigs)containing (1) a list of input coin states inputs = [in ]that specify coin states (in ) the client wishes to spend aswell as (2) a list of output coin states outputs = [(amount ownerlabel) ] The client wallet signs with the private keys thatcorrespond to the input coin states the concatenation of theFabcoin request and a nonce which is a part of every Fabrictransaction and adds the signatures in a set sigs A SPENDtransaction is valid when the sum of the amounts in the in-put coin states is at least the sum of the amounts in the out-puts and when the output amounts are positive For a MINTtransaction that creates new coins inputs contains only anidentifier (ie a reference to a public key) of a special en-tity called Central Bank (CB) whereas outputs contains anarbitrary number of coin states To be considered valid thesignatures of a MINT transaction in sigs must be a crypto-graphic signature under the public key of CB over the con-catenation of the Fabcoin request and of the aforementionednonce Fabcoin may be configured to use multiple CBs orspecify a threshold number of signatures from a set of CBsFinally the client wallet includes the Fabcoin request into atransaction and sends this to a peer of its choice

Fabcoin chaincode A peer runs the chaincode of Fab-coin which simulates the transaction and creates readsets andwritesets In a nutshell in the case of a SPEND transactionfor every input coin state in isin inputs the chaincode first per-forms GetState(in) this puts in into the readset along with itscurrent version in Fabricrsquos versioned KVS (Sec 44) Thenthe chaincode executes DelState(in) for every input state inwhich also adds in to the writeset and effectively marks thecoin state as ldquospentrdquo Finally for j = 1 |outputs| thechaincode executes PutState(txidj out) with the j-th out-put out = (amount owner label) in outputs In addition apeer optionally runs the transaction validation code as de-scribed next in the VSCC step for Fabcoin this is not neces-sary since the custom VSCC actually validates transactionsbut it allows the (correct) peers to filter out potentially mal-formed transactions In our implementation the chaincoderuns the Fabcoin VSCC without cryptographically verifyingthe signatures

Custom VSCC Finally every peer validates Fabcoin trans-actions using custom VSCC This verifies first the cryp-tographic signature(s) in sigs under the respective public

key(s) and performs semantic validation as follows For aMINT transaction it checks that the output states are createdunder the matching transaction identifier (txid) and that alloutput amounts are positive For a SPEND transaction theVSCC additionally verifies (1) that for all input coin statesan entry in the readset has been created and that it was alsoadded to the writeset and marked as deleted (2) that the sumof the amounts for all input coin states equals the sum ofamounts of all output coin states and (3) that input and out-put coin labels match Here the VSCC obtains the amountsfor the input coins by retrieving their current values from theledger

Notice that the Fabcoin VSCC does not check transac-tions for double spending as this occurs through Fabricrsquosstandard validation that runs after the custom VSCC In par-ticular if two transactions attempt to assign the same un-spent coin state to a new owner both would pass the VSCClogic but would be caught subsequently in the read-writeconflict check performed by the PTM According to Sec-tions 34 and 44 the PTM verifies that the current versionnumber stored in the ledger matches the one in the readsethence after the first transaction has changed the version ofthe coin state the transaction ordered second will be recog-nized as invalid

52 ExperimentsSetup Unless explicitly mentioned differently in our ex-periments (1) nodes run on Fabric version v11-preview2

instrumented for performance evaluation through local log-ging (2) nodes are hosted in a single IBM Cloud (SoftLayer)datacenter as dedicated VMs interconnected with 1Gbps net-working (3) all nodes are 20 GHz 16-vCPU VMs runningUbuntu with 8GB of RAM and SSDs as local disks (4) asingle-channel ordering service runs a typical Kafka orderersetup with 3 ZooKeeper nodes 4 Kafka brokers and 3 Fab-ric orderers all on distinct VMs (5) there are 5 peers in to-tal all of which are Fabcoin endorsers and (6) signaturesuse the default 256-bit ECDSA scheme In order to measureand stage latencies in the transaction flow spanning multiplenodes the node clocks are synchronized with an NTP ser-vice throughout the experiments All communication amongFabric nodes is configured to use TLS

Methodology In every experiment in the first phase weinvoke transactions that contain only Fabcoin MINT opera-tions to produce the coins and then run a second phase ofthe experiment in which we invoke Fabcoin SPEND opera-tion on previously minted coins (effectively running single-input single-output SPEND transactions) When reportingthroughput measurements we use an increasing number ofFabric CLI clients (modified to issue concurrent requests)running on a single VM until the end-to-end throughputis saturated and state the throughput just before saturationThroughput numbers are reported as an average throughput

2Patched with commit ID 9e770062 in the Fabric master branch

12 201821

measured throughout the duration of an experiment disre-garding the ldquotailrdquo of the experiment where throughput dropsdue to some client threads finishing submitting their share oftransactions In every experiment client threads submit atleast 500k MINT and SPEND transactions

Choosing the block size A critical Fabric configurationparameter that impacts both throughput and latency is blocksize To fix the block size for subsequent experiments andto evaluate the impact of block size on performance we ranexperiments varying block size from 05MB to 4MB Resultsare depicted in Fig 6 showing peak throughput measured atthe peers along with the corresponding average end-to-end(e2e) latency

Figure 6 Impact of block size on throughput and latency

We can observe that throughput does not significantlyimprove beyond a block size of 2MB but latency gets worse(as expected) Therefore we adopt 2MB as the block sizefor the following experiments with the goal of maximizingthe measured throughput assuming the end-to-end latencyof roughly 500ms is acceptable

Size of transactions During this experiment we also ob-served the size MINT and SPEND transactions In particularthe 2MB-blocks contained 473 MINT or 670 SPEND transac-tions ie the average transaction size is 306kB for SPENDand 433kB for MINT In general transactions in Fabric arelarge because they carry certificate information BesidesMINT transactions of Fabcoin are larger than SPEND trans-actions because they carry CB certificates This is an avenuefor future improvement of both Fabric and Fabcoin

Impact of peer CPU Fabric peers run many CPU-intensivecryptographic operations To estimate the impact of CPU onthroughput we performed a set of experiments in which 4peers run on 4 8 16 and 32 vCPU VMs while also doingcoarse-grained latency staging of block validation to betteridentify bottlenecks Our experiment focused on the vali-dation phase as ordering with Kafka ordering service hasnever been a bottleneck in our experiments The validation

phase and in particular the VSCC validation of Fabcoinis computationally intensive due to many digital signatureverifications performed in this phase

Figure 7 Impact of peer CPU on end-to-end throughputand block validation latency

The results with 2MB blocks are shown in Fig 7 SPENDlatency is not shown in Fig 7 as it scales similarly to MINTWe can observe that Fabcoin VSCC validation scales quasi-linearly with CPU as Fabric VSCC validation is embarrass-ingly parallel However the read-write-check and ledger-access stages are sequential and become dominant with alarger number of cores (vCPUs) This suggests that futureversions of Fabric could profit from pipelining validationstages (which are now sequential) optimizing stable-storageaccess and parallelizing read-write dependency checks

Finally in this experiment we measured over 3560 trans-actions per second (tps) average SPEND throughput at the32-vCPU peer The MINT throughput is in general slightlylower than that of SPEND but the difference remains within10

Latency profiling by stages We further performed coarse-grained profiling of latency during our previous experimentat the peak reported throughput Results are depicted inTable 1 The ordering phase comprises broadcast-deliverlatency and internal latency within a peer before valida-tion starts The table reports average latencies for MINTand SPEND standard deviation and tail latencies (99 and999)

We can observe that ordering dominates the overall la-tency We observe that average latencies are below 550mswith sub-second tail latencies In particular the highestend-to-end latencies in our experiment come from the firstblocks during the load build-up Latency under lower loadcan be regulated and reduced by leveraging the time-to-cutparameter of the orderer (see Sec 33) which we basicallydo not use in our experiments as we set it to a large value

SSD vs RAM disk To evaluate the overhead of using lo-cal SSDs for stable storage we repeated the previous exper-iment with RAM disks (tmpfs) mounted as stable storage at

13 201821

all peer VMs The benefits are limited as tmpfs only helpswith the ledger phase of the validation at the peer We mea-sured sustained peak throughput at 3870 SPEND tps at 32-vCPU peer roughly a 9 improvement over SSD

avg stdev 99 999(1) endorsement 56 75 24 42 15 21 19 26(2) ordering 248 365 600 920 484 624 523 636(3) VSCC val 310 353 102 90 727 570 113 1084(4) RW check 348 615 39 93 470 885 590 933(5) ledger 506 722 62 88 701 975 725 105(6) validation (3+4+5) 116 169 128 178 156 216 199 230(7) end-to-end (1+2+6) 371 542 63 94 612 805 646 813

Table 1 Latency statistics in milliseconds (ms) for MINTand SPEND broken down into five stages at a 32-vCPUpeer with 2MB blocks Validation (6) comprises stages 34 and 5 the end-to-end latency contains stages 1ndash5

6 Related WorkThe architecture of Fabric resembles that of a middleware-replicated database as pioneered by Kemme and Alonso [24]However all existing work on this addressed only crash fail-ures not the setting of distributed trust that corresponds to aBFT system For instance a replicated database with asym-metric update processing [25 Sec 63] relies on one node toexecute each transaction which would not work on a block-chain The execute-order-validate architecture of Fabric canbe interpreted as a generalization of this work to the Byzan-tine model with practical applications to distributed ledgers

Byzantium [17] and HRDB [35] are two further prede-cessors of Fabric from the viewpoint of BFT database repli-cation Byzantium allows transactions to run in parallel anduses active replication but totally orders BEGIN and COM-MITROLLBACK using a BFT middleware In its opti-mistic mode every operation is coordinated by a single mas-ter replica if the master is suspected to be Byzantine allreplicas execute the transaction operations for the master andit triggers a costly protocol to change the master HRDB re-lies in an even stronger way on a correct master In contrastto Fabric both systems use active replication cannot handlea flexible trust model and rely on deterministic operationsby the replicas However their database API is richer thanthe KVS model of Fabric

In Eve [21] a related architecture for SMR has been ex-plored also in the BFT model Its peers execute transactionsconcurrently and then verify that they all reach the same out-put state using a consensus protocol If the states divergethey roll back and execute operations sequentially Eve con-tains the element of independent execution which also existsin Fabric but offers none of its other features

A large number of distributed ledger platforms in thepermissioned model have come out recently which makesit impossible to compare to all (some prominent ones areTendermint [5] Quorum [19] Chain Core [12] Multi-chain (httpswwwmultichaincom) HyperledgerSawtooth (httpssawtoothhyperledgerorg) the

Volt proposal [32] and more see references in recent over-views [9 16]) All platforms follow the order-execute archi-tecture as discussed in Section 2 As a representative exam-ple take the Quorum platform [19] an enterprise-focusedversion of Ethereum With its consensus based on Raft [29]it disseminates a transaction to all peers using gossip and theRaft leader (called minter) assembles valid transactions to ablock and distributes this using Raft All peers execute thetransaction in the order decided by the leader Therefore itsuffers from the limitations mentioned in Sections 1ndash2

References[1] P-L Aublin R Guerraoui N Knezevic V Quema and

M Vukolic The next 700 BFT protocols ACM Trans Com-put Syst 32(4)121ndash1245 Jan 2015

[2] E Ben-Sasson A Chiesa C Garman M Green I MiersE Tromer and M Virza Zerocash Decentralized anonymouspayments from bitcoin In IEEE Symposium on Security ampPrivacy pages 459ndash474 2014

[3] A N Bessani J Sousa and E A P Alchieri State ma-chine replication for the masses with BFT-SMART In In-ternational Conference on Dependable Systems and Networks(DSN) pages 355ndash362 2014

[4] G Bracha and S Toueg Asynchronous consensus and broad-cast protocols J ACM 32(4)824ndash840 1985

[5] E Buchman Tendermint Byzantine fault tolerance in the ageof blockchains MSc Thesis University of Guelph CanadaJune 2016

[6] N Budhiraja K Marzullo F B Schneider and S Toueg Theprimary-backup approach In S Mullender editor DistributedSystems (2nd Ed) pages 199ndash216 ACM PressAddison-Wesley 1993

[7] C Cachin R Guerraoui and L E T Rodrigues Introductionto Reliable and Secure Distributed Programming (2 ed)Springer 2011

[8] C Cachin S Schubert and M Vukolic Non-determinismin byzantine fault-tolerant replication In 20th InternationalConference on Principles of Distributed Systems (OPODIS)2016

[9] C Cachin and M Vukolic Blockchain consensus protocolsin the wild In A W Richa editor 31st Intl Symposium onDistributed Computing (DISC 2017) pages 11ndash116 2017

[10] J Camenisch and E V Herreweghen Design and imple-mentation of the idemix anonymous credential system InACM Conference on Computer and Communications Security(CCS) pages 21ndash30 2002

[11] M Castro and B Liskov Practical Byzantine fault toler-ance and proactive recovery ACM Trans Comput Syst20(4)398ndash461 Nov 2002

[12] Chain Chain protocol whitepaper httpschaincom

docs12protocolpaperswhitepaper 2017

[13] B Charron-Bost F Pedone and A Schiper editors Replica-tion Theory and Practice volume 5959 of Lecture Notes inComputer Science Springer 2010

14 201821

[14] K Croman C Decker I Eyal A E Gencer A JuelsA Kosba A Miller P Saxena E Shi E G Sirer et alOn scaling decentralized blockchains In International Con-ference on Financial Cryptography and Data Security (FC)pages 106ndash125 Springer 2016

[15] A Demers D Greene C Hauser W Irish J LarsonS Shenker H Sturgis D Swinehart and D Terry Epidemicalgorithms for replicated database maintenance In ACMSymposium on Principles of Distributed Computing (PODC)pages 1ndash12 ACM 1987

[16] T T A Dinh R Liu M Zhang G Chen B C Ooi andJ Wang Untangling blockchain A data processing viewof blockchain systems e-print arXiv170805665 [csDB]2017

[17] R Garcia R Rodrigues and N M Preguica Efficient mid-dleware for Byzantine fault tolerant database replication InEuropean Conference on Computer Systems (EuroSys) pages107ndash122 2011

[18] R Guerraoui R R Levy B Pochon and V Quema Through-put optimal total order broadcast for cluster environmentsACM Trans Comput Syst 28(2)51ndash532 2010

[19] J P Morgan Quorum whitepaper httpsgithubcom

jpmorganchasequorum-docs 2016

[20] F P Junqueira B C Reed and M Serafini Zab High-performance broadcast for primary-backup systems In In-ternational Conference on Dependable Systems amp Networks(DSN) pages 245ndash256 2011

[21] M Kapritsos Y Wang V Quema A Clement L Alvisiand M Dahlin All about Eve Execute-verify replicationfor multi-core servers In Symposium on Operating SystemsDesign and Implementation (OSDI) pages 237ndash250 2012

[22] R Karp C Schindelhauer S Shenker and B Vocking Ran-domized rumor spreading In Symposium on Foundations ofComputer Science (FOCS) pages 565ndash574 IEEE 2000

[23] B Kemme One-copy-serializability In Encyclopedia ofDatabase Systems pages 1947ndash1948 Springer 2009

[24] B Kemme and G Alonso A new approach to developingand implementing eager database replication protocols ACMTransactions on Database Systems 25(3)333ndash379 2000

[25] B Kemme R Jimenez-Peris and M Patino-MartınezDatabase Replication Synthesis Lectures on Data Manage-ment Morgan amp Claypool Publishers 2010

[26] A E Kosba A Miller E Shi Z Wen and C PapamanthouHawk The blockchain model of cryptography and privacy-preserving smart contracts In 37th IEEE Symposium onSecurity amp Privacy 2016

[27] S Liu P Viotti C Cachin V Quema and M Vukolic XFTpractical fault tolerance beyond crashes In Symposium onOperating Systems Design and Implementation (OSDI) pages485ndash500 2016

[28] S Nakamoto Bitcoin A peer-to-peer electronic cash systemhttpwwwbitcoinorgbitcoinpdf 2009

[29] D Ongaro and J Ousterhout In search of an understandableconsensus algorithm In USENIX Annual Technical Confer-ence (ATC) pages 305ndash320 2014

[30] F Pedone and A Schiper Handling message semanticswith Generic Broadcast protocols Distributed Computing15(2)97ndash107 2002

[31] F B Schneider Implementing fault-tolerant services usingthe state machine approach A tutorial ACM Comput Surv22(4)299ndash319 1990

[32] S Setty S Basu L Zhou M L Roberts and R VenkatesanEnabling secure and resource-efficient blockchain networkswith VOLT Technical Report MSR-TR-2017-38 MicrosoftResearch 2017

[33] A Singh T Das P Maniatis P Druschel and T Roscoe BFTprotocols under fire In Symposium on Networked SystemsDesign amp Implementation (NSDI) pages 189ndash204 2008

[34] J Sousa A Bessani and M Vukolic A Byzantine fault-tolerant ordering service for the Hyperledger Fabric block-chain platform Technical Report arXiv170906921 CoRR2017

[35] B Vandiver H Balakrishnan B Liskov and S MaddenTolerating Byzantine faults in transaction processing systemsusing commit barrier scheduling In ACM Symposium onOperating Systems Principles (SOSP) pages 59ndash72 2007

[36] M Vukolic The quest for scalable blockchain fabric Proof-of-work vs BFT replication In International Workshop onOpen Problems in Network Security (iNetSec) pages 112ndash125 2015

[37] J Yin J Martin A Venkataramani L Alvisi and M DahlinSeparating agreement from execution for Byzantine fault tol-erant services In ACM Symposium on Operating SystemsPrinciples (SOSP) pages 253ndash267 2003

15 201821

  • 1 Introduction
  • 2 Background
    • 21 Order-Execute Architecture for Blockchains
    • 22 Limitations of Order-Execute
    • 23 Further Limitations of Existing Architectures
    • 24 Experience with Order-Execute Blockchain
      • 3 Architecture
        • 31 Fabric Overview
        • 32 Execution Phase
        • 33 Ordering Phase
        • 34 Validation Phase
          • 4 Fabric Components
            • 41 Membership Service
            • 42 Ordering Service
            • 43 Peer Gossip
            • 44 Ledger
            • 45 Chaincode Execution
            • 46 Configuration and System Chaincodes
              • 5 Evaluation
                • 51 Fabric Coin (Fabcoin)
                • 52 Experiments
                  • 6 Related Work

approach became immediately clear For instance users of-ten observed diverging states at the peers and reported abug in the consensus protocol in all cases closer inspectionrevealed that the culprit was non-deterministic transactioncode Other complaints addressed limited performance egldquoonly five transactions per secondrdquo until users confessedthat their average transaction took 200ms to execute Wehave learned that the key properties of a blockchain sys-tem namely consistency security and performance mustnot depend on the knowledge and goodwill of its users inparticular since the blockchain should run in an untrustedenvironment

3 ArchitectureIn this section we introduce the three-phase execute-order-validate architecture and then explain the transaction flowThe components of Fabric are discussed in Section 4

31 Fabric OverviewFabric is a distributed operating system for permissio-ned blockchains that executes distributed applications writ-ten in general-purpose programming languages (eg GoJava Nodejs) It securely tracks its execution history inan append-only replicated ledger data structure and has nocryptocurrency built in

Fabric introduces the execute-order-validate blockchainarchitecture and does not follow the standard order-executedesign for reasons explained in Section 2 In a nutshell adistributed application for Fabric consists of two parts

bull A smart contract called chaincode which is programcode that implements the application logic and runs dur-ing the execution phase The chaincode is the central partof a distributed application in Fabric and may be writtenby an untrusted developer Special chaincodes exist formanaging the blockchain system and maintaining param-eters collectively called system chaincodes (Sec 46)

bull An endorsement policy that is evaluated in the validationphase Endorsement policies cannot be chosen or modi-fied by untrusted application developers they are part ofthe system An endorsement policy acts as a static libraryfor transaction validation in Fabric which can merely beparameterized by the chaincode Only designated admin-istrators may run system management functions and havethe right to modify the endorsement policyA typical endorsement policy lets the chaincode specifythe endorsers for a transaction in the form of a set of peersthat are necessary for endorsement it uses a monotonelogical expression on sets such as ldquothree out of fiverdquo orldquoA and B or B and Crdquo Custom endorsement policiesmay implement arbitrary logic (eg our Bitcoin-inspiredcryptocurrency in Sec 51)

A client sends transactions to the peers specified by theendorsement policy Each transaction is then executed byspecific peers and its output is recorded this step is also

Update state

Order rw-sets Atomic broadcast

(consensus) Stateless ordering

service

Persist state on all peers

Simulate trans and endorse

Create rw-set Collect endorse-

ments

OrderExecute Validate

Validate endorse-ments amp rw-sets

Eliminate invalid and conflicting trans

Figure 2 Execute-order-validate architecture of Fabric (rw-set means a readset and writeset as explained in Sec 32)

called endorsement After execution transactions enter theordering phase which uses a pluggable consensus protocolto produce a totally ordered sequence of endorsed transac-tions grouped in blocks These are broadcast to all peerswith the (optional) help of gossip Unlike standard activereplication [31] which totally orders transaction inputs Fab-ric orders transaction outputs combined with state depen-dencies as computed during the execution phase Each peerthen validates the state changes from endorsed transactionswith respect to the endorsement policy and the consistencyof the execution in the validation phase All peers validatethe transactions in the same order and validation is determin-istic In this sense Fabric introduces a novel hybrid replica-tion paradigm in the Byzantine model which combines pas-sive replication (the pre-consensus computation of state up-dates) and active replication (the post-consensus validationof execution results and state changes)

The execute-order-validate approach is illustrated by Fig 2

A Fabric blockchain consists of a set of nodes that form anetwork As Fabric is permissioned all nodes that participatein the network have an identity as provided by a modularmembership service provider (MSP) (Sec 41) Nodes in aFabric network take up one of three roles

Clients submit transaction proposals for execution helporchestrate the execution phase and finally broadcasttransactions for ordering

Peers execute transaction proposals and validate transac-tions Peers also maintain the blockchain ledger anappend-only data structure recording all transactions inthe form of a hash chain as well as the state a succinctrepresentation of the latest ledger state Not all peers ex-ecute all transaction proposals only a subset of themcalled endorsing peers (or simply endorsers) as speci-fied by the policy of the chaincode to which the transac-tion pertains However all peers maintain the completeledger

Orderering Service Nodes (OSN) (or simply orderers)are the nodes that collectively form the ordering ser-vice In short the ordering service establishes the totalorder of all transactions in Fabric where each transac-tion contains state updates and dependencies computedduring the execution phase along with cryptographic sig-

5 201821

client endorsingpeer 1

endorsingpeer 2

endorsingpeer 3

Peer(non-endorsing)

ord

ering service

orderers

Invocation Commit

1 1 1

2

3

4 4

55

5 5

1 Chaincode execution

2 Endorsementcollection

34Ordering

BroadcastDelivery5 Validation

Figure 3 Fabric high level transaction flow

natures of the endorsing peers that computed them Or-derers are entirely unaware of the application state anddo not participate in the execution nor in the validationof transactions This design choice renders consensus inFabric as modular as possible and simplifies replacementof consensus protocols in Fabric

Since it is possible to run one physical node with multipleroles Fabric can also be operated like a traditional peer-to-peer blockchain system in which every node maintainsthe state and invokes validates and orders transactions Thetransaction flow in Fabric across the different types of nodesis depicted in Fig 3

Compared to the description focusing on one singleblockchain so far a Fabric network actually supports mul-tiple blockchains connected to the same ordering serviceEach such blockchain is called a channel and may havedifferent peers as its members Channels can be used topartition the state of the blockchain network but consen-sus across channels is not coordinated and the total order oftransactions in each channel is separate from the others Cer-tain deployments that consider all orderers as trusted mayalso implement by-channel access control for peers In thefollowing we mention channels only briefly and concentrateon one single channel

The next three sections explain the transaction flow inFabric and illustrate the steps of the execution ordering andvalidation phases A Fabric network is shown in Fig 4

32 Execution PhaseIn the execution phase clients send the transaction proposal(or simply proposal) to one or more endorsers for execu-tion Recall that every chaincode implicitly specifies a set ofendorsers via the endorsement policy A proposal containsthe identity of the submitting client (according to the MSP)the transaction payload in the form of an operation to exe-cute parameters and the identifier of the chaincode to whichit belongs a nonce to be used only once by each client (suchas a counter or a random value) and a transaction identifierderived from the client identifier and the nonce The clientalso signs the proposal

Appl

MSP

P

SDK

P P P P P P

SDK

OSN

P

OSNOSNOSNOSN

Client

Orderingservice

Peer-to-peer gossip

Peers (P)

Client

Appl

Appl

Figure 4 A Fabric network with federated MSPs and run-ning multiple (differently shaded and colored) chaincodesselectively installed on peers according to policy

The endorsers simulate the proposal by executing the op-eration on the specified chaincode which has been installedon the blockchain The chaincode runs in a Docker containerisolated from the main endorser process

A proposal is simulated against the endorserrsquos localblockchain state without any synchronization with otherpeers at this point moreover endorsers do not persist theresults of the simulation to the ledger state The state ofthe blockchain is maintained by the peer transaction man-ager (PTM) in the form of a versioned key-value store inwhich successive updates to a key have monotonically in-creasing version numbers (Sec 44) The state created by achaincode is scoped exclusively to that chaincode and can-not be accessed directly by another chaincode Note that thechaincode is not supposed to maintain the local state in theprogram code only what it maintains in the blockchain statethat is accessed with GetState PutState and DelState oper-ations Given the appropriate permission a chaincode mayinvoke another chaincode to access its state within the samechannel

As a result of the simulation each endorser produces avalue writeset consisting of the state updates produced bysimulation (ie the modified keys along with their new val-ues) as well as a readset representing the version dependen-cies of the proposal simulation (ie all keys read during sim-ulation along with their version numbers) After the simula-tion the endorser cryptographically signs a message calledendorsement which contains readset and writeset (togetherwith metadata such as transaction ID endorser ID and en-dorser signature) and sends it back to the client in a proposalresponse The client collects endorsements until they satisfythe endorsement policy of the chaincode which the trans-action invokes (see Sec 34) In particular this requires allendorsers as determined by the policy to produce the sameexecution result (ie identical readset and writeset) Thenthe client proceeds to create the transaction and passes it tothe ordering service

6 201821

Discussion on design choices As the endorsers simulatethe proposal without synchronizing with other endorserstwo endorsers may execute it on different states of the ledgerand produce different outputs For the standard endorse-ment policy which requires multiple endorsers to producethe same result this implies that under high contention ofoperations accessing the same keys a client may not beable to satisfy the endorsement policy This is a new ele-ment compared to primary-backup replication in replicateddatabases with synchronization through middleware [24] aconsequence of the assumption that no single peer is trustedfor correct execution in a blockchain

We consciously adopted this design as it considerablysimplifies the architecture and is adequate for typical block-chain applications As demonstrated by the approach of Bit-coin distributed applications can be formulated such thatcontention by operations accessing the same state can be re-duced or eliminated completely in the normal case (eg inBitcoin two operations that modify the same ldquoobjectrdquo arenot allowed and represent a double-spending attack [28])

Executing a transaction before the ordering phase is crit-ical to tolerating non-deterministic chaincodes as discussedin Section 2 A chaincode in Fabric with non-deterministictransactions can only endanger the liveness of its own oper-ations because a client cannot gather a sufficient number ofendorsements for instance This is much more acceptable inpractice than the situation in an order-execute architecturewhere non-deterministic operations lead to inconsistenciesin the state of the peers

Finally tolerating non-deterministic execution also ad-dresses DoS attacks from untrusted chaincode as an endorsercan simply abort an execution according to a local policy ifit suspects a DoS attack This will not endanger the consis-tency of the system and again such unilateral abortion ofexecution is not possible in order-execute architectures

33 Ordering PhaseWhen a client has collected enough endorsements on a pro-posal it assembles a transaction and submits this to the or-dering service The transaction contains the transaction pay-load (ie the chaincode operation including parameters)transaction metadata and a set of endorsements The order-ing phase establishes a total order on all submitted transac-tions per channel In other words ordering atomically broad-casts [7] endorsements and thereby establishes consensus ontransactions despite faulty orderers Moreover the orderingservice batches multiple transactions into blocks and outputsa hash-chained sequence of blocks containing transactionsGrouping or batching transactions into blocks improves thethroughput of the broadcast protocol which is a well-knowntechnique in the context of fault-tolerant broadcasts

At a high level the interface of the ordering service onlysupports the following two operations These operations areinvoked by a peer and implicitly parameterized by a channelidentifier

bull broadcast(tx) A client calls this operation to broadcastan arbitrary transaction tx which usually contains thetransaction payload and a signature of the client fordissemination

bull B larr deliver(s) A client calls this to retrieve block Bwith non-negative sequence number s The block con-tains a list of transactions [tx1 txk] and a hash-chain value h representing the block with sequence num-ber sminus1 ie B = ([tx1 txk] h) As the client maycall this multiple times and always returns the same blockonce it is available we say the peer delivers block B withsequence number s when it receives B for the first timeupon invoking deliver(s)

The ordering service ensures that the delivered blocks onone channel are totally ordered More specifically orderingensures the following safety properties for each channel

Agreement For any two blocks B delivered with sequencenumber s and Bprime delivered with sprime at correct peers suchthat s = sprime it holds B = Bprime

Hashchain integrity If some correct peer delivers a block Bwith number s and another correct peer delivers blockBprime = ([tx1 txk] h

prime) with number s + 1 then itholds hprime = H(B) where H(middot) denotes the cryptographichash function

No skipping If a correct peer p delivers a block with num-ber s gt 0 then for each i = 0 s minus 1 peer p hasalready delivered a block with number i

No creation When a correct peer delivers block B withnumber s then for every tx isin B some client has alreadybroadcast tx

For liveness the ordering service supports at least thefollowing ldquoeventualrdquo property

Validity If a correct client invokes broadcast(tx) then ev-ery correct peer eventually delivers a block B that in-cludes tx with some sequence number

However every individual ordering implementation is al-lowed to come with its own liveness and fairness guaranteeswith respect to client requests

Since there may be a large number of peers in the block-chain network but only relatively few nodes are expectedto implement the ordering service Fabric can be config-ured to use a built-in gossip service for disseminating deliv-ered blocks from the ordering service to all peers (Sec 43)The implementation of gossip is scalable and agnostic to theparticular implementation of the ordering service hence itworks with both CFT and BFT ordering services ensuringthe modularity of Fabric

The ordering service may also perform access controlchecks to see if a client is allowed to broadcast messagesor receive blocks on a given channel This and other featuresof the ordering service are further explained in Section 42

7 201821

Discussion on design choices It is very important that theordering service does not maintain any state of the block-chain and neither validates nor executes transactions Thisarchitecture is a crucial defining feature of Fabric andmakes Fabric the first blockchain system to totally sepa-rate consensus from execution and validation This makesconsensus as modular as possible and enables an ecosystemof consensus protocols implementing the ordering service

34 Validation PhaseBlocks are delivered to peers either via a direct connectionto the ordering service or through gossip When a new blockarrives it enters the validation phase consisting of threesequential steps

1 The endorsement policy evaluation occurs in parallelfor all transactions within the block The evaluation isthe task of the so-called validation system chaincode(VSCC) a static library that is part of the blockchainrsquosconfiguration and is responsible for validating the en-dorsement with respect to the endorsement policy config-ured for the chaincode (see Sec 46) If the endorsementis not satisfied the transaction is marked as invalid andits effects are disregarded

2 A read-write conflict check is done for all transactions inthe block sequentially For each transaction it comparesthe versions of the keys in the readset field to those in thecurrent state of the ledger as stored locally by the peerand ensures they are still the same If the versions do notmatch the transaction is marked as invalid and its effectsare disregarded

3 The ledger update phase runs last in which the block isappended to the locally stored ledger and the blockchainstate is updated In particular when adding the blockto the ledger the results of the validity checks in thefirst two steps are persisted as well in the form of a bitmask denoting the transactions that are valid within theblock This facilitates the reconstruction of the state at alater time Furthermore all state updates are applied bywriting all key-value pairs in writeset to the local state

The default VSCC in Fabric allows monotone logical ex-pressions over the set of endorsers configured for a chain-code to be expressed The VSCC evaluation verifies that theset of peers as expressed through valid signatures on en-dorsements of the transaction satisfy the expression Differ-ent VSCC policies can be configured statically however

Discussion on design choices The ledger of Fabric con-tains all transactions including those that are deemed in-valid This follows from the overall design because orderingservice which is agnostic to chaincode state produces thechain of the blocks and because the validation is done by thepeers post-consensus This feature is needed in certain usecases that require tracking of invalid transactions during sub-sequent audits and stands in contrast to other blockchains

Committer

Peer

Endorser

Ledger

Gossip

KVS

Validation configuration sys chaincode

Block store peer trans manager (PTM)

LevelDB CouchDB

Peer-to-peer block gossip

Chaincode execution containers

Figure 5 Components of a Fabric peer

(eg Bitcoin and Ethereum) where the ledger contains onlyvalid transactions

4 Fabric ComponentsFabric is written in Go and uses the gRPC framework(httpgrpcio) for communication between clientspeers and orderers In the following we describe some im-portant components in more detail Figure 5 shows the com-ponents of a peer

41 Membership ServiceThe membership service provider (MSP) maintains the iden-tities of all nodes in the system (clients peers and order-ers) and is responsible for issuing node credentials that areused for authentication and authorization Since Fabric ispermissioned all interactions among nodes occur throughmessages that are authenticated typically with digital sig-natures The membership service comprises a componentat each node where it may authenticate transactions ver-ify the integrity of transactions sign endorsements validateendorsements and authenticate other blockchain operationsTools for key management and registration of nodes are alsopart of the MSP

The MSP is an abstraction for which different instantia-tions are possible The default MSP implementation in Fab-ric handles standard PKI methods for authentication basedon digital signatures and can accommodate commercial cer-tification authorities (CAs) A stand-alone CA is provided aswell with Fabric called Fabric-CA Furthermore alternativeMSP implementations are envisaged such as one relying onanonymous credentials for authorizing a client to invoke atransaction without linking this to an identity [10]

Fabric allows two modes for setting up a blockchain net-work In offline mode credentials are generated by a CAand distributed out-of-band to all nodes Peers and order-ers can only be registered in offline mode For enrollingclients Fabric-CA provides an online mode that issues cryp-tographic credentials to them The MSP configuration mustensure that all nodes especially all peers recognize the sameidentities and authentications as valid

8 201821

The MSP permits identity federation for example whenmultiple organizations operate a blockchain network Eachorganization issues identities to its own members and everypeer recognizes members of all organizations This can beachieved with multiple MSP instantiations for example bycreating a mapping between each organization and an MSP

42 Ordering ServiceThe ordering service manages multiple channels On everychannel it provides the following services

1 Atomic broadcast for establishing order on transactionsimplementing the broadcast and deliver calls (Sec 33)

2 Reconfiguration of a channel when its members mod-ify the channel by broadcasting a configuration updatetransaction (Sec 46)

3 Optionally access control in those configurations wherethe ordering service acts as a trusted entity restrictingbroadcasting of transactions and receiving of blocks tospecified clients and peers

The ordering service is bootstrapped with a genesis blockon the system channel This block carries a configurationtransaction that defines the properties of the ordering ser-vice

The current production implementation consists of or-dering-service nodes (OSNs) that implement the opera-tions described here and communicate through the sys-tem channel The actual atomic broadcast function is pro-vided by an instance of Apache Kafka (httpkafkaapacheorg) which offers scalable publish-subscribe mes-saging and strong consistency despite node crashes basedon ZooKeeper Kafka may run on physical nodes separatefrom the OSNs The OSNs act as proxies between the peersand Kafka

An OSN directly injects a newly received transaction tothe atomic broadcast (eg to the Kafka broker) On the otherhand the nodes batch transactions received from the atomicbroadcast and form blocks A block is cut as soon as one ofthree conditions is met (1) the block contains the specifiedmaximal number of transactions (2) the block has reached amaximal size (in bytes) or (3) an amount of time has elapsedsince the first transaction of a new block was received asexplained below

This batching process is deterministic and therefore pro-duces the same blocks at all nodes It is easy to see thatthe first two conditions are trivially deterministic given thestream of transactions received from the atomic broadcastTo ensure deterministic block production in the third casea node starts a timer when it reads the first transaction ina block from the atomic broadcast If the block is not yetcut when the timer expires the node broadcasts a specialtime-to-cut transaction on the channel which indicates thesequence number of the block which it intends to cut On theother hand every node immediately cuts a new block upon

receiving the first time-to-cut transaction for the given blocknumber Since this transaction is atomically delivered to allconnected nodes they all include the same list of transac-tions in the block (To deploy this scheme in the presence off Byzantine-faulty OSNs the block is cut only when receiv-ing f + 1 time-to-cut transactions)

The orderers persist a range of the most recently deliveredblocks directly to their filesystem so they can answer topeers retrieving blocks through deliver

The orderer using Kafka is one of three ordering serviceimplementations currently available A centralized orderercalled Solo runs on one node and is used for development Aproof-of-concept orderer based on BFT-SMaRt [3] has alsobeen made available [34] it ensures the atomic broadcastservice but not yet reconfiguration and access control Thisillustrates the modularity of consensus in Fabric

43 Peer GossipOne advantage of separating the execution ordering andvalidation phases is that they can be scaled independentlyHowever since most consensus algorithms (in the CFT andBFT models) are bandwidth-bound the throughput of the or-dering service is capped by the network capacity of its nodesConsensus cannot be scaled up by adding more nodes [1436] rather throughput will decrease However since order-ing and validation are decoupled we are interested in ef-ficiently broadcasting the execution results to all peers forvalidation after the ordering phase This is precisely thegoal of the gossip component which utilizes epidemic mul-ticast [15] for this purpose The blocks are signed by theordering service This means that a peer can upon receiv-ing all blocks independently assemble the blockchain andverify its integrity

Dissemination through gossip is robust and resistant tothe node failures in contrast to overlay networks Makingrandom selections allows gossip to reduce the overhead ofmaintaining connectivity among peers and moreover re-duces the attack surface Gossip works well in the permis-sioned environment where it can withstand sybil attacks andmessage forgery

The communication layer for gossip is based on gRPCand utilizes TLS with mutual authentication which enableseach side to bind the TLS credentials to the identity of theremote peer The gossip component maintains an up-to-datemembership view of the online peers in the system All peersindependently build a local view from periodically dissem-inated membership data Furthermore a peer can reconnectto the view after a crash or a network outage

The main purpose of gossip is to reliably distribute mes-sages ie blocks from the ordering phase among the peerstaking advantage of a push-pull protocol It uses two phasesfor information dissemination during push each peer se-lects a random set of active neighbors from the membershipview and forwards them the message during pull each peerperiodically probes a set of randomly selected peers and re-

9 201821

quests missing messages It has been shown [15 22] thatusing both methods in tandem is crucial to optimally utilizethe available bandwidth and to ensure that all peers receiveall messages with high probability

In order to reduce the load of sending blocks from the or-dering nodes to the network the protocol also elects a leaderpeer that pulls blocks from the ordering service on their be-half and initiates the gossip distribution This mechanism isresilient to leader failures

Another task of gossip is to transfer the state to newlyjoining peers and peers that were disconnected for a longtime They need to receive all blocks in the chain This fea-ture relies on the fact that the largest block-sequence num-ber stored by each peer is disseminated with the membershipdata

44 LedgerThe ledger component at each peer maintains the ledgerand the blockchain state on persistent storage and enablessimulation validation and ledger-update phases Broadlyit consists of a block store and a peer transaction manager(PTM)

Ledger block store The ledger block store persists trans-action blocks and is implemented as a set of append-onlyfiles Since the blocks are immutable and arrive in a defi-nite order an append-only structure gives maximum perfor-mance In addition the block store maintains a few indicesfor random access to a block or to a transaction in a block

Peer transaction manager (PTM) The PTM maintainsthe latest state in a versioned key-value store It stores onetuple of the form (key val ver) for each unique entry keystored by any chaincode containing its most recently storedvalue val and its latest version ver The version consistsof the block sequence number and the sequence number ofthe transaction (that stores the entry) within the block Thismakes the version unique and monotonically increasing

The PTM uses a local key-value store to realize the ver-sioned key-value store implemented by a LevelDB key-value database implemented in Go (httpsgithubcomsyndtrgoleveldb) or Apache CouchDB (httpcouchdbapacheorg)

During simulation the PTM provides a stable snapshotof the latest state to the transaction As mentioned in Sec-tion 32 the PTM records in readset a tuple (key ver) foreach entry accessed by GetState and in writeset a tuple(key val) for each entry updated with PutState by the trans-action In addition the PTM supports range queries forwhich it computes a cryptographic hash of the query results(a set of tuples (key ver)) and adds the query string itself andthe hash to readset

For transaction validation (Sec 34) the PTM validatesall transactions in a block sequentially This checks whethera transaction conflicts with any preceding transaction (withinthe block or earlier) For any key in readset if the version

recorded in readset differs from the version present in thelatest state (assuming that all preceding valid transactionsare committed) then the PTM marks the transaction as in-valid For range queries the PTM re-executes the query andcompares the hash with the one present in readset to ensurethat no phantom reads occur This read-write conflict seman-tics results in one-copy serializability [23]

The ledger component tolerates a crash of the peer duringthe ledger update as follows After receiving a new block thePTM has already performed validation and marked transac-tions as valid or invalid within the block using a bit mask asmentioned in Section 34 The ledger now writes the blockto the ledger block store flushes it to disk and subsequentlyupdates the block store indices Then the PTM applies thestate changes from writeset of all valid transactions to the lo-cal versioned store Finally it computes and persists a valuesavepoint which denotes the largest successfully appliedblock number The value savepoint is used to recover theindices and the latest state from the persisted blocks whenrecovering from a crash

45 Chaincode ExecutionChaincode is executed within an environment that is looselycoupled with the rest of the peer and that supports pluginsfor adding new languages for programming chaincodes Cur-rently three languages are supported for chaincode Go Javaand Node

Every user-level or application chaincode runs in a sepa-rate process within a Docker container environment whichisolates the chaincodes from each other and from the peerThis also simplifies the management of the lifecycle forchaincodes (ie starting stopping or aborting chaincode)The chaincode and the peer communicate using gRPC mes-sages Through this loose coupling the peer is agnostic ofthe actual language in which chaincode is implemented

In contrast to application chaincode system chaincoderuns directly in the peer process System chaincode canimplement specific functions needed by Fabric and may beused in situations where the isolation among user chaincodesis overly restrictive More details on system chaincodes aregiven in the next section

46 Configuration and System ChaincodesFabricrsquos basic behavior is customized through channel con-figuration and through special chaincodes known as systemchaincodes

Channel configuration Recall that a channel forms onelogical blockchain The configuration of a channel is main-tained in metadata persisted in special configuration blocksEach configuration block contains the full channel config-uration and does not contain any other transactions Eachblockchain begins with a configuration block known as thegenesis block which is used to bootstrap the channel Thechannel configuration includes

10 201821

bull Definitions of the MSPs for the participating nodesbull The network addresses of the OSNsbull Shared configuration for the consensus implementation

and the ordering service such as batch size and timeoutsbull Rules governing access to the ordering service operations

(broadcast and deliver)bull Rules governing how each part the channel configuration

may be modified

The configuration of a channel may be updated using achannel configuration update transaction This transactioncontains a representation of the changes to be made to theconfiguration as well as a set of signatures The orderingservice nodes evaluate whether the update is valid by usingthe current configuration to verify that the modifications areauthorized using the signatures The orderers then generate anew configuration block which embeds the new configura-tion and the configuration update transaction Peers receiv-ing this block validate whether the configuration update isauthorized based on the current configuration if valid theyupdate their current configuration

System chaincodes The application chaincodes are de-ployed with a reference to an endorsement system chaincode(ESCC) and to a validation system chaincode (VSCC) Thesetwo chaincodes are selected in a symmetric way such thatthe output of the ESCC (an endorsement) may be validatedas part of the input to the VSCC

The ESCC takes as input a proposal and the proposalsimulation results If the results are satisfactory then theESCC produces a response containing the results and theendorsement For the default ESCC this endorsement issimply a signature by the peerrsquos local signing identity

The VSCC takes as input a transaction and outputswhether that transaction is valid For the default VSCCthe endorsements are collected and evaluated against theendorsement policy specified for the chaincode

Further system chaincodes implement other support func-tions such as configuration and chaincode lifecycle

5 EvaluationEven though Fabric is not yet performance-tuned and opti-mized we report in this section on some preliminary per-formance numbers Fabric is a complex distributed systemits performance depends on many parameters including thechoice of a distributed application and transaction size theordering service and consensus implementation and their pa-rameters the network parameters and topology of nodes inthe network the hardware on which nodes run the num-ber of nodes and channels further configuration parametersand the network dynamics Therefore in-depth performanceevaluation of Fabric is postponed to future work

In the absence of a standard benchmark for blockchainswe use the most prominent blockchain application for eval-uating Fabric a simple authority-minted cryptocurrency that

uses the data model of Bitcoin which we call Fabric coin(abbreviated hereafter as Fabcoin) This allows us to put theperformance of Fabric in the context of other permissionedblockchains which are often derived from Bitcoin or Ethe-reum For example it is also the application used in bench-marks of other permissioned blockchains [19 32]

In the following we first describe Fabcoin (Sec 51)which also demonstrates how to customize the validationphase and endorsement policy In Section 52 we present thebenchmark and discuss our results

51 Fabric Coin (Fabcoin)UTXO cryptocurrencies The data model introduced byBitcoin [28] has become known as ldquounspent transaction out-putrdquo or UTXO and is also used by many other cryptocur-rencies and distributed applications UTXO represents eachstep in the evolution of a data object as a separate atomicstate on the ledger Such a state is created by a transactionand destroyed (or ldquoconsumedrdquo) by another unique transac-tion occurring later Every given transaction destroys a num-ber of input states and creates one or more output states Aldquocoinrdquo in Bitcoin is initially created by a coinbase transac-tion that rewards the ldquominerrdquo of the block This appears onthe ledger as a coin state designating the miner as the ownerAny coin can be spent in the sense that the coin is assignedto a new owner by a transaction that atomically destroys thecurrent coin state designating the previous owner and createsanother coin state representing the new owner

We capture the UTXO model in the key-value store ofFabric as follows Each UTXO state corresponds to a uniqueKVS entry that is created once (the coin state is ldquounspentrdquo)and destroyed once (the coin state is ldquospentrdquo) Equivalentlyevery state may be seen as a KVS entry with logical ver-sion 0 after creation when it is destroyed again it receivesversion 1 There should not be any concurrent updates tosuch entries (eg attempting to update a coin state in differ-ent ways amounts to double-spending the coin)

Value in the UTXO model is transferred through transac-tions that refer to several input states that all belong to theentity issuing the transaction An entity owns a state becausethe public key of the entity is contained in the state itselfEvery transaction creates one or more output states in theKVS representing the new owners deletes the input states inthe KVS and ensures that the sum of the values in the inputstates equals the sum of the output statesrsquo values There isalso a policy determining how value is created (eg coin-base transactions in Bitcoin or specific mint operations inother systems) or destroyed (ie as a fee consumed by theexecution)

Fabcoin implementation Each state in Fabcoin is a tupleof the form (key val) = (txidj (amount owner label)) de-noting the coin state created as the j-th output of a transac-tion with identifier txid and allocating amount units labeledwith label to the entity whose public key is owner Labels are

11 201821

strings used to identify a given type of a coin (eg lsquoUSDlsquolsquoEURlsquo lsquoFBClsquo) Transaction identifiers are short values thatuniquely identify every Fabric transaction The Fabcoin im-plementation consists of three parts (1) a client wallet (2)the Fabcoin chaincode and (3) a custom VSCC for Fabcoinimplementing its endorsement policy

Client wallet By default each Fabric client maintains aFabcoin wallet that locally stores a set of cryptographic keysallowing the client to spend coins For creating a SPENDtransaction that transfers one or more coins the client walletcreates a Fabcoin request request = (inputs outputs sigs)containing (1) a list of input coin states inputs = [in ]that specify coin states (in ) the client wishes to spend aswell as (2) a list of output coin states outputs = [(amount ownerlabel) ] The client wallet signs with the private keys thatcorrespond to the input coin states the concatenation of theFabcoin request and a nonce which is a part of every Fabrictransaction and adds the signatures in a set sigs A SPENDtransaction is valid when the sum of the amounts in the in-put coin states is at least the sum of the amounts in the out-puts and when the output amounts are positive For a MINTtransaction that creates new coins inputs contains only anidentifier (ie a reference to a public key) of a special en-tity called Central Bank (CB) whereas outputs contains anarbitrary number of coin states To be considered valid thesignatures of a MINT transaction in sigs must be a crypto-graphic signature under the public key of CB over the con-catenation of the Fabcoin request and of the aforementionednonce Fabcoin may be configured to use multiple CBs orspecify a threshold number of signatures from a set of CBsFinally the client wallet includes the Fabcoin request into atransaction and sends this to a peer of its choice

Fabcoin chaincode A peer runs the chaincode of Fab-coin which simulates the transaction and creates readsets andwritesets In a nutshell in the case of a SPEND transactionfor every input coin state in isin inputs the chaincode first per-forms GetState(in) this puts in into the readset along with itscurrent version in Fabricrsquos versioned KVS (Sec 44) Thenthe chaincode executes DelState(in) for every input state inwhich also adds in to the writeset and effectively marks thecoin state as ldquospentrdquo Finally for j = 1 |outputs| thechaincode executes PutState(txidj out) with the j-th out-put out = (amount owner label) in outputs In addition apeer optionally runs the transaction validation code as de-scribed next in the VSCC step for Fabcoin this is not neces-sary since the custom VSCC actually validates transactionsbut it allows the (correct) peers to filter out potentially mal-formed transactions In our implementation the chaincoderuns the Fabcoin VSCC without cryptographically verifyingthe signatures

Custom VSCC Finally every peer validates Fabcoin trans-actions using custom VSCC This verifies first the cryp-tographic signature(s) in sigs under the respective public

key(s) and performs semantic validation as follows For aMINT transaction it checks that the output states are createdunder the matching transaction identifier (txid) and that alloutput amounts are positive For a SPEND transaction theVSCC additionally verifies (1) that for all input coin statesan entry in the readset has been created and that it was alsoadded to the writeset and marked as deleted (2) that the sumof the amounts for all input coin states equals the sum ofamounts of all output coin states and (3) that input and out-put coin labels match Here the VSCC obtains the amountsfor the input coins by retrieving their current values from theledger

Notice that the Fabcoin VSCC does not check transac-tions for double spending as this occurs through Fabricrsquosstandard validation that runs after the custom VSCC In par-ticular if two transactions attempt to assign the same un-spent coin state to a new owner both would pass the VSCClogic but would be caught subsequently in the read-writeconflict check performed by the PTM According to Sec-tions 34 and 44 the PTM verifies that the current versionnumber stored in the ledger matches the one in the readsethence after the first transaction has changed the version ofthe coin state the transaction ordered second will be recog-nized as invalid

52 ExperimentsSetup Unless explicitly mentioned differently in our ex-periments (1) nodes run on Fabric version v11-preview2

instrumented for performance evaluation through local log-ging (2) nodes are hosted in a single IBM Cloud (SoftLayer)datacenter as dedicated VMs interconnected with 1Gbps net-working (3) all nodes are 20 GHz 16-vCPU VMs runningUbuntu with 8GB of RAM and SSDs as local disks (4) asingle-channel ordering service runs a typical Kafka orderersetup with 3 ZooKeeper nodes 4 Kafka brokers and 3 Fab-ric orderers all on distinct VMs (5) there are 5 peers in to-tal all of which are Fabcoin endorsers and (6) signaturesuse the default 256-bit ECDSA scheme In order to measureand stage latencies in the transaction flow spanning multiplenodes the node clocks are synchronized with an NTP ser-vice throughout the experiments All communication amongFabric nodes is configured to use TLS

Methodology In every experiment in the first phase weinvoke transactions that contain only Fabcoin MINT opera-tions to produce the coins and then run a second phase ofthe experiment in which we invoke Fabcoin SPEND opera-tion on previously minted coins (effectively running single-input single-output SPEND transactions) When reportingthroughput measurements we use an increasing number ofFabric CLI clients (modified to issue concurrent requests)running on a single VM until the end-to-end throughputis saturated and state the throughput just before saturationThroughput numbers are reported as an average throughput

2Patched with commit ID 9e770062 in the Fabric master branch

12 201821

measured throughout the duration of an experiment disre-garding the ldquotailrdquo of the experiment where throughput dropsdue to some client threads finishing submitting their share oftransactions In every experiment client threads submit atleast 500k MINT and SPEND transactions

Choosing the block size A critical Fabric configurationparameter that impacts both throughput and latency is blocksize To fix the block size for subsequent experiments andto evaluate the impact of block size on performance we ranexperiments varying block size from 05MB to 4MB Resultsare depicted in Fig 6 showing peak throughput measured atthe peers along with the corresponding average end-to-end(e2e) latency

Figure 6 Impact of block size on throughput and latency

We can observe that throughput does not significantlyimprove beyond a block size of 2MB but latency gets worse(as expected) Therefore we adopt 2MB as the block sizefor the following experiments with the goal of maximizingthe measured throughput assuming the end-to-end latencyof roughly 500ms is acceptable

Size of transactions During this experiment we also ob-served the size MINT and SPEND transactions In particularthe 2MB-blocks contained 473 MINT or 670 SPEND transac-tions ie the average transaction size is 306kB for SPENDand 433kB for MINT In general transactions in Fabric arelarge because they carry certificate information BesidesMINT transactions of Fabcoin are larger than SPEND trans-actions because they carry CB certificates This is an avenuefor future improvement of both Fabric and Fabcoin

Impact of peer CPU Fabric peers run many CPU-intensivecryptographic operations To estimate the impact of CPU onthroughput we performed a set of experiments in which 4peers run on 4 8 16 and 32 vCPU VMs while also doingcoarse-grained latency staging of block validation to betteridentify bottlenecks Our experiment focused on the vali-dation phase as ordering with Kafka ordering service hasnever been a bottleneck in our experiments The validation

phase and in particular the VSCC validation of Fabcoinis computationally intensive due to many digital signatureverifications performed in this phase

Figure 7 Impact of peer CPU on end-to-end throughputand block validation latency

The results with 2MB blocks are shown in Fig 7 SPENDlatency is not shown in Fig 7 as it scales similarly to MINTWe can observe that Fabcoin VSCC validation scales quasi-linearly with CPU as Fabric VSCC validation is embarrass-ingly parallel However the read-write-check and ledger-access stages are sequential and become dominant with alarger number of cores (vCPUs) This suggests that futureversions of Fabric could profit from pipelining validationstages (which are now sequential) optimizing stable-storageaccess and parallelizing read-write dependency checks

Finally in this experiment we measured over 3560 trans-actions per second (tps) average SPEND throughput at the32-vCPU peer The MINT throughput is in general slightlylower than that of SPEND but the difference remains within10

Latency profiling by stages We further performed coarse-grained profiling of latency during our previous experimentat the peak reported throughput Results are depicted inTable 1 The ordering phase comprises broadcast-deliverlatency and internal latency within a peer before valida-tion starts The table reports average latencies for MINTand SPEND standard deviation and tail latencies (99 and999)

We can observe that ordering dominates the overall la-tency We observe that average latencies are below 550mswith sub-second tail latencies In particular the highestend-to-end latencies in our experiment come from the firstblocks during the load build-up Latency under lower loadcan be regulated and reduced by leveraging the time-to-cutparameter of the orderer (see Sec 33) which we basicallydo not use in our experiments as we set it to a large value

SSD vs RAM disk To evaluate the overhead of using lo-cal SSDs for stable storage we repeated the previous exper-iment with RAM disks (tmpfs) mounted as stable storage at

13 201821

all peer VMs The benefits are limited as tmpfs only helpswith the ledger phase of the validation at the peer We mea-sured sustained peak throughput at 3870 SPEND tps at 32-vCPU peer roughly a 9 improvement over SSD

avg stdev 99 999(1) endorsement 56 75 24 42 15 21 19 26(2) ordering 248 365 600 920 484 624 523 636(3) VSCC val 310 353 102 90 727 570 113 1084(4) RW check 348 615 39 93 470 885 590 933(5) ledger 506 722 62 88 701 975 725 105(6) validation (3+4+5) 116 169 128 178 156 216 199 230(7) end-to-end (1+2+6) 371 542 63 94 612 805 646 813

Table 1 Latency statistics in milliseconds (ms) for MINTand SPEND broken down into five stages at a 32-vCPUpeer with 2MB blocks Validation (6) comprises stages 34 and 5 the end-to-end latency contains stages 1ndash5

6 Related WorkThe architecture of Fabric resembles that of a middleware-replicated database as pioneered by Kemme and Alonso [24]However all existing work on this addressed only crash fail-ures not the setting of distributed trust that corresponds to aBFT system For instance a replicated database with asym-metric update processing [25 Sec 63] relies on one node toexecute each transaction which would not work on a block-chain The execute-order-validate architecture of Fabric canbe interpreted as a generalization of this work to the Byzan-tine model with practical applications to distributed ledgers

Byzantium [17] and HRDB [35] are two further prede-cessors of Fabric from the viewpoint of BFT database repli-cation Byzantium allows transactions to run in parallel anduses active replication but totally orders BEGIN and COM-MITROLLBACK using a BFT middleware In its opti-mistic mode every operation is coordinated by a single mas-ter replica if the master is suspected to be Byzantine allreplicas execute the transaction operations for the master andit triggers a costly protocol to change the master HRDB re-lies in an even stronger way on a correct master In contrastto Fabric both systems use active replication cannot handlea flexible trust model and rely on deterministic operationsby the replicas However their database API is richer thanthe KVS model of Fabric

In Eve [21] a related architecture for SMR has been ex-plored also in the BFT model Its peers execute transactionsconcurrently and then verify that they all reach the same out-put state using a consensus protocol If the states divergethey roll back and execute operations sequentially Eve con-tains the element of independent execution which also existsin Fabric but offers none of its other features

A large number of distributed ledger platforms in thepermissioned model have come out recently which makesit impossible to compare to all (some prominent ones areTendermint [5] Quorum [19] Chain Core [12] Multi-chain (httpswwwmultichaincom) HyperledgerSawtooth (httpssawtoothhyperledgerorg) the

Volt proposal [32] and more see references in recent over-views [9 16]) All platforms follow the order-execute archi-tecture as discussed in Section 2 As a representative exam-ple take the Quorum platform [19] an enterprise-focusedversion of Ethereum With its consensus based on Raft [29]it disseminates a transaction to all peers using gossip and theRaft leader (called minter) assembles valid transactions to ablock and distributes this using Raft All peers execute thetransaction in the order decided by the leader Therefore itsuffers from the limitations mentioned in Sections 1ndash2

References[1] P-L Aublin R Guerraoui N Knezevic V Quema and

M Vukolic The next 700 BFT protocols ACM Trans Com-put Syst 32(4)121ndash1245 Jan 2015

[2] E Ben-Sasson A Chiesa C Garman M Green I MiersE Tromer and M Virza Zerocash Decentralized anonymouspayments from bitcoin In IEEE Symposium on Security ampPrivacy pages 459ndash474 2014

[3] A N Bessani J Sousa and E A P Alchieri State ma-chine replication for the masses with BFT-SMART In In-ternational Conference on Dependable Systems and Networks(DSN) pages 355ndash362 2014

[4] G Bracha and S Toueg Asynchronous consensus and broad-cast protocols J ACM 32(4)824ndash840 1985

[5] E Buchman Tendermint Byzantine fault tolerance in the ageof blockchains MSc Thesis University of Guelph CanadaJune 2016

[6] N Budhiraja K Marzullo F B Schneider and S Toueg Theprimary-backup approach In S Mullender editor DistributedSystems (2nd Ed) pages 199ndash216 ACM PressAddison-Wesley 1993

[7] C Cachin R Guerraoui and L E T Rodrigues Introductionto Reliable and Secure Distributed Programming (2 ed)Springer 2011

[8] C Cachin S Schubert and M Vukolic Non-determinismin byzantine fault-tolerant replication In 20th InternationalConference on Principles of Distributed Systems (OPODIS)2016

[9] C Cachin and M Vukolic Blockchain consensus protocolsin the wild In A W Richa editor 31st Intl Symposium onDistributed Computing (DISC 2017) pages 11ndash116 2017

[10] J Camenisch and E V Herreweghen Design and imple-mentation of the idemix anonymous credential system InACM Conference on Computer and Communications Security(CCS) pages 21ndash30 2002

[11] M Castro and B Liskov Practical Byzantine fault toler-ance and proactive recovery ACM Trans Comput Syst20(4)398ndash461 Nov 2002

[12] Chain Chain protocol whitepaper httpschaincom

docs12protocolpaperswhitepaper 2017

[13] B Charron-Bost F Pedone and A Schiper editors Replica-tion Theory and Practice volume 5959 of Lecture Notes inComputer Science Springer 2010

14 201821

[14] K Croman C Decker I Eyal A E Gencer A JuelsA Kosba A Miller P Saxena E Shi E G Sirer et alOn scaling decentralized blockchains In International Con-ference on Financial Cryptography and Data Security (FC)pages 106ndash125 Springer 2016

[15] A Demers D Greene C Hauser W Irish J LarsonS Shenker H Sturgis D Swinehart and D Terry Epidemicalgorithms for replicated database maintenance In ACMSymposium on Principles of Distributed Computing (PODC)pages 1ndash12 ACM 1987

[16] T T A Dinh R Liu M Zhang G Chen B C Ooi andJ Wang Untangling blockchain A data processing viewof blockchain systems e-print arXiv170805665 [csDB]2017

[17] R Garcia R Rodrigues and N M Preguica Efficient mid-dleware for Byzantine fault tolerant database replication InEuropean Conference on Computer Systems (EuroSys) pages107ndash122 2011

[18] R Guerraoui R R Levy B Pochon and V Quema Through-put optimal total order broadcast for cluster environmentsACM Trans Comput Syst 28(2)51ndash532 2010

[19] J P Morgan Quorum whitepaper httpsgithubcom

jpmorganchasequorum-docs 2016

[20] F P Junqueira B C Reed and M Serafini Zab High-performance broadcast for primary-backup systems In In-ternational Conference on Dependable Systems amp Networks(DSN) pages 245ndash256 2011

[21] M Kapritsos Y Wang V Quema A Clement L Alvisiand M Dahlin All about Eve Execute-verify replicationfor multi-core servers In Symposium on Operating SystemsDesign and Implementation (OSDI) pages 237ndash250 2012

[22] R Karp C Schindelhauer S Shenker and B Vocking Ran-domized rumor spreading In Symposium on Foundations ofComputer Science (FOCS) pages 565ndash574 IEEE 2000

[23] B Kemme One-copy-serializability In Encyclopedia ofDatabase Systems pages 1947ndash1948 Springer 2009

[24] B Kemme and G Alonso A new approach to developingand implementing eager database replication protocols ACMTransactions on Database Systems 25(3)333ndash379 2000

[25] B Kemme R Jimenez-Peris and M Patino-MartınezDatabase Replication Synthesis Lectures on Data Manage-ment Morgan amp Claypool Publishers 2010

[26] A E Kosba A Miller E Shi Z Wen and C PapamanthouHawk The blockchain model of cryptography and privacy-preserving smart contracts In 37th IEEE Symposium onSecurity amp Privacy 2016

[27] S Liu P Viotti C Cachin V Quema and M Vukolic XFTpractical fault tolerance beyond crashes In Symposium onOperating Systems Design and Implementation (OSDI) pages485ndash500 2016

[28] S Nakamoto Bitcoin A peer-to-peer electronic cash systemhttpwwwbitcoinorgbitcoinpdf 2009

[29] D Ongaro and J Ousterhout In search of an understandableconsensus algorithm In USENIX Annual Technical Confer-ence (ATC) pages 305ndash320 2014

[30] F Pedone and A Schiper Handling message semanticswith Generic Broadcast protocols Distributed Computing15(2)97ndash107 2002

[31] F B Schneider Implementing fault-tolerant services usingthe state machine approach A tutorial ACM Comput Surv22(4)299ndash319 1990

[32] S Setty S Basu L Zhou M L Roberts and R VenkatesanEnabling secure and resource-efficient blockchain networkswith VOLT Technical Report MSR-TR-2017-38 MicrosoftResearch 2017

[33] A Singh T Das P Maniatis P Druschel and T Roscoe BFTprotocols under fire In Symposium on Networked SystemsDesign amp Implementation (NSDI) pages 189ndash204 2008

[34] J Sousa A Bessani and M Vukolic A Byzantine fault-tolerant ordering service for the Hyperledger Fabric block-chain platform Technical Report arXiv170906921 CoRR2017

[35] B Vandiver H Balakrishnan B Liskov and S MaddenTolerating Byzantine faults in transaction processing systemsusing commit barrier scheduling In ACM Symposium onOperating Systems Principles (SOSP) pages 59ndash72 2007

[36] M Vukolic The quest for scalable blockchain fabric Proof-of-work vs BFT replication In International Workshop onOpen Problems in Network Security (iNetSec) pages 112ndash125 2015

[37] J Yin J Martin A Venkataramani L Alvisi and M DahlinSeparating agreement from execution for Byzantine fault tol-erant services In ACM Symposium on Operating SystemsPrinciples (SOSP) pages 253ndash267 2003

15 201821

  • 1 Introduction
  • 2 Background
    • 21 Order-Execute Architecture for Blockchains
    • 22 Limitations of Order-Execute
    • 23 Further Limitations of Existing Architectures
    • 24 Experience with Order-Execute Blockchain
      • 3 Architecture
        • 31 Fabric Overview
        • 32 Execution Phase
        • 33 Ordering Phase
        • 34 Validation Phase
          • 4 Fabric Components
            • 41 Membership Service
            • 42 Ordering Service
            • 43 Peer Gossip
            • 44 Ledger
            • 45 Chaincode Execution
            • 46 Configuration and System Chaincodes
              • 5 Evaluation
                • 51 Fabric Coin (Fabcoin)
                • 52 Experiments
                  • 6 Related Work

client endorsingpeer 1

endorsingpeer 2

endorsingpeer 3

Peer(non-endorsing)

ord

ering service

orderers

Invocation Commit

1 1 1

2

3

4 4

55

5 5

1 Chaincode execution

2 Endorsementcollection

34Ordering

BroadcastDelivery5 Validation

Figure 3 Fabric high level transaction flow

natures of the endorsing peers that computed them Or-derers are entirely unaware of the application state anddo not participate in the execution nor in the validationof transactions This design choice renders consensus inFabric as modular as possible and simplifies replacementof consensus protocols in Fabric

Since it is possible to run one physical node with multipleroles Fabric can also be operated like a traditional peer-to-peer blockchain system in which every node maintainsthe state and invokes validates and orders transactions Thetransaction flow in Fabric across the different types of nodesis depicted in Fig 3

Compared to the description focusing on one singleblockchain so far a Fabric network actually supports mul-tiple blockchains connected to the same ordering serviceEach such blockchain is called a channel and may havedifferent peers as its members Channels can be used topartition the state of the blockchain network but consen-sus across channels is not coordinated and the total order oftransactions in each channel is separate from the others Cer-tain deployments that consider all orderers as trusted mayalso implement by-channel access control for peers In thefollowing we mention channels only briefly and concentrateon one single channel

The next three sections explain the transaction flow inFabric and illustrate the steps of the execution ordering andvalidation phases A Fabric network is shown in Fig 4

32 Execution PhaseIn the execution phase clients send the transaction proposal(or simply proposal) to one or more endorsers for execu-tion Recall that every chaincode implicitly specifies a set ofendorsers via the endorsement policy A proposal containsthe identity of the submitting client (according to the MSP)the transaction payload in the form of an operation to exe-cute parameters and the identifier of the chaincode to whichit belongs a nonce to be used only once by each client (suchas a counter or a random value) and a transaction identifierderived from the client identifier and the nonce The clientalso signs the proposal

Appl

MSP

P

SDK

P P P P P P

SDK

OSN

P

OSNOSNOSNOSN

Client

Orderingservice

Peer-to-peer gossip

Peers (P)

Client

Appl

Appl

Figure 4 A Fabric network with federated MSPs and run-ning multiple (differently shaded and colored) chaincodesselectively installed on peers according to policy

The endorsers simulate the proposal by executing the op-eration on the specified chaincode which has been installedon the blockchain The chaincode runs in a Docker containerisolated from the main endorser process

A proposal is simulated against the endorserrsquos localblockchain state without any synchronization with otherpeers at this point moreover endorsers do not persist theresults of the simulation to the ledger state The state ofthe blockchain is maintained by the peer transaction man-ager (PTM) in the form of a versioned key-value store inwhich successive updates to a key have monotonically in-creasing version numbers (Sec 44) The state created by achaincode is scoped exclusively to that chaincode and can-not be accessed directly by another chaincode Note that thechaincode is not supposed to maintain the local state in theprogram code only what it maintains in the blockchain statethat is accessed with GetState PutState and DelState oper-ations Given the appropriate permission a chaincode mayinvoke another chaincode to access its state within the samechannel

As a result of the simulation each endorser produces avalue writeset consisting of the state updates produced bysimulation (ie the modified keys along with their new val-ues) as well as a readset representing the version dependen-cies of the proposal simulation (ie all keys read during sim-ulation along with their version numbers) After the simula-tion the endorser cryptographically signs a message calledendorsement which contains readset and writeset (togetherwith metadata such as transaction ID endorser ID and en-dorser signature) and sends it back to the client in a proposalresponse The client collects endorsements until they satisfythe endorsement policy of the chaincode which the trans-action invokes (see Sec 34) In particular this requires allendorsers as determined by the policy to produce the sameexecution result (ie identical readset and writeset) Thenthe client proceeds to create the transaction and passes it tothe ordering service

6 201821

Discussion on design choices As the endorsers simulatethe proposal without synchronizing with other endorserstwo endorsers may execute it on different states of the ledgerand produce different outputs For the standard endorse-ment policy which requires multiple endorsers to producethe same result this implies that under high contention ofoperations accessing the same keys a client may not beable to satisfy the endorsement policy This is a new ele-ment compared to primary-backup replication in replicateddatabases with synchronization through middleware [24] aconsequence of the assumption that no single peer is trustedfor correct execution in a blockchain

We consciously adopted this design as it considerablysimplifies the architecture and is adequate for typical block-chain applications As demonstrated by the approach of Bit-coin distributed applications can be formulated such thatcontention by operations accessing the same state can be re-duced or eliminated completely in the normal case (eg inBitcoin two operations that modify the same ldquoobjectrdquo arenot allowed and represent a double-spending attack [28])

Executing a transaction before the ordering phase is crit-ical to tolerating non-deterministic chaincodes as discussedin Section 2 A chaincode in Fabric with non-deterministictransactions can only endanger the liveness of its own oper-ations because a client cannot gather a sufficient number ofendorsements for instance This is much more acceptable inpractice than the situation in an order-execute architecturewhere non-deterministic operations lead to inconsistenciesin the state of the peers

Finally tolerating non-deterministic execution also ad-dresses DoS attacks from untrusted chaincode as an endorsercan simply abort an execution according to a local policy ifit suspects a DoS attack This will not endanger the consis-tency of the system and again such unilateral abortion ofexecution is not possible in order-execute architectures

33 Ordering PhaseWhen a client has collected enough endorsements on a pro-posal it assembles a transaction and submits this to the or-dering service The transaction contains the transaction pay-load (ie the chaincode operation including parameters)transaction metadata and a set of endorsements The order-ing phase establishes a total order on all submitted transac-tions per channel In other words ordering atomically broad-casts [7] endorsements and thereby establishes consensus ontransactions despite faulty orderers Moreover the orderingservice batches multiple transactions into blocks and outputsa hash-chained sequence of blocks containing transactionsGrouping or batching transactions into blocks improves thethroughput of the broadcast protocol which is a well-knowntechnique in the context of fault-tolerant broadcasts

At a high level the interface of the ordering service onlysupports the following two operations These operations areinvoked by a peer and implicitly parameterized by a channelidentifier

bull broadcast(tx) A client calls this operation to broadcastan arbitrary transaction tx which usually contains thetransaction payload and a signature of the client fordissemination

bull B larr deliver(s) A client calls this to retrieve block Bwith non-negative sequence number s The block con-tains a list of transactions [tx1 txk] and a hash-chain value h representing the block with sequence num-ber sminus1 ie B = ([tx1 txk] h) As the client maycall this multiple times and always returns the same blockonce it is available we say the peer delivers block B withsequence number s when it receives B for the first timeupon invoking deliver(s)

The ordering service ensures that the delivered blocks onone channel are totally ordered More specifically orderingensures the following safety properties for each channel

Agreement For any two blocks B delivered with sequencenumber s and Bprime delivered with sprime at correct peers suchthat s = sprime it holds B = Bprime

Hashchain integrity If some correct peer delivers a block Bwith number s and another correct peer delivers blockBprime = ([tx1 txk] h

prime) with number s + 1 then itholds hprime = H(B) where H(middot) denotes the cryptographichash function

No skipping If a correct peer p delivers a block with num-ber s gt 0 then for each i = 0 s minus 1 peer p hasalready delivered a block with number i

No creation When a correct peer delivers block B withnumber s then for every tx isin B some client has alreadybroadcast tx

For liveness the ordering service supports at least thefollowing ldquoeventualrdquo property

Validity If a correct client invokes broadcast(tx) then ev-ery correct peer eventually delivers a block B that in-cludes tx with some sequence number

However every individual ordering implementation is al-lowed to come with its own liveness and fairness guaranteeswith respect to client requests

Since there may be a large number of peers in the block-chain network but only relatively few nodes are expectedto implement the ordering service Fabric can be config-ured to use a built-in gossip service for disseminating deliv-ered blocks from the ordering service to all peers (Sec 43)The implementation of gossip is scalable and agnostic to theparticular implementation of the ordering service hence itworks with both CFT and BFT ordering services ensuringthe modularity of Fabric

The ordering service may also perform access controlchecks to see if a client is allowed to broadcast messagesor receive blocks on a given channel This and other featuresof the ordering service are further explained in Section 42

7 201821

Discussion on design choices It is very important that theordering service does not maintain any state of the block-chain and neither validates nor executes transactions Thisarchitecture is a crucial defining feature of Fabric andmakes Fabric the first blockchain system to totally sepa-rate consensus from execution and validation This makesconsensus as modular as possible and enables an ecosystemof consensus protocols implementing the ordering service

34 Validation PhaseBlocks are delivered to peers either via a direct connectionto the ordering service or through gossip When a new blockarrives it enters the validation phase consisting of threesequential steps

1 The endorsement policy evaluation occurs in parallelfor all transactions within the block The evaluation isthe task of the so-called validation system chaincode(VSCC) a static library that is part of the blockchainrsquosconfiguration and is responsible for validating the en-dorsement with respect to the endorsement policy config-ured for the chaincode (see Sec 46) If the endorsementis not satisfied the transaction is marked as invalid andits effects are disregarded

2 A read-write conflict check is done for all transactions inthe block sequentially For each transaction it comparesthe versions of the keys in the readset field to those in thecurrent state of the ledger as stored locally by the peerand ensures they are still the same If the versions do notmatch the transaction is marked as invalid and its effectsare disregarded

3 The ledger update phase runs last in which the block isappended to the locally stored ledger and the blockchainstate is updated In particular when adding the blockto the ledger the results of the validity checks in thefirst two steps are persisted as well in the form of a bitmask denoting the transactions that are valid within theblock This facilitates the reconstruction of the state at alater time Furthermore all state updates are applied bywriting all key-value pairs in writeset to the local state

The default VSCC in Fabric allows monotone logical ex-pressions over the set of endorsers configured for a chain-code to be expressed The VSCC evaluation verifies that theset of peers as expressed through valid signatures on en-dorsements of the transaction satisfy the expression Differ-ent VSCC policies can be configured statically however

Discussion on design choices The ledger of Fabric con-tains all transactions including those that are deemed in-valid This follows from the overall design because orderingservice which is agnostic to chaincode state produces thechain of the blocks and because the validation is done by thepeers post-consensus This feature is needed in certain usecases that require tracking of invalid transactions during sub-sequent audits and stands in contrast to other blockchains

Committer

Peer

Endorser

Ledger

Gossip

KVS

Validation configuration sys chaincode

Block store peer trans manager (PTM)

LevelDB CouchDB

Peer-to-peer block gossip

Chaincode execution containers

Figure 5 Components of a Fabric peer

(eg Bitcoin and Ethereum) where the ledger contains onlyvalid transactions

4 Fabric ComponentsFabric is written in Go and uses the gRPC framework(httpgrpcio) for communication between clientspeers and orderers In the following we describe some im-portant components in more detail Figure 5 shows the com-ponents of a peer

41 Membership ServiceThe membership service provider (MSP) maintains the iden-tities of all nodes in the system (clients peers and order-ers) and is responsible for issuing node credentials that areused for authentication and authorization Since Fabric ispermissioned all interactions among nodes occur throughmessages that are authenticated typically with digital sig-natures The membership service comprises a componentat each node where it may authenticate transactions ver-ify the integrity of transactions sign endorsements validateendorsements and authenticate other blockchain operationsTools for key management and registration of nodes are alsopart of the MSP

The MSP is an abstraction for which different instantia-tions are possible The default MSP implementation in Fab-ric handles standard PKI methods for authentication basedon digital signatures and can accommodate commercial cer-tification authorities (CAs) A stand-alone CA is provided aswell with Fabric called Fabric-CA Furthermore alternativeMSP implementations are envisaged such as one relying onanonymous credentials for authorizing a client to invoke atransaction without linking this to an identity [10]

Fabric allows two modes for setting up a blockchain net-work In offline mode credentials are generated by a CAand distributed out-of-band to all nodes Peers and order-ers can only be registered in offline mode For enrollingclients Fabric-CA provides an online mode that issues cryp-tographic credentials to them The MSP configuration mustensure that all nodes especially all peers recognize the sameidentities and authentications as valid

8 201821

The MSP permits identity federation for example whenmultiple organizations operate a blockchain network Eachorganization issues identities to its own members and everypeer recognizes members of all organizations This can beachieved with multiple MSP instantiations for example bycreating a mapping between each organization and an MSP

42 Ordering ServiceThe ordering service manages multiple channels On everychannel it provides the following services

1 Atomic broadcast for establishing order on transactionsimplementing the broadcast and deliver calls (Sec 33)

2 Reconfiguration of a channel when its members mod-ify the channel by broadcasting a configuration updatetransaction (Sec 46)

3 Optionally access control in those configurations wherethe ordering service acts as a trusted entity restrictingbroadcasting of transactions and receiving of blocks tospecified clients and peers

The ordering service is bootstrapped with a genesis blockon the system channel This block carries a configurationtransaction that defines the properties of the ordering ser-vice

The current production implementation consists of or-dering-service nodes (OSNs) that implement the opera-tions described here and communicate through the sys-tem channel The actual atomic broadcast function is pro-vided by an instance of Apache Kafka (httpkafkaapacheorg) which offers scalable publish-subscribe mes-saging and strong consistency despite node crashes basedon ZooKeeper Kafka may run on physical nodes separatefrom the OSNs The OSNs act as proxies between the peersand Kafka

An OSN directly injects a newly received transaction tothe atomic broadcast (eg to the Kafka broker) On the otherhand the nodes batch transactions received from the atomicbroadcast and form blocks A block is cut as soon as one ofthree conditions is met (1) the block contains the specifiedmaximal number of transactions (2) the block has reached amaximal size (in bytes) or (3) an amount of time has elapsedsince the first transaction of a new block was received asexplained below

This batching process is deterministic and therefore pro-duces the same blocks at all nodes It is easy to see thatthe first two conditions are trivially deterministic given thestream of transactions received from the atomic broadcastTo ensure deterministic block production in the third casea node starts a timer when it reads the first transaction ina block from the atomic broadcast If the block is not yetcut when the timer expires the node broadcasts a specialtime-to-cut transaction on the channel which indicates thesequence number of the block which it intends to cut On theother hand every node immediately cuts a new block upon

receiving the first time-to-cut transaction for the given blocknumber Since this transaction is atomically delivered to allconnected nodes they all include the same list of transac-tions in the block (To deploy this scheme in the presence off Byzantine-faulty OSNs the block is cut only when receiv-ing f + 1 time-to-cut transactions)

The orderers persist a range of the most recently deliveredblocks directly to their filesystem so they can answer topeers retrieving blocks through deliver

The orderer using Kafka is one of three ordering serviceimplementations currently available A centralized orderercalled Solo runs on one node and is used for development Aproof-of-concept orderer based on BFT-SMaRt [3] has alsobeen made available [34] it ensures the atomic broadcastservice but not yet reconfiguration and access control Thisillustrates the modularity of consensus in Fabric

43 Peer GossipOne advantage of separating the execution ordering andvalidation phases is that they can be scaled independentlyHowever since most consensus algorithms (in the CFT andBFT models) are bandwidth-bound the throughput of the or-dering service is capped by the network capacity of its nodesConsensus cannot be scaled up by adding more nodes [1436] rather throughput will decrease However since order-ing and validation are decoupled we are interested in ef-ficiently broadcasting the execution results to all peers forvalidation after the ordering phase This is precisely thegoal of the gossip component which utilizes epidemic mul-ticast [15] for this purpose The blocks are signed by theordering service This means that a peer can upon receiv-ing all blocks independently assemble the blockchain andverify its integrity

Dissemination through gossip is robust and resistant tothe node failures in contrast to overlay networks Makingrandom selections allows gossip to reduce the overhead ofmaintaining connectivity among peers and moreover re-duces the attack surface Gossip works well in the permis-sioned environment where it can withstand sybil attacks andmessage forgery

The communication layer for gossip is based on gRPCand utilizes TLS with mutual authentication which enableseach side to bind the TLS credentials to the identity of theremote peer The gossip component maintains an up-to-datemembership view of the online peers in the system All peersindependently build a local view from periodically dissem-inated membership data Furthermore a peer can reconnectto the view after a crash or a network outage

The main purpose of gossip is to reliably distribute mes-sages ie blocks from the ordering phase among the peerstaking advantage of a push-pull protocol It uses two phasesfor information dissemination during push each peer se-lects a random set of active neighbors from the membershipview and forwards them the message during pull each peerperiodically probes a set of randomly selected peers and re-

9 201821

quests missing messages It has been shown [15 22] thatusing both methods in tandem is crucial to optimally utilizethe available bandwidth and to ensure that all peers receiveall messages with high probability

In order to reduce the load of sending blocks from the or-dering nodes to the network the protocol also elects a leaderpeer that pulls blocks from the ordering service on their be-half and initiates the gossip distribution This mechanism isresilient to leader failures

Another task of gossip is to transfer the state to newlyjoining peers and peers that were disconnected for a longtime They need to receive all blocks in the chain This fea-ture relies on the fact that the largest block-sequence num-ber stored by each peer is disseminated with the membershipdata

44 LedgerThe ledger component at each peer maintains the ledgerand the blockchain state on persistent storage and enablessimulation validation and ledger-update phases Broadlyit consists of a block store and a peer transaction manager(PTM)

Ledger block store The ledger block store persists trans-action blocks and is implemented as a set of append-onlyfiles Since the blocks are immutable and arrive in a defi-nite order an append-only structure gives maximum perfor-mance In addition the block store maintains a few indicesfor random access to a block or to a transaction in a block

Peer transaction manager (PTM) The PTM maintainsthe latest state in a versioned key-value store It stores onetuple of the form (key val ver) for each unique entry keystored by any chaincode containing its most recently storedvalue val and its latest version ver The version consistsof the block sequence number and the sequence number ofthe transaction (that stores the entry) within the block Thismakes the version unique and monotonically increasing

The PTM uses a local key-value store to realize the ver-sioned key-value store implemented by a LevelDB key-value database implemented in Go (httpsgithubcomsyndtrgoleveldb) or Apache CouchDB (httpcouchdbapacheorg)

During simulation the PTM provides a stable snapshotof the latest state to the transaction As mentioned in Sec-tion 32 the PTM records in readset a tuple (key ver) foreach entry accessed by GetState and in writeset a tuple(key val) for each entry updated with PutState by the trans-action In addition the PTM supports range queries forwhich it computes a cryptographic hash of the query results(a set of tuples (key ver)) and adds the query string itself andthe hash to readset

For transaction validation (Sec 34) the PTM validatesall transactions in a block sequentially This checks whethera transaction conflicts with any preceding transaction (withinthe block or earlier) For any key in readset if the version

recorded in readset differs from the version present in thelatest state (assuming that all preceding valid transactionsare committed) then the PTM marks the transaction as in-valid For range queries the PTM re-executes the query andcompares the hash with the one present in readset to ensurethat no phantom reads occur This read-write conflict seman-tics results in one-copy serializability [23]

The ledger component tolerates a crash of the peer duringthe ledger update as follows After receiving a new block thePTM has already performed validation and marked transac-tions as valid or invalid within the block using a bit mask asmentioned in Section 34 The ledger now writes the blockto the ledger block store flushes it to disk and subsequentlyupdates the block store indices Then the PTM applies thestate changes from writeset of all valid transactions to the lo-cal versioned store Finally it computes and persists a valuesavepoint which denotes the largest successfully appliedblock number The value savepoint is used to recover theindices and the latest state from the persisted blocks whenrecovering from a crash

45 Chaincode ExecutionChaincode is executed within an environment that is looselycoupled with the rest of the peer and that supports pluginsfor adding new languages for programming chaincodes Cur-rently three languages are supported for chaincode Go Javaand Node

Every user-level or application chaincode runs in a sepa-rate process within a Docker container environment whichisolates the chaincodes from each other and from the peerThis also simplifies the management of the lifecycle forchaincodes (ie starting stopping or aborting chaincode)The chaincode and the peer communicate using gRPC mes-sages Through this loose coupling the peer is agnostic ofthe actual language in which chaincode is implemented

In contrast to application chaincode system chaincoderuns directly in the peer process System chaincode canimplement specific functions needed by Fabric and may beused in situations where the isolation among user chaincodesis overly restrictive More details on system chaincodes aregiven in the next section

46 Configuration and System ChaincodesFabricrsquos basic behavior is customized through channel con-figuration and through special chaincodes known as systemchaincodes

Channel configuration Recall that a channel forms onelogical blockchain The configuration of a channel is main-tained in metadata persisted in special configuration blocksEach configuration block contains the full channel config-uration and does not contain any other transactions Eachblockchain begins with a configuration block known as thegenesis block which is used to bootstrap the channel Thechannel configuration includes

10 201821

bull Definitions of the MSPs for the participating nodesbull The network addresses of the OSNsbull Shared configuration for the consensus implementation

and the ordering service such as batch size and timeoutsbull Rules governing access to the ordering service operations

(broadcast and deliver)bull Rules governing how each part the channel configuration

may be modified

The configuration of a channel may be updated using achannel configuration update transaction This transactioncontains a representation of the changes to be made to theconfiguration as well as a set of signatures The orderingservice nodes evaluate whether the update is valid by usingthe current configuration to verify that the modifications areauthorized using the signatures The orderers then generate anew configuration block which embeds the new configura-tion and the configuration update transaction Peers receiv-ing this block validate whether the configuration update isauthorized based on the current configuration if valid theyupdate their current configuration

System chaincodes The application chaincodes are de-ployed with a reference to an endorsement system chaincode(ESCC) and to a validation system chaincode (VSCC) Thesetwo chaincodes are selected in a symmetric way such thatthe output of the ESCC (an endorsement) may be validatedas part of the input to the VSCC

The ESCC takes as input a proposal and the proposalsimulation results If the results are satisfactory then theESCC produces a response containing the results and theendorsement For the default ESCC this endorsement issimply a signature by the peerrsquos local signing identity

The VSCC takes as input a transaction and outputswhether that transaction is valid For the default VSCCthe endorsements are collected and evaluated against theendorsement policy specified for the chaincode

Further system chaincodes implement other support func-tions such as configuration and chaincode lifecycle

5 EvaluationEven though Fabric is not yet performance-tuned and opti-mized we report in this section on some preliminary per-formance numbers Fabric is a complex distributed systemits performance depends on many parameters including thechoice of a distributed application and transaction size theordering service and consensus implementation and their pa-rameters the network parameters and topology of nodes inthe network the hardware on which nodes run the num-ber of nodes and channels further configuration parametersand the network dynamics Therefore in-depth performanceevaluation of Fabric is postponed to future work

In the absence of a standard benchmark for blockchainswe use the most prominent blockchain application for eval-uating Fabric a simple authority-minted cryptocurrency that

uses the data model of Bitcoin which we call Fabric coin(abbreviated hereafter as Fabcoin) This allows us to put theperformance of Fabric in the context of other permissionedblockchains which are often derived from Bitcoin or Ethe-reum For example it is also the application used in bench-marks of other permissioned blockchains [19 32]

In the following we first describe Fabcoin (Sec 51)which also demonstrates how to customize the validationphase and endorsement policy In Section 52 we present thebenchmark and discuss our results

51 Fabric Coin (Fabcoin)UTXO cryptocurrencies The data model introduced byBitcoin [28] has become known as ldquounspent transaction out-putrdquo or UTXO and is also used by many other cryptocur-rencies and distributed applications UTXO represents eachstep in the evolution of a data object as a separate atomicstate on the ledger Such a state is created by a transactionand destroyed (or ldquoconsumedrdquo) by another unique transac-tion occurring later Every given transaction destroys a num-ber of input states and creates one or more output states Aldquocoinrdquo in Bitcoin is initially created by a coinbase transac-tion that rewards the ldquominerrdquo of the block This appears onthe ledger as a coin state designating the miner as the ownerAny coin can be spent in the sense that the coin is assignedto a new owner by a transaction that atomically destroys thecurrent coin state designating the previous owner and createsanother coin state representing the new owner

We capture the UTXO model in the key-value store ofFabric as follows Each UTXO state corresponds to a uniqueKVS entry that is created once (the coin state is ldquounspentrdquo)and destroyed once (the coin state is ldquospentrdquo) Equivalentlyevery state may be seen as a KVS entry with logical ver-sion 0 after creation when it is destroyed again it receivesversion 1 There should not be any concurrent updates tosuch entries (eg attempting to update a coin state in differ-ent ways amounts to double-spending the coin)

Value in the UTXO model is transferred through transac-tions that refer to several input states that all belong to theentity issuing the transaction An entity owns a state becausethe public key of the entity is contained in the state itselfEvery transaction creates one or more output states in theKVS representing the new owners deletes the input states inthe KVS and ensures that the sum of the values in the inputstates equals the sum of the output statesrsquo values There isalso a policy determining how value is created (eg coin-base transactions in Bitcoin or specific mint operations inother systems) or destroyed (ie as a fee consumed by theexecution)

Fabcoin implementation Each state in Fabcoin is a tupleof the form (key val) = (txidj (amount owner label)) de-noting the coin state created as the j-th output of a transac-tion with identifier txid and allocating amount units labeledwith label to the entity whose public key is owner Labels are

11 201821

strings used to identify a given type of a coin (eg lsquoUSDlsquolsquoEURlsquo lsquoFBClsquo) Transaction identifiers are short values thatuniquely identify every Fabric transaction The Fabcoin im-plementation consists of three parts (1) a client wallet (2)the Fabcoin chaincode and (3) a custom VSCC for Fabcoinimplementing its endorsement policy

Client wallet By default each Fabric client maintains aFabcoin wallet that locally stores a set of cryptographic keysallowing the client to spend coins For creating a SPENDtransaction that transfers one or more coins the client walletcreates a Fabcoin request request = (inputs outputs sigs)containing (1) a list of input coin states inputs = [in ]that specify coin states (in ) the client wishes to spend aswell as (2) a list of output coin states outputs = [(amount ownerlabel) ] The client wallet signs with the private keys thatcorrespond to the input coin states the concatenation of theFabcoin request and a nonce which is a part of every Fabrictransaction and adds the signatures in a set sigs A SPENDtransaction is valid when the sum of the amounts in the in-put coin states is at least the sum of the amounts in the out-puts and when the output amounts are positive For a MINTtransaction that creates new coins inputs contains only anidentifier (ie a reference to a public key) of a special en-tity called Central Bank (CB) whereas outputs contains anarbitrary number of coin states To be considered valid thesignatures of a MINT transaction in sigs must be a crypto-graphic signature under the public key of CB over the con-catenation of the Fabcoin request and of the aforementionednonce Fabcoin may be configured to use multiple CBs orspecify a threshold number of signatures from a set of CBsFinally the client wallet includes the Fabcoin request into atransaction and sends this to a peer of its choice

Fabcoin chaincode A peer runs the chaincode of Fab-coin which simulates the transaction and creates readsets andwritesets In a nutshell in the case of a SPEND transactionfor every input coin state in isin inputs the chaincode first per-forms GetState(in) this puts in into the readset along with itscurrent version in Fabricrsquos versioned KVS (Sec 44) Thenthe chaincode executes DelState(in) for every input state inwhich also adds in to the writeset and effectively marks thecoin state as ldquospentrdquo Finally for j = 1 |outputs| thechaincode executes PutState(txidj out) with the j-th out-put out = (amount owner label) in outputs In addition apeer optionally runs the transaction validation code as de-scribed next in the VSCC step for Fabcoin this is not neces-sary since the custom VSCC actually validates transactionsbut it allows the (correct) peers to filter out potentially mal-formed transactions In our implementation the chaincoderuns the Fabcoin VSCC without cryptographically verifyingthe signatures

Custom VSCC Finally every peer validates Fabcoin trans-actions using custom VSCC This verifies first the cryp-tographic signature(s) in sigs under the respective public

key(s) and performs semantic validation as follows For aMINT transaction it checks that the output states are createdunder the matching transaction identifier (txid) and that alloutput amounts are positive For a SPEND transaction theVSCC additionally verifies (1) that for all input coin statesan entry in the readset has been created and that it was alsoadded to the writeset and marked as deleted (2) that the sumof the amounts for all input coin states equals the sum ofamounts of all output coin states and (3) that input and out-put coin labels match Here the VSCC obtains the amountsfor the input coins by retrieving their current values from theledger

Notice that the Fabcoin VSCC does not check transac-tions for double spending as this occurs through Fabricrsquosstandard validation that runs after the custom VSCC In par-ticular if two transactions attempt to assign the same un-spent coin state to a new owner both would pass the VSCClogic but would be caught subsequently in the read-writeconflict check performed by the PTM According to Sec-tions 34 and 44 the PTM verifies that the current versionnumber stored in the ledger matches the one in the readsethence after the first transaction has changed the version ofthe coin state the transaction ordered second will be recog-nized as invalid

52 ExperimentsSetup Unless explicitly mentioned differently in our ex-periments (1) nodes run on Fabric version v11-preview2

instrumented for performance evaluation through local log-ging (2) nodes are hosted in a single IBM Cloud (SoftLayer)datacenter as dedicated VMs interconnected with 1Gbps net-working (3) all nodes are 20 GHz 16-vCPU VMs runningUbuntu with 8GB of RAM and SSDs as local disks (4) asingle-channel ordering service runs a typical Kafka orderersetup with 3 ZooKeeper nodes 4 Kafka brokers and 3 Fab-ric orderers all on distinct VMs (5) there are 5 peers in to-tal all of which are Fabcoin endorsers and (6) signaturesuse the default 256-bit ECDSA scheme In order to measureand stage latencies in the transaction flow spanning multiplenodes the node clocks are synchronized with an NTP ser-vice throughout the experiments All communication amongFabric nodes is configured to use TLS

Methodology In every experiment in the first phase weinvoke transactions that contain only Fabcoin MINT opera-tions to produce the coins and then run a second phase ofthe experiment in which we invoke Fabcoin SPEND opera-tion on previously minted coins (effectively running single-input single-output SPEND transactions) When reportingthroughput measurements we use an increasing number ofFabric CLI clients (modified to issue concurrent requests)running on a single VM until the end-to-end throughputis saturated and state the throughput just before saturationThroughput numbers are reported as an average throughput

2Patched with commit ID 9e770062 in the Fabric master branch

12 201821

measured throughout the duration of an experiment disre-garding the ldquotailrdquo of the experiment where throughput dropsdue to some client threads finishing submitting their share oftransactions In every experiment client threads submit atleast 500k MINT and SPEND transactions

Choosing the block size A critical Fabric configurationparameter that impacts both throughput and latency is blocksize To fix the block size for subsequent experiments andto evaluate the impact of block size on performance we ranexperiments varying block size from 05MB to 4MB Resultsare depicted in Fig 6 showing peak throughput measured atthe peers along with the corresponding average end-to-end(e2e) latency

Figure 6 Impact of block size on throughput and latency

We can observe that throughput does not significantlyimprove beyond a block size of 2MB but latency gets worse(as expected) Therefore we adopt 2MB as the block sizefor the following experiments with the goal of maximizingthe measured throughput assuming the end-to-end latencyof roughly 500ms is acceptable

Size of transactions During this experiment we also ob-served the size MINT and SPEND transactions In particularthe 2MB-blocks contained 473 MINT or 670 SPEND transac-tions ie the average transaction size is 306kB for SPENDand 433kB for MINT In general transactions in Fabric arelarge because they carry certificate information BesidesMINT transactions of Fabcoin are larger than SPEND trans-actions because they carry CB certificates This is an avenuefor future improvement of both Fabric and Fabcoin

Impact of peer CPU Fabric peers run many CPU-intensivecryptographic operations To estimate the impact of CPU onthroughput we performed a set of experiments in which 4peers run on 4 8 16 and 32 vCPU VMs while also doingcoarse-grained latency staging of block validation to betteridentify bottlenecks Our experiment focused on the vali-dation phase as ordering with Kafka ordering service hasnever been a bottleneck in our experiments The validation

phase and in particular the VSCC validation of Fabcoinis computationally intensive due to many digital signatureverifications performed in this phase

Figure 7 Impact of peer CPU on end-to-end throughputand block validation latency

The results with 2MB blocks are shown in Fig 7 SPENDlatency is not shown in Fig 7 as it scales similarly to MINTWe can observe that Fabcoin VSCC validation scales quasi-linearly with CPU as Fabric VSCC validation is embarrass-ingly parallel However the read-write-check and ledger-access stages are sequential and become dominant with alarger number of cores (vCPUs) This suggests that futureversions of Fabric could profit from pipelining validationstages (which are now sequential) optimizing stable-storageaccess and parallelizing read-write dependency checks

Finally in this experiment we measured over 3560 trans-actions per second (tps) average SPEND throughput at the32-vCPU peer The MINT throughput is in general slightlylower than that of SPEND but the difference remains within10

Latency profiling by stages We further performed coarse-grained profiling of latency during our previous experimentat the peak reported throughput Results are depicted inTable 1 The ordering phase comprises broadcast-deliverlatency and internal latency within a peer before valida-tion starts The table reports average latencies for MINTand SPEND standard deviation and tail latencies (99 and999)

We can observe that ordering dominates the overall la-tency We observe that average latencies are below 550mswith sub-second tail latencies In particular the highestend-to-end latencies in our experiment come from the firstblocks during the load build-up Latency under lower loadcan be regulated and reduced by leveraging the time-to-cutparameter of the orderer (see Sec 33) which we basicallydo not use in our experiments as we set it to a large value

SSD vs RAM disk To evaluate the overhead of using lo-cal SSDs for stable storage we repeated the previous exper-iment with RAM disks (tmpfs) mounted as stable storage at

13 201821

all peer VMs The benefits are limited as tmpfs only helpswith the ledger phase of the validation at the peer We mea-sured sustained peak throughput at 3870 SPEND tps at 32-vCPU peer roughly a 9 improvement over SSD

avg stdev 99 999(1) endorsement 56 75 24 42 15 21 19 26(2) ordering 248 365 600 920 484 624 523 636(3) VSCC val 310 353 102 90 727 570 113 1084(4) RW check 348 615 39 93 470 885 590 933(5) ledger 506 722 62 88 701 975 725 105(6) validation (3+4+5) 116 169 128 178 156 216 199 230(7) end-to-end (1+2+6) 371 542 63 94 612 805 646 813

Table 1 Latency statistics in milliseconds (ms) for MINTand SPEND broken down into five stages at a 32-vCPUpeer with 2MB blocks Validation (6) comprises stages 34 and 5 the end-to-end latency contains stages 1ndash5

6 Related WorkThe architecture of Fabric resembles that of a middleware-replicated database as pioneered by Kemme and Alonso [24]However all existing work on this addressed only crash fail-ures not the setting of distributed trust that corresponds to aBFT system For instance a replicated database with asym-metric update processing [25 Sec 63] relies on one node toexecute each transaction which would not work on a block-chain The execute-order-validate architecture of Fabric canbe interpreted as a generalization of this work to the Byzan-tine model with practical applications to distributed ledgers

Byzantium [17] and HRDB [35] are two further prede-cessors of Fabric from the viewpoint of BFT database repli-cation Byzantium allows transactions to run in parallel anduses active replication but totally orders BEGIN and COM-MITROLLBACK using a BFT middleware In its opti-mistic mode every operation is coordinated by a single mas-ter replica if the master is suspected to be Byzantine allreplicas execute the transaction operations for the master andit triggers a costly protocol to change the master HRDB re-lies in an even stronger way on a correct master In contrastto Fabric both systems use active replication cannot handlea flexible trust model and rely on deterministic operationsby the replicas However their database API is richer thanthe KVS model of Fabric

In Eve [21] a related architecture for SMR has been ex-plored also in the BFT model Its peers execute transactionsconcurrently and then verify that they all reach the same out-put state using a consensus protocol If the states divergethey roll back and execute operations sequentially Eve con-tains the element of independent execution which also existsin Fabric but offers none of its other features

A large number of distributed ledger platforms in thepermissioned model have come out recently which makesit impossible to compare to all (some prominent ones areTendermint [5] Quorum [19] Chain Core [12] Multi-chain (httpswwwmultichaincom) HyperledgerSawtooth (httpssawtoothhyperledgerorg) the

Volt proposal [32] and more see references in recent over-views [9 16]) All platforms follow the order-execute archi-tecture as discussed in Section 2 As a representative exam-ple take the Quorum platform [19] an enterprise-focusedversion of Ethereum With its consensus based on Raft [29]it disseminates a transaction to all peers using gossip and theRaft leader (called minter) assembles valid transactions to ablock and distributes this using Raft All peers execute thetransaction in the order decided by the leader Therefore itsuffers from the limitations mentioned in Sections 1ndash2

References[1] P-L Aublin R Guerraoui N Knezevic V Quema and

M Vukolic The next 700 BFT protocols ACM Trans Com-put Syst 32(4)121ndash1245 Jan 2015

[2] E Ben-Sasson A Chiesa C Garman M Green I MiersE Tromer and M Virza Zerocash Decentralized anonymouspayments from bitcoin In IEEE Symposium on Security ampPrivacy pages 459ndash474 2014

[3] A N Bessani J Sousa and E A P Alchieri State ma-chine replication for the masses with BFT-SMART In In-ternational Conference on Dependable Systems and Networks(DSN) pages 355ndash362 2014

[4] G Bracha and S Toueg Asynchronous consensus and broad-cast protocols J ACM 32(4)824ndash840 1985

[5] E Buchman Tendermint Byzantine fault tolerance in the ageof blockchains MSc Thesis University of Guelph CanadaJune 2016

[6] N Budhiraja K Marzullo F B Schneider and S Toueg Theprimary-backup approach In S Mullender editor DistributedSystems (2nd Ed) pages 199ndash216 ACM PressAddison-Wesley 1993

[7] C Cachin R Guerraoui and L E T Rodrigues Introductionto Reliable and Secure Distributed Programming (2 ed)Springer 2011

[8] C Cachin S Schubert and M Vukolic Non-determinismin byzantine fault-tolerant replication In 20th InternationalConference on Principles of Distributed Systems (OPODIS)2016

[9] C Cachin and M Vukolic Blockchain consensus protocolsin the wild In A W Richa editor 31st Intl Symposium onDistributed Computing (DISC 2017) pages 11ndash116 2017

[10] J Camenisch and E V Herreweghen Design and imple-mentation of the idemix anonymous credential system InACM Conference on Computer and Communications Security(CCS) pages 21ndash30 2002

[11] M Castro and B Liskov Practical Byzantine fault toler-ance and proactive recovery ACM Trans Comput Syst20(4)398ndash461 Nov 2002

[12] Chain Chain protocol whitepaper httpschaincom

docs12protocolpaperswhitepaper 2017

[13] B Charron-Bost F Pedone and A Schiper editors Replica-tion Theory and Practice volume 5959 of Lecture Notes inComputer Science Springer 2010

14 201821

[14] K Croman C Decker I Eyal A E Gencer A JuelsA Kosba A Miller P Saxena E Shi E G Sirer et alOn scaling decentralized blockchains In International Con-ference on Financial Cryptography and Data Security (FC)pages 106ndash125 Springer 2016

[15] A Demers D Greene C Hauser W Irish J LarsonS Shenker H Sturgis D Swinehart and D Terry Epidemicalgorithms for replicated database maintenance In ACMSymposium on Principles of Distributed Computing (PODC)pages 1ndash12 ACM 1987

[16] T T A Dinh R Liu M Zhang G Chen B C Ooi andJ Wang Untangling blockchain A data processing viewof blockchain systems e-print arXiv170805665 [csDB]2017

[17] R Garcia R Rodrigues and N M Preguica Efficient mid-dleware for Byzantine fault tolerant database replication InEuropean Conference on Computer Systems (EuroSys) pages107ndash122 2011

[18] R Guerraoui R R Levy B Pochon and V Quema Through-put optimal total order broadcast for cluster environmentsACM Trans Comput Syst 28(2)51ndash532 2010

[19] J P Morgan Quorum whitepaper httpsgithubcom

jpmorganchasequorum-docs 2016

[20] F P Junqueira B C Reed and M Serafini Zab High-performance broadcast for primary-backup systems In In-ternational Conference on Dependable Systems amp Networks(DSN) pages 245ndash256 2011

[21] M Kapritsos Y Wang V Quema A Clement L Alvisiand M Dahlin All about Eve Execute-verify replicationfor multi-core servers In Symposium on Operating SystemsDesign and Implementation (OSDI) pages 237ndash250 2012

[22] R Karp C Schindelhauer S Shenker and B Vocking Ran-domized rumor spreading In Symposium on Foundations ofComputer Science (FOCS) pages 565ndash574 IEEE 2000

[23] B Kemme One-copy-serializability In Encyclopedia ofDatabase Systems pages 1947ndash1948 Springer 2009

[24] B Kemme and G Alonso A new approach to developingand implementing eager database replication protocols ACMTransactions on Database Systems 25(3)333ndash379 2000

[25] B Kemme R Jimenez-Peris and M Patino-MartınezDatabase Replication Synthesis Lectures on Data Manage-ment Morgan amp Claypool Publishers 2010

[26] A E Kosba A Miller E Shi Z Wen and C PapamanthouHawk The blockchain model of cryptography and privacy-preserving smart contracts In 37th IEEE Symposium onSecurity amp Privacy 2016

[27] S Liu P Viotti C Cachin V Quema and M Vukolic XFTpractical fault tolerance beyond crashes In Symposium onOperating Systems Design and Implementation (OSDI) pages485ndash500 2016

[28] S Nakamoto Bitcoin A peer-to-peer electronic cash systemhttpwwwbitcoinorgbitcoinpdf 2009

[29] D Ongaro and J Ousterhout In search of an understandableconsensus algorithm In USENIX Annual Technical Confer-ence (ATC) pages 305ndash320 2014

[30] F Pedone and A Schiper Handling message semanticswith Generic Broadcast protocols Distributed Computing15(2)97ndash107 2002

[31] F B Schneider Implementing fault-tolerant services usingthe state machine approach A tutorial ACM Comput Surv22(4)299ndash319 1990

[32] S Setty S Basu L Zhou M L Roberts and R VenkatesanEnabling secure and resource-efficient blockchain networkswith VOLT Technical Report MSR-TR-2017-38 MicrosoftResearch 2017

[33] A Singh T Das P Maniatis P Druschel and T Roscoe BFTprotocols under fire In Symposium on Networked SystemsDesign amp Implementation (NSDI) pages 189ndash204 2008

[34] J Sousa A Bessani and M Vukolic A Byzantine fault-tolerant ordering service for the Hyperledger Fabric block-chain platform Technical Report arXiv170906921 CoRR2017

[35] B Vandiver H Balakrishnan B Liskov and S MaddenTolerating Byzantine faults in transaction processing systemsusing commit barrier scheduling In ACM Symposium onOperating Systems Principles (SOSP) pages 59ndash72 2007

[36] M Vukolic The quest for scalable blockchain fabric Proof-of-work vs BFT replication In International Workshop onOpen Problems in Network Security (iNetSec) pages 112ndash125 2015

[37] J Yin J Martin A Venkataramani L Alvisi and M DahlinSeparating agreement from execution for Byzantine fault tol-erant services In ACM Symposium on Operating SystemsPrinciples (SOSP) pages 253ndash267 2003

15 201821

  • 1 Introduction
  • 2 Background
    • 21 Order-Execute Architecture for Blockchains
    • 22 Limitations of Order-Execute
    • 23 Further Limitations of Existing Architectures
    • 24 Experience with Order-Execute Blockchain
      • 3 Architecture
        • 31 Fabric Overview
        • 32 Execution Phase
        • 33 Ordering Phase
        • 34 Validation Phase
          • 4 Fabric Components
            • 41 Membership Service
            • 42 Ordering Service
            • 43 Peer Gossip
            • 44 Ledger
            • 45 Chaincode Execution
            • 46 Configuration and System Chaincodes
              • 5 Evaluation
                • 51 Fabric Coin (Fabcoin)
                • 52 Experiments
                  • 6 Related Work

Discussion on design choices As the endorsers simulatethe proposal without synchronizing with other endorserstwo endorsers may execute it on different states of the ledgerand produce different outputs For the standard endorse-ment policy which requires multiple endorsers to producethe same result this implies that under high contention ofoperations accessing the same keys a client may not beable to satisfy the endorsement policy This is a new ele-ment compared to primary-backup replication in replicateddatabases with synchronization through middleware [24] aconsequence of the assumption that no single peer is trustedfor correct execution in a blockchain

We consciously adopted this design as it considerablysimplifies the architecture and is adequate for typical block-chain applications As demonstrated by the approach of Bit-coin distributed applications can be formulated such thatcontention by operations accessing the same state can be re-duced or eliminated completely in the normal case (eg inBitcoin two operations that modify the same ldquoobjectrdquo arenot allowed and represent a double-spending attack [28])

Executing a transaction before the ordering phase is crit-ical to tolerating non-deterministic chaincodes as discussedin Section 2 A chaincode in Fabric with non-deterministictransactions can only endanger the liveness of its own oper-ations because a client cannot gather a sufficient number ofendorsements for instance This is much more acceptable inpractice than the situation in an order-execute architecturewhere non-deterministic operations lead to inconsistenciesin the state of the peers

Finally tolerating non-deterministic execution also ad-dresses DoS attacks from untrusted chaincode as an endorsercan simply abort an execution according to a local policy ifit suspects a DoS attack This will not endanger the consis-tency of the system and again such unilateral abortion ofexecution is not possible in order-execute architectures

33 Ordering PhaseWhen a client has collected enough endorsements on a pro-posal it assembles a transaction and submits this to the or-dering service The transaction contains the transaction pay-load (ie the chaincode operation including parameters)transaction metadata and a set of endorsements The order-ing phase establishes a total order on all submitted transac-tions per channel In other words ordering atomically broad-casts [7] endorsements and thereby establishes consensus ontransactions despite faulty orderers Moreover the orderingservice batches multiple transactions into blocks and outputsa hash-chained sequence of blocks containing transactionsGrouping or batching transactions into blocks improves thethroughput of the broadcast protocol which is a well-knowntechnique in the context of fault-tolerant broadcasts

At a high level the interface of the ordering service onlysupports the following two operations These operations areinvoked by a peer and implicitly parameterized by a channelidentifier

bull broadcast(tx) A client calls this operation to broadcastan arbitrary transaction tx which usually contains thetransaction payload and a signature of the client fordissemination

bull B larr deliver(s) A client calls this to retrieve block Bwith non-negative sequence number s The block con-tains a list of transactions [tx1 txk] and a hash-chain value h representing the block with sequence num-ber sminus1 ie B = ([tx1 txk] h) As the client maycall this multiple times and always returns the same blockonce it is available we say the peer delivers block B withsequence number s when it receives B for the first timeupon invoking deliver(s)

The ordering service ensures that the delivered blocks onone channel are totally ordered More specifically orderingensures the following safety properties for each channel

Agreement For any two blocks B delivered with sequencenumber s and Bprime delivered with sprime at correct peers suchthat s = sprime it holds B = Bprime

Hashchain integrity If some correct peer delivers a block Bwith number s and another correct peer delivers blockBprime = ([tx1 txk] h

prime) with number s + 1 then itholds hprime = H(B) where H(middot) denotes the cryptographichash function

No skipping If a correct peer p delivers a block with num-ber s gt 0 then for each i = 0 s minus 1 peer p hasalready delivered a block with number i

No creation When a correct peer delivers block B withnumber s then for every tx isin B some client has alreadybroadcast tx

For liveness the ordering service supports at least thefollowing ldquoeventualrdquo property

Validity If a correct client invokes broadcast(tx) then ev-ery correct peer eventually delivers a block B that in-cludes tx with some sequence number

However every individual ordering implementation is al-lowed to come with its own liveness and fairness guaranteeswith respect to client requests

Since there may be a large number of peers in the block-chain network but only relatively few nodes are expectedto implement the ordering service Fabric can be config-ured to use a built-in gossip service for disseminating deliv-ered blocks from the ordering service to all peers (Sec 43)The implementation of gossip is scalable and agnostic to theparticular implementation of the ordering service hence itworks with both CFT and BFT ordering services ensuringthe modularity of Fabric

The ordering service may also perform access controlchecks to see if a client is allowed to broadcast messagesor receive blocks on a given channel This and other featuresof the ordering service are further explained in Section 42

7 201821

Discussion on design choices It is very important that theordering service does not maintain any state of the block-chain and neither validates nor executes transactions Thisarchitecture is a crucial defining feature of Fabric andmakes Fabric the first blockchain system to totally sepa-rate consensus from execution and validation This makesconsensus as modular as possible and enables an ecosystemof consensus protocols implementing the ordering service

34 Validation PhaseBlocks are delivered to peers either via a direct connectionto the ordering service or through gossip When a new blockarrives it enters the validation phase consisting of threesequential steps

1 The endorsement policy evaluation occurs in parallelfor all transactions within the block The evaluation isthe task of the so-called validation system chaincode(VSCC) a static library that is part of the blockchainrsquosconfiguration and is responsible for validating the en-dorsement with respect to the endorsement policy config-ured for the chaincode (see Sec 46) If the endorsementis not satisfied the transaction is marked as invalid andits effects are disregarded

2 A read-write conflict check is done for all transactions inthe block sequentially For each transaction it comparesthe versions of the keys in the readset field to those in thecurrent state of the ledger as stored locally by the peerand ensures they are still the same If the versions do notmatch the transaction is marked as invalid and its effectsare disregarded

3 The ledger update phase runs last in which the block isappended to the locally stored ledger and the blockchainstate is updated In particular when adding the blockto the ledger the results of the validity checks in thefirst two steps are persisted as well in the form of a bitmask denoting the transactions that are valid within theblock This facilitates the reconstruction of the state at alater time Furthermore all state updates are applied bywriting all key-value pairs in writeset to the local state

The default VSCC in Fabric allows monotone logical ex-pressions over the set of endorsers configured for a chain-code to be expressed The VSCC evaluation verifies that theset of peers as expressed through valid signatures on en-dorsements of the transaction satisfy the expression Differ-ent VSCC policies can be configured statically however

Discussion on design choices The ledger of Fabric con-tains all transactions including those that are deemed in-valid This follows from the overall design because orderingservice which is agnostic to chaincode state produces thechain of the blocks and because the validation is done by thepeers post-consensus This feature is needed in certain usecases that require tracking of invalid transactions during sub-sequent audits and stands in contrast to other blockchains

Committer

Peer

Endorser

Ledger

Gossip

KVS

Validation configuration sys chaincode

Block store peer trans manager (PTM)

LevelDB CouchDB

Peer-to-peer block gossip

Chaincode execution containers

Figure 5 Components of a Fabric peer

(eg Bitcoin and Ethereum) where the ledger contains onlyvalid transactions

4 Fabric ComponentsFabric is written in Go and uses the gRPC framework(httpgrpcio) for communication between clientspeers and orderers In the following we describe some im-portant components in more detail Figure 5 shows the com-ponents of a peer

41 Membership ServiceThe membership service provider (MSP) maintains the iden-tities of all nodes in the system (clients peers and order-ers) and is responsible for issuing node credentials that areused for authentication and authorization Since Fabric ispermissioned all interactions among nodes occur throughmessages that are authenticated typically with digital sig-natures The membership service comprises a componentat each node where it may authenticate transactions ver-ify the integrity of transactions sign endorsements validateendorsements and authenticate other blockchain operationsTools for key management and registration of nodes are alsopart of the MSP

The MSP is an abstraction for which different instantia-tions are possible The default MSP implementation in Fab-ric handles standard PKI methods for authentication basedon digital signatures and can accommodate commercial cer-tification authorities (CAs) A stand-alone CA is provided aswell with Fabric called Fabric-CA Furthermore alternativeMSP implementations are envisaged such as one relying onanonymous credentials for authorizing a client to invoke atransaction without linking this to an identity [10]

Fabric allows two modes for setting up a blockchain net-work In offline mode credentials are generated by a CAand distributed out-of-band to all nodes Peers and order-ers can only be registered in offline mode For enrollingclients Fabric-CA provides an online mode that issues cryp-tographic credentials to them The MSP configuration mustensure that all nodes especially all peers recognize the sameidentities and authentications as valid

8 201821

The MSP permits identity federation for example whenmultiple organizations operate a blockchain network Eachorganization issues identities to its own members and everypeer recognizes members of all organizations This can beachieved with multiple MSP instantiations for example bycreating a mapping between each organization and an MSP

42 Ordering ServiceThe ordering service manages multiple channels On everychannel it provides the following services

1 Atomic broadcast for establishing order on transactionsimplementing the broadcast and deliver calls (Sec 33)

2 Reconfiguration of a channel when its members mod-ify the channel by broadcasting a configuration updatetransaction (Sec 46)

3 Optionally access control in those configurations wherethe ordering service acts as a trusted entity restrictingbroadcasting of transactions and receiving of blocks tospecified clients and peers

The ordering service is bootstrapped with a genesis blockon the system channel This block carries a configurationtransaction that defines the properties of the ordering ser-vice

The current production implementation consists of or-dering-service nodes (OSNs) that implement the opera-tions described here and communicate through the sys-tem channel The actual atomic broadcast function is pro-vided by an instance of Apache Kafka (httpkafkaapacheorg) which offers scalable publish-subscribe mes-saging and strong consistency despite node crashes basedon ZooKeeper Kafka may run on physical nodes separatefrom the OSNs The OSNs act as proxies between the peersand Kafka

An OSN directly injects a newly received transaction tothe atomic broadcast (eg to the Kafka broker) On the otherhand the nodes batch transactions received from the atomicbroadcast and form blocks A block is cut as soon as one ofthree conditions is met (1) the block contains the specifiedmaximal number of transactions (2) the block has reached amaximal size (in bytes) or (3) an amount of time has elapsedsince the first transaction of a new block was received asexplained below

This batching process is deterministic and therefore pro-duces the same blocks at all nodes It is easy to see thatthe first two conditions are trivially deterministic given thestream of transactions received from the atomic broadcastTo ensure deterministic block production in the third casea node starts a timer when it reads the first transaction ina block from the atomic broadcast If the block is not yetcut when the timer expires the node broadcasts a specialtime-to-cut transaction on the channel which indicates thesequence number of the block which it intends to cut On theother hand every node immediately cuts a new block upon

receiving the first time-to-cut transaction for the given blocknumber Since this transaction is atomically delivered to allconnected nodes they all include the same list of transac-tions in the block (To deploy this scheme in the presence off Byzantine-faulty OSNs the block is cut only when receiv-ing f + 1 time-to-cut transactions)

The orderers persist a range of the most recently deliveredblocks directly to their filesystem so they can answer topeers retrieving blocks through deliver

The orderer using Kafka is one of three ordering serviceimplementations currently available A centralized orderercalled Solo runs on one node and is used for development Aproof-of-concept orderer based on BFT-SMaRt [3] has alsobeen made available [34] it ensures the atomic broadcastservice but not yet reconfiguration and access control Thisillustrates the modularity of consensus in Fabric

43 Peer GossipOne advantage of separating the execution ordering andvalidation phases is that they can be scaled independentlyHowever since most consensus algorithms (in the CFT andBFT models) are bandwidth-bound the throughput of the or-dering service is capped by the network capacity of its nodesConsensus cannot be scaled up by adding more nodes [1436] rather throughput will decrease However since order-ing and validation are decoupled we are interested in ef-ficiently broadcasting the execution results to all peers forvalidation after the ordering phase This is precisely thegoal of the gossip component which utilizes epidemic mul-ticast [15] for this purpose The blocks are signed by theordering service This means that a peer can upon receiv-ing all blocks independently assemble the blockchain andverify its integrity

Dissemination through gossip is robust and resistant tothe node failures in contrast to overlay networks Makingrandom selections allows gossip to reduce the overhead ofmaintaining connectivity among peers and moreover re-duces the attack surface Gossip works well in the permis-sioned environment where it can withstand sybil attacks andmessage forgery

The communication layer for gossip is based on gRPCand utilizes TLS with mutual authentication which enableseach side to bind the TLS credentials to the identity of theremote peer The gossip component maintains an up-to-datemembership view of the online peers in the system All peersindependently build a local view from periodically dissem-inated membership data Furthermore a peer can reconnectto the view after a crash or a network outage

The main purpose of gossip is to reliably distribute mes-sages ie blocks from the ordering phase among the peerstaking advantage of a push-pull protocol It uses two phasesfor information dissemination during push each peer se-lects a random set of active neighbors from the membershipview and forwards them the message during pull each peerperiodically probes a set of randomly selected peers and re-

9 201821

quests missing messages It has been shown [15 22] thatusing both methods in tandem is crucial to optimally utilizethe available bandwidth and to ensure that all peers receiveall messages with high probability

In order to reduce the load of sending blocks from the or-dering nodes to the network the protocol also elects a leaderpeer that pulls blocks from the ordering service on their be-half and initiates the gossip distribution This mechanism isresilient to leader failures

Another task of gossip is to transfer the state to newlyjoining peers and peers that were disconnected for a longtime They need to receive all blocks in the chain This fea-ture relies on the fact that the largest block-sequence num-ber stored by each peer is disseminated with the membershipdata

44 LedgerThe ledger component at each peer maintains the ledgerand the blockchain state on persistent storage and enablessimulation validation and ledger-update phases Broadlyit consists of a block store and a peer transaction manager(PTM)

Ledger block store The ledger block store persists trans-action blocks and is implemented as a set of append-onlyfiles Since the blocks are immutable and arrive in a defi-nite order an append-only structure gives maximum perfor-mance In addition the block store maintains a few indicesfor random access to a block or to a transaction in a block

Peer transaction manager (PTM) The PTM maintainsthe latest state in a versioned key-value store It stores onetuple of the form (key val ver) for each unique entry keystored by any chaincode containing its most recently storedvalue val and its latest version ver The version consistsof the block sequence number and the sequence number ofthe transaction (that stores the entry) within the block Thismakes the version unique and monotonically increasing

The PTM uses a local key-value store to realize the ver-sioned key-value store implemented by a LevelDB key-value database implemented in Go (httpsgithubcomsyndtrgoleveldb) or Apache CouchDB (httpcouchdbapacheorg)

During simulation the PTM provides a stable snapshotof the latest state to the transaction As mentioned in Sec-tion 32 the PTM records in readset a tuple (key ver) foreach entry accessed by GetState and in writeset a tuple(key val) for each entry updated with PutState by the trans-action In addition the PTM supports range queries forwhich it computes a cryptographic hash of the query results(a set of tuples (key ver)) and adds the query string itself andthe hash to readset

For transaction validation (Sec 34) the PTM validatesall transactions in a block sequentially This checks whethera transaction conflicts with any preceding transaction (withinthe block or earlier) For any key in readset if the version

recorded in readset differs from the version present in thelatest state (assuming that all preceding valid transactionsare committed) then the PTM marks the transaction as in-valid For range queries the PTM re-executes the query andcompares the hash with the one present in readset to ensurethat no phantom reads occur This read-write conflict seman-tics results in one-copy serializability [23]

The ledger component tolerates a crash of the peer duringthe ledger update as follows After receiving a new block thePTM has already performed validation and marked transac-tions as valid or invalid within the block using a bit mask asmentioned in Section 34 The ledger now writes the blockto the ledger block store flushes it to disk and subsequentlyupdates the block store indices Then the PTM applies thestate changes from writeset of all valid transactions to the lo-cal versioned store Finally it computes and persists a valuesavepoint which denotes the largest successfully appliedblock number The value savepoint is used to recover theindices and the latest state from the persisted blocks whenrecovering from a crash

45 Chaincode ExecutionChaincode is executed within an environment that is looselycoupled with the rest of the peer and that supports pluginsfor adding new languages for programming chaincodes Cur-rently three languages are supported for chaincode Go Javaand Node

Every user-level or application chaincode runs in a sepa-rate process within a Docker container environment whichisolates the chaincodes from each other and from the peerThis also simplifies the management of the lifecycle forchaincodes (ie starting stopping or aborting chaincode)The chaincode and the peer communicate using gRPC mes-sages Through this loose coupling the peer is agnostic ofthe actual language in which chaincode is implemented

In contrast to application chaincode system chaincoderuns directly in the peer process System chaincode canimplement specific functions needed by Fabric and may beused in situations where the isolation among user chaincodesis overly restrictive More details on system chaincodes aregiven in the next section

46 Configuration and System ChaincodesFabricrsquos basic behavior is customized through channel con-figuration and through special chaincodes known as systemchaincodes

Channel configuration Recall that a channel forms onelogical blockchain The configuration of a channel is main-tained in metadata persisted in special configuration blocksEach configuration block contains the full channel config-uration and does not contain any other transactions Eachblockchain begins with a configuration block known as thegenesis block which is used to bootstrap the channel Thechannel configuration includes

10 201821

bull Definitions of the MSPs for the participating nodesbull The network addresses of the OSNsbull Shared configuration for the consensus implementation

and the ordering service such as batch size and timeoutsbull Rules governing access to the ordering service operations

(broadcast and deliver)bull Rules governing how each part the channel configuration

may be modified

The configuration of a channel may be updated using achannel configuration update transaction This transactioncontains a representation of the changes to be made to theconfiguration as well as a set of signatures The orderingservice nodes evaluate whether the update is valid by usingthe current configuration to verify that the modifications areauthorized using the signatures The orderers then generate anew configuration block which embeds the new configura-tion and the configuration update transaction Peers receiv-ing this block validate whether the configuration update isauthorized based on the current configuration if valid theyupdate their current configuration

System chaincodes The application chaincodes are de-ployed with a reference to an endorsement system chaincode(ESCC) and to a validation system chaincode (VSCC) Thesetwo chaincodes are selected in a symmetric way such thatthe output of the ESCC (an endorsement) may be validatedas part of the input to the VSCC

The ESCC takes as input a proposal and the proposalsimulation results If the results are satisfactory then theESCC produces a response containing the results and theendorsement For the default ESCC this endorsement issimply a signature by the peerrsquos local signing identity

The VSCC takes as input a transaction and outputswhether that transaction is valid For the default VSCCthe endorsements are collected and evaluated against theendorsement policy specified for the chaincode

Further system chaincodes implement other support func-tions such as configuration and chaincode lifecycle

5 EvaluationEven though Fabric is not yet performance-tuned and opti-mized we report in this section on some preliminary per-formance numbers Fabric is a complex distributed systemits performance depends on many parameters including thechoice of a distributed application and transaction size theordering service and consensus implementation and their pa-rameters the network parameters and topology of nodes inthe network the hardware on which nodes run the num-ber of nodes and channels further configuration parametersand the network dynamics Therefore in-depth performanceevaluation of Fabric is postponed to future work

In the absence of a standard benchmark for blockchainswe use the most prominent blockchain application for eval-uating Fabric a simple authority-minted cryptocurrency that

uses the data model of Bitcoin which we call Fabric coin(abbreviated hereafter as Fabcoin) This allows us to put theperformance of Fabric in the context of other permissionedblockchains which are often derived from Bitcoin or Ethe-reum For example it is also the application used in bench-marks of other permissioned blockchains [19 32]

In the following we first describe Fabcoin (Sec 51)which also demonstrates how to customize the validationphase and endorsement policy In Section 52 we present thebenchmark and discuss our results

51 Fabric Coin (Fabcoin)UTXO cryptocurrencies The data model introduced byBitcoin [28] has become known as ldquounspent transaction out-putrdquo or UTXO and is also used by many other cryptocur-rencies and distributed applications UTXO represents eachstep in the evolution of a data object as a separate atomicstate on the ledger Such a state is created by a transactionand destroyed (or ldquoconsumedrdquo) by another unique transac-tion occurring later Every given transaction destroys a num-ber of input states and creates one or more output states Aldquocoinrdquo in Bitcoin is initially created by a coinbase transac-tion that rewards the ldquominerrdquo of the block This appears onthe ledger as a coin state designating the miner as the ownerAny coin can be spent in the sense that the coin is assignedto a new owner by a transaction that atomically destroys thecurrent coin state designating the previous owner and createsanother coin state representing the new owner

We capture the UTXO model in the key-value store ofFabric as follows Each UTXO state corresponds to a uniqueKVS entry that is created once (the coin state is ldquounspentrdquo)and destroyed once (the coin state is ldquospentrdquo) Equivalentlyevery state may be seen as a KVS entry with logical ver-sion 0 after creation when it is destroyed again it receivesversion 1 There should not be any concurrent updates tosuch entries (eg attempting to update a coin state in differ-ent ways amounts to double-spending the coin)

Value in the UTXO model is transferred through transac-tions that refer to several input states that all belong to theentity issuing the transaction An entity owns a state becausethe public key of the entity is contained in the state itselfEvery transaction creates one or more output states in theKVS representing the new owners deletes the input states inthe KVS and ensures that the sum of the values in the inputstates equals the sum of the output statesrsquo values There isalso a policy determining how value is created (eg coin-base transactions in Bitcoin or specific mint operations inother systems) or destroyed (ie as a fee consumed by theexecution)

Fabcoin implementation Each state in Fabcoin is a tupleof the form (key val) = (txidj (amount owner label)) de-noting the coin state created as the j-th output of a transac-tion with identifier txid and allocating amount units labeledwith label to the entity whose public key is owner Labels are

11 201821

strings used to identify a given type of a coin (eg lsquoUSDlsquolsquoEURlsquo lsquoFBClsquo) Transaction identifiers are short values thatuniquely identify every Fabric transaction The Fabcoin im-plementation consists of three parts (1) a client wallet (2)the Fabcoin chaincode and (3) a custom VSCC for Fabcoinimplementing its endorsement policy

Client wallet By default each Fabric client maintains aFabcoin wallet that locally stores a set of cryptographic keysallowing the client to spend coins For creating a SPENDtransaction that transfers one or more coins the client walletcreates a Fabcoin request request = (inputs outputs sigs)containing (1) a list of input coin states inputs = [in ]that specify coin states (in ) the client wishes to spend aswell as (2) a list of output coin states outputs = [(amount ownerlabel) ] The client wallet signs with the private keys thatcorrespond to the input coin states the concatenation of theFabcoin request and a nonce which is a part of every Fabrictransaction and adds the signatures in a set sigs A SPENDtransaction is valid when the sum of the amounts in the in-put coin states is at least the sum of the amounts in the out-puts and when the output amounts are positive For a MINTtransaction that creates new coins inputs contains only anidentifier (ie a reference to a public key) of a special en-tity called Central Bank (CB) whereas outputs contains anarbitrary number of coin states To be considered valid thesignatures of a MINT transaction in sigs must be a crypto-graphic signature under the public key of CB over the con-catenation of the Fabcoin request and of the aforementionednonce Fabcoin may be configured to use multiple CBs orspecify a threshold number of signatures from a set of CBsFinally the client wallet includes the Fabcoin request into atransaction and sends this to a peer of its choice

Fabcoin chaincode A peer runs the chaincode of Fab-coin which simulates the transaction and creates readsets andwritesets In a nutshell in the case of a SPEND transactionfor every input coin state in isin inputs the chaincode first per-forms GetState(in) this puts in into the readset along with itscurrent version in Fabricrsquos versioned KVS (Sec 44) Thenthe chaincode executes DelState(in) for every input state inwhich also adds in to the writeset and effectively marks thecoin state as ldquospentrdquo Finally for j = 1 |outputs| thechaincode executes PutState(txidj out) with the j-th out-put out = (amount owner label) in outputs In addition apeer optionally runs the transaction validation code as de-scribed next in the VSCC step for Fabcoin this is not neces-sary since the custom VSCC actually validates transactionsbut it allows the (correct) peers to filter out potentially mal-formed transactions In our implementation the chaincoderuns the Fabcoin VSCC without cryptographically verifyingthe signatures

Custom VSCC Finally every peer validates Fabcoin trans-actions using custom VSCC This verifies first the cryp-tographic signature(s) in sigs under the respective public

key(s) and performs semantic validation as follows For aMINT transaction it checks that the output states are createdunder the matching transaction identifier (txid) and that alloutput amounts are positive For a SPEND transaction theVSCC additionally verifies (1) that for all input coin statesan entry in the readset has been created and that it was alsoadded to the writeset and marked as deleted (2) that the sumof the amounts for all input coin states equals the sum ofamounts of all output coin states and (3) that input and out-put coin labels match Here the VSCC obtains the amountsfor the input coins by retrieving their current values from theledger

Notice that the Fabcoin VSCC does not check transac-tions for double spending as this occurs through Fabricrsquosstandard validation that runs after the custom VSCC In par-ticular if two transactions attempt to assign the same un-spent coin state to a new owner both would pass the VSCClogic but would be caught subsequently in the read-writeconflict check performed by the PTM According to Sec-tions 34 and 44 the PTM verifies that the current versionnumber stored in the ledger matches the one in the readsethence after the first transaction has changed the version ofthe coin state the transaction ordered second will be recog-nized as invalid

52 ExperimentsSetup Unless explicitly mentioned differently in our ex-periments (1) nodes run on Fabric version v11-preview2

instrumented for performance evaluation through local log-ging (2) nodes are hosted in a single IBM Cloud (SoftLayer)datacenter as dedicated VMs interconnected with 1Gbps net-working (3) all nodes are 20 GHz 16-vCPU VMs runningUbuntu with 8GB of RAM and SSDs as local disks (4) asingle-channel ordering service runs a typical Kafka orderersetup with 3 ZooKeeper nodes 4 Kafka brokers and 3 Fab-ric orderers all on distinct VMs (5) there are 5 peers in to-tal all of which are Fabcoin endorsers and (6) signaturesuse the default 256-bit ECDSA scheme In order to measureand stage latencies in the transaction flow spanning multiplenodes the node clocks are synchronized with an NTP ser-vice throughout the experiments All communication amongFabric nodes is configured to use TLS

Methodology In every experiment in the first phase weinvoke transactions that contain only Fabcoin MINT opera-tions to produce the coins and then run a second phase ofthe experiment in which we invoke Fabcoin SPEND opera-tion on previously minted coins (effectively running single-input single-output SPEND transactions) When reportingthroughput measurements we use an increasing number ofFabric CLI clients (modified to issue concurrent requests)running on a single VM until the end-to-end throughputis saturated and state the throughput just before saturationThroughput numbers are reported as an average throughput

2Patched with commit ID 9e770062 in the Fabric master branch

12 201821

measured throughout the duration of an experiment disre-garding the ldquotailrdquo of the experiment where throughput dropsdue to some client threads finishing submitting their share oftransactions In every experiment client threads submit atleast 500k MINT and SPEND transactions

Choosing the block size A critical Fabric configurationparameter that impacts both throughput and latency is blocksize To fix the block size for subsequent experiments andto evaluate the impact of block size on performance we ranexperiments varying block size from 05MB to 4MB Resultsare depicted in Fig 6 showing peak throughput measured atthe peers along with the corresponding average end-to-end(e2e) latency

Figure 6 Impact of block size on throughput and latency

We can observe that throughput does not significantlyimprove beyond a block size of 2MB but latency gets worse(as expected) Therefore we adopt 2MB as the block sizefor the following experiments with the goal of maximizingthe measured throughput assuming the end-to-end latencyof roughly 500ms is acceptable

Size of transactions During this experiment we also ob-served the size MINT and SPEND transactions In particularthe 2MB-blocks contained 473 MINT or 670 SPEND transac-tions ie the average transaction size is 306kB for SPENDand 433kB for MINT In general transactions in Fabric arelarge because they carry certificate information BesidesMINT transactions of Fabcoin are larger than SPEND trans-actions because they carry CB certificates This is an avenuefor future improvement of both Fabric and Fabcoin

Impact of peer CPU Fabric peers run many CPU-intensivecryptographic operations To estimate the impact of CPU onthroughput we performed a set of experiments in which 4peers run on 4 8 16 and 32 vCPU VMs while also doingcoarse-grained latency staging of block validation to betteridentify bottlenecks Our experiment focused on the vali-dation phase as ordering with Kafka ordering service hasnever been a bottleneck in our experiments The validation

phase and in particular the VSCC validation of Fabcoinis computationally intensive due to many digital signatureverifications performed in this phase

Figure 7 Impact of peer CPU on end-to-end throughputand block validation latency

The results with 2MB blocks are shown in Fig 7 SPENDlatency is not shown in Fig 7 as it scales similarly to MINTWe can observe that Fabcoin VSCC validation scales quasi-linearly with CPU as Fabric VSCC validation is embarrass-ingly parallel However the read-write-check and ledger-access stages are sequential and become dominant with alarger number of cores (vCPUs) This suggests that futureversions of Fabric could profit from pipelining validationstages (which are now sequential) optimizing stable-storageaccess and parallelizing read-write dependency checks

Finally in this experiment we measured over 3560 trans-actions per second (tps) average SPEND throughput at the32-vCPU peer The MINT throughput is in general slightlylower than that of SPEND but the difference remains within10

Latency profiling by stages We further performed coarse-grained profiling of latency during our previous experimentat the peak reported throughput Results are depicted inTable 1 The ordering phase comprises broadcast-deliverlatency and internal latency within a peer before valida-tion starts The table reports average latencies for MINTand SPEND standard deviation and tail latencies (99 and999)

We can observe that ordering dominates the overall la-tency We observe that average latencies are below 550mswith sub-second tail latencies In particular the highestend-to-end latencies in our experiment come from the firstblocks during the load build-up Latency under lower loadcan be regulated and reduced by leveraging the time-to-cutparameter of the orderer (see Sec 33) which we basicallydo not use in our experiments as we set it to a large value

SSD vs RAM disk To evaluate the overhead of using lo-cal SSDs for stable storage we repeated the previous exper-iment with RAM disks (tmpfs) mounted as stable storage at

13 201821

all peer VMs The benefits are limited as tmpfs only helpswith the ledger phase of the validation at the peer We mea-sured sustained peak throughput at 3870 SPEND tps at 32-vCPU peer roughly a 9 improvement over SSD

avg stdev 99 999(1) endorsement 56 75 24 42 15 21 19 26(2) ordering 248 365 600 920 484 624 523 636(3) VSCC val 310 353 102 90 727 570 113 1084(4) RW check 348 615 39 93 470 885 590 933(5) ledger 506 722 62 88 701 975 725 105(6) validation (3+4+5) 116 169 128 178 156 216 199 230(7) end-to-end (1+2+6) 371 542 63 94 612 805 646 813

Table 1 Latency statistics in milliseconds (ms) for MINTand SPEND broken down into five stages at a 32-vCPUpeer with 2MB blocks Validation (6) comprises stages 34 and 5 the end-to-end latency contains stages 1ndash5

6 Related WorkThe architecture of Fabric resembles that of a middleware-replicated database as pioneered by Kemme and Alonso [24]However all existing work on this addressed only crash fail-ures not the setting of distributed trust that corresponds to aBFT system For instance a replicated database with asym-metric update processing [25 Sec 63] relies on one node toexecute each transaction which would not work on a block-chain The execute-order-validate architecture of Fabric canbe interpreted as a generalization of this work to the Byzan-tine model with practical applications to distributed ledgers

Byzantium [17] and HRDB [35] are two further prede-cessors of Fabric from the viewpoint of BFT database repli-cation Byzantium allows transactions to run in parallel anduses active replication but totally orders BEGIN and COM-MITROLLBACK using a BFT middleware In its opti-mistic mode every operation is coordinated by a single mas-ter replica if the master is suspected to be Byzantine allreplicas execute the transaction operations for the master andit triggers a costly protocol to change the master HRDB re-lies in an even stronger way on a correct master In contrastto Fabric both systems use active replication cannot handlea flexible trust model and rely on deterministic operationsby the replicas However their database API is richer thanthe KVS model of Fabric

In Eve [21] a related architecture for SMR has been ex-plored also in the BFT model Its peers execute transactionsconcurrently and then verify that they all reach the same out-put state using a consensus protocol If the states divergethey roll back and execute operations sequentially Eve con-tains the element of independent execution which also existsin Fabric but offers none of its other features

A large number of distributed ledger platforms in thepermissioned model have come out recently which makesit impossible to compare to all (some prominent ones areTendermint [5] Quorum [19] Chain Core [12] Multi-chain (httpswwwmultichaincom) HyperledgerSawtooth (httpssawtoothhyperledgerorg) the

Volt proposal [32] and more see references in recent over-views [9 16]) All platforms follow the order-execute archi-tecture as discussed in Section 2 As a representative exam-ple take the Quorum platform [19] an enterprise-focusedversion of Ethereum With its consensus based on Raft [29]it disseminates a transaction to all peers using gossip and theRaft leader (called minter) assembles valid transactions to ablock and distributes this using Raft All peers execute thetransaction in the order decided by the leader Therefore itsuffers from the limitations mentioned in Sections 1ndash2

References[1] P-L Aublin R Guerraoui N Knezevic V Quema and

M Vukolic The next 700 BFT protocols ACM Trans Com-put Syst 32(4)121ndash1245 Jan 2015

[2] E Ben-Sasson A Chiesa C Garman M Green I MiersE Tromer and M Virza Zerocash Decentralized anonymouspayments from bitcoin In IEEE Symposium on Security ampPrivacy pages 459ndash474 2014

[3] A N Bessani J Sousa and E A P Alchieri State ma-chine replication for the masses with BFT-SMART In In-ternational Conference on Dependable Systems and Networks(DSN) pages 355ndash362 2014

[4] G Bracha and S Toueg Asynchronous consensus and broad-cast protocols J ACM 32(4)824ndash840 1985

[5] E Buchman Tendermint Byzantine fault tolerance in the ageof blockchains MSc Thesis University of Guelph CanadaJune 2016

[6] N Budhiraja K Marzullo F B Schneider and S Toueg Theprimary-backup approach In S Mullender editor DistributedSystems (2nd Ed) pages 199ndash216 ACM PressAddison-Wesley 1993

[7] C Cachin R Guerraoui and L E T Rodrigues Introductionto Reliable and Secure Distributed Programming (2 ed)Springer 2011

[8] C Cachin S Schubert and M Vukolic Non-determinismin byzantine fault-tolerant replication In 20th InternationalConference on Principles of Distributed Systems (OPODIS)2016

[9] C Cachin and M Vukolic Blockchain consensus protocolsin the wild In A W Richa editor 31st Intl Symposium onDistributed Computing (DISC 2017) pages 11ndash116 2017

[10] J Camenisch and E V Herreweghen Design and imple-mentation of the idemix anonymous credential system InACM Conference on Computer and Communications Security(CCS) pages 21ndash30 2002

[11] M Castro and B Liskov Practical Byzantine fault toler-ance and proactive recovery ACM Trans Comput Syst20(4)398ndash461 Nov 2002

[12] Chain Chain protocol whitepaper httpschaincom

docs12protocolpaperswhitepaper 2017

[13] B Charron-Bost F Pedone and A Schiper editors Replica-tion Theory and Practice volume 5959 of Lecture Notes inComputer Science Springer 2010

14 201821

[14] K Croman C Decker I Eyal A E Gencer A JuelsA Kosba A Miller P Saxena E Shi E G Sirer et alOn scaling decentralized blockchains In International Con-ference on Financial Cryptography and Data Security (FC)pages 106ndash125 Springer 2016

[15] A Demers D Greene C Hauser W Irish J LarsonS Shenker H Sturgis D Swinehart and D Terry Epidemicalgorithms for replicated database maintenance In ACMSymposium on Principles of Distributed Computing (PODC)pages 1ndash12 ACM 1987

[16] T T A Dinh R Liu M Zhang G Chen B C Ooi andJ Wang Untangling blockchain A data processing viewof blockchain systems e-print arXiv170805665 [csDB]2017

[17] R Garcia R Rodrigues and N M Preguica Efficient mid-dleware for Byzantine fault tolerant database replication InEuropean Conference on Computer Systems (EuroSys) pages107ndash122 2011

[18] R Guerraoui R R Levy B Pochon and V Quema Through-put optimal total order broadcast for cluster environmentsACM Trans Comput Syst 28(2)51ndash532 2010

[19] J P Morgan Quorum whitepaper httpsgithubcom

jpmorganchasequorum-docs 2016

[20] F P Junqueira B C Reed and M Serafini Zab High-performance broadcast for primary-backup systems In In-ternational Conference on Dependable Systems amp Networks(DSN) pages 245ndash256 2011

[21] M Kapritsos Y Wang V Quema A Clement L Alvisiand M Dahlin All about Eve Execute-verify replicationfor multi-core servers In Symposium on Operating SystemsDesign and Implementation (OSDI) pages 237ndash250 2012

[22] R Karp C Schindelhauer S Shenker and B Vocking Ran-domized rumor spreading In Symposium on Foundations ofComputer Science (FOCS) pages 565ndash574 IEEE 2000

[23] B Kemme One-copy-serializability In Encyclopedia ofDatabase Systems pages 1947ndash1948 Springer 2009

[24] B Kemme and G Alonso A new approach to developingand implementing eager database replication protocols ACMTransactions on Database Systems 25(3)333ndash379 2000

[25] B Kemme R Jimenez-Peris and M Patino-MartınezDatabase Replication Synthesis Lectures on Data Manage-ment Morgan amp Claypool Publishers 2010

[26] A E Kosba A Miller E Shi Z Wen and C PapamanthouHawk The blockchain model of cryptography and privacy-preserving smart contracts In 37th IEEE Symposium onSecurity amp Privacy 2016

[27] S Liu P Viotti C Cachin V Quema and M Vukolic XFTpractical fault tolerance beyond crashes In Symposium onOperating Systems Design and Implementation (OSDI) pages485ndash500 2016

[28] S Nakamoto Bitcoin A peer-to-peer electronic cash systemhttpwwwbitcoinorgbitcoinpdf 2009

[29] D Ongaro and J Ousterhout In search of an understandableconsensus algorithm In USENIX Annual Technical Confer-ence (ATC) pages 305ndash320 2014

[30] F Pedone and A Schiper Handling message semanticswith Generic Broadcast protocols Distributed Computing15(2)97ndash107 2002

[31] F B Schneider Implementing fault-tolerant services usingthe state machine approach A tutorial ACM Comput Surv22(4)299ndash319 1990

[32] S Setty S Basu L Zhou M L Roberts and R VenkatesanEnabling secure and resource-efficient blockchain networkswith VOLT Technical Report MSR-TR-2017-38 MicrosoftResearch 2017

[33] A Singh T Das P Maniatis P Druschel and T Roscoe BFTprotocols under fire In Symposium on Networked SystemsDesign amp Implementation (NSDI) pages 189ndash204 2008

[34] J Sousa A Bessani and M Vukolic A Byzantine fault-tolerant ordering service for the Hyperledger Fabric block-chain platform Technical Report arXiv170906921 CoRR2017

[35] B Vandiver H Balakrishnan B Liskov and S MaddenTolerating Byzantine faults in transaction processing systemsusing commit barrier scheduling In ACM Symposium onOperating Systems Principles (SOSP) pages 59ndash72 2007

[36] M Vukolic The quest for scalable blockchain fabric Proof-of-work vs BFT replication In International Workshop onOpen Problems in Network Security (iNetSec) pages 112ndash125 2015

[37] J Yin J Martin A Venkataramani L Alvisi and M DahlinSeparating agreement from execution for Byzantine fault tol-erant services In ACM Symposium on Operating SystemsPrinciples (SOSP) pages 253ndash267 2003

15 201821

  • 1 Introduction
  • 2 Background
    • 21 Order-Execute Architecture for Blockchains
    • 22 Limitations of Order-Execute
    • 23 Further Limitations of Existing Architectures
    • 24 Experience with Order-Execute Blockchain
      • 3 Architecture
        • 31 Fabric Overview
        • 32 Execution Phase
        • 33 Ordering Phase
        • 34 Validation Phase
          • 4 Fabric Components
            • 41 Membership Service
            • 42 Ordering Service
            • 43 Peer Gossip
            • 44 Ledger
            • 45 Chaincode Execution
            • 46 Configuration and System Chaincodes
              • 5 Evaluation
                • 51 Fabric Coin (Fabcoin)
                • 52 Experiments
                  • 6 Related Work

Discussion on design choices It is very important that theordering service does not maintain any state of the block-chain and neither validates nor executes transactions Thisarchitecture is a crucial defining feature of Fabric andmakes Fabric the first blockchain system to totally sepa-rate consensus from execution and validation This makesconsensus as modular as possible and enables an ecosystemof consensus protocols implementing the ordering service

34 Validation PhaseBlocks are delivered to peers either via a direct connectionto the ordering service or through gossip When a new blockarrives it enters the validation phase consisting of threesequential steps

1 The endorsement policy evaluation occurs in parallelfor all transactions within the block The evaluation isthe task of the so-called validation system chaincode(VSCC) a static library that is part of the blockchainrsquosconfiguration and is responsible for validating the en-dorsement with respect to the endorsement policy config-ured for the chaincode (see Sec 46) If the endorsementis not satisfied the transaction is marked as invalid andits effects are disregarded

2 A read-write conflict check is done for all transactions inthe block sequentially For each transaction it comparesthe versions of the keys in the readset field to those in thecurrent state of the ledger as stored locally by the peerand ensures they are still the same If the versions do notmatch the transaction is marked as invalid and its effectsare disregarded

3 The ledger update phase runs last in which the block isappended to the locally stored ledger and the blockchainstate is updated In particular when adding the blockto the ledger the results of the validity checks in thefirst two steps are persisted as well in the form of a bitmask denoting the transactions that are valid within theblock This facilitates the reconstruction of the state at alater time Furthermore all state updates are applied bywriting all key-value pairs in writeset to the local state

The default VSCC in Fabric allows monotone logical ex-pressions over the set of endorsers configured for a chain-code to be expressed The VSCC evaluation verifies that theset of peers as expressed through valid signatures on en-dorsements of the transaction satisfy the expression Differ-ent VSCC policies can be configured statically however

Discussion on design choices The ledger of Fabric con-tains all transactions including those that are deemed in-valid This follows from the overall design because orderingservice which is agnostic to chaincode state produces thechain of the blocks and because the validation is done by thepeers post-consensus This feature is needed in certain usecases that require tracking of invalid transactions during sub-sequent audits and stands in contrast to other blockchains

Committer

Peer

Endorser

Ledger

Gossip

KVS

Validation configuration sys chaincode

Block store peer trans manager (PTM)

LevelDB CouchDB

Peer-to-peer block gossip

Chaincode execution containers

Figure 5 Components of a Fabric peer

(eg Bitcoin and Ethereum) where the ledger contains onlyvalid transactions

4 Fabric ComponentsFabric is written in Go and uses the gRPC framework(httpgrpcio) for communication between clientspeers and orderers In the following we describe some im-portant components in more detail Figure 5 shows the com-ponents of a peer

41 Membership ServiceThe membership service provider (MSP) maintains the iden-tities of all nodes in the system (clients peers and order-ers) and is responsible for issuing node credentials that areused for authentication and authorization Since Fabric ispermissioned all interactions among nodes occur throughmessages that are authenticated typically with digital sig-natures The membership service comprises a componentat each node where it may authenticate transactions ver-ify the integrity of transactions sign endorsements validateendorsements and authenticate other blockchain operationsTools for key management and registration of nodes are alsopart of the MSP

The MSP is an abstraction for which different instantia-tions are possible The default MSP implementation in Fab-ric handles standard PKI methods for authentication basedon digital signatures and can accommodate commercial cer-tification authorities (CAs) A stand-alone CA is provided aswell with Fabric called Fabric-CA Furthermore alternativeMSP implementations are envisaged such as one relying onanonymous credentials for authorizing a client to invoke atransaction without linking this to an identity [10]

Fabric allows two modes for setting up a blockchain net-work In offline mode credentials are generated by a CAand distributed out-of-band to all nodes Peers and order-ers can only be registered in offline mode For enrollingclients Fabric-CA provides an online mode that issues cryp-tographic credentials to them The MSP configuration mustensure that all nodes especially all peers recognize the sameidentities and authentications as valid

8 201821

The MSP permits identity federation for example whenmultiple organizations operate a blockchain network Eachorganization issues identities to its own members and everypeer recognizes members of all organizations This can beachieved with multiple MSP instantiations for example bycreating a mapping between each organization and an MSP

42 Ordering ServiceThe ordering service manages multiple channels On everychannel it provides the following services

1 Atomic broadcast for establishing order on transactionsimplementing the broadcast and deliver calls (Sec 33)

2 Reconfiguration of a channel when its members mod-ify the channel by broadcasting a configuration updatetransaction (Sec 46)

3 Optionally access control in those configurations wherethe ordering service acts as a trusted entity restrictingbroadcasting of transactions and receiving of blocks tospecified clients and peers

The ordering service is bootstrapped with a genesis blockon the system channel This block carries a configurationtransaction that defines the properties of the ordering ser-vice

The current production implementation consists of or-dering-service nodes (OSNs) that implement the opera-tions described here and communicate through the sys-tem channel The actual atomic broadcast function is pro-vided by an instance of Apache Kafka (httpkafkaapacheorg) which offers scalable publish-subscribe mes-saging and strong consistency despite node crashes basedon ZooKeeper Kafka may run on physical nodes separatefrom the OSNs The OSNs act as proxies between the peersand Kafka

An OSN directly injects a newly received transaction tothe atomic broadcast (eg to the Kafka broker) On the otherhand the nodes batch transactions received from the atomicbroadcast and form blocks A block is cut as soon as one ofthree conditions is met (1) the block contains the specifiedmaximal number of transactions (2) the block has reached amaximal size (in bytes) or (3) an amount of time has elapsedsince the first transaction of a new block was received asexplained below

This batching process is deterministic and therefore pro-duces the same blocks at all nodes It is easy to see thatthe first two conditions are trivially deterministic given thestream of transactions received from the atomic broadcastTo ensure deterministic block production in the third casea node starts a timer when it reads the first transaction ina block from the atomic broadcast If the block is not yetcut when the timer expires the node broadcasts a specialtime-to-cut transaction on the channel which indicates thesequence number of the block which it intends to cut On theother hand every node immediately cuts a new block upon

receiving the first time-to-cut transaction for the given blocknumber Since this transaction is atomically delivered to allconnected nodes they all include the same list of transac-tions in the block (To deploy this scheme in the presence off Byzantine-faulty OSNs the block is cut only when receiv-ing f + 1 time-to-cut transactions)

The orderers persist a range of the most recently deliveredblocks directly to their filesystem so they can answer topeers retrieving blocks through deliver

The orderer using Kafka is one of three ordering serviceimplementations currently available A centralized orderercalled Solo runs on one node and is used for development Aproof-of-concept orderer based on BFT-SMaRt [3] has alsobeen made available [34] it ensures the atomic broadcastservice but not yet reconfiguration and access control Thisillustrates the modularity of consensus in Fabric

43 Peer GossipOne advantage of separating the execution ordering andvalidation phases is that they can be scaled independentlyHowever since most consensus algorithms (in the CFT andBFT models) are bandwidth-bound the throughput of the or-dering service is capped by the network capacity of its nodesConsensus cannot be scaled up by adding more nodes [1436] rather throughput will decrease However since order-ing and validation are decoupled we are interested in ef-ficiently broadcasting the execution results to all peers forvalidation after the ordering phase This is precisely thegoal of the gossip component which utilizes epidemic mul-ticast [15] for this purpose The blocks are signed by theordering service This means that a peer can upon receiv-ing all blocks independently assemble the blockchain andverify its integrity

Dissemination through gossip is robust and resistant tothe node failures in contrast to overlay networks Makingrandom selections allows gossip to reduce the overhead ofmaintaining connectivity among peers and moreover re-duces the attack surface Gossip works well in the permis-sioned environment where it can withstand sybil attacks andmessage forgery

The communication layer for gossip is based on gRPCand utilizes TLS with mutual authentication which enableseach side to bind the TLS credentials to the identity of theremote peer The gossip component maintains an up-to-datemembership view of the online peers in the system All peersindependently build a local view from periodically dissem-inated membership data Furthermore a peer can reconnectto the view after a crash or a network outage

The main purpose of gossip is to reliably distribute mes-sages ie blocks from the ordering phase among the peerstaking advantage of a push-pull protocol It uses two phasesfor information dissemination during push each peer se-lects a random set of active neighbors from the membershipview and forwards them the message during pull each peerperiodically probes a set of randomly selected peers and re-

9 201821

quests missing messages It has been shown [15 22] thatusing both methods in tandem is crucial to optimally utilizethe available bandwidth and to ensure that all peers receiveall messages with high probability

In order to reduce the load of sending blocks from the or-dering nodes to the network the protocol also elects a leaderpeer that pulls blocks from the ordering service on their be-half and initiates the gossip distribution This mechanism isresilient to leader failures

Another task of gossip is to transfer the state to newlyjoining peers and peers that were disconnected for a longtime They need to receive all blocks in the chain This fea-ture relies on the fact that the largest block-sequence num-ber stored by each peer is disseminated with the membershipdata

44 LedgerThe ledger component at each peer maintains the ledgerand the blockchain state on persistent storage and enablessimulation validation and ledger-update phases Broadlyit consists of a block store and a peer transaction manager(PTM)

Ledger block store The ledger block store persists trans-action blocks and is implemented as a set of append-onlyfiles Since the blocks are immutable and arrive in a defi-nite order an append-only structure gives maximum perfor-mance In addition the block store maintains a few indicesfor random access to a block or to a transaction in a block

Peer transaction manager (PTM) The PTM maintainsthe latest state in a versioned key-value store It stores onetuple of the form (key val ver) for each unique entry keystored by any chaincode containing its most recently storedvalue val and its latest version ver The version consistsof the block sequence number and the sequence number ofthe transaction (that stores the entry) within the block Thismakes the version unique and monotonically increasing

The PTM uses a local key-value store to realize the ver-sioned key-value store implemented by a LevelDB key-value database implemented in Go (httpsgithubcomsyndtrgoleveldb) or Apache CouchDB (httpcouchdbapacheorg)

During simulation the PTM provides a stable snapshotof the latest state to the transaction As mentioned in Sec-tion 32 the PTM records in readset a tuple (key ver) foreach entry accessed by GetState and in writeset a tuple(key val) for each entry updated with PutState by the trans-action In addition the PTM supports range queries forwhich it computes a cryptographic hash of the query results(a set of tuples (key ver)) and adds the query string itself andthe hash to readset

For transaction validation (Sec 34) the PTM validatesall transactions in a block sequentially This checks whethera transaction conflicts with any preceding transaction (withinthe block or earlier) For any key in readset if the version

recorded in readset differs from the version present in thelatest state (assuming that all preceding valid transactionsare committed) then the PTM marks the transaction as in-valid For range queries the PTM re-executes the query andcompares the hash with the one present in readset to ensurethat no phantom reads occur This read-write conflict seman-tics results in one-copy serializability [23]

The ledger component tolerates a crash of the peer duringthe ledger update as follows After receiving a new block thePTM has already performed validation and marked transac-tions as valid or invalid within the block using a bit mask asmentioned in Section 34 The ledger now writes the blockto the ledger block store flushes it to disk and subsequentlyupdates the block store indices Then the PTM applies thestate changes from writeset of all valid transactions to the lo-cal versioned store Finally it computes and persists a valuesavepoint which denotes the largest successfully appliedblock number The value savepoint is used to recover theindices and the latest state from the persisted blocks whenrecovering from a crash

45 Chaincode ExecutionChaincode is executed within an environment that is looselycoupled with the rest of the peer and that supports pluginsfor adding new languages for programming chaincodes Cur-rently three languages are supported for chaincode Go Javaand Node

Every user-level or application chaincode runs in a sepa-rate process within a Docker container environment whichisolates the chaincodes from each other and from the peerThis also simplifies the management of the lifecycle forchaincodes (ie starting stopping or aborting chaincode)The chaincode and the peer communicate using gRPC mes-sages Through this loose coupling the peer is agnostic ofthe actual language in which chaincode is implemented

In contrast to application chaincode system chaincoderuns directly in the peer process System chaincode canimplement specific functions needed by Fabric and may beused in situations where the isolation among user chaincodesis overly restrictive More details on system chaincodes aregiven in the next section

46 Configuration and System ChaincodesFabricrsquos basic behavior is customized through channel con-figuration and through special chaincodes known as systemchaincodes

Channel configuration Recall that a channel forms onelogical blockchain The configuration of a channel is main-tained in metadata persisted in special configuration blocksEach configuration block contains the full channel config-uration and does not contain any other transactions Eachblockchain begins with a configuration block known as thegenesis block which is used to bootstrap the channel Thechannel configuration includes

10 201821

bull Definitions of the MSPs for the participating nodesbull The network addresses of the OSNsbull Shared configuration for the consensus implementation

and the ordering service such as batch size and timeoutsbull Rules governing access to the ordering service operations

(broadcast and deliver)bull Rules governing how each part the channel configuration

may be modified

The configuration of a channel may be updated using achannel configuration update transaction This transactioncontains a representation of the changes to be made to theconfiguration as well as a set of signatures The orderingservice nodes evaluate whether the update is valid by usingthe current configuration to verify that the modifications areauthorized using the signatures The orderers then generate anew configuration block which embeds the new configura-tion and the configuration update transaction Peers receiv-ing this block validate whether the configuration update isauthorized based on the current configuration if valid theyupdate their current configuration

System chaincodes The application chaincodes are de-ployed with a reference to an endorsement system chaincode(ESCC) and to a validation system chaincode (VSCC) Thesetwo chaincodes are selected in a symmetric way such thatthe output of the ESCC (an endorsement) may be validatedas part of the input to the VSCC

The ESCC takes as input a proposal and the proposalsimulation results If the results are satisfactory then theESCC produces a response containing the results and theendorsement For the default ESCC this endorsement issimply a signature by the peerrsquos local signing identity

The VSCC takes as input a transaction and outputswhether that transaction is valid For the default VSCCthe endorsements are collected and evaluated against theendorsement policy specified for the chaincode

Further system chaincodes implement other support func-tions such as configuration and chaincode lifecycle

5 EvaluationEven though Fabric is not yet performance-tuned and opti-mized we report in this section on some preliminary per-formance numbers Fabric is a complex distributed systemits performance depends on many parameters including thechoice of a distributed application and transaction size theordering service and consensus implementation and their pa-rameters the network parameters and topology of nodes inthe network the hardware on which nodes run the num-ber of nodes and channels further configuration parametersand the network dynamics Therefore in-depth performanceevaluation of Fabric is postponed to future work

In the absence of a standard benchmark for blockchainswe use the most prominent blockchain application for eval-uating Fabric a simple authority-minted cryptocurrency that

uses the data model of Bitcoin which we call Fabric coin(abbreviated hereafter as Fabcoin) This allows us to put theperformance of Fabric in the context of other permissionedblockchains which are often derived from Bitcoin or Ethe-reum For example it is also the application used in bench-marks of other permissioned blockchains [19 32]

In the following we first describe Fabcoin (Sec 51)which also demonstrates how to customize the validationphase and endorsement policy In Section 52 we present thebenchmark and discuss our results

51 Fabric Coin (Fabcoin)UTXO cryptocurrencies The data model introduced byBitcoin [28] has become known as ldquounspent transaction out-putrdquo or UTXO and is also used by many other cryptocur-rencies and distributed applications UTXO represents eachstep in the evolution of a data object as a separate atomicstate on the ledger Such a state is created by a transactionand destroyed (or ldquoconsumedrdquo) by another unique transac-tion occurring later Every given transaction destroys a num-ber of input states and creates one or more output states Aldquocoinrdquo in Bitcoin is initially created by a coinbase transac-tion that rewards the ldquominerrdquo of the block This appears onthe ledger as a coin state designating the miner as the ownerAny coin can be spent in the sense that the coin is assignedto a new owner by a transaction that atomically destroys thecurrent coin state designating the previous owner and createsanother coin state representing the new owner

We capture the UTXO model in the key-value store ofFabric as follows Each UTXO state corresponds to a uniqueKVS entry that is created once (the coin state is ldquounspentrdquo)and destroyed once (the coin state is ldquospentrdquo) Equivalentlyevery state may be seen as a KVS entry with logical ver-sion 0 after creation when it is destroyed again it receivesversion 1 There should not be any concurrent updates tosuch entries (eg attempting to update a coin state in differ-ent ways amounts to double-spending the coin)

Value in the UTXO model is transferred through transac-tions that refer to several input states that all belong to theentity issuing the transaction An entity owns a state becausethe public key of the entity is contained in the state itselfEvery transaction creates one or more output states in theKVS representing the new owners deletes the input states inthe KVS and ensures that the sum of the values in the inputstates equals the sum of the output statesrsquo values There isalso a policy determining how value is created (eg coin-base transactions in Bitcoin or specific mint operations inother systems) or destroyed (ie as a fee consumed by theexecution)

Fabcoin implementation Each state in Fabcoin is a tupleof the form (key val) = (txidj (amount owner label)) de-noting the coin state created as the j-th output of a transac-tion with identifier txid and allocating amount units labeledwith label to the entity whose public key is owner Labels are

11 201821

strings used to identify a given type of a coin (eg lsquoUSDlsquolsquoEURlsquo lsquoFBClsquo) Transaction identifiers are short values thatuniquely identify every Fabric transaction The Fabcoin im-plementation consists of three parts (1) a client wallet (2)the Fabcoin chaincode and (3) a custom VSCC for Fabcoinimplementing its endorsement policy

Client wallet By default each Fabric client maintains aFabcoin wallet that locally stores a set of cryptographic keysallowing the client to spend coins For creating a SPENDtransaction that transfers one or more coins the client walletcreates a Fabcoin request request = (inputs outputs sigs)containing (1) a list of input coin states inputs = [in ]that specify coin states (in ) the client wishes to spend aswell as (2) a list of output coin states outputs = [(amount ownerlabel) ] The client wallet signs with the private keys thatcorrespond to the input coin states the concatenation of theFabcoin request and a nonce which is a part of every Fabrictransaction and adds the signatures in a set sigs A SPENDtransaction is valid when the sum of the amounts in the in-put coin states is at least the sum of the amounts in the out-puts and when the output amounts are positive For a MINTtransaction that creates new coins inputs contains only anidentifier (ie a reference to a public key) of a special en-tity called Central Bank (CB) whereas outputs contains anarbitrary number of coin states To be considered valid thesignatures of a MINT transaction in sigs must be a crypto-graphic signature under the public key of CB over the con-catenation of the Fabcoin request and of the aforementionednonce Fabcoin may be configured to use multiple CBs orspecify a threshold number of signatures from a set of CBsFinally the client wallet includes the Fabcoin request into atransaction and sends this to a peer of its choice

Fabcoin chaincode A peer runs the chaincode of Fab-coin which simulates the transaction and creates readsets andwritesets In a nutshell in the case of a SPEND transactionfor every input coin state in isin inputs the chaincode first per-forms GetState(in) this puts in into the readset along with itscurrent version in Fabricrsquos versioned KVS (Sec 44) Thenthe chaincode executes DelState(in) for every input state inwhich also adds in to the writeset and effectively marks thecoin state as ldquospentrdquo Finally for j = 1 |outputs| thechaincode executes PutState(txidj out) with the j-th out-put out = (amount owner label) in outputs In addition apeer optionally runs the transaction validation code as de-scribed next in the VSCC step for Fabcoin this is not neces-sary since the custom VSCC actually validates transactionsbut it allows the (correct) peers to filter out potentially mal-formed transactions In our implementation the chaincoderuns the Fabcoin VSCC without cryptographically verifyingthe signatures

Custom VSCC Finally every peer validates Fabcoin trans-actions using custom VSCC This verifies first the cryp-tographic signature(s) in sigs under the respective public

key(s) and performs semantic validation as follows For aMINT transaction it checks that the output states are createdunder the matching transaction identifier (txid) and that alloutput amounts are positive For a SPEND transaction theVSCC additionally verifies (1) that for all input coin statesan entry in the readset has been created and that it was alsoadded to the writeset and marked as deleted (2) that the sumof the amounts for all input coin states equals the sum ofamounts of all output coin states and (3) that input and out-put coin labels match Here the VSCC obtains the amountsfor the input coins by retrieving their current values from theledger

Notice that the Fabcoin VSCC does not check transac-tions for double spending as this occurs through Fabricrsquosstandard validation that runs after the custom VSCC In par-ticular if two transactions attempt to assign the same un-spent coin state to a new owner both would pass the VSCClogic but would be caught subsequently in the read-writeconflict check performed by the PTM According to Sec-tions 34 and 44 the PTM verifies that the current versionnumber stored in the ledger matches the one in the readsethence after the first transaction has changed the version ofthe coin state the transaction ordered second will be recog-nized as invalid

52 ExperimentsSetup Unless explicitly mentioned differently in our ex-periments (1) nodes run on Fabric version v11-preview2

instrumented for performance evaluation through local log-ging (2) nodes are hosted in a single IBM Cloud (SoftLayer)datacenter as dedicated VMs interconnected with 1Gbps net-working (3) all nodes are 20 GHz 16-vCPU VMs runningUbuntu with 8GB of RAM and SSDs as local disks (4) asingle-channel ordering service runs a typical Kafka orderersetup with 3 ZooKeeper nodes 4 Kafka brokers and 3 Fab-ric orderers all on distinct VMs (5) there are 5 peers in to-tal all of which are Fabcoin endorsers and (6) signaturesuse the default 256-bit ECDSA scheme In order to measureand stage latencies in the transaction flow spanning multiplenodes the node clocks are synchronized with an NTP ser-vice throughout the experiments All communication amongFabric nodes is configured to use TLS

Methodology In every experiment in the first phase weinvoke transactions that contain only Fabcoin MINT opera-tions to produce the coins and then run a second phase ofthe experiment in which we invoke Fabcoin SPEND opera-tion on previously minted coins (effectively running single-input single-output SPEND transactions) When reportingthroughput measurements we use an increasing number ofFabric CLI clients (modified to issue concurrent requests)running on a single VM until the end-to-end throughputis saturated and state the throughput just before saturationThroughput numbers are reported as an average throughput

2Patched with commit ID 9e770062 in the Fabric master branch

12 201821

measured throughout the duration of an experiment disre-garding the ldquotailrdquo of the experiment where throughput dropsdue to some client threads finishing submitting their share oftransactions In every experiment client threads submit atleast 500k MINT and SPEND transactions

Choosing the block size A critical Fabric configurationparameter that impacts both throughput and latency is blocksize To fix the block size for subsequent experiments andto evaluate the impact of block size on performance we ranexperiments varying block size from 05MB to 4MB Resultsare depicted in Fig 6 showing peak throughput measured atthe peers along with the corresponding average end-to-end(e2e) latency

Figure 6 Impact of block size on throughput and latency

We can observe that throughput does not significantlyimprove beyond a block size of 2MB but latency gets worse(as expected) Therefore we adopt 2MB as the block sizefor the following experiments with the goal of maximizingthe measured throughput assuming the end-to-end latencyof roughly 500ms is acceptable

Size of transactions During this experiment we also ob-served the size MINT and SPEND transactions In particularthe 2MB-blocks contained 473 MINT or 670 SPEND transac-tions ie the average transaction size is 306kB for SPENDand 433kB for MINT In general transactions in Fabric arelarge because they carry certificate information BesidesMINT transactions of Fabcoin are larger than SPEND trans-actions because they carry CB certificates This is an avenuefor future improvement of both Fabric and Fabcoin

Impact of peer CPU Fabric peers run many CPU-intensivecryptographic operations To estimate the impact of CPU onthroughput we performed a set of experiments in which 4peers run on 4 8 16 and 32 vCPU VMs while also doingcoarse-grained latency staging of block validation to betteridentify bottlenecks Our experiment focused on the vali-dation phase as ordering with Kafka ordering service hasnever been a bottleneck in our experiments The validation

phase and in particular the VSCC validation of Fabcoinis computationally intensive due to many digital signatureverifications performed in this phase

Figure 7 Impact of peer CPU on end-to-end throughputand block validation latency

The results with 2MB blocks are shown in Fig 7 SPENDlatency is not shown in Fig 7 as it scales similarly to MINTWe can observe that Fabcoin VSCC validation scales quasi-linearly with CPU as Fabric VSCC validation is embarrass-ingly parallel However the read-write-check and ledger-access stages are sequential and become dominant with alarger number of cores (vCPUs) This suggests that futureversions of Fabric could profit from pipelining validationstages (which are now sequential) optimizing stable-storageaccess and parallelizing read-write dependency checks

Finally in this experiment we measured over 3560 trans-actions per second (tps) average SPEND throughput at the32-vCPU peer The MINT throughput is in general slightlylower than that of SPEND but the difference remains within10

Latency profiling by stages We further performed coarse-grained profiling of latency during our previous experimentat the peak reported throughput Results are depicted inTable 1 The ordering phase comprises broadcast-deliverlatency and internal latency within a peer before valida-tion starts The table reports average latencies for MINTand SPEND standard deviation and tail latencies (99 and999)

We can observe that ordering dominates the overall la-tency We observe that average latencies are below 550mswith sub-second tail latencies In particular the highestend-to-end latencies in our experiment come from the firstblocks during the load build-up Latency under lower loadcan be regulated and reduced by leveraging the time-to-cutparameter of the orderer (see Sec 33) which we basicallydo not use in our experiments as we set it to a large value

SSD vs RAM disk To evaluate the overhead of using lo-cal SSDs for stable storage we repeated the previous exper-iment with RAM disks (tmpfs) mounted as stable storage at

13 201821

all peer VMs The benefits are limited as tmpfs only helpswith the ledger phase of the validation at the peer We mea-sured sustained peak throughput at 3870 SPEND tps at 32-vCPU peer roughly a 9 improvement over SSD

avg stdev 99 999(1) endorsement 56 75 24 42 15 21 19 26(2) ordering 248 365 600 920 484 624 523 636(3) VSCC val 310 353 102 90 727 570 113 1084(4) RW check 348 615 39 93 470 885 590 933(5) ledger 506 722 62 88 701 975 725 105(6) validation (3+4+5) 116 169 128 178 156 216 199 230(7) end-to-end (1+2+6) 371 542 63 94 612 805 646 813

Table 1 Latency statistics in milliseconds (ms) for MINTand SPEND broken down into five stages at a 32-vCPUpeer with 2MB blocks Validation (6) comprises stages 34 and 5 the end-to-end latency contains stages 1ndash5

6 Related WorkThe architecture of Fabric resembles that of a middleware-replicated database as pioneered by Kemme and Alonso [24]However all existing work on this addressed only crash fail-ures not the setting of distributed trust that corresponds to aBFT system For instance a replicated database with asym-metric update processing [25 Sec 63] relies on one node toexecute each transaction which would not work on a block-chain The execute-order-validate architecture of Fabric canbe interpreted as a generalization of this work to the Byzan-tine model with practical applications to distributed ledgers

Byzantium [17] and HRDB [35] are two further prede-cessors of Fabric from the viewpoint of BFT database repli-cation Byzantium allows transactions to run in parallel anduses active replication but totally orders BEGIN and COM-MITROLLBACK using a BFT middleware In its opti-mistic mode every operation is coordinated by a single mas-ter replica if the master is suspected to be Byzantine allreplicas execute the transaction operations for the master andit triggers a costly protocol to change the master HRDB re-lies in an even stronger way on a correct master In contrastto Fabric both systems use active replication cannot handlea flexible trust model and rely on deterministic operationsby the replicas However their database API is richer thanthe KVS model of Fabric

In Eve [21] a related architecture for SMR has been ex-plored also in the BFT model Its peers execute transactionsconcurrently and then verify that they all reach the same out-put state using a consensus protocol If the states divergethey roll back and execute operations sequentially Eve con-tains the element of independent execution which also existsin Fabric but offers none of its other features

A large number of distributed ledger platforms in thepermissioned model have come out recently which makesit impossible to compare to all (some prominent ones areTendermint [5] Quorum [19] Chain Core [12] Multi-chain (httpswwwmultichaincom) HyperledgerSawtooth (httpssawtoothhyperledgerorg) the

Volt proposal [32] and more see references in recent over-views [9 16]) All platforms follow the order-execute archi-tecture as discussed in Section 2 As a representative exam-ple take the Quorum platform [19] an enterprise-focusedversion of Ethereum With its consensus based on Raft [29]it disseminates a transaction to all peers using gossip and theRaft leader (called minter) assembles valid transactions to ablock and distributes this using Raft All peers execute thetransaction in the order decided by the leader Therefore itsuffers from the limitations mentioned in Sections 1ndash2

References[1] P-L Aublin R Guerraoui N Knezevic V Quema and

M Vukolic The next 700 BFT protocols ACM Trans Com-put Syst 32(4)121ndash1245 Jan 2015

[2] E Ben-Sasson A Chiesa C Garman M Green I MiersE Tromer and M Virza Zerocash Decentralized anonymouspayments from bitcoin In IEEE Symposium on Security ampPrivacy pages 459ndash474 2014

[3] A N Bessani J Sousa and E A P Alchieri State ma-chine replication for the masses with BFT-SMART In In-ternational Conference on Dependable Systems and Networks(DSN) pages 355ndash362 2014

[4] G Bracha and S Toueg Asynchronous consensus and broad-cast protocols J ACM 32(4)824ndash840 1985

[5] E Buchman Tendermint Byzantine fault tolerance in the ageof blockchains MSc Thesis University of Guelph CanadaJune 2016

[6] N Budhiraja K Marzullo F B Schneider and S Toueg Theprimary-backup approach In S Mullender editor DistributedSystems (2nd Ed) pages 199ndash216 ACM PressAddison-Wesley 1993

[7] C Cachin R Guerraoui and L E T Rodrigues Introductionto Reliable and Secure Distributed Programming (2 ed)Springer 2011

[8] C Cachin S Schubert and M Vukolic Non-determinismin byzantine fault-tolerant replication In 20th InternationalConference on Principles of Distributed Systems (OPODIS)2016

[9] C Cachin and M Vukolic Blockchain consensus protocolsin the wild In A W Richa editor 31st Intl Symposium onDistributed Computing (DISC 2017) pages 11ndash116 2017

[10] J Camenisch and E V Herreweghen Design and imple-mentation of the idemix anonymous credential system InACM Conference on Computer and Communications Security(CCS) pages 21ndash30 2002

[11] M Castro and B Liskov Practical Byzantine fault toler-ance and proactive recovery ACM Trans Comput Syst20(4)398ndash461 Nov 2002

[12] Chain Chain protocol whitepaper httpschaincom

docs12protocolpaperswhitepaper 2017

[13] B Charron-Bost F Pedone and A Schiper editors Replica-tion Theory and Practice volume 5959 of Lecture Notes inComputer Science Springer 2010

14 201821

[14] K Croman C Decker I Eyal A E Gencer A JuelsA Kosba A Miller P Saxena E Shi E G Sirer et alOn scaling decentralized blockchains In International Con-ference on Financial Cryptography and Data Security (FC)pages 106ndash125 Springer 2016

[15] A Demers D Greene C Hauser W Irish J LarsonS Shenker H Sturgis D Swinehart and D Terry Epidemicalgorithms for replicated database maintenance In ACMSymposium on Principles of Distributed Computing (PODC)pages 1ndash12 ACM 1987

[16] T T A Dinh R Liu M Zhang G Chen B C Ooi andJ Wang Untangling blockchain A data processing viewof blockchain systems e-print arXiv170805665 [csDB]2017

[17] R Garcia R Rodrigues and N M Preguica Efficient mid-dleware for Byzantine fault tolerant database replication InEuropean Conference on Computer Systems (EuroSys) pages107ndash122 2011

[18] R Guerraoui R R Levy B Pochon and V Quema Through-put optimal total order broadcast for cluster environmentsACM Trans Comput Syst 28(2)51ndash532 2010

[19] J P Morgan Quorum whitepaper httpsgithubcom

jpmorganchasequorum-docs 2016

[20] F P Junqueira B C Reed and M Serafini Zab High-performance broadcast for primary-backup systems In In-ternational Conference on Dependable Systems amp Networks(DSN) pages 245ndash256 2011

[21] M Kapritsos Y Wang V Quema A Clement L Alvisiand M Dahlin All about Eve Execute-verify replicationfor multi-core servers In Symposium on Operating SystemsDesign and Implementation (OSDI) pages 237ndash250 2012

[22] R Karp C Schindelhauer S Shenker and B Vocking Ran-domized rumor spreading In Symposium on Foundations ofComputer Science (FOCS) pages 565ndash574 IEEE 2000

[23] B Kemme One-copy-serializability In Encyclopedia ofDatabase Systems pages 1947ndash1948 Springer 2009

[24] B Kemme and G Alonso A new approach to developingand implementing eager database replication protocols ACMTransactions on Database Systems 25(3)333ndash379 2000

[25] B Kemme R Jimenez-Peris and M Patino-MartınezDatabase Replication Synthesis Lectures on Data Manage-ment Morgan amp Claypool Publishers 2010

[26] A E Kosba A Miller E Shi Z Wen and C PapamanthouHawk The blockchain model of cryptography and privacy-preserving smart contracts In 37th IEEE Symposium onSecurity amp Privacy 2016

[27] S Liu P Viotti C Cachin V Quema and M Vukolic XFTpractical fault tolerance beyond crashes In Symposium onOperating Systems Design and Implementation (OSDI) pages485ndash500 2016

[28] S Nakamoto Bitcoin A peer-to-peer electronic cash systemhttpwwwbitcoinorgbitcoinpdf 2009

[29] D Ongaro and J Ousterhout In search of an understandableconsensus algorithm In USENIX Annual Technical Confer-ence (ATC) pages 305ndash320 2014

[30] F Pedone and A Schiper Handling message semanticswith Generic Broadcast protocols Distributed Computing15(2)97ndash107 2002

[31] F B Schneider Implementing fault-tolerant services usingthe state machine approach A tutorial ACM Comput Surv22(4)299ndash319 1990

[32] S Setty S Basu L Zhou M L Roberts and R VenkatesanEnabling secure and resource-efficient blockchain networkswith VOLT Technical Report MSR-TR-2017-38 MicrosoftResearch 2017

[33] A Singh T Das P Maniatis P Druschel and T Roscoe BFTprotocols under fire In Symposium on Networked SystemsDesign amp Implementation (NSDI) pages 189ndash204 2008

[34] J Sousa A Bessani and M Vukolic A Byzantine fault-tolerant ordering service for the Hyperledger Fabric block-chain platform Technical Report arXiv170906921 CoRR2017

[35] B Vandiver H Balakrishnan B Liskov and S MaddenTolerating Byzantine faults in transaction processing systemsusing commit barrier scheduling In ACM Symposium onOperating Systems Principles (SOSP) pages 59ndash72 2007

[36] M Vukolic The quest for scalable blockchain fabric Proof-of-work vs BFT replication In International Workshop onOpen Problems in Network Security (iNetSec) pages 112ndash125 2015

[37] J Yin J Martin A Venkataramani L Alvisi and M DahlinSeparating agreement from execution for Byzantine fault tol-erant services In ACM Symposium on Operating SystemsPrinciples (SOSP) pages 253ndash267 2003

15 201821

  • 1 Introduction
  • 2 Background
    • 21 Order-Execute Architecture for Blockchains
    • 22 Limitations of Order-Execute
    • 23 Further Limitations of Existing Architectures
    • 24 Experience with Order-Execute Blockchain
      • 3 Architecture
        • 31 Fabric Overview
        • 32 Execution Phase
        • 33 Ordering Phase
        • 34 Validation Phase
          • 4 Fabric Components
            • 41 Membership Service
            • 42 Ordering Service
            • 43 Peer Gossip
            • 44 Ledger
            • 45 Chaincode Execution
            • 46 Configuration and System Chaincodes
              • 5 Evaluation
                • 51 Fabric Coin (Fabcoin)
                • 52 Experiments
                  • 6 Related Work

The MSP permits identity federation for example whenmultiple organizations operate a blockchain network Eachorganization issues identities to its own members and everypeer recognizes members of all organizations This can beachieved with multiple MSP instantiations for example bycreating a mapping between each organization and an MSP

42 Ordering ServiceThe ordering service manages multiple channels On everychannel it provides the following services

1 Atomic broadcast for establishing order on transactionsimplementing the broadcast and deliver calls (Sec 33)

2 Reconfiguration of a channel when its members mod-ify the channel by broadcasting a configuration updatetransaction (Sec 46)

3 Optionally access control in those configurations wherethe ordering service acts as a trusted entity restrictingbroadcasting of transactions and receiving of blocks tospecified clients and peers

The ordering service is bootstrapped with a genesis blockon the system channel This block carries a configurationtransaction that defines the properties of the ordering ser-vice

The current production implementation consists of or-dering-service nodes (OSNs) that implement the opera-tions described here and communicate through the sys-tem channel The actual atomic broadcast function is pro-vided by an instance of Apache Kafka (httpkafkaapacheorg) which offers scalable publish-subscribe mes-saging and strong consistency despite node crashes basedon ZooKeeper Kafka may run on physical nodes separatefrom the OSNs The OSNs act as proxies between the peersand Kafka

An OSN directly injects a newly received transaction tothe atomic broadcast (eg to the Kafka broker) On the otherhand the nodes batch transactions received from the atomicbroadcast and form blocks A block is cut as soon as one ofthree conditions is met (1) the block contains the specifiedmaximal number of transactions (2) the block has reached amaximal size (in bytes) or (3) an amount of time has elapsedsince the first transaction of a new block was received asexplained below

This batching process is deterministic and therefore pro-duces the same blocks at all nodes It is easy to see thatthe first two conditions are trivially deterministic given thestream of transactions received from the atomic broadcastTo ensure deterministic block production in the third casea node starts a timer when it reads the first transaction ina block from the atomic broadcast If the block is not yetcut when the timer expires the node broadcasts a specialtime-to-cut transaction on the channel which indicates thesequence number of the block which it intends to cut On theother hand every node immediately cuts a new block upon

receiving the first time-to-cut transaction for the given blocknumber Since this transaction is atomically delivered to allconnected nodes they all include the same list of transac-tions in the block (To deploy this scheme in the presence off Byzantine-faulty OSNs the block is cut only when receiv-ing f + 1 time-to-cut transactions)

The orderers persist a range of the most recently deliveredblocks directly to their filesystem so they can answer topeers retrieving blocks through deliver

The orderer using Kafka is one of three ordering serviceimplementations currently available A centralized orderercalled Solo runs on one node and is used for development Aproof-of-concept orderer based on BFT-SMaRt [3] has alsobeen made available [34] it ensures the atomic broadcastservice but not yet reconfiguration and access control Thisillustrates the modularity of consensus in Fabric

43 Peer GossipOne advantage of separating the execution ordering andvalidation phases is that they can be scaled independentlyHowever since most consensus algorithms (in the CFT andBFT models) are bandwidth-bound the throughput of the or-dering service is capped by the network capacity of its nodesConsensus cannot be scaled up by adding more nodes [1436] rather throughput will decrease However since order-ing and validation are decoupled we are interested in ef-ficiently broadcasting the execution results to all peers forvalidation after the ordering phase This is precisely thegoal of the gossip component which utilizes epidemic mul-ticast [15] for this purpose The blocks are signed by theordering service This means that a peer can upon receiv-ing all blocks independently assemble the blockchain andverify its integrity

Dissemination through gossip is robust and resistant tothe node failures in contrast to overlay networks Makingrandom selections allows gossip to reduce the overhead ofmaintaining connectivity among peers and moreover re-duces the attack surface Gossip works well in the permis-sioned environment where it can withstand sybil attacks andmessage forgery

The communication layer for gossip is based on gRPCand utilizes TLS with mutual authentication which enableseach side to bind the TLS credentials to the identity of theremote peer The gossip component maintains an up-to-datemembership view of the online peers in the system All peersindependently build a local view from periodically dissem-inated membership data Furthermore a peer can reconnectto the view after a crash or a network outage

The main purpose of gossip is to reliably distribute mes-sages ie blocks from the ordering phase among the peerstaking advantage of a push-pull protocol It uses two phasesfor information dissemination during push each peer se-lects a random set of active neighbors from the membershipview and forwards them the message during pull each peerperiodically probes a set of randomly selected peers and re-

9 201821

quests missing messages It has been shown [15 22] thatusing both methods in tandem is crucial to optimally utilizethe available bandwidth and to ensure that all peers receiveall messages with high probability

In order to reduce the load of sending blocks from the or-dering nodes to the network the protocol also elects a leaderpeer that pulls blocks from the ordering service on their be-half and initiates the gossip distribution This mechanism isresilient to leader failures

Another task of gossip is to transfer the state to newlyjoining peers and peers that were disconnected for a longtime They need to receive all blocks in the chain This fea-ture relies on the fact that the largest block-sequence num-ber stored by each peer is disseminated with the membershipdata

44 LedgerThe ledger component at each peer maintains the ledgerand the blockchain state on persistent storage and enablessimulation validation and ledger-update phases Broadlyit consists of a block store and a peer transaction manager(PTM)

Ledger block store The ledger block store persists trans-action blocks and is implemented as a set of append-onlyfiles Since the blocks are immutable and arrive in a defi-nite order an append-only structure gives maximum perfor-mance In addition the block store maintains a few indicesfor random access to a block or to a transaction in a block

Peer transaction manager (PTM) The PTM maintainsthe latest state in a versioned key-value store It stores onetuple of the form (key val ver) for each unique entry keystored by any chaincode containing its most recently storedvalue val and its latest version ver The version consistsof the block sequence number and the sequence number ofthe transaction (that stores the entry) within the block Thismakes the version unique and monotonically increasing

The PTM uses a local key-value store to realize the ver-sioned key-value store implemented by a LevelDB key-value database implemented in Go (httpsgithubcomsyndtrgoleveldb) or Apache CouchDB (httpcouchdbapacheorg)

During simulation the PTM provides a stable snapshotof the latest state to the transaction As mentioned in Sec-tion 32 the PTM records in readset a tuple (key ver) foreach entry accessed by GetState and in writeset a tuple(key val) for each entry updated with PutState by the trans-action In addition the PTM supports range queries forwhich it computes a cryptographic hash of the query results(a set of tuples (key ver)) and adds the query string itself andthe hash to readset

For transaction validation (Sec 34) the PTM validatesall transactions in a block sequentially This checks whethera transaction conflicts with any preceding transaction (withinthe block or earlier) For any key in readset if the version

recorded in readset differs from the version present in thelatest state (assuming that all preceding valid transactionsare committed) then the PTM marks the transaction as in-valid For range queries the PTM re-executes the query andcompares the hash with the one present in readset to ensurethat no phantom reads occur This read-write conflict seman-tics results in one-copy serializability [23]

The ledger component tolerates a crash of the peer duringthe ledger update as follows After receiving a new block thePTM has already performed validation and marked transac-tions as valid or invalid within the block using a bit mask asmentioned in Section 34 The ledger now writes the blockto the ledger block store flushes it to disk and subsequentlyupdates the block store indices Then the PTM applies thestate changes from writeset of all valid transactions to the lo-cal versioned store Finally it computes and persists a valuesavepoint which denotes the largest successfully appliedblock number The value savepoint is used to recover theindices and the latest state from the persisted blocks whenrecovering from a crash

45 Chaincode ExecutionChaincode is executed within an environment that is looselycoupled with the rest of the peer and that supports pluginsfor adding new languages for programming chaincodes Cur-rently three languages are supported for chaincode Go Javaand Node

Every user-level or application chaincode runs in a sepa-rate process within a Docker container environment whichisolates the chaincodes from each other and from the peerThis also simplifies the management of the lifecycle forchaincodes (ie starting stopping or aborting chaincode)The chaincode and the peer communicate using gRPC mes-sages Through this loose coupling the peer is agnostic ofthe actual language in which chaincode is implemented

In contrast to application chaincode system chaincoderuns directly in the peer process System chaincode canimplement specific functions needed by Fabric and may beused in situations where the isolation among user chaincodesis overly restrictive More details on system chaincodes aregiven in the next section

46 Configuration and System ChaincodesFabricrsquos basic behavior is customized through channel con-figuration and through special chaincodes known as systemchaincodes

Channel configuration Recall that a channel forms onelogical blockchain The configuration of a channel is main-tained in metadata persisted in special configuration blocksEach configuration block contains the full channel config-uration and does not contain any other transactions Eachblockchain begins with a configuration block known as thegenesis block which is used to bootstrap the channel Thechannel configuration includes

10 201821

bull Definitions of the MSPs for the participating nodesbull The network addresses of the OSNsbull Shared configuration for the consensus implementation

and the ordering service such as batch size and timeoutsbull Rules governing access to the ordering service operations

(broadcast and deliver)bull Rules governing how each part the channel configuration

may be modified

The configuration of a channel may be updated using achannel configuration update transaction This transactioncontains a representation of the changes to be made to theconfiguration as well as a set of signatures The orderingservice nodes evaluate whether the update is valid by usingthe current configuration to verify that the modifications areauthorized using the signatures The orderers then generate anew configuration block which embeds the new configura-tion and the configuration update transaction Peers receiv-ing this block validate whether the configuration update isauthorized based on the current configuration if valid theyupdate their current configuration

System chaincodes The application chaincodes are de-ployed with a reference to an endorsement system chaincode(ESCC) and to a validation system chaincode (VSCC) Thesetwo chaincodes are selected in a symmetric way such thatthe output of the ESCC (an endorsement) may be validatedas part of the input to the VSCC

The ESCC takes as input a proposal and the proposalsimulation results If the results are satisfactory then theESCC produces a response containing the results and theendorsement For the default ESCC this endorsement issimply a signature by the peerrsquos local signing identity

The VSCC takes as input a transaction and outputswhether that transaction is valid For the default VSCCthe endorsements are collected and evaluated against theendorsement policy specified for the chaincode

Further system chaincodes implement other support func-tions such as configuration and chaincode lifecycle

5 EvaluationEven though Fabric is not yet performance-tuned and opti-mized we report in this section on some preliminary per-formance numbers Fabric is a complex distributed systemits performance depends on many parameters including thechoice of a distributed application and transaction size theordering service and consensus implementation and their pa-rameters the network parameters and topology of nodes inthe network the hardware on which nodes run the num-ber of nodes and channels further configuration parametersand the network dynamics Therefore in-depth performanceevaluation of Fabric is postponed to future work

In the absence of a standard benchmark for blockchainswe use the most prominent blockchain application for eval-uating Fabric a simple authority-minted cryptocurrency that

uses the data model of Bitcoin which we call Fabric coin(abbreviated hereafter as Fabcoin) This allows us to put theperformance of Fabric in the context of other permissionedblockchains which are often derived from Bitcoin or Ethe-reum For example it is also the application used in bench-marks of other permissioned blockchains [19 32]

In the following we first describe Fabcoin (Sec 51)which also demonstrates how to customize the validationphase and endorsement policy In Section 52 we present thebenchmark and discuss our results

51 Fabric Coin (Fabcoin)UTXO cryptocurrencies The data model introduced byBitcoin [28] has become known as ldquounspent transaction out-putrdquo or UTXO and is also used by many other cryptocur-rencies and distributed applications UTXO represents eachstep in the evolution of a data object as a separate atomicstate on the ledger Such a state is created by a transactionand destroyed (or ldquoconsumedrdquo) by another unique transac-tion occurring later Every given transaction destroys a num-ber of input states and creates one or more output states Aldquocoinrdquo in Bitcoin is initially created by a coinbase transac-tion that rewards the ldquominerrdquo of the block This appears onthe ledger as a coin state designating the miner as the ownerAny coin can be spent in the sense that the coin is assignedto a new owner by a transaction that atomically destroys thecurrent coin state designating the previous owner and createsanother coin state representing the new owner

We capture the UTXO model in the key-value store ofFabric as follows Each UTXO state corresponds to a uniqueKVS entry that is created once (the coin state is ldquounspentrdquo)and destroyed once (the coin state is ldquospentrdquo) Equivalentlyevery state may be seen as a KVS entry with logical ver-sion 0 after creation when it is destroyed again it receivesversion 1 There should not be any concurrent updates tosuch entries (eg attempting to update a coin state in differ-ent ways amounts to double-spending the coin)

Value in the UTXO model is transferred through transac-tions that refer to several input states that all belong to theentity issuing the transaction An entity owns a state becausethe public key of the entity is contained in the state itselfEvery transaction creates one or more output states in theKVS representing the new owners deletes the input states inthe KVS and ensures that the sum of the values in the inputstates equals the sum of the output statesrsquo values There isalso a policy determining how value is created (eg coin-base transactions in Bitcoin or specific mint operations inother systems) or destroyed (ie as a fee consumed by theexecution)

Fabcoin implementation Each state in Fabcoin is a tupleof the form (key val) = (txidj (amount owner label)) de-noting the coin state created as the j-th output of a transac-tion with identifier txid and allocating amount units labeledwith label to the entity whose public key is owner Labels are

11 201821

strings used to identify a given type of a coin (eg lsquoUSDlsquolsquoEURlsquo lsquoFBClsquo) Transaction identifiers are short values thatuniquely identify every Fabric transaction The Fabcoin im-plementation consists of three parts (1) a client wallet (2)the Fabcoin chaincode and (3) a custom VSCC for Fabcoinimplementing its endorsement policy

Client wallet By default each Fabric client maintains aFabcoin wallet that locally stores a set of cryptographic keysallowing the client to spend coins For creating a SPENDtransaction that transfers one or more coins the client walletcreates a Fabcoin request request = (inputs outputs sigs)containing (1) a list of input coin states inputs = [in ]that specify coin states (in ) the client wishes to spend aswell as (2) a list of output coin states outputs = [(amount ownerlabel) ] The client wallet signs with the private keys thatcorrespond to the input coin states the concatenation of theFabcoin request and a nonce which is a part of every Fabrictransaction and adds the signatures in a set sigs A SPENDtransaction is valid when the sum of the amounts in the in-put coin states is at least the sum of the amounts in the out-puts and when the output amounts are positive For a MINTtransaction that creates new coins inputs contains only anidentifier (ie a reference to a public key) of a special en-tity called Central Bank (CB) whereas outputs contains anarbitrary number of coin states To be considered valid thesignatures of a MINT transaction in sigs must be a crypto-graphic signature under the public key of CB over the con-catenation of the Fabcoin request and of the aforementionednonce Fabcoin may be configured to use multiple CBs orspecify a threshold number of signatures from a set of CBsFinally the client wallet includes the Fabcoin request into atransaction and sends this to a peer of its choice

Fabcoin chaincode A peer runs the chaincode of Fab-coin which simulates the transaction and creates readsets andwritesets In a nutshell in the case of a SPEND transactionfor every input coin state in isin inputs the chaincode first per-forms GetState(in) this puts in into the readset along with itscurrent version in Fabricrsquos versioned KVS (Sec 44) Thenthe chaincode executes DelState(in) for every input state inwhich also adds in to the writeset and effectively marks thecoin state as ldquospentrdquo Finally for j = 1 |outputs| thechaincode executes PutState(txidj out) with the j-th out-put out = (amount owner label) in outputs In addition apeer optionally runs the transaction validation code as de-scribed next in the VSCC step for Fabcoin this is not neces-sary since the custom VSCC actually validates transactionsbut it allows the (correct) peers to filter out potentially mal-formed transactions In our implementation the chaincoderuns the Fabcoin VSCC without cryptographically verifyingthe signatures

Custom VSCC Finally every peer validates Fabcoin trans-actions using custom VSCC This verifies first the cryp-tographic signature(s) in sigs under the respective public

key(s) and performs semantic validation as follows For aMINT transaction it checks that the output states are createdunder the matching transaction identifier (txid) and that alloutput amounts are positive For a SPEND transaction theVSCC additionally verifies (1) that for all input coin statesan entry in the readset has been created and that it was alsoadded to the writeset and marked as deleted (2) that the sumof the amounts for all input coin states equals the sum ofamounts of all output coin states and (3) that input and out-put coin labels match Here the VSCC obtains the amountsfor the input coins by retrieving their current values from theledger

Notice that the Fabcoin VSCC does not check transac-tions for double spending as this occurs through Fabricrsquosstandard validation that runs after the custom VSCC In par-ticular if two transactions attempt to assign the same un-spent coin state to a new owner both would pass the VSCClogic but would be caught subsequently in the read-writeconflict check performed by the PTM According to Sec-tions 34 and 44 the PTM verifies that the current versionnumber stored in the ledger matches the one in the readsethence after the first transaction has changed the version ofthe coin state the transaction ordered second will be recog-nized as invalid

52 ExperimentsSetup Unless explicitly mentioned differently in our ex-periments (1) nodes run on Fabric version v11-preview2

instrumented for performance evaluation through local log-ging (2) nodes are hosted in a single IBM Cloud (SoftLayer)datacenter as dedicated VMs interconnected with 1Gbps net-working (3) all nodes are 20 GHz 16-vCPU VMs runningUbuntu with 8GB of RAM and SSDs as local disks (4) asingle-channel ordering service runs a typical Kafka orderersetup with 3 ZooKeeper nodes 4 Kafka brokers and 3 Fab-ric orderers all on distinct VMs (5) there are 5 peers in to-tal all of which are Fabcoin endorsers and (6) signaturesuse the default 256-bit ECDSA scheme In order to measureand stage latencies in the transaction flow spanning multiplenodes the node clocks are synchronized with an NTP ser-vice throughout the experiments All communication amongFabric nodes is configured to use TLS

Methodology In every experiment in the first phase weinvoke transactions that contain only Fabcoin MINT opera-tions to produce the coins and then run a second phase ofthe experiment in which we invoke Fabcoin SPEND opera-tion on previously minted coins (effectively running single-input single-output SPEND transactions) When reportingthroughput measurements we use an increasing number ofFabric CLI clients (modified to issue concurrent requests)running on a single VM until the end-to-end throughputis saturated and state the throughput just before saturationThroughput numbers are reported as an average throughput

2Patched with commit ID 9e770062 in the Fabric master branch

12 201821

measured throughout the duration of an experiment disre-garding the ldquotailrdquo of the experiment where throughput dropsdue to some client threads finishing submitting their share oftransactions In every experiment client threads submit atleast 500k MINT and SPEND transactions

Choosing the block size A critical Fabric configurationparameter that impacts both throughput and latency is blocksize To fix the block size for subsequent experiments andto evaluate the impact of block size on performance we ranexperiments varying block size from 05MB to 4MB Resultsare depicted in Fig 6 showing peak throughput measured atthe peers along with the corresponding average end-to-end(e2e) latency

Figure 6 Impact of block size on throughput and latency

We can observe that throughput does not significantlyimprove beyond a block size of 2MB but latency gets worse(as expected) Therefore we adopt 2MB as the block sizefor the following experiments with the goal of maximizingthe measured throughput assuming the end-to-end latencyof roughly 500ms is acceptable

Size of transactions During this experiment we also ob-served the size MINT and SPEND transactions In particularthe 2MB-blocks contained 473 MINT or 670 SPEND transac-tions ie the average transaction size is 306kB for SPENDand 433kB for MINT In general transactions in Fabric arelarge because they carry certificate information BesidesMINT transactions of Fabcoin are larger than SPEND trans-actions because they carry CB certificates This is an avenuefor future improvement of both Fabric and Fabcoin

Impact of peer CPU Fabric peers run many CPU-intensivecryptographic operations To estimate the impact of CPU onthroughput we performed a set of experiments in which 4peers run on 4 8 16 and 32 vCPU VMs while also doingcoarse-grained latency staging of block validation to betteridentify bottlenecks Our experiment focused on the vali-dation phase as ordering with Kafka ordering service hasnever been a bottleneck in our experiments The validation

phase and in particular the VSCC validation of Fabcoinis computationally intensive due to many digital signatureverifications performed in this phase

Figure 7 Impact of peer CPU on end-to-end throughputand block validation latency

The results with 2MB blocks are shown in Fig 7 SPENDlatency is not shown in Fig 7 as it scales similarly to MINTWe can observe that Fabcoin VSCC validation scales quasi-linearly with CPU as Fabric VSCC validation is embarrass-ingly parallel However the read-write-check and ledger-access stages are sequential and become dominant with alarger number of cores (vCPUs) This suggests that futureversions of Fabric could profit from pipelining validationstages (which are now sequential) optimizing stable-storageaccess and parallelizing read-write dependency checks

Finally in this experiment we measured over 3560 trans-actions per second (tps) average SPEND throughput at the32-vCPU peer The MINT throughput is in general slightlylower than that of SPEND but the difference remains within10

Latency profiling by stages We further performed coarse-grained profiling of latency during our previous experimentat the peak reported throughput Results are depicted inTable 1 The ordering phase comprises broadcast-deliverlatency and internal latency within a peer before valida-tion starts The table reports average latencies for MINTand SPEND standard deviation and tail latencies (99 and999)

We can observe that ordering dominates the overall la-tency We observe that average latencies are below 550mswith sub-second tail latencies In particular the highestend-to-end latencies in our experiment come from the firstblocks during the load build-up Latency under lower loadcan be regulated and reduced by leveraging the time-to-cutparameter of the orderer (see Sec 33) which we basicallydo not use in our experiments as we set it to a large value

SSD vs RAM disk To evaluate the overhead of using lo-cal SSDs for stable storage we repeated the previous exper-iment with RAM disks (tmpfs) mounted as stable storage at

13 201821

all peer VMs The benefits are limited as tmpfs only helpswith the ledger phase of the validation at the peer We mea-sured sustained peak throughput at 3870 SPEND tps at 32-vCPU peer roughly a 9 improvement over SSD

avg stdev 99 999(1) endorsement 56 75 24 42 15 21 19 26(2) ordering 248 365 600 920 484 624 523 636(3) VSCC val 310 353 102 90 727 570 113 1084(4) RW check 348 615 39 93 470 885 590 933(5) ledger 506 722 62 88 701 975 725 105(6) validation (3+4+5) 116 169 128 178 156 216 199 230(7) end-to-end (1+2+6) 371 542 63 94 612 805 646 813

Table 1 Latency statistics in milliseconds (ms) for MINTand SPEND broken down into five stages at a 32-vCPUpeer with 2MB blocks Validation (6) comprises stages 34 and 5 the end-to-end latency contains stages 1ndash5

6 Related WorkThe architecture of Fabric resembles that of a middleware-replicated database as pioneered by Kemme and Alonso [24]However all existing work on this addressed only crash fail-ures not the setting of distributed trust that corresponds to aBFT system For instance a replicated database with asym-metric update processing [25 Sec 63] relies on one node toexecute each transaction which would not work on a block-chain The execute-order-validate architecture of Fabric canbe interpreted as a generalization of this work to the Byzan-tine model with practical applications to distributed ledgers

Byzantium [17] and HRDB [35] are two further prede-cessors of Fabric from the viewpoint of BFT database repli-cation Byzantium allows transactions to run in parallel anduses active replication but totally orders BEGIN and COM-MITROLLBACK using a BFT middleware In its opti-mistic mode every operation is coordinated by a single mas-ter replica if the master is suspected to be Byzantine allreplicas execute the transaction operations for the master andit triggers a costly protocol to change the master HRDB re-lies in an even stronger way on a correct master In contrastto Fabric both systems use active replication cannot handlea flexible trust model and rely on deterministic operationsby the replicas However their database API is richer thanthe KVS model of Fabric

In Eve [21] a related architecture for SMR has been ex-plored also in the BFT model Its peers execute transactionsconcurrently and then verify that they all reach the same out-put state using a consensus protocol If the states divergethey roll back and execute operations sequentially Eve con-tains the element of independent execution which also existsin Fabric but offers none of its other features

A large number of distributed ledger platforms in thepermissioned model have come out recently which makesit impossible to compare to all (some prominent ones areTendermint [5] Quorum [19] Chain Core [12] Multi-chain (httpswwwmultichaincom) HyperledgerSawtooth (httpssawtoothhyperledgerorg) the

Volt proposal [32] and more see references in recent over-views [9 16]) All platforms follow the order-execute archi-tecture as discussed in Section 2 As a representative exam-ple take the Quorum platform [19] an enterprise-focusedversion of Ethereum With its consensus based on Raft [29]it disseminates a transaction to all peers using gossip and theRaft leader (called minter) assembles valid transactions to ablock and distributes this using Raft All peers execute thetransaction in the order decided by the leader Therefore itsuffers from the limitations mentioned in Sections 1ndash2

References[1] P-L Aublin R Guerraoui N Knezevic V Quema and

M Vukolic The next 700 BFT protocols ACM Trans Com-put Syst 32(4)121ndash1245 Jan 2015

[2] E Ben-Sasson A Chiesa C Garman M Green I MiersE Tromer and M Virza Zerocash Decentralized anonymouspayments from bitcoin In IEEE Symposium on Security ampPrivacy pages 459ndash474 2014

[3] A N Bessani J Sousa and E A P Alchieri State ma-chine replication for the masses with BFT-SMART In In-ternational Conference on Dependable Systems and Networks(DSN) pages 355ndash362 2014

[4] G Bracha and S Toueg Asynchronous consensus and broad-cast protocols J ACM 32(4)824ndash840 1985

[5] E Buchman Tendermint Byzantine fault tolerance in the ageof blockchains MSc Thesis University of Guelph CanadaJune 2016

[6] N Budhiraja K Marzullo F B Schneider and S Toueg Theprimary-backup approach In S Mullender editor DistributedSystems (2nd Ed) pages 199ndash216 ACM PressAddison-Wesley 1993

[7] C Cachin R Guerraoui and L E T Rodrigues Introductionto Reliable and Secure Distributed Programming (2 ed)Springer 2011

[8] C Cachin S Schubert and M Vukolic Non-determinismin byzantine fault-tolerant replication In 20th InternationalConference on Principles of Distributed Systems (OPODIS)2016

[9] C Cachin and M Vukolic Blockchain consensus protocolsin the wild In A W Richa editor 31st Intl Symposium onDistributed Computing (DISC 2017) pages 11ndash116 2017

[10] J Camenisch and E V Herreweghen Design and imple-mentation of the idemix anonymous credential system InACM Conference on Computer and Communications Security(CCS) pages 21ndash30 2002

[11] M Castro and B Liskov Practical Byzantine fault toler-ance and proactive recovery ACM Trans Comput Syst20(4)398ndash461 Nov 2002

[12] Chain Chain protocol whitepaper httpschaincom

docs12protocolpaperswhitepaper 2017

[13] B Charron-Bost F Pedone and A Schiper editors Replica-tion Theory and Practice volume 5959 of Lecture Notes inComputer Science Springer 2010

14 201821

[14] K Croman C Decker I Eyal A E Gencer A JuelsA Kosba A Miller P Saxena E Shi E G Sirer et alOn scaling decentralized blockchains In International Con-ference on Financial Cryptography and Data Security (FC)pages 106ndash125 Springer 2016

[15] A Demers D Greene C Hauser W Irish J LarsonS Shenker H Sturgis D Swinehart and D Terry Epidemicalgorithms for replicated database maintenance In ACMSymposium on Principles of Distributed Computing (PODC)pages 1ndash12 ACM 1987

[16] T T A Dinh R Liu M Zhang G Chen B C Ooi andJ Wang Untangling blockchain A data processing viewof blockchain systems e-print arXiv170805665 [csDB]2017

[17] R Garcia R Rodrigues and N M Preguica Efficient mid-dleware for Byzantine fault tolerant database replication InEuropean Conference on Computer Systems (EuroSys) pages107ndash122 2011

[18] R Guerraoui R R Levy B Pochon and V Quema Through-put optimal total order broadcast for cluster environmentsACM Trans Comput Syst 28(2)51ndash532 2010

[19] J P Morgan Quorum whitepaper httpsgithubcom

jpmorganchasequorum-docs 2016

[20] F P Junqueira B C Reed and M Serafini Zab High-performance broadcast for primary-backup systems In In-ternational Conference on Dependable Systems amp Networks(DSN) pages 245ndash256 2011

[21] M Kapritsos Y Wang V Quema A Clement L Alvisiand M Dahlin All about Eve Execute-verify replicationfor multi-core servers In Symposium on Operating SystemsDesign and Implementation (OSDI) pages 237ndash250 2012

[22] R Karp C Schindelhauer S Shenker and B Vocking Ran-domized rumor spreading In Symposium on Foundations ofComputer Science (FOCS) pages 565ndash574 IEEE 2000

[23] B Kemme One-copy-serializability In Encyclopedia ofDatabase Systems pages 1947ndash1948 Springer 2009

[24] B Kemme and G Alonso A new approach to developingand implementing eager database replication protocols ACMTransactions on Database Systems 25(3)333ndash379 2000

[25] B Kemme R Jimenez-Peris and M Patino-MartınezDatabase Replication Synthesis Lectures on Data Manage-ment Morgan amp Claypool Publishers 2010

[26] A E Kosba A Miller E Shi Z Wen and C PapamanthouHawk The blockchain model of cryptography and privacy-preserving smart contracts In 37th IEEE Symposium onSecurity amp Privacy 2016

[27] S Liu P Viotti C Cachin V Quema and M Vukolic XFTpractical fault tolerance beyond crashes In Symposium onOperating Systems Design and Implementation (OSDI) pages485ndash500 2016

[28] S Nakamoto Bitcoin A peer-to-peer electronic cash systemhttpwwwbitcoinorgbitcoinpdf 2009

[29] D Ongaro and J Ousterhout In search of an understandableconsensus algorithm In USENIX Annual Technical Confer-ence (ATC) pages 305ndash320 2014

[30] F Pedone and A Schiper Handling message semanticswith Generic Broadcast protocols Distributed Computing15(2)97ndash107 2002

[31] F B Schneider Implementing fault-tolerant services usingthe state machine approach A tutorial ACM Comput Surv22(4)299ndash319 1990

[32] S Setty S Basu L Zhou M L Roberts and R VenkatesanEnabling secure and resource-efficient blockchain networkswith VOLT Technical Report MSR-TR-2017-38 MicrosoftResearch 2017

[33] A Singh T Das P Maniatis P Druschel and T Roscoe BFTprotocols under fire In Symposium on Networked SystemsDesign amp Implementation (NSDI) pages 189ndash204 2008

[34] J Sousa A Bessani and M Vukolic A Byzantine fault-tolerant ordering service for the Hyperledger Fabric block-chain platform Technical Report arXiv170906921 CoRR2017

[35] B Vandiver H Balakrishnan B Liskov and S MaddenTolerating Byzantine faults in transaction processing systemsusing commit barrier scheduling In ACM Symposium onOperating Systems Principles (SOSP) pages 59ndash72 2007

[36] M Vukolic The quest for scalable blockchain fabric Proof-of-work vs BFT replication In International Workshop onOpen Problems in Network Security (iNetSec) pages 112ndash125 2015

[37] J Yin J Martin A Venkataramani L Alvisi and M DahlinSeparating agreement from execution for Byzantine fault tol-erant services In ACM Symposium on Operating SystemsPrinciples (SOSP) pages 253ndash267 2003

15 201821

  • 1 Introduction
  • 2 Background
    • 21 Order-Execute Architecture for Blockchains
    • 22 Limitations of Order-Execute
    • 23 Further Limitations of Existing Architectures
    • 24 Experience with Order-Execute Blockchain
      • 3 Architecture
        • 31 Fabric Overview
        • 32 Execution Phase
        • 33 Ordering Phase
        • 34 Validation Phase
          • 4 Fabric Components
            • 41 Membership Service
            • 42 Ordering Service
            • 43 Peer Gossip
            • 44 Ledger
            • 45 Chaincode Execution
            • 46 Configuration and System Chaincodes
              • 5 Evaluation
                • 51 Fabric Coin (Fabcoin)
                • 52 Experiments
                  • 6 Related Work

quests missing messages It has been shown [15 22] thatusing both methods in tandem is crucial to optimally utilizethe available bandwidth and to ensure that all peers receiveall messages with high probability

In order to reduce the load of sending blocks from the or-dering nodes to the network the protocol also elects a leaderpeer that pulls blocks from the ordering service on their be-half and initiates the gossip distribution This mechanism isresilient to leader failures

Another task of gossip is to transfer the state to newlyjoining peers and peers that were disconnected for a longtime They need to receive all blocks in the chain This fea-ture relies on the fact that the largest block-sequence num-ber stored by each peer is disseminated with the membershipdata

44 LedgerThe ledger component at each peer maintains the ledgerand the blockchain state on persistent storage and enablessimulation validation and ledger-update phases Broadlyit consists of a block store and a peer transaction manager(PTM)

Ledger block store The ledger block store persists trans-action blocks and is implemented as a set of append-onlyfiles Since the blocks are immutable and arrive in a defi-nite order an append-only structure gives maximum perfor-mance In addition the block store maintains a few indicesfor random access to a block or to a transaction in a block

Peer transaction manager (PTM) The PTM maintainsthe latest state in a versioned key-value store It stores onetuple of the form (key val ver) for each unique entry keystored by any chaincode containing its most recently storedvalue val and its latest version ver The version consistsof the block sequence number and the sequence number ofthe transaction (that stores the entry) within the block Thismakes the version unique and monotonically increasing

The PTM uses a local key-value store to realize the ver-sioned key-value store implemented by a LevelDB key-value database implemented in Go (httpsgithubcomsyndtrgoleveldb) or Apache CouchDB (httpcouchdbapacheorg)

During simulation the PTM provides a stable snapshotof the latest state to the transaction As mentioned in Sec-tion 32 the PTM records in readset a tuple (key ver) foreach entry accessed by GetState and in writeset a tuple(key val) for each entry updated with PutState by the trans-action In addition the PTM supports range queries forwhich it computes a cryptographic hash of the query results(a set of tuples (key ver)) and adds the query string itself andthe hash to readset

For transaction validation (Sec 34) the PTM validatesall transactions in a block sequentially This checks whethera transaction conflicts with any preceding transaction (withinthe block or earlier) For any key in readset if the version

recorded in readset differs from the version present in thelatest state (assuming that all preceding valid transactionsare committed) then the PTM marks the transaction as in-valid For range queries the PTM re-executes the query andcompares the hash with the one present in readset to ensurethat no phantom reads occur This read-write conflict seman-tics results in one-copy serializability [23]

The ledger component tolerates a crash of the peer duringthe ledger update as follows After receiving a new block thePTM has already performed validation and marked transac-tions as valid or invalid within the block using a bit mask asmentioned in Section 34 The ledger now writes the blockto the ledger block store flushes it to disk and subsequentlyupdates the block store indices Then the PTM applies thestate changes from writeset of all valid transactions to the lo-cal versioned store Finally it computes and persists a valuesavepoint which denotes the largest successfully appliedblock number The value savepoint is used to recover theindices and the latest state from the persisted blocks whenrecovering from a crash

45 Chaincode ExecutionChaincode is executed within an environment that is looselycoupled with the rest of the peer and that supports pluginsfor adding new languages for programming chaincodes Cur-rently three languages are supported for chaincode Go Javaand Node

Every user-level or application chaincode runs in a sepa-rate process within a Docker container environment whichisolates the chaincodes from each other and from the peerThis also simplifies the management of the lifecycle forchaincodes (ie starting stopping or aborting chaincode)The chaincode and the peer communicate using gRPC mes-sages Through this loose coupling the peer is agnostic ofthe actual language in which chaincode is implemented

In contrast to application chaincode system chaincoderuns directly in the peer process System chaincode canimplement specific functions needed by Fabric and may beused in situations where the isolation among user chaincodesis overly restrictive More details on system chaincodes aregiven in the next section

46 Configuration and System ChaincodesFabricrsquos basic behavior is customized through channel con-figuration and through special chaincodes known as systemchaincodes

Channel configuration Recall that a channel forms onelogical blockchain The configuration of a channel is main-tained in metadata persisted in special configuration blocksEach configuration block contains the full channel config-uration and does not contain any other transactions Eachblockchain begins with a configuration block known as thegenesis block which is used to bootstrap the channel Thechannel configuration includes

10 201821

bull Definitions of the MSPs for the participating nodesbull The network addresses of the OSNsbull Shared configuration for the consensus implementation

and the ordering service such as batch size and timeoutsbull Rules governing access to the ordering service operations

(broadcast and deliver)bull Rules governing how each part the channel configuration

may be modified

The configuration of a channel may be updated using achannel configuration update transaction This transactioncontains a representation of the changes to be made to theconfiguration as well as a set of signatures The orderingservice nodes evaluate whether the update is valid by usingthe current configuration to verify that the modifications areauthorized using the signatures The orderers then generate anew configuration block which embeds the new configura-tion and the configuration update transaction Peers receiv-ing this block validate whether the configuration update isauthorized based on the current configuration if valid theyupdate their current configuration

System chaincodes The application chaincodes are de-ployed with a reference to an endorsement system chaincode(ESCC) and to a validation system chaincode (VSCC) Thesetwo chaincodes are selected in a symmetric way such thatthe output of the ESCC (an endorsement) may be validatedas part of the input to the VSCC

The ESCC takes as input a proposal and the proposalsimulation results If the results are satisfactory then theESCC produces a response containing the results and theendorsement For the default ESCC this endorsement issimply a signature by the peerrsquos local signing identity

The VSCC takes as input a transaction and outputswhether that transaction is valid For the default VSCCthe endorsements are collected and evaluated against theendorsement policy specified for the chaincode

Further system chaincodes implement other support func-tions such as configuration and chaincode lifecycle

5 EvaluationEven though Fabric is not yet performance-tuned and opti-mized we report in this section on some preliminary per-formance numbers Fabric is a complex distributed systemits performance depends on many parameters including thechoice of a distributed application and transaction size theordering service and consensus implementation and their pa-rameters the network parameters and topology of nodes inthe network the hardware on which nodes run the num-ber of nodes and channels further configuration parametersand the network dynamics Therefore in-depth performanceevaluation of Fabric is postponed to future work

In the absence of a standard benchmark for blockchainswe use the most prominent blockchain application for eval-uating Fabric a simple authority-minted cryptocurrency that

uses the data model of Bitcoin which we call Fabric coin(abbreviated hereafter as Fabcoin) This allows us to put theperformance of Fabric in the context of other permissionedblockchains which are often derived from Bitcoin or Ethe-reum For example it is also the application used in bench-marks of other permissioned blockchains [19 32]

In the following we first describe Fabcoin (Sec 51)which also demonstrates how to customize the validationphase and endorsement policy In Section 52 we present thebenchmark and discuss our results

51 Fabric Coin (Fabcoin)UTXO cryptocurrencies The data model introduced byBitcoin [28] has become known as ldquounspent transaction out-putrdquo or UTXO and is also used by many other cryptocur-rencies and distributed applications UTXO represents eachstep in the evolution of a data object as a separate atomicstate on the ledger Such a state is created by a transactionand destroyed (or ldquoconsumedrdquo) by another unique transac-tion occurring later Every given transaction destroys a num-ber of input states and creates one or more output states Aldquocoinrdquo in Bitcoin is initially created by a coinbase transac-tion that rewards the ldquominerrdquo of the block This appears onthe ledger as a coin state designating the miner as the ownerAny coin can be spent in the sense that the coin is assignedto a new owner by a transaction that atomically destroys thecurrent coin state designating the previous owner and createsanother coin state representing the new owner

We capture the UTXO model in the key-value store ofFabric as follows Each UTXO state corresponds to a uniqueKVS entry that is created once (the coin state is ldquounspentrdquo)and destroyed once (the coin state is ldquospentrdquo) Equivalentlyevery state may be seen as a KVS entry with logical ver-sion 0 after creation when it is destroyed again it receivesversion 1 There should not be any concurrent updates tosuch entries (eg attempting to update a coin state in differ-ent ways amounts to double-spending the coin)

Value in the UTXO model is transferred through transac-tions that refer to several input states that all belong to theentity issuing the transaction An entity owns a state becausethe public key of the entity is contained in the state itselfEvery transaction creates one or more output states in theKVS representing the new owners deletes the input states inthe KVS and ensures that the sum of the values in the inputstates equals the sum of the output statesrsquo values There isalso a policy determining how value is created (eg coin-base transactions in Bitcoin or specific mint operations inother systems) or destroyed (ie as a fee consumed by theexecution)

Fabcoin implementation Each state in Fabcoin is a tupleof the form (key val) = (txidj (amount owner label)) de-noting the coin state created as the j-th output of a transac-tion with identifier txid and allocating amount units labeledwith label to the entity whose public key is owner Labels are

11 201821

strings used to identify a given type of a coin (eg lsquoUSDlsquolsquoEURlsquo lsquoFBClsquo) Transaction identifiers are short values thatuniquely identify every Fabric transaction The Fabcoin im-plementation consists of three parts (1) a client wallet (2)the Fabcoin chaincode and (3) a custom VSCC for Fabcoinimplementing its endorsement policy

Client wallet By default each Fabric client maintains aFabcoin wallet that locally stores a set of cryptographic keysallowing the client to spend coins For creating a SPENDtransaction that transfers one or more coins the client walletcreates a Fabcoin request request = (inputs outputs sigs)containing (1) a list of input coin states inputs = [in ]that specify coin states (in ) the client wishes to spend aswell as (2) a list of output coin states outputs = [(amount ownerlabel) ] The client wallet signs with the private keys thatcorrespond to the input coin states the concatenation of theFabcoin request and a nonce which is a part of every Fabrictransaction and adds the signatures in a set sigs A SPENDtransaction is valid when the sum of the amounts in the in-put coin states is at least the sum of the amounts in the out-puts and when the output amounts are positive For a MINTtransaction that creates new coins inputs contains only anidentifier (ie a reference to a public key) of a special en-tity called Central Bank (CB) whereas outputs contains anarbitrary number of coin states To be considered valid thesignatures of a MINT transaction in sigs must be a crypto-graphic signature under the public key of CB over the con-catenation of the Fabcoin request and of the aforementionednonce Fabcoin may be configured to use multiple CBs orspecify a threshold number of signatures from a set of CBsFinally the client wallet includes the Fabcoin request into atransaction and sends this to a peer of its choice

Fabcoin chaincode A peer runs the chaincode of Fab-coin which simulates the transaction and creates readsets andwritesets In a nutshell in the case of a SPEND transactionfor every input coin state in isin inputs the chaincode first per-forms GetState(in) this puts in into the readset along with itscurrent version in Fabricrsquos versioned KVS (Sec 44) Thenthe chaincode executes DelState(in) for every input state inwhich also adds in to the writeset and effectively marks thecoin state as ldquospentrdquo Finally for j = 1 |outputs| thechaincode executes PutState(txidj out) with the j-th out-put out = (amount owner label) in outputs In addition apeer optionally runs the transaction validation code as de-scribed next in the VSCC step for Fabcoin this is not neces-sary since the custom VSCC actually validates transactionsbut it allows the (correct) peers to filter out potentially mal-formed transactions In our implementation the chaincoderuns the Fabcoin VSCC without cryptographically verifyingthe signatures

Custom VSCC Finally every peer validates Fabcoin trans-actions using custom VSCC This verifies first the cryp-tographic signature(s) in sigs under the respective public

key(s) and performs semantic validation as follows For aMINT transaction it checks that the output states are createdunder the matching transaction identifier (txid) and that alloutput amounts are positive For a SPEND transaction theVSCC additionally verifies (1) that for all input coin statesan entry in the readset has been created and that it was alsoadded to the writeset and marked as deleted (2) that the sumof the amounts for all input coin states equals the sum ofamounts of all output coin states and (3) that input and out-put coin labels match Here the VSCC obtains the amountsfor the input coins by retrieving their current values from theledger

Notice that the Fabcoin VSCC does not check transac-tions for double spending as this occurs through Fabricrsquosstandard validation that runs after the custom VSCC In par-ticular if two transactions attempt to assign the same un-spent coin state to a new owner both would pass the VSCClogic but would be caught subsequently in the read-writeconflict check performed by the PTM According to Sec-tions 34 and 44 the PTM verifies that the current versionnumber stored in the ledger matches the one in the readsethence after the first transaction has changed the version ofthe coin state the transaction ordered second will be recog-nized as invalid

52 ExperimentsSetup Unless explicitly mentioned differently in our ex-periments (1) nodes run on Fabric version v11-preview2

instrumented for performance evaluation through local log-ging (2) nodes are hosted in a single IBM Cloud (SoftLayer)datacenter as dedicated VMs interconnected with 1Gbps net-working (3) all nodes are 20 GHz 16-vCPU VMs runningUbuntu with 8GB of RAM and SSDs as local disks (4) asingle-channel ordering service runs a typical Kafka orderersetup with 3 ZooKeeper nodes 4 Kafka brokers and 3 Fab-ric orderers all on distinct VMs (5) there are 5 peers in to-tal all of which are Fabcoin endorsers and (6) signaturesuse the default 256-bit ECDSA scheme In order to measureand stage latencies in the transaction flow spanning multiplenodes the node clocks are synchronized with an NTP ser-vice throughout the experiments All communication amongFabric nodes is configured to use TLS

Methodology In every experiment in the first phase weinvoke transactions that contain only Fabcoin MINT opera-tions to produce the coins and then run a second phase ofthe experiment in which we invoke Fabcoin SPEND opera-tion on previously minted coins (effectively running single-input single-output SPEND transactions) When reportingthroughput measurements we use an increasing number ofFabric CLI clients (modified to issue concurrent requests)running on a single VM until the end-to-end throughputis saturated and state the throughput just before saturationThroughput numbers are reported as an average throughput

2Patched with commit ID 9e770062 in the Fabric master branch

12 201821

measured throughout the duration of an experiment disre-garding the ldquotailrdquo of the experiment where throughput dropsdue to some client threads finishing submitting their share oftransactions In every experiment client threads submit atleast 500k MINT and SPEND transactions

Choosing the block size A critical Fabric configurationparameter that impacts both throughput and latency is blocksize To fix the block size for subsequent experiments andto evaluate the impact of block size on performance we ranexperiments varying block size from 05MB to 4MB Resultsare depicted in Fig 6 showing peak throughput measured atthe peers along with the corresponding average end-to-end(e2e) latency

Figure 6 Impact of block size on throughput and latency

We can observe that throughput does not significantlyimprove beyond a block size of 2MB but latency gets worse(as expected) Therefore we adopt 2MB as the block sizefor the following experiments with the goal of maximizingthe measured throughput assuming the end-to-end latencyof roughly 500ms is acceptable

Size of transactions During this experiment we also ob-served the size MINT and SPEND transactions In particularthe 2MB-blocks contained 473 MINT or 670 SPEND transac-tions ie the average transaction size is 306kB for SPENDand 433kB for MINT In general transactions in Fabric arelarge because they carry certificate information BesidesMINT transactions of Fabcoin are larger than SPEND trans-actions because they carry CB certificates This is an avenuefor future improvement of both Fabric and Fabcoin

Impact of peer CPU Fabric peers run many CPU-intensivecryptographic operations To estimate the impact of CPU onthroughput we performed a set of experiments in which 4peers run on 4 8 16 and 32 vCPU VMs while also doingcoarse-grained latency staging of block validation to betteridentify bottlenecks Our experiment focused on the vali-dation phase as ordering with Kafka ordering service hasnever been a bottleneck in our experiments The validation

phase and in particular the VSCC validation of Fabcoinis computationally intensive due to many digital signatureverifications performed in this phase

Figure 7 Impact of peer CPU on end-to-end throughputand block validation latency

The results with 2MB blocks are shown in Fig 7 SPENDlatency is not shown in Fig 7 as it scales similarly to MINTWe can observe that Fabcoin VSCC validation scales quasi-linearly with CPU as Fabric VSCC validation is embarrass-ingly parallel However the read-write-check and ledger-access stages are sequential and become dominant with alarger number of cores (vCPUs) This suggests that futureversions of Fabric could profit from pipelining validationstages (which are now sequential) optimizing stable-storageaccess and parallelizing read-write dependency checks

Finally in this experiment we measured over 3560 trans-actions per second (tps) average SPEND throughput at the32-vCPU peer The MINT throughput is in general slightlylower than that of SPEND but the difference remains within10

Latency profiling by stages We further performed coarse-grained profiling of latency during our previous experimentat the peak reported throughput Results are depicted inTable 1 The ordering phase comprises broadcast-deliverlatency and internal latency within a peer before valida-tion starts The table reports average latencies for MINTand SPEND standard deviation and tail latencies (99 and999)

We can observe that ordering dominates the overall la-tency We observe that average latencies are below 550mswith sub-second tail latencies In particular the highestend-to-end latencies in our experiment come from the firstblocks during the load build-up Latency under lower loadcan be regulated and reduced by leveraging the time-to-cutparameter of the orderer (see Sec 33) which we basicallydo not use in our experiments as we set it to a large value

SSD vs RAM disk To evaluate the overhead of using lo-cal SSDs for stable storage we repeated the previous exper-iment with RAM disks (tmpfs) mounted as stable storage at

13 201821

all peer VMs The benefits are limited as tmpfs only helpswith the ledger phase of the validation at the peer We mea-sured sustained peak throughput at 3870 SPEND tps at 32-vCPU peer roughly a 9 improvement over SSD

avg stdev 99 999(1) endorsement 56 75 24 42 15 21 19 26(2) ordering 248 365 600 920 484 624 523 636(3) VSCC val 310 353 102 90 727 570 113 1084(4) RW check 348 615 39 93 470 885 590 933(5) ledger 506 722 62 88 701 975 725 105(6) validation (3+4+5) 116 169 128 178 156 216 199 230(7) end-to-end (1+2+6) 371 542 63 94 612 805 646 813

Table 1 Latency statistics in milliseconds (ms) for MINTand SPEND broken down into five stages at a 32-vCPUpeer with 2MB blocks Validation (6) comprises stages 34 and 5 the end-to-end latency contains stages 1ndash5

6 Related WorkThe architecture of Fabric resembles that of a middleware-replicated database as pioneered by Kemme and Alonso [24]However all existing work on this addressed only crash fail-ures not the setting of distributed trust that corresponds to aBFT system For instance a replicated database with asym-metric update processing [25 Sec 63] relies on one node toexecute each transaction which would not work on a block-chain The execute-order-validate architecture of Fabric canbe interpreted as a generalization of this work to the Byzan-tine model with practical applications to distributed ledgers

Byzantium [17] and HRDB [35] are two further prede-cessors of Fabric from the viewpoint of BFT database repli-cation Byzantium allows transactions to run in parallel anduses active replication but totally orders BEGIN and COM-MITROLLBACK using a BFT middleware In its opti-mistic mode every operation is coordinated by a single mas-ter replica if the master is suspected to be Byzantine allreplicas execute the transaction operations for the master andit triggers a costly protocol to change the master HRDB re-lies in an even stronger way on a correct master In contrastto Fabric both systems use active replication cannot handlea flexible trust model and rely on deterministic operationsby the replicas However their database API is richer thanthe KVS model of Fabric

In Eve [21] a related architecture for SMR has been ex-plored also in the BFT model Its peers execute transactionsconcurrently and then verify that they all reach the same out-put state using a consensus protocol If the states divergethey roll back and execute operations sequentially Eve con-tains the element of independent execution which also existsin Fabric but offers none of its other features

A large number of distributed ledger platforms in thepermissioned model have come out recently which makesit impossible to compare to all (some prominent ones areTendermint [5] Quorum [19] Chain Core [12] Multi-chain (httpswwwmultichaincom) HyperledgerSawtooth (httpssawtoothhyperledgerorg) the

Volt proposal [32] and more see references in recent over-views [9 16]) All platforms follow the order-execute archi-tecture as discussed in Section 2 As a representative exam-ple take the Quorum platform [19] an enterprise-focusedversion of Ethereum With its consensus based on Raft [29]it disseminates a transaction to all peers using gossip and theRaft leader (called minter) assembles valid transactions to ablock and distributes this using Raft All peers execute thetransaction in the order decided by the leader Therefore itsuffers from the limitations mentioned in Sections 1ndash2

References[1] P-L Aublin R Guerraoui N Knezevic V Quema and

M Vukolic The next 700 BFT protocols ACM Trans Com-put Syst 32(4)121ndash1245 Jan 2015

[2] E Ben-Sasson A Chiesa C Garman M Green I MiersE Tromer and M Virza Zerocash Decentralized anonymouspayments from bitcoin In IEEE Symposium on Security ampPrivacy pages 459ndash474 2014

[3] A N Bessani J Sousa and E A P Alchieri State ma-chine replication for the masses with BFT-SMART In In-ternational Conference on Dependable Systems and Networks(DSN) pages 355ndash362 2014

[4] G Bracha and S Toueg Asynchronous consensus and broad-cast protocols J ACM 32(4)824ndash840 1985

[5] E Buchman Tendermint Byzantine fault tolerance in the ageof blockchains MSc Thesis University of Guelph CanadaJune 2016

[6] N Budhiraja K Marzullo F B Schneider and S Toueg Theprimary-backup approach In S Mullender editor DistributedSystems (2nd Ed) pages 199ndash216 ACM PressAddison-Wesley 1993

[7] C Cachin R Guerraoui and L E T Rodrigues Introductionto Reliable and Secure Distributed Programming (2 ed)Springer 2011

[8] C Cachin S Schubert and M Vukolic Non-determinismin byzantine fault-tolerant replication In 20th InternationalConference on Principles of Distributed Systems (OPODIS)2016

[9] C Cachin and M Vukolic Blockchain consensus protocolsin the wild In A W Richa editor 31st Intl Symposium onDistributed Computing (DISC 2017) pages 11ndash116 2017

[10] J Camenisch and E V Herreweghen Design and imple-mentation of the idemix anonymous credential system InACM Conference on Computer and Communications Security(CCS) pages 21ndash30 2002

[11] M Castro and B Liskov Practical Byzantine fault toler-ance and proactive recovery ACM Trans Comput Syst20(4)398ndash461 Nov 2002

[12] Chain Chain protocol whitepaper httpschaincom

docs12protocolpaperswhitepaper 2017

[13] B Charron-Bost F Pedone and A Schiper editors Replica-tion Theory and Practice volume 5959 of Lecture Notes inComputer Science Springer 2010

14 201821

[14] K Croman C Decker I Eyal A E Gencer A JuelsA Kosba A Miller P Saxena E Shi E G Sirer et alOn scaling decentralized blockchains In International Con-ference on Financial Cryptography and Data Security (FC)pages 106ndash125 Springer 2016

[15] A Demers D Greene C Hauser W Irish J LarsonS Shenker H Sturgis D Swinehart and D Terry Epidemicalgorithms for replicated database maintenance In ACMSymposium on Principles of Distributed Computing (PODC)pages 1ndash12 ACM 1987

[16] T T A Dinh R Liu M Zhang G Chen B C Ooi andJ Wang Untangling blockchain A data processing viewof blockchain systems e-print arXiv170805665 [csDB]2017

[17] R Garcia R Rodrigues and N M Preguica Efficient mid-dleware for Byzantine fault tolerant database replication InEuropean Conference on Computer Systems (EuroSys) pages107ndash122 2011

[18] R Guerraoui R R Levy B Pochon and V Quema Through-put optimal total order broadcast for cluster environmentsACM Trans Comput Syst 28(2)51ndash532 2010

[19] J P Morgan Quorum whitepaper httpsgithubcom

jpmorganchasequorum-docs 2016

[20] F P Junqueira B C Reed and M Serafini Zab High-performance broadcast for primary-backup systems In In-ternational Conference on Dependable Systems amp Networks(DSN) pages 245ndash256 2011

[21] M Kapritsos Y Wang V Quema A Clement L Alvisiand M Dahlin All about Eve Execute-verify replicationfor multi-core servers In Symposium on Operating SystemsDesign and Implementation (OSDI) pages 237ndash250 2012

[22] R Karp C Schindelhauer S Shenker and B Vocking Ran-domized rumor spreading In Symposium on Foundations ofComputer Science (FOCS) pages 565ndash574 IEEE 2000

[23] B Kemme One-copy-serializability In Encyclopedia ofDatabase Systems pages 1947ndash1948 Springer 2009

[24] B Kemme and G Alonso A new approach to developingand implementing eager database replication protocols ACMTransactions on Database Systems 25(3)333ndash379 2000

[25] B Kemme R Jimenez-Peris and M Patino-MartınezDatabase Replication Synthesis Lectures on Data Manage-ment Morgan amp Claypool Publishers 2010

[26] A E Kosba A Miller E Shi Z Wen and C PapamanthouHawk The blockchain model of cryptography and privacy-preserving smart contracts In 37th IEEE Symposium onSecurity amp Privacy 2016

[27] S Liu P Viotti C Cachin V Quema and M Vukolic XFTpractical fault tolerance beyond crashes In Symposium onOperating Systems Design and Implementation (OSDI) pages485ndash500 2016

[28] S Nakamoto Bitcoin A peer-to-peer electronic cash systemhttpwwwbitcoinorgbitcoinpdf 2009

[29] D Ongaro and J Ousterhout In search of an understandableconsensus algorithm In USENIX Annual Technical Confer-ence (ATC) pages 305ndash320 2014

[30] F Pedone and A Schiper Handling message semanticswith Generic Broadcast protocols Distributed Computing15(2)97ndash107 2002

[31] F B Schneider Implementing fault-tolerant services usingthe state machine approach A tutorial ACM Comput Surv22(4)299ndash319 1990

[32] S Setty S Basu L Zhou M L Roberts and R VenkatesanEnabling secure and resource-efficient blockchain networkswith VOLT Technical Report MSR-TR-2017-38 MicrosoftResearch 2017

[33] A Singh T Das P Maniatis P Druschel and T Roscoe BFTprotocols under fire In Symposium on Networked SystemsDesign amp Implementation (NSDI) pages 189ndash204 2008

[34] J Sousa A Bessani and M Vukolic A Byzantine fault-tolerant ordering service for the Hyperledger Fabric block-chain platform Technical Report arXiv170906921 CoRR2017

[35] B Vandiver H Balakrishnan B Liskov and S MaddenTolerating Byzantine faults in transaction processing systemsusing commit barrier scheduling In ACM Symposium onOperating Systems Principles (SOSP) pages 59ndash72 2007

[36] M Vukolic The quest for scalable blockchain fabric Proof-of-work vs BFT replication In International Workshop onOpen Problems in Network Security (iNetSec) pages 112ndash125 2015

[37] J Yin J Martin A Venkataramani L Alvisi and M DahlinSeparating agreement from execution for Byzantine fault tol-erant services In ACM Symposium on Operating SystemsPrinciples (SOSP) pages 253ndash267 2003

15 201821

  • 1 Introduction
  • 2 Background
    • 21 Order-Execute Architecture for Blockchains
    • 22 Limitations of Order-Execute
    • 23 Further Limitations of Existing Architectures
    • 24 Experience with Order-Execute Blockchain
      • 3 Architecture
        • 31 Fabric Overview
        • 32 Execution Phase
        • 33 Ordering Phase
        • 34 Validation Phase
          • 4 Fabric Components
            • 41 Membership Service
            • 42 Ordering Service
            • 43 Peer Gossip
            • 44 Ledger
            • 45 Chaincode Execution
            • 46 Configuration and System Chaincodes
              • 5 Evaluation
                • 51 Fabric Coin (Fabcoin)
                • 52 Experiments
                  • 6 Related Work

bull Definitions of the MSPs for the participating nodesbull The network addresses of the OSNsbull Shared configuration for the consensus implementation

and the ordering service such as batch size and timeoutsbull Rules governing access to the ordering service operations

(broadcast and deliver)bull Rules governing how each part the channel configuration

may be modified

The configuration of a channel may be updated using achannel configuration update transaction This transactioncontains a representation of the changes to be made to theconfiguration as well as a set of signatures The orderingservice nodes evaluate whether the update is valid by usingthe current configuration to verify that the modifications areauthorized using the signatures The orderers then generate anew configuration block which embeds the new configura-tion and the configuration update transaction Peers receiv-ing this block validate whether the configuration update isauthorized based on the current configuration if valid theyupdate their current configuration

System chaincodes The application chaincodes are de-ployed with a reference to an endorsement system chaincode(ESCC) and to a validation system chaincode (VSCC) Thesetwo chaincodes are selected in a symmetric way such thatthe output of the ESCC (an endorsement) may be validatedas part of the input to the VSCC

The ESCC takes as input a proposal and the proposalsimulation results If the results are satisfactory then theESCC produces a response containing the results and theendorsement For the default ESCC this endorsement issimply a signature by the peerrsquos local signing identity

The VSCC takes as input a transaction and outputswhether that transaction is valid For the default VSCCthe endorsements are collected and evaluated against theendorsement policy specified for the chaincode

Further system chaincodes implement other support func-tions such as configuration and chaincode lifecycle

5 EvaluationEven though Fabric is not yet performance-tuned and opti-mized we report in this section on some preliminary per-formance numbers Fabric is a complex distributed systemits performance depends on many parameters including thechoice of a distributed application and transaction size theordering service and consensus implementation and their pa-rameters the network parameters and topology of nodes inthe network the hardware on which nodes run the num-ber of nodes and channels further configuration parametersand the network dynamics Therefore in-depth performanceevaluation of Fabric is postponed to future work

In the absence of a standard benchmark for blockchainswe use the most prominent blockchain application for eval-uating Fabric a simple authority-minted cryptocurrency that

uses the data model of Bitcoin which we call Fabric coin(abbreviated hereafter as Fabcoin) This allows us to put theperformance of Fabric in the context of other permissionedblockchains which are often derived from Bitcoin or Ethe-reum For example it is also the application used in bench-marks of other permissioned blockchains [19 32]

In the following we first describe Fabcoin (Sec 51)which also demonstrates how to customize the validationphase and endorsement policy In Section 52 we present thebenchmark and discuss our results

51 Fabric Coin (Fabcoin)UTXO cryptocurrencies The data model introduced byBitcoin [28] has become known as ldquounspent transaction out-putrdquo or UTXO and is also used by many other cryptocur-rencies and distributed applications UTXO represents eachstep in the evolution of a data object as a separate atomicstate on the ledger Such a state is created by a transactionand destroyed (or ldquoconsumedrdquo) by another unique transac-tion occurring later Every given transaction destroys a num-ber of input states and creates one or more output states Aldquocoinrdquo in Bitcoin is initially created by a coinbase transac-tion that rewards the ldquominerrdquo of the block This appears onthe ledger as a coin state designating the miner as the ownerAny coin can be spent in the sense that the coin is assignedto a new owner by a transaction that atomically destroys thecurrent coin state designating the previous owner and createsanother coin state representing the new owner

We capture the UTXO model in the key-value store ofFabric as follows Each UTXO state corresponds to a uniqueKVS entry that is created once (the coin state is ldquounspentrdquo)and destroyed once (the coin state is ldquospentrdquo) Equivalentlyevery state may be seen as a KVS entry with logical ver-sion 0 after creation when it is destroyed again it receivesversion 1 There should not be any concurrent updates tosuch entries (eg attempting to update a coin state in differ-ent ways amounts to double-spending the coin)

Value in the UTXO model is transferred through transac-tions that refer to several input states that all belong to theentity issuing the transaction An entity owns a state becausethe public key of the entity is contained in the state itselfEvery transaction creates one or more output states in theKVS representing the new owners deletes the input states inthe KVS and ensures that the sum of the values in the inputstates equals the sum of the output statesrsquo values There isalso a policy determining how value is created (eg coin-base transactions in Bitcoin or specific mint operations inother systems) or destroyed (ie as a fee consumed by theexecution)

Fabcoin implementation Each state in Fabcoin is a tupleof the form (key val) = (txidj (amount owner label)) de-noting the coin state created as the j-th output of a transac-tion with identifier txid and allocating amount units labeledwith label to the entity whose public key is owner Labels are

11 201821

strings used to identify a given type of a coin (eg lsquoUSDlsquolsquoEURlsquo lsquoFBClsquo) Transaction identifiers are short values thatuniquely identify every Fabric transaction The Fabcoin im-plementation consists of three parts (1) a client wallet (2)the Fabcoin chaincode and (3) a custom VSCC for Fabcoinimplementing its endorsement policy

Client wallet By default each Fabric client maintains aFabcoin wallet that locally stores a set of cryptographic keysallowing the client to spend coins For creating a SPENDtransaction that transfers one or more coins the client walletcreates a Fabcoin request request = (inputs outputs sigs)containing (1) a list of input coin states inputs = [in ]that specify coin states (in ) the client wishes to spend aswell as (2) a list of output coin states outputs = [(amount ownerlabel) ] The client wallet signs with the private keys thatcorrespond to the input coin states the concatenation of theFabcoin request and a nonce which is a part of every Fabrictransaction and adds the signatures in a set sigs A SPENDtransaction is valid when the sum of the amounts in the in-put coin states is at least the sum of the amounts in the out-puts and when the output amounts are positive For a MINTtransaction that creates new coins inputs contains only anidentifier (ie a reference to a public key) of a special en-tity called Central Bank (CB) whereas outputs contains anarbitrary number of coin states To be considered valid thesignatures of a MINT transaction in sigs must be a crypto-graphic signature under the public key of CB over the con-catenation of the Fabcoin request and of the aforementionednonce Fabcoin may be configured to use multiple CBs orspecify a threshold number of signatures from a set of CBsFinally the client wallet includes the Fabcoin request into atransaction and sends this to a peer of its choice

Fabcoin chaincode A peer runs the chaincode of Fab-coin which simulates the transaction and creates readsets andwritesets In a nutshell in the case of a SPEND transactionfor every input coin state in isin inputs the chaincode first per-forms GetState(in) this puts in into the readset along with itscurrent version in Fabricrsquos versioned KVS (Sec 44) Thenthe chaincode executes DelState(in) for every input state inwhich also adds in to the writeset and effectively marks thecoin state as ldquospentrdquo Finally for j = 1 |outputs| thechaincode executes PutState(txidj out) with the j-th out-put out = (amount owner label) in outputs In addition apeer optionally runs the transaction validation code as de-scribed next in the VSCC step for Fabcoin this is not neces-sary since the custom VSCC actually validates transactionsbut it allows the (correct) peers to filter out potentially mal-formed transactions In our implementation the chaincoderuns the Fabcoin VSCC without cryptographically verifyingthe signatures

Custom VSCC Finally every peer validates Fabcoin trans-actions using custom VSCC This verifies first the cryp-tographic signature(s) in sigs under the respective public

key(s) and performs semantic validation as follows For aMINT transaction it checks that the output states are createdunder the matching transaction identifier (txid) and that alloutput amounts are positive For a SPEND transaction theVSCC additionally verifies (1) that for all input coin statesan entry in the readset has been created and that it was alsoadded to the writeset and marked as deleted (2) that the sumof the amounts for all input coin states equals the sum ofamounts of all output coin states and (3) that input and out-put coin labels match Here the VSCC obtains the amountsfor the input coins by retrieving their current values from theledger

Notice that the Fabcoin VSCC does not check transac-tions for double spending as this occurs through Fabricrsquosstandard validation that runs after the custom VSCC In par-ticular if two transactions attempt to assign the same un-spent coin state to a new owner both would pass the VSCClogic but would be caught subsequently in the read-writeconflict check performed by the PTM According to Sec-tions 34 and 44 the PTM verifies that the current versionnumber stored in the ledger matches the one in the readsethence after the first transaction has changed the version ofthe coin state the transaction ordered second will be recog-nized as invalid

52 ExperimentsSetup Unless explicitly mentioned differently in our ex-periments (1) nodes run on Fabric version v11-preview2

instrumented for performance evaluation through local log-ging (2) nodes are hosted in a single IBM Cloud (SoftLayer)datacenter as dedicated VMs interconnected with 1Gbps net-working (3) all nodes are 20 GHz 16-vCPU VMs runningUbuntu with 8GB of RAM and SSDs as local disks (4) asingle-channel ordering service runs a typical Kafka orderersetup with 3 ZooKeeper nodes 4 Kafka brokers and 3 Fab-ric orderers all on distinct VMs (5) there are 5 peers in to-tal all of which are Fabcoin endorsers and (6) signaturesuse the default 256-bit ECDSA scheme In order to measureand stage latencies in the transaction flow spanning multiplenodes the node clocks are synchronized with an NTP ser-vice throughout the experiments All communication amongFabric nodes is configured to use TLS

Methodology In every experiment in the first phase weinvoke transactions that contain only Fabcoin MINT opera-tions to produce the coins and then run a second phase ofthe experiment in which we invoke Fabcoin SPEND opera-tion on previously minted coins (effectively running single-input single-output SPEND transactions) When reportingthroughput measurements we use an increasing number ofFabric CLI clients (modified to issue concurrent requests)running on a single VM until the end-to-end throughputis saturated and state the throughput just before saturationThroughput numbers are reported as an average throughput

2Patched with commit ID 9e770062 in the Fabric master branch

12 201821

measured throughout the duration of an experiment disre-garding the ldquotailrdquo of the experiment where throughput dropsdue to some client threads finishing submitting their share oftransactions In every experiment client threads submit atleast 500k MINT and SPEND transactions

Choosing the block size A critical Fabric configurationparameter that impacts both throughput and latency is blocksize To fix the block size for subsequent experiments andto evaluate the impact of block size on performance we ranexperiments varying block size from 05MB to 4MB Resultsare depicted in Fig 6 showing peak throughput measured atthe peers along with the corresponding average end-to-end(e2e) latency

Figure 6 Impact of block size on throughput and latency

We can observe that throughput does not significantlyimprove beyond a block size of 2MB but latency gets worse(as expected) Therefore we adopt 2MB as the block sizefor the following experiments with the goal of maximizingthe measured throughput assuming the end-to-end latencyof roughly 500ms is acceptable

Size of transactions During this experiment we also ob-served the size MINT and SPEND transactions In particularthe 2MB-blocks contained 473 MINT or 670 SPEND transac-tions ie the average transaction size is 306kB for SPENDand 433kB for MINT In general transactions in Fabric arelarge because they carry certificate information BesidesMINT transactions of Fabcoin are larger than SPEND trans-actions because they carry CB certificates This is an avenuefor future improvement of both Fabric and Fabcoin

Impact of peer CPU Fabric peers run many CPU-intensivecryptographic operations To estimate the impact of CPU onthroughput we performed a set of experiments in which 4peers run on 4 8 16 and 32 vCPU VMs while also doingcoarse-grained latency staging of block validation to betteridentify bottlenecks Our experiment focused on the vali-dation phase as ordering with Kafka ordering service hasnever been a bottleneck in our experiments The validation

phase and in particular the VSCC validation of Fabcoinis computationally intensive due to many digital signatureverifications performed in this phase

Figure 7 Impact of peer CPU on end-to-end throughputand block validation latency

The results with 2MB blocks are shown in Fig 7 SPENDlatency is not shown in Fig 7 as it scales similarly to MINTWe can observe that Fabcoin VSCC validation scales quasi-linearly with CPU as Fabric VSCC validation is embarrass-ingly parallel However the read-write-check and ledger-access stages are sequential and become dominant with alarger number of cores (vCPUs) This suggests that futureversions of Fabric could profit from pipelining validationstages (which are now sequential) optimizing stable-storageaccess and parallelizing read-write dependency checks

Finally in this experiment we measured over 3560 trans-actions per second (tps) average SPEND throughput at the32-vCPU peer The MINT throughput is in general slightlylower than that of SPEND but the difference remains within10

Latency profiling by stages We further performed coarse-grained profiling of latency during our previous experimentat the peak reported throughput Results are depicted inTable 1 The ordering phase comprises broadcast-deliverlatency and internal latency within a peer before valida-tion starts The table reports average latencies for MINTand SPEND standard deviation and tail latencies (99 and999)

We can observe that ordering dominates the overall la-tency We observe that average latencies are below 550mswith sub-second tail latencies In particular the highestend-to-end latencies in our experiment come from the firstblocks during the load build-up Latency under lower loadcan be regulated and reduced by leveraging the time-to-cutparameter of the orderer (see Sec 33) which we basicallydo not use in our experiments as we set it to a large value

SSD vs RAM disk To evaluate the overhead of using lo-cal SSDs for stable storage we repeated the previous exper-iment with RAM disks (tmpfs) mounted as stable storage at

13 201821

all peer VMs The benefits are limited as tmpfs only helpswith the ledger phase of the validation at the peer We mea-sured sustained peak throughput at 3870 SPEND tps at 32-vCPU peer roughly a 9 improvement over SSD

avg stdev 99 999(1) endorsement 56 75 24 42 15 21 19 26(2) ordering 248 365 600 920 484 624 523 636(3) VSCC val 310 353 102 90 727 570 113 1084(4) RW check 348 615 39 93 470 885 590 933(5) ledger 506 722 62 88 701 975 725 105(6) validation (3+4+5) 116 169 128 178 156 216 199 230(7) end-to-end (1+2+6) 371 542 63 94 612 805 646 813

Table 1 Latency statistics in milliseconds (ms) for MINTand SPEND broken down into five stages at a 32-vCPUpeer with 2MB blocks Validation (6) comprises stages 34 and 5 the end-to-end latency contains stages 1ndash5

6 Related WorkThe architecture of Fabric resembles that of a middleware-replicated database as pioneered by Kemme and Alonso [24]However all existing work on this addressed only crash fail-ures not the setting of distributed trust that corresponds to aBFT system For instance a replicated database with asym-metric update processing [25 Sec 63] relies on one node toexecute each transaction which would not work on a block-chain The execute-order-validate architecture of Fabric canbe interpreted as a generalization of this work to the Byzan-tine model with practical applications to distributed ledgers

Byzantium [17] and HRDB [35] are two further prede-cessors of Fabric from the viewpoint of BFT database repli-cation Byzantium allows transactions to run in parallel anduses active replication but totally orders BEGIN and COM-MITROLLBACK using a BFT middleware In its opti-mistic mode every operation is coordinated by a single mas-ter replica if the master is suspected to be Byzantine allreplicas execute the transaction operations for the master andit triggers a costly protocol to change the master HRDB re-lies in an even stronger way on a correct master In contrastto Fabric both systems use active replication cannot handlea flexible trust model and rely on deterministic operationsby the replicas However their database API is richer thanthe KVS model of Fabric

In Eve [21] a related architecture for SMR has been ex-plored also in the BFT model Its peers execute transactionsconcurrently and then verify that they all reach the same out-put state using a consensus protocol If the states divergethey roll back and execute operations sequentially Eve con-tains the element of independent execution which also existsin Fabric but offers none of its other features

A large number of distributed ledger platforms in thepermissioned model have come out recently which makesit impossible to compare to all (some prominent ones areTendermint [5] Quorum [19] Chain Core [12] Multi-chain (httpswwwmultichaincom) HyperledgerSawtooth (httpssawtoothhyperledgerorg) the

Volt proposal [32] and more see references in recent over-views [9 16]) All platforms follow the order-execute archi-tecture as discussed in Section 2 As a representative exam-ple take the Quorum platform [19] an enterprise-focusedversion of Ethereum With its consensus based on Raft [29]it disseminates a transaction to all peers using gossip and theRaft leader (called minter) assembles valid transactions to ablock and distributes this using Raft All peers execute thetransaction in the order decided by the leader Therefore itsuffers from the limitations mentioned in Sections 1ndash2

References[1] P-L Aublin R Guerraoui N Knezevic V Quema and

M Vukolic The next 700 BFT protocols ACM Trans Com-put Syst 32(4)121ndash1245 Jan 2015

[2] E Ben-Sasson A Chiesa C Garman M Green I MiersE Tromer and M Virza Zerocash Decentralized anonymouspayments from bitcoin In IEEE Symposium on Security ampPrivacy pages 459ndash474 2014

[3] A N Bessani J Sousa and E A P Alchieri State ma-chine replication for the masses with BFT-SMART In In-ternational Conference on Dependable Systems and Networks(DSN) pages 355ndash362 2014

[4] G Bracha and S Toueg Asynchronous consensus and broad-cast protocols J ACM 32(4)824ndash840 1985

[5] E Buchman Tendermint Byzantine fault tolerance in the ageof blockchains MSc Thesis University of Guelph CanadaJune 2016

[6] N Budhiraja K Marzullo F B Schneider and S Toueg Theprimary-backup approach In S Mullender editor DistributedSystems (2nd Ed) pages 199ndash216 ACM PressAddison-Wesley 1993

[7] C Cachin R Guerraoui and L E T Rodrigues Introductionto Reliable and Secure Distributed Programming (2 ed)Springer 2011

[8] C Cachin S Schubert and M Vukolic Non-determinismin byzantine fault-tolerant replication In 20th InternationalConference on Principles of Distributed Systems (OPODIS)2016

[9] C Cachin and M Vukolic Blockchain consensus protocolsin the wild In A W Richa editor 31st Intl Symposium onDistributed Computing (DISC 2017) pages 11ndash116 2017

[10] J Camenisch and E V Herreweghen Design and imple-mentation of the idemix anonymous credential system InACM Conference on Computer and Communications Security(CCS) pages 21ndash30 2002

[11] M Castro and B Liskov Practical Byzantine fault toler-ance and proactive recovery ACM Trans Comput Syst20(4)398ndash461 Nov 2002

[12] Chain Chain protocol whitepaper httpschaincom

docs12protocolpaperswhitepaper 2017

[13] B Charron-Bost F Pedone and A Schiper editors Replica-tion Theory and Practice volume 5959 of Lecture Notes inComputer Science Springer 2010

14 201821

[14] K Croman C Decker I Eyal A E Gencer A JuelsA Kosba A Miller P Saxena E Shi E G Sirer et alOn scaling decentralized blockchains In International Con-ference on Financial Cryptography and Data Security (FC)pages 106ndash125 Springer 2016

[15] A Demers D Greene C Hauser W Irish J LarsonS Shenker H Sturgis D Swinehart and D Terry Epidemicalgorithms for replicated database maintenance In ACMSymposium on Principles of Distributed Computing (PODC)pages 1ndash12 ACM 1987

[16] T T A Dinh R Liu M Zhang G Chen B C Ooi andJ Wang Untangling blockchain A data processing viewof blockchain systems e-print arXiv170805665 [csDB]2017

[17] R Garcia R Rodrigues and N M Preguica Efficient mid-dleware for Byzantine fault tolerant database replication InEuropean Conference on Computer Systems (EuroSys) pages107ndash122 2011

[18] R Guerraoui R R Levy B Pochon and V Quema Through-put optimal total order broadcast for cluster environmentsACM Trans Comput Syst 28(2)51ndash532 2010

[19] J P Morgan Quorum whitepaper httpsgithubcom

jpmorganchasequorum-docs 2016

[20] F P Junqueira B C Reed and M Serafini Zab High-performance broadcast for primary-backup systems In In-ternational Conference on Dependable Systems amp Networks(DSN) pages 245ndash256 2011

[21] M Kapritsos Y Wang V Quema A Clement L Alvisiand M Dahlin All about Eve Execute-verify replicationfor multi-core servers In Symposium on Operating SystemsDesign and Implementation (OSDI) pages 237ndash250 2012

[22] R Karp C Schindelhauer S Shenker and B Vocking Ran-domized rumor spreading In Symposium on Foundations ofComputer Science (FOCS) pages 565ndash574 IEEE 2000

[23] B Kemme One-copy-serializability In Encyclopedia ofDatabase Systems pages 1947ndash1948 Springer 2009

[24] B Kemme and G Alonso A new approach to developingand implementing eager database replication protocols ACMTransactions on Database Systems 25(3)333ndash379 2000

[25] B Kemme R Jimenez-Peris and M Patino-MartınezDatabase Replication Synthesis Lectures on Data Manage-ment Morgan amp Claypool Publishers 2010

[26] A E Kosba A Miller E Shi Z Wen and C PapamanthouHawk The blockchain model of cryptography and privacy-preserving smart contracts In 37th IEEE Symposium onSecurity amp Privacy 2016

[27] S Liu P Viotti C Cachin V Quema and M Vukolic XFTpractical fault tolerance beyond crashes In Symposium onOperating Systems Design and Implementation (OSDI) pages485ndash500 2016

[28] S Nakamoto Bitcoin A peer-to-peer electronic cash systemhttpwwwbitcoinorgbitcoinpdf 2009

[29] D Ongaro and J Ousterhout In search of an understandableconsensus algorithm In USENIX Annual Technical Confer-ence (ATC) pages 305ndash320 2014

[30] F Pedone and A Schiper Handling message semanticswith Generic Broadcast protocols Distributed Computing15(2)97ndash107 2002

[31] F B Schneider Implementing fault-tolerant services usingthe state machine approach A tutorial ACM Comput Surv22(4)299ndash319 1990

[32] S Setty S Basu L Zhou M L Roberts and R VenkatesanEnabling secure and resource-efficient blockchain networkswith VOLT Technical Report MSR-TR-2017-38 MicrosoftResearch 2017

[33] A Singh T Das P Maniatis P Druschel and T Roscoe BFTprotocols under fire In Symposium on Networked SystemsDesign amp Implementation (NSDI) pages 189ndash204 2008

[34] J Sousa A Bessani and M Vukolic A Byzantine fault-tolerant ordering service for the Hyperledger Fabric block-chain platform Technical Report arXiv170906921 CoRR2017

[35] B Vandiver H Balakrishnan B Liskov and S MaddenTolerating Byzantine faults in transaction processing systemsusing commit barrier scheduling In ACM Symposium onOperating Systems Principles (SOSP) pages 59ndash72 2007

[36] M Vukolic The quest for scalable blockchain fabric Proof-of-work vs BFT replication In International Workshop onOpen Problems in Network Security (iNetSec) pages 112ndash125 2015

[37] J Yin J Martin A Venkataramani L Alvisi and M DahlinSeparating agreement from execution for Byzantine fault tol-erant services In ACM Symposium on Operating SystemsPrinciples (SOSP) pages 253ndash267 2003

15 201821

  • 1 Introduction
  • 2 Background
    • 21 Order-Execute Architecture for Blockchains
    • 22 Limitations of Order-Execute
    • 23 Further Limitations of Existing Architectures
    • 24 Experience with Order-Execute Blockchain
      • 3 Architecture
        • 31 Fabric Overview
        • 32 Execution Phase
        • 33 Ordering Phase
        • 34 Validation Phase
          • 4 Fabric Components
            • 41 Membership Service
            • 42 Ordering Service
            • 43 Peer Gossip
            • 44 Ledger
            • 45 Chaincode Execution
            • 46 Configuration and System Chaincodes
              • 5 Evaluation
                • 51 Fabric Coin (Fabcoin)
                • 52 Experiments
                  • 6 Related Work

strings used to identify a given type of a coin (eg lsquoUSDlsquolsquoEURlsquo lsquoFBClsquo) Transaction identifiers are short values thatuniquely identify every Fabric transaction The Fabcoin im-plementation consists of three parts (1) a client wallet (2)the Fabcoin chaincode and (3) a custom VSCC for Fabcoinimplementing its endorsement policy

Client wallet By default each Fabric client maintains aFabcoin wallet that locally stores a set of cryptographic keysallowing the client to spend coins For creating a SPENDtransaction that transfers one or more coins the client walletcreates a Fabcoin request request = (inputs outputs sigs)containing (1) a list of input coin states inputs = [in ]that specify coin states (in ) the client wishes to spend aswell as (2) a list of output coin states outputs = [(amount ownerlabel) ] The client wallet signs with the private keys thatcorrespond to the input coin states the concatenation of theFabcoin request and a nonce which is a part of every Fabrictransaction and adds the signatures in a set sigs A SPENDtransaction is valid when the sum of the amounts in the in-put coin states is at least the sum of the amounts in the out-puts and when the output amounts are positive For a MINTtransaction that creates new coins inputs contains only anidentifier (ie a reference to a public key) of a special en-tity called Central Bank (CB) whereas outputs contains anarbitrary number of coin states To be considered valid thesignatures of a MINT transaction in sigs must be a crypto-graphic signature under the public key of CB over the con-catenation of the Fabcoin request and of the aforementionednonce Fabcoin may be configured to use multiple CBs orspecify a threshold number of signatures from a set of CBsFinally the client wallet includes the Fabcoin request into atransaction and sends this to a peer of its choice

Fabcoin chaincode A peer runs the chaincode of Fab-coin which simulates the transaction and creates readsets andwritesets In a nutshell in the case of a SPEND transactionfor every input coin state in isin inputs the chaincode first per-forms GetState(in) this puts in into the readset along with itscurrent version in Fabricrsquos versioned KVS (Sec 44) Thenthe chaincode executes DelState(in) for every input state inwhich also adds in to the writeset and effectively marks thecoin state as ldquospentrdquo Finally for j = 1 |outputs| thechaincode executes PutState(txidj out) with the j-th out-put out = (amount owner label) in outputs In addition apeer optionally runs the transaction validation code as de-scribed next in the VSCC step for Fabcoin this is not neces-sary since the custom VSCC actually validates transactionsbut it allows the (correct) peers to filter out potentially mal-formed transactions In our implementation the chaincoderuns the Fabcoin VSCC without cryptographically verifyingthe signatures

Custom VSCC Finally every peer validates Fabcoin trans-actions using custom VSCC This verifies first the cryp-tographic signature(s) in sigs under the respective public

key(s) and performs semantic validation as follows For aMINT transaction it checks that the output states are createdunder the matching transaction identifier (txid) and that alloutput amounts are positive For a SPEND transaction theVSCC additionally verifies (1) that for all input coin statesan entry in the readset has been created and that it was alsoadded to the writeset and marked as deleted (2) that the sumof the amounts for all input coin states equals the sum ofamounts of all output coin states and (3) that input and out-put coin labels match Here the VSCC obtains the amountsfor the input coins by retrieving their current values from theledger

Notice that the Fabcoin VSCC does not check transac-tions for double spending as this occurs through Fabricrsquosstandard validation that runs after the custom VSCC In par-ticular if two transactions attempt to assign the same un-spent coin state to a new owner both would pass the VSCClogic but would be caught subsequently in the read-writeconflict check performed by the PTM According to Sec-tions 34 and 44 the PTM verifies that the current versionnumber stored in the ledger matches the one in the readsethence after the first transaction has changed the version ofthe coin state the transaction ordered second will be recog-nized as invalid

52 ExperimentsSetup Unless explicitly mentioned differently in our ex-periments (1) nodes run on Fabric version v11-preview2

instrumented for performance evaluation through local log-ging (2) nodes are hosted in a single IBM Cloud (SoftLayer)datacenter as dedicated VMs interconnected with 1Gbps net-working (3) all nodes are 20 GHz 16-vCPU VMs runningUbuntu with 8GB of RAM and SSDs as local disks (4) asingle-channel ordering service runs a typical Kafka orderersetup with 3 ZooKeeper nodes 4 Kafka brokers and 3 Fab-ric orderers all on distinct VMs (5) there are 5 peers in to-tal all of which are Fabcoin endorsers and (6) signaturesuse the default 256-bit ECDSA scheme In order to measureand stage latencies in the transaction flow spanning multiplenodes the node clocks are synchronized with an NTP ser-vice throughout the experiments All communication amongFabric nodes is configured to use TLS

Methodology In every experiment in the first phase weinvoke transactions that contain only Fabcoin MINT opera-tions to produce the coins and then run a second phase ofthe experiment in which we invoke Fabcoin SPEND opera-tion on previously minted coins (effectively running single-input single-output SPEND transactions) When reportingthroughput measurements we use an increasing number ofFabric CLI clients (modified to issue concurrent requests)running on a single VM until the end-to-end throughputis saturated and state the throughput just before saturationThroughput numbers are reported as an average throughput

2Patched with commit ID 9e770062 in the Fabric master branch

12 201821

measured throughout the duration of an experiment disre-garding the ldquotailrdquo of the experiment where throughput dropsdue to some client threads finishing submitting their share oftransactions In every experiment client threads submit atleast 500k MINT and SPEND transactions

Choosing the block size A critical Fabric configurationparameter that impacts both throughput and latency is blocksize To fix the block size for subsequent experiments andto evaluate the impact of block size on performance we ranexperiments varying block size from 05MB to 4MB Resultsare depicted in Fig 6 showing peak throughput measured atthe peers along with the corresponding average end-to-end(e2e) latency

Figure 6 Impact of block size on throughput and latency

We can observe that throughput does not significantlyimprove beyond a block size of 2MB but latency gets worse(as expected) Therefore we adopt 2MB as the block sizefor the following experiments with the goal of maximizingthe measured throughput assuming the end-to-end latencyof roughly 500ms is acceptable

Size of transactions During this experiment we also ob-served the size MINT and SPEND transactions In particularthe 2MB-blocks contained 473 MINT or 670 SPEND transac-tions ie the average transaction size is 306kB for SPENDand 433kB for MINT In general transactions in Fabric arelarge because they carry certificate information BesidesMINT transactions of Fabcoin are larger than SPEND trans-actions because they carry CB certificates This is an avenuefor future improvement of both Fabric and Fabcoin

Impact of peer CPU Fabric peers run many CPU-intensivecryptographic operations To estimate the impact of CPU onthroughput we performed a set of experiments in which 4peers run on 4 8 16 and 32 vCPU VMs while also doingcoarse-grained latency staging of block validation to betteridentify bottlenecks Our experiment focused on the vali-dation phase as ordering with Kafka ordering service hasnever been a bottleneck in our experiments The validation

phase and in particular the VSCC validation of Fabcoinis computationally intensive due to many digital signatureverifications performed in this phase

Figure 7 Impact of peer CPU on end-to-end throughputand block validation latency

The results with 2MB blocks are shown in Fig 7 SPENDlatency is not shown in Fig 7 as it scales similarly to MINTWe can observe that Fabcoin VSCC validation scales quasi-linearly with CPU as Fabric VSCC validation is embarrass-ingly parallel However the read-write-check and ledger-access stages are sequential and become dominant with alarger number of cores (vCPUs) This suggests that futureversions of Fabric could profit from pipelining validationstages (which are now sequential) optimizing stable-storageaccess and parallelizing read-write dependency checks

Finally in this experiment we measured over 3560 trans-actions per second (tps) average SPEND throughput at the32-vCPU peer The MINT throughput is in general slightlylower than that of SPEND but the difference remains within10

Latency profiling by stages We further performed coarse-grained profiling of latency during our previous experimentat the peak reported throughput Results are depicted inTable 1 The ordering phase comprises broadcast-deliverlatency and internal latency within a peer before valida-tion starts The table reports average latencies for MINTand SPEND standard deviation and tail latencies (99 and999)

We can observe that ordering dominates the overall la-tency We observe that average latencies are below 550mswith sub-second tail latencies In particular the highestend-to-end latencies in our experiment come from the firstblocks during the load build-up Latency under lower loadcan be regulated and reduced by leveraging the time-to-cutparameter of the orderer (see Sec 33) which we basicallydo not use in our experiments as we set it to a large value

SSD vs RAM disk To evaluate the overhead of using lo-cal SSDs for stable storage we repeated the previous exper-iment with RAM disks (tmpfs) mounted as stable storage at

13 201821

all peer VMs The benefits are limited as tmpfs only helpswith the ledger phase of the validation at the peer We mea-sured sustained peak throughput at 3870 SPEND tps at 32-vCPU peer roughly a 9 improvement over SSD

avg stdev 99 999(1) endorsement 56 75 24 42 15 21 19 26(2) ordering 248 365 600 920 484 624 523 636(3) VSCC val 310 353 102 90 727 570 113 1084(4) RW check 348 615 39 93 470 885 590 933(5) ledger 506 722 62 88 701 975 725 105(6) validation (3+4+5) 116 169 128 178 156 216 199 230(7) end-to-end (1+2+6) 371 542 63 94 612 805 646 813

Table 1 Latency statistics in milliseconds (ms) for MINTand SPEND broken down into five stages at a 32-vCPUpeer with 2MB blocks Validation (6) comprises stages 34 and 5 the end-to-end latency contains stages 1ndash5

6 Related WorkThe architecture of Fabric resembles that of a middleware-replicated database as pioneered by Kemme and Alonso [24]However all existing work on this addressed only crash fail-ures not the setting of distributed trust that corresponds to aBFT system For instance a replicated database with asym-metric update processing [25 Sec 63] relies on one node toexecute each transaction which would not work on a block-chain The execute-order-validate architecture of Fabric canbe interpreted as a generalization of this work to the Byzan-tine model with practical applications to distributed ledgers

Byzantium [17] and HRDB [35] are two further prede-cessors of Fabric from the viewpoint of BFT database repli-cation Byzantium allows transactions to run in parallel anduses active replication but totally orders BEGIN and COM-MITROLLBACK using a BFT middleware In its opti-mistic mode every operation is coordinated by a single mas-ter replica if the master is suspected to be Byzantine allreplicas execute the transaction operations for the master andit triggers a costly protocol to change the master HRDB re-lies in an even stronger way on a correct master In contrastto Fabric both systems use active replication cannot handlea flexible trust model and rely on deterministic operationsby the replicas However their database API is richer thanthe KVS model of Fabric

In Eve [21] a related architecture for SMR has been ex-plored also in the BFT model Its peers execute transactionsconcurrently and then verify that they all reach the same out-put state using a consensus protocol If the states divergethey roll back and execute operations sequentially Eve con-tains the element of independent execution which also existsin Fabric but offers none of its other features

A large number of distributed ledger platforms in thepermissioned model have come out recently which makesit impossible to compare to all (some prominent ones areTendermint [5] Quorum [19] Chain Core [12] Multi-chain (httpswwwmultichaincom) HyperledgerSawtooth (httpssawtoothhyperledgerorg) the

Volt proposal [32] and more see references in recent over-views [9 16]) All platforms follow the order-execute archi-tecture as discussed in Section 2 As a representative exam-ple take the Quorum platform [19] an enterprise-focusedversion of Ethereum With its consensus based on Raft [29]it disseminates a transaction to all peers using gossip and theRaft leader (called minter) assembles valid transactions to ablock and distributes this using Raft All peers execute thetransaction in the order decided by the leader Therefore itsuffers from the limitations mentioned in Sections 1ndash2

References[1] P-L Aublin R Guerraoui N Knezevic V Quema and

M Vukolic The next 700 BFT protocols ACM Trans Com-put Syst 32(4)121ndash1245 Jan 2015

[2] E Ben-Sasson A Chiesa C Garman M Green I MiersE Tromer and M Virza Zerocash Decentralized anonymouspayments from bitcoin In IEEE Symposium on Security ampPrivacy pages 459ndash474 2014

[3] A N Bessani J Sousa and E A P Alchieri State ma-chine replication for the masses with BFT-SMART In In-ternational Conference on Dependable Systems and Networks(DSN) pages 355ndash362 2014

[4] G Bracha and S Toueg Asynchronous consensus and broad-cast protocols J ACM 32(4)824ndash840 1985

[5] E Buchman Tendermint Byzantine fault tolerance in the ageof blockchains MSc Thesis University of Guelph CanadaJune 2016

[6] N Budhiraja K Marzullo F B Schneider and S Toueg Theprimary-backup approach In S Mullender editor DistributedSystems (2nd Ed) pages 199ndash216 ACM PressAddison-Wesley 1993

[7] C Cachin R Guerraoui and L E T Rodrigues Introductionto Reliable and Secure Distributed Programming (2 ed)Springer 2011

[8] C Cachin S Schubert and M Vukolic Non-determinismin byzantine fault-tolerant replication In 20th InternationalConference on Principles of Distributed Systems (OPODIS)2016

[9] C Cachin and M Vukolic Blockchain consensus protocolsin the wild In A W Richa editor 31st Intl Symposium onDistributed Computing (DISC 2017) pages 11ndash116 2017

[10] J Camenisch and E V Herreweghen Design and imple-mentation of the idemix anonymous credential system InACM Conference on Computer and Communications Security(CCS) pages 21ndash30 2002

[11] M Castro and B Liskov Practical Byzantine fault toler-ance and proactive recovery ACM Trans Comput Syst20(4)398ndash461 Nov 2002

[12] Chain Chain protocol whitepaper httpschaincom

docs12protocolpaperswhitepaper 2017

[13] B Charron-Bost F Pedone and A Schiper editors Replica-tion Theory and Practice volume 5959 of Lecture Notes inComputer Science Springer 2010

14 201821

[14] K Croman C Decker I Eyal A E Gencer A JuelsA Kosba A Miller P Saxena E Shi E G Sirer et alOn scaling decentralized blockchains In International Con-ference on Financial Cryptography and Data Security (FC)pages 106ndash125 Springer 2016

[15] A Demers D Greene C Hauser W Irish J LarsonS Shenker H Sturgis D Swinehart and D Terry Epidemicalgorithms for replicated database maintenance In ACMSymposium on Principles of Distributed Computing (PODC)pages 1ndash12 ACM 1987

[16] T T A Dinh R Liu M Zhang G Chen B C Ooi andJ Wang Untangling blockchain A data processing viewof blockchain systems e-print arXiv170805665 [csDB]2017

[17] R Garcia R Rodrigues and N M Preguica Efficient mid-dleware for Byzantine fault tolerant database replication InEuropean Conference on Computer Systems (EuroSys) pages107ndash122 2011

[18] R Guerraoui R R Levy B Pochon and V Quema Through-put optimal total order broadcast for cluster environmentsACM Trans Comput Syst 28(2)51ndash532 2010

[19] J P Morgan Quorum whitepaper httpsgithubcom

jpmorganchasequorum-docs 2016

[20] F P Junqueira B C Reed and M Serafini Zab High-performance broadcast for primary-backup systems In In-ternational Conference on Dependable Systems amp Networks(DSN) pages 245ndash256 2011

[21] M Kapritsos Y Wang V Quema A Clement L Alvisiand M Dahlin All about Eve Execute-verify replicationfor multi-core servers In Symposium on Operating SystemsDesign and Implementation (OSDI) pages 237ndash250 2012

[22] R Karp C Schindelhauer S Shenker and B Vocking Ran-domized rumor spreading In Symposium on Foundations ofComputer Science (FOCS) pages 565ndash574 IEEE 2000

[23] B Kemme One-copy-serializability In Encyclopedia ofDatabase Systems pages 1947ndash1948 Springer 2009

[24] B Kemme and G Alonso A new approach to developingand implementing eager database replication protocols ACMTransactions on Database Systems 25(3)333ndash379 2000

[25] B Kemme R Jimenez-Peris and M Patino-MartınezDatabase Replication Synthesis Lectures on Data Manage-ment Morgan amp Claypool Publishers 2010

[26] A E Kosba A Miller E Shi Z Wen and C PapamanthouHawk The blockchain model of cryptography and privacy-preserving smart contracts In 37th IEEE Symposium onSecurity amp Privacy 2016

[27] S Liu P Viotti C Cachin V Quema and M Vukolic XFTpractical fault tolerance beyond crashes In Symposium onOperating Systems Design and Implementation (OSDI) pages485ndash500 2016

[28] S Nakamoto Bitcoin A peer-to-peer electronic cash systemhttpwwwbitcoinorgbitcoinpdf 2009

[29] D Ongaro and J Ousterhout In search of an understandableconsensus algorithm In USENIX Annual Technical Confer-ence (ATC) pages 305ndash320 2014

[30] F Pedone and A Schiper Handling message semanticswith Generic Broadcast protocols Distributed Computing15(2)97ndash107 2002

[31] F B Schneider Implementing fault-tolerant services usingthe state machine approach A tutorial ACM Comput Surv22(4)299ndash319 1990

[32] S Setty S Basu L Zhou M L Roberts and R VenkatesanEnabling secure and resource-efficient blockchain networkswith VOLT Technical Report MSR-TR-2017-38 MicrosoftResearch 2017

[33] A Singh T Das P Maniatis P Druschel and T Roscoe BFTprotocols under fire In Symposium on Networked SystemsDesign amp Implementation (NSDI) pages 189ndash204 2008

[34] J Sousa A Bessani and M Vukolic A Byzantine fault-tolerant ordering service for the Hyperledger Fabric block-chain platform Technical Report arXiv170906921 CoRR2017

[35] B Vandiver H Balakrishnan B Liskov and S MaddenTolerating Byzantine faults in transaction processing systemsusing commit barrier scheduling In ACM Symposium onOperating Systems Principles (SOSP) pages 59ndash72 2007

[36] M Vukolic The quest for scalable blockchain fabric Proof-of-work vs BFT replication In International Workshop onOpen Problems in Network Security (iNetSec) pages 112ndash125 2015

[37] J Yin J Martin A Venkataramani L Alvisi and M DahlinSeparating agreement from execution for Byzantine fault tol-erant services In ACM Symposium on Operating SystemsPrinciples (SOSP) pages 253ndash267 2003

15 201821

  • 1 Introduction
  • 2 Background
    • 21 Order-Execute Architecture for Blockchains
    • 22 Limitations of Order-Execute
    • 23 Further Limitations of Existing Architectures
    • 24 Experience with Order-Execute Blockchain
      • 3 Architecture
        • 31 Fabric Overview
        • 32 Execution Phase
        • 33 Ordering Phase
        • 34 Validation Phase
          • 4 Fabric Components
            • 41 Membership Service
            • 42 Ordering Service
            • 43 Peer Gossip
            • 44 Ledger
            • 45 Chaincode Execution
            • 46 Configuration and System Chaincodes
              • 5 Evaluation
                • 51 Fabric Coin (Fabcoin)
                • 52 Experiments
                  • 6 Related Work

measured throughout the duration of an experiment disre-garding the ldquotailrdquo of the experiment where throughput dropsdue to some client threads finishing submitting their share oftransactions In every experiment client threads submit atleast 500k MINT and SPEND transactions

Choosing the block size A critical Fabric configurationparameter that impacts both throughput and latency is blocksize To fix the block size for subsequent experiments andto evaluate the impact of block size on performance we ranexperiments varying block size from 05MB to 4MB Resultsare depicted in Fig 6 showing peak throughput measured atthe peers along with the corresponding average end-to-end(e2e) latency

Figure 6 Impact of block size on throughput and latency

We can observe that throughput does not significantlyimprove beyond a block size of 2MB but latency gets worse(as expected) Therefore we adopt 2MB as the block sizefor the following experiments with the goal of maximizingthe measured throughput assuming the end-to-end latencyof roughly 500ms is acceptable

Size of transactions During this experiment we also ob-served the size MINT and SPEND transactions In particularthe 2MB-blocks contained 473 MINT or 670 SPEND transac-tions ie the average transaction size is 306kB for SPENDand 433kB for MINT In general transactions in Fabric arelarge because they carry certificate information BesidesMINT transactions of Fabcoin are larger than SPEND trans-actions because they carry CB certificates This is an avenuefor future improvement of both Fabric and Fabcoin

Impact of peer CPU Fabric peers run many CPU-intensivecryptographic operations To estimate the impact of CPU onthroughput we performed a set of experiments in which 4peers run on 4 8 16 and 32 vCPU VMs while also doingcoarse-grained latency staging of block validation to betteridentify bottlenecks Our experiment focused on the vali-dation phase as ordering with Kafka ordering service hasnever been a bottleneck in our experiments The validation

phase and in particular the VSCC validation of Fabcoinis computationally intensive due to many digital signatureverifications performed in this phase

Figure 7 Impact of peer CPU on end-to-end throughputand block validation latency

The results with 2MB blocks are shown in Fig 7 SPENDlatency is not shown in Fig 7 as it scales similarly to MINTWe can observe that Fabcoin VSCC validation scales quasi-linearly with CPU as Fabric VSCC validation is embarrass-ingly parallel However the read-write-check and ledger-access stages are sequential and become dominant with alarger number of cores (vCPUs) This suggests that futureversions of Fabric could profit from pipelining validationstages (which are now sequential) optimizing stable-storageaccess and parallelizing read-write dependency checks

Finally in this experiment we measured over 3560 trans-actions per second (tps) average SPEND throughput at the32-vCPU peer The MINT throughput is in general slightlylower than that of SPEND but the difference remains within10

Latency profiling by stages We further performed coarse-grained profiling of latency during our previous experimentat the peak reported throughput Results are depicted inTable 1 The ordering phase comprises broadcast-deliverlatency and internal latency within a peer before valida-tion starts The table reports average latencies for MINTand SPEND standard deviation and tail latencies (99 and999)

We can observe that ordering dominates the overall la-tency We observe that average latencies are below 550mswith sub-second tail latencies In particular the highestend-to-end latencies in our experiment come from the firstblocks during the load build-up Latency under lower loadcan be regulated and reduced by leveraging the time-to-cutparameter of the orderer (see Sec 33) which we basicallydo not use in our experiments as we set it to a large value

SSD vs RAM disk To evaluate the overhead of using lo-cal SSDs for stable storage we repeated the previous exper-iment with RAM disks (tmpfs) mounted as stable storage at

13 201821

all peer VMs The benefits are limited as tmpfs only helpswith the ledger phase of the validation at the peer We mea-sured sustained peak throughput at 3870 SPEND tps at 32-vCPU peer roughly a 9 improvement over SSD

avg stdev 99 999(1) endorsement 56 75 24 42 15 21 19 26(2) ordering 248 365 600 920 484 624 523 636(3) VSCC val 310 353 102 90 727 570 113 1084(4) RW check 348 615 39 93 470 885 590 933(5) ledger 506 722 62 88 701 975 725 105(6) validation (3+4+5) 116 169 128 178 156 216 199 230(7) end-to-end (1+2+6) 371 542 63 94 612 805 646 813

Table 1 Latency statistics in milliseconds (ms) for MINTand SPEND broken down into five stages at a 32-vCPUpeer with 2MB blocks Validation (6) comprises stages 34 and 5 the end-to-end latency contains stages 1ndash5

6 Related WorkThe architecture of Fabric resembles that of a middleware-replicated database as pioneered by Kemme and Alonso [24]However all existing work on this addressed only crash fail-ures not the setting of distributed trust that corresponds to aBFT system For instance a replicated database with asym-metric update processing [25 Sec 63] relies on one node toexecute each transaction which would not work on a block-chain The execute-order-validate architecture of Fabric canbe interpreted as a generalization of this work to the Byzan-tine model with practical applications to distributed ledgers

Byzantium [17] and HRDB [35] are two further prede-cessors of Fabric from the viewpoint of BFT database repli-cation Byzantium allows transactions to run in parallel anduses active replication but totally orders BEGIN and COM-MITROLLBACK using a BFT middleware In its opti-mistic mode every operation is coordinated by a single mas-ter replica if the master is suspected to be Byzantine allreplicas execute the transaction operations for the master andit triggers a costly protocol to change the master HRDB re-lies in an even stronger way on a correct master In contrastto Fabric both systems use active replication cannot handlea flexible trust model and rely on deterministic operationsby the replicas However their database API is richer thanthe KVS model of Fabric

In Eve [21] a related architecture for SMR has been ex-plored also in the BFT model Its peers execute transactionsconcurrently and then verify that they all reach the same out-put state using a consensus protocol If the states divergethey roll back and execute operations sequentially Eve con-tains the element of independent execution which also existsin Fabric but offers none of its other features

A large number of distributed ledger platforms in thepermissioned model have come out recently which makesit impossible to compare to all (some prominent ones areTendermint [5] Quorum [19] Chain Core [12] Multi-chain (httpswwwmultichaincom) HyperledgerSawtooth (httpssawtoothhyperledgerorg) the

Volt proposal [32] and more see references in recent over-views [9 16]) All platforms follow the order-execute archi-tecture as discussed in Section 2 As a representative exam-ple take the Quorum platform [19] an enterprise-focusedversion of Ethereum With its consensus based on Raft [29]it disseminates a transaction to all peers using gossip and theRaft leader (called minter) assembles valid transactions to ablock and distributes this using Raft All peers execute thetransaction in the order decided by the leader Therefore itsuffers from the limitations mentioned in Sections 1ndash2

References[1] P-L Aublin R Guerraoui N Knezevic V Quema and

M Vukolic The next 700 BFT protocols ACM Trans Com-put Syst 32(4)121ndash1245 Jan 2015

[2] E Ben-Sasson A Chiesa C Garman M Green I MiersE Tromer and M Virza Zerocash Decentralized anonymouspayments from bitcoin In IEEE Symposium on Security ampPrivacy pages 459ndash474 2014

[3] A N Bessani J Sousa and E A P Alchieri State ma-chine replication for the masses with BFT-SMART In In-ternational Conference on Dependable Systems and Networks(DSN) pages 355ndash362 2014

[4] G Bracha and S Toueg Asynchronous consensus and broad-cast protocols J ACM 32(4)824ndash840 1985

[5] E Buchman Tendermint Byzantine fault tolerance in the ageof blockchains MSc Thesis University of Guelph CanadaJune 2016

[6] N Budhiraja K Marzullo F B Schneider and S Toueg Theprimary-backup approach In S Mullender editor DistributedSystems (2nd Ed) pages 199ndash216 ACM PressAddison-Wesley 1993

[7] C Cachin R Guerraoui and L E T Rodrigues Introductionto Reliable and Secure Distributed Programming (2 ed)Springer 2011

[8] C Cachin S Schubert and M Vukolic Non-determinismin byzantine fault-tolerant replication In 20th InternationalConference on Principles of Distributed Systems (OPODIS)2016

[9] C Cachin and M Vukolic Blockchain consensus protocolsin the wild In A W Richa editor 31st Intl Symposium onDistributed Computing (DISC 2017) pages 11ndash116 2017

[10] J Camenisch and E V Herreweghen Design and imple-mentation of the idemix anonymous credential system InACM Conference on Computer and Communications Security(CCS) pages 21ndash30 2002

[11] M Castro and B Liskov Practical Byzantine fault toler-ance and proactive recovery ACM Trans Comput Syst20(4)398ndash461 Nov 2002

[12] Chain Chain protocol whitepaper httpschaincom

docs12protocolpaperswhitepaper 2017

[13] B Charron-Bost F Pedone and A Schiper editors Replica-tion Theory and Practice volume 5959 of Lecture Notes inComputer Science Springer 2010

14 201821

[14] K Croman C Decker I Eyal A E Gencer A JuelsA Kosba A Miller P Saxena E Shi E G Sirer et alOn scaling decentralized blockchains In International Con-ference on Financial Cryptography and Data Security (FC)pages 106ndash125 Springer 2016

[15] A Demers D Greene C Hauser W Irish J LarsonS Shenker H Sturgis D Swinehart and D Terry Epidemicalgorithms for replicated database maintenance In ACMSymposium on Principles of Distributed Computing (PODC)pages 1ndash12 ACM 1987

[16] T T A Dinh R Liu M Zhang G Chen B C Ooi andJ Wang Untangling blockchain A data processing viewof blockchain systems e-print arXiv170805665 [csDB]2017

[17] R Garcia R Rodrigues and N M Preguica Efficient mid-dleware for Byzantine fault tolerant database replication InEuropean Conference on Computer Systems (EuroSys) pages107ndash122 2011

[18] R Guerraoui R R Levy B Pochon and V Quema Through-put optimal total order broadcast for cluster environmentsACM Trans Comput Syst 28(2)51ndash532 2010

[19] J P Morgan Quorum whitepaper httpsgithubcom

jpmorganchasequorum-docs 2016

[20] F P Junqueira B C Reed and M Serafini Zab High-performance broadcast for primary-backup systems In In-ternational Conference on Dependable Systems amp Networks(DSN) pages 245ndash256 2011

[21] M Kapritsos Y Wang V Quema A Clement L Alvisiand M Dahlin All about Eve Execute-verify replicationfor multi-core servers In Symposium on Operating SystemsDesign and Implementation (OSDI) pages 237ndash250 2012

[22] R Karp C Schindelhauer S Shenker and B Vocking Ran-domized rumor spreading In Symposium on Foundations ofComputer Science (FOCS) pages 565ndash574 IEEE 2000

[23] B Kemme One-copy-serializability In Encyclopedia ofDatabase Systems pages 1947ndash1948 Springer 2009

[24] B Kemme and G Alonso A new approach to developingand implementing eager database replication protocols ACMTransactions on Database Systems 25(3)333ndash379 2000

[25] B Kemme R Jimenez-Peris and M Patino-MartınezDatabase Replication Synthesis Lectures on Data Manage-ment Morgan amp Claypool Publishers 2010

[26] A E Kosba A Miller E Shi Z Wen and C PapamanthouHawk The blockchain model of cryptography and privacy-preserving smart contracts In 37th IEEE Symposium onSecurity amp Privacy 2016

[27] S Liu P Viotti C Cachin V Quema and M Vukolic XFTpractical fault tolerance beyond crashes In Symposium onOperating Systems Design and Implementation (OSDI) pages485ndash500 2016

[28] S Nakamoto Bitcoin A peer-to-peer electronic cash systemhttpwwwbitcoinorgbitcoinpdf 2009

[29] D Ongaro and J Ousterhout In search of an understandableconsensus algorithm In USENIX Annual Technical Confer-ence (ATC) pages 305ndash320 2014

[30] F Pedone and A Schiper Handling message semanticswith Generic Broadcast protocols Distributed Computing15(2)97ndash107 2002

[31] F B Schneider Implementing fault-tolerant services usingthe state machine approach A tutorial ACM Comput Surv22(4)299ndash319 1990

[32] S Setty S Basu L Zhou M L Roberts and R VenkatesanEnabling secure and resource-efficient blockchain networkswith VOLT Technical Report MSR-TR-2017-38 MicrosoftResearch 2017

[33] A Singh T Das P Maniatis P Druschel and T Roscoe BFTprotocols under fire In Symposium on Networked SystemsDesign amp Implementation (NSDI) pages 189ndash204 2008

[34] J Sousa A Bessani and M Vukolic A Byzantine fault-tolerant ordering service for the Hyperledger Fabric block-chain platform Technical Report arXiv170906921 CoRR2017

[35] B Vandiver H Balakrishnan B Liskov and S MaddenTolerating Byzantine faults in transaction processing systemsusing commit barrier scheduling In ACM Symposium onOperating Systems Principles (SOSP) pages 59ndash72 2007

[36] M Vukolic The quest for scalable blockchain fabric Proof-of-work vs BFT replication In International Workshop onOpen Problems in Network Security (iNetSec) pages 112ndash125 2015

[37] J Yin J Martin A Venkataramani L Alvisi and M DahlinSeparating agreement from execution for Byzantine fault tol-erant services In ACM Symposium on Operating SystemsPrinciples (SOSP) pages 253ndash267 2003

15 201821

  • 1 Introduction
  • 2 Background
    • 21 Order-Execute Architecture for Blockchains
    • 22 Limitations of Order-Execute
    • 23 Further Limitations of Existing Architectures
    • 24 Experience with Order-Execute Blockchain
      • 3 Architecture
        • 31 Fabric Overview
        • 32 Execution Phase
        • 33 Ordering Phase
        • 34 Validation Phase
          • 4 Fabric Components
            • 41 Membership Service
            • 42 Ordering Service
            • 43 Peer Gossip
            • 44 Ledger
            • 45 Chaincode Execution
            • 46 Configuration and System Chaincodes
              • 5 Evaluation
                • 51 Fabric Coin (Fabcoin)
                • 52 Experiments
                  • 6 Related Work

all peer VMs The benefits are limited as tmpfs only helpswith the ledger phase of the validation at the peer We mea-sured sustained peak throughput at 3870 SPEND tps at 32-vCPU peer roughly a 9 improvement over SSD

avg stdev 99 999(1) endorsement 56 75 24 42 15 21 19 26(2) ordering 248 365 600 920 484 624 523 636(3) VSCC val 310 353 102 90 727 570 113 1084(4) RW check 348 615 39 93 470 885 590 933(5) ledger 506 722 62 88 701 975 725 105(6) validation (3+4+5) 116 169 128 178 156 216 199 230(7) end-to-end (1+2+6) 371 542 63 94 612 805 646 813

Table 1 Latency statistics in milliseconds (ms) for MINTand SPEND broken down into five stages at a 32-vCPUpeer with 2MB blocks Validation (6) comprises stages 34 and 5 the end-to-end latency contains stages 1ndash5

6 Related WorkThe architecture of Fabric resembles that of a middleware-replicated database as pioneered by Kemme and Alonso [24]However all existing work on this addressed only crash fail-ures not the setting of distributed trust that corresponds to aBFT system For instance a replicated database with asym-metric update processing [25 Sec 63] relies on one node toexecute each transaction which would not work on a block-chain The execute-order-validate architecture of Fabric canbe interpreted as a generalization of this work to the Byzan-tine model with practical applications to distributed ledgers

Byzantium [17] and HRDB [35] are two further prede-cessors of Fabric from the viewpoint of BFT database repli-cation Byzantium allows transactions to run in parallel anduses active replication but totally orders BEGIN and COM-MITROLLBACK using a BFT middleware In its opti-mistic mode every operation is coordinated by a single mas-ter replica if the master is suspected to be Byzantine allreplicas execute the transaction operations for the master andit triggers a costly protocol to change the master HRDB re-lies in an even stronger way on a correct master In contrastto Fabric both systems use active replication cannot handlea flexible trust model and rely on deterministic operationsby the replicas However their database API is richer thanthe KVS model of Fabric

In Eve [21] a related architecture for SMR has been ex-plored also in the BFT model Its peers execute transactionsconcurrently and then verify that they all reach the same out-put state using a consensus protocol If the states divergethey roll back and execute operations sequentially Eve con-tains the element of independent execution which also existsin Fabric but offers none of its other features

A large number of distributed ledger platforms in thepermissioned model have come out recently which makesit impossible to compare to all (some prominent ones areTendermint [5] Quorum [19] Chain Core [12] Multi-chain (httpswwwmultichaincom) HyperledgerSawtooth (httpssawtoothhyperledgerorg) the

Volt proposal [32] and more see references in recent over-views [9 16]) All platforms follow the order-execute archi-tecture as discussed in Section 2 As a representative exam-ple take the Quorum platform [19] an enterprise-focusedversion of Ethereum With its consensus based on Raft [29]it disseminates a transaction to all peers using gossip and theRaft leader (called minter) assembles valid transactions to ablock and distributes this using Raft All peers execute thetransaction in the order decided by the leader Therefore itsuffers from the limitations mentioned in Sections 1ndash2

References[1] P-L Aublin R Guerraoui N Knezevic V Quema and

M Vukolic The next 700 BFT protocols ACM Trans Com-put Syst 32(4)121ndash1245 Jan 2015

[2] E Ben-Sasson A Chiesa C Garman M Green I MiersE Tromer and M Virza Zerocash Decentralized anonymouspayments from bitcoin In IEEE Symposium on Security ampPrivacy pages 459ndash474 2014

[3] A N Bessani J Sousa and E A P Alchieri State ma-chine replication for the masses with BFT-SMART In In-ternational Conference on Dependable Systems and Networks(DSN) pages 355ndash362 2014

[4] G Bracha and S Toueg Asynchronous consensus and broad-cast protocols J ACM 32(4)824ndash840 1985

[5] E Buchman Tendermint Byzantine fault tolerance in the ageof blockchains MSc Thesis University of Guelph CanadaJune 2016

[6] N Budhiraja K Marzullo F B Schneider and S Toueg Theprimary-backup approach In S Mullender editor DistributedSystems (2nd Ed) pages 199ndash216 ACM PressAddison-Wesley 1993

[7] C Cachin R Guerraoui and L E T Rodrigues Introductionto Reliable and Secure Distributed Programming (2 ed)Springer 2011

[8] C Cachin S Schubert and M Vukolic Non-determinismin byzantine fault-tolerant replication In 20th InternationalConference on Principles of Distributed Systems (OPODIS)2016

[9] C Cachin and M Vukolic Blockchain consensus protocolsin the wild In A W Richa editor 31st Intl Symposium onDistributed Computing (DISC 2017) pages 11ndash116 2017

[10] J Camenisch and E V Herreweghen Design and imple-mentation of the idemix anonymous credential system InACM Conference on Computer and Communications Security(CCS) pages 21ndash30 2002

[11] M Castro and B Liskov Practical Byzantine fault toler-ance and proactive recovery ACM Trans Comput Syst20(4)398ndash461 Nov 2002

[12] Chain Chain protocol whitepaper httpschaincom

docs12protocolpaperswhitepaper 2017

[13] B Charron-Bost F Pedone and A Schiper editors Replica-tion Theory and Practice volume 5959 of Lecture Notes inComputer Science Springer 2010

14 201821

[14] K Croman C Decker I Eyal A E Gencer A JuelsA Kosba A Miller P Saxena E Shi E G Sirer et alOn scaling decentralized blockchains In International Con-ference on Financial Cryptography and Data Security (FC)pages 106ndash125 Springer 2016

[15] A Demers D Greene C Hauser W Irish J LarsonS Shenker H Sturgis D Swinehart and D Terry Epidemicalgorithms for replicated database maintenance In ACMSymposium on Principles of Distributed Computing (PODC)pages 1ndash12 ACM 1987

[16] T T A Dinh R Liu M Zhang G Chen B C Ooi andJ Wang Untangling blockchain A data processing viewof blockchain systems e-print arXiv170805665 [csDB]2017

[17] R Garcia R Rodrigues and N M Preguica Efficient mid-dleware for Byzantine fault tolerant database replication InEuropean Conference on Computer Systems (EuroSys) pages107ndash122 2011

[18] R Guerraoui R R Levy B Pochon and V Quema Through-put optimal total order broadcast for cluster environmentsACM Trans Comput Syst 28(2)51ndash532 2010

[19] J P Morgan Quorum whitepaper httpsgithubcom

jpmorganchasequorum-docs 2016

[20] F P Junqueira B C Reed and M Serafini Zab High-performance broadcast for primary-backup systems In In-ternational Conference on Dependable Systems amp Networks(DSN) pages 245ndash256 2011

[21] M Kapritsos Y Wang V Quema A Clement L Alvisiand M Dahlin All about Eve Execute-verify replicationfor multi-core servers In Symposium on Operating SystemsDesign and Implementation (OSDI) pages 237ndash250 2012

[22] R Karp C Schindelhauer S Shenker and B Vocking Ran-domized rumor spreading In Symposium on Foundations ofComputer Science (FOCS) pages 565ndash574 IEEE 2000

[23] B Kemme One-copy-serializability In Encyclopedia ofDatabase Systems pages 1947ndash1948 Springer 2009

[24] B Kemme and G Alonso A new approach to developingand implementing eager database replication protocols ACMTransactions on Database Systems 25(3)333ndash379 2000

[25] B Kemme R Jimenez-Peris and M Patino-MartınezDatabase Replication Synthesis Lectures on Data Manage-ment Morgan amp Claypool Publishers 2010

[26] A E Kosba A Miller E Shi Z Wen and C PapamanthouHawk The blockchain model of cryptography and privacy-preserving smart contracts In 37th IEEE Symposium onSecurity amp Privacy 2016

[27] S Liu P Viotti C Cachin V Quema and M Vukolic XFTpractical fault tolerance beyond crashes In Symposium onOperating Systems Design and Implementation (OSDI) pages485ndash500 2016

[28] S Nakamoto Bitcoin A peer-to-peer electronic cash systemhttpwwwbitcoinorgbitcoinpdf 2009

[29] D Ongaro and J Ousterhout In search of an understandableconsensus algorithm In USENIX Annual Technical Confer-ence (ATC) pages 305ndash320 2014

[30] F Pedone and A Schiper Handling message semanticswith Generic Broadcast protocols Distributed Computing15(2)97ndash107 2002

[31] F B Schneider Implementing fault-tolerant services usingthe state machine approach A tutorial ACM Comput Surv22(4)299ndash319 1990

[32] S Setty S Basu L Zhou M L Roberts and R VenkatesanEnabling secure and resource-efficient blockchain networkswith VOLT Technical Report MSR-TR-2017-38 MicrosoftResearch 2017

[33] A Singh T Das P Maniatis P Druschel and T Roscoe BFTprotocols under fire In Symposium on Networked SystemsDesign amp Implementation (NSDI) pages 189ndash204 2008

[34] J Sousa A Bessani and M Vukolic A Byzantine fault-tolerant ordering service for the Hyperledger Fabric block-chain platform Technical Report arXiv170906921 CoRR2017

[35] B Vandiver H Balakrishnan B Liskov and S MaddenTolerating Byzantine faults in transaction processing systemsusing commit barrier scheduling In ACM Symposium onOperating Systems Principles (SOSP) pages 59ndash72 2007

[36] M Vukolic The quest for scalable blockchain fabric Proof-of-work vs BFT replication In International Workshop onOpen Problems in Network Security (iNetSec) pages 112ndash125 2015

[37] J Yin J Martin A Venkataramani L Alvisi and M DahlinSeparating agreement from execution for Byzantine fault tol-erant services In ACM Symposium on Operating SystemsPrinciples (SOSP) pages 253ndash267 2003

15 201821

  • 1 Introduction
  • 2 Background
    • 21 Order-Execute Architecture for Blockchains
    • 22 Limitations of Order-Execute
    • 23 Further Limitations of Existing Architectures
    • 24 Experience with Order-Execute Blockchain
      • 3 Architecture
        • 31 Fabric Overview
        • 32 Execution Phase
        • 33 Ordering Phase
        • 34 Validation Phase
          • 4 Fabric Components
            • 41 Membership Service
            • 42 Ordering Service
            • 43 Peer Gossip
            • 44 Ledger
            • 45 Chaincode Execution
            • 46 Configuration and System Chaincodes
              • 5 Evaluation
                • 51 Fabric Coin (Fabcoin)
                • 52 Experiments
                  • 6 Related Work

[14] K Croman C Decker I Eyal A E Gencer A JuelsA Kosba A Miller P Saxena E Shi E G Sirer et alOn scaling decentralized blockchains In International Con-ference on Financial Cryptography and Data Security (FC)pages 106ndash125 Springer 2016

[15] A Demers D Greene C Hauser W Irish J LarsonS Shenker H Sturgis D Swinehart and D Terry Epidemicalgorithms for replicated database maintenance In ACMSymposium on Principles of Distributed Computing (PODC)pages 1ndash12 ACM 1987

[16] T T A Dinh R Liu M Zhang G Chen B C Ooi andJ Wang Untangling blockchain A data processing viewof blockchain systems e-print arXiv170805665 [csDB]2017

[17] R Garcia R Rodrigues and N M Preguica Efficient mid-dleware for Byzantine fault tolerant database replication InEuropean Conference on Computer Systems (EuroSys) pages107ndash122 2011

[18] R Guerraoui R R Levy B Pochon and V Quema Through-put optimal total order broadcast for cluster environmentsACM Trans Comput Syst 28(2)51ndash532 2010

[19] J P Morgan Quorum whitepaper httpsgithubcom

jpmorganchasequorum-docs 2016

[20] F P Junqueira B C Reed and M Serafini Zab High-performance broadcast for primary-backup systems In In-ternational Conference on Dependable Systems amp Networks(DSN) pages 245ndash256 2011

[21] M Kapritsos Y Wang V Quema A Clement L Alvisiand M Dahlin All about Eve Execute-verify replicationfor multi-core servers In Symposium on Operating SystemsDesign and Implementation (OSDI) pages 237ndash250 2012

[22] R Karp C Schindelhauer S Shenker and B Vocking Ran-domized rumor spreading In Symposium on Foundations ofComputer Science (FOCS) pages 565ndash574 IEEE 2000

[23] B Kemme One-copy-serializability In Encyclopedia ofDatabase Systems pages 1947ndash1948 Springer 2009

[24] B Kemme and G Alonso A new approach to developingand implementing eager database replication protocols ACMTransactions on Database Systems 25(3)333ndash379 2000

[25] B Kemme R Jimenez-Peris and M Patino-MartınezDatabase Replication Synthesis Lectures on Data Manage-ment Morgan amp Claypool Publishers 2010

[26] A E Kosba A Miller E Shi Z Wen and C PapamanthouHawk The blockchain model of cryptography and privacy-preserving smart contracts In 37th IEEE Symposium onSecurity amp Privacy 2016

[27] S Liu P Viotti C Cachin V Quema and M Vukolic XFTpractical fault tolerance beyond crashes In Symposium onOperating Systems Design and Implementation (OSDI) pages485ndash500 2016

[28] S Nakamoto Bitcoin A peer-to-peer electronic cash systemhttpwwwbitcoinorgbitcoinpdf 2009

[29] D Ongaro and J Ousterhout In search of an understandableconsensus algorithm In USENIX Annual Technical Confer-ence (ATC) pages 305ndash320 2014

[30] F Pedone and A Schiper Handling message semanticswith Generic Broadcast protocols Distributed Computing15(2)97ndash107 2002

[31] F B Schneider Implementing fault-tolerant services usingthe state machine approach A tutorial ACM Comput Surv22(4)299ndash319 1990

[32] S Setty S Basu L Zhou M L Roberts and R VenkatesanEnabling secure and resource-efficient blockchain networkswith VOLT Technical Report MSR-TR-2017-38 MicrosoftResearch 2017

[33] A Singh T Das P Maniatis P Druschel and T Roscoe BFTprotocols under fire In Symposium on Networked SystemsDesign amp Implementation (NSDI) pages 189ndash204 2008

[34] J Sousa A Bessani and M Vukolic A Byzantine fault-tolerant ordering service for the Hyperledger Fabric block-chain platform Technical Report arXiv170906921 CoRR2017

[35] B Vandiver H Balakrishnan B Liskov and S MaddenTolerating Byzantine faults in transaction processing systemsusing commit barrier scheduling In ACM Symposium onOperating Systems Principles (SOSP) pages 59ndash72 2007

[36] M Vukolic The quest for scalable blockchain fabric Proof-of-work vs BFT replication In International Workshop onOpen Problems in Network Security (iNetSec) pages 112ndash125 2015

[37] J Yin J Martin A Venkataramani L Alvisi and M DahlinSeparating agreement from execution for Byzantine fault tol-erant services In ACM Symposium on Operating SystemsPrinciples (SOSP) pages 253ndash267 2003

15 201821

  • 1 Introduction
  • 2 Background
    • 21 Order-Execute Architecture for Blockchains
    • 22 Limitations of Order-Execute
    • 23 Further Limitations of Existing Architectures
    • 24 Experience with Order-Execute Blockchain
      • 3 Architecture
        • 31 Fabric Overview
        • 32 Execution Phase
        • 33 Ordering Phase
        • 34 Validation Phase
          • 4 Fabric Components
            • 41 Membership Service
            • 42 Ordering Service
            • 43 Peer Gossip
            • 44 Ledger
            • 45 Chaincode Execution
            • 46 Configuration and System Chaincodes
              • 5 Evaluation
                • 51 Fabric Coin (Fabcoin)
                • 52 Experiments
                  • 6 Related Work