automatic software dependency management …trap.ncirl.ie/3300/1/gavindmello.pdfautomatic software...

20
Automatic Software Dependency Management using Blockchain Msc Research Project Cloud Computing Gavin D’mello x17110483 School of Computing National College of Ireland Supervisor: Horacio Gonz´ alez–V´ elez

Upload: others

Post on 20-Apr-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automatic Software Dependency Management …trap.ncirl.ie/3300/1/gavindmello.pdfAutomatic Software Dependency Management using Blockchain Gavin D’mello x17110483 Msc Research Project

Automatic Software DependencyManagement using Blockchain

Msc Research Project

Cloud Computing

Gavin D’mellox17110483

School of Computing

National College of Ireland

Supervisor: Horacio Gonzalez–Velez

www.ncirl.ie

Page 2: Automatic Software Dependency Management …trap.ncirl.ie/3300/1/gavindmello.pdfAutomatic Software Dependency Management using Blockchain Gavin D’mello x17110483 Msc Research Project

National College of IrelandProject Submission Sheet – 2017/2018

School of Computing

Student Name: Gavin D’melloStudent ID: x17110483Programme: Cloud ComputingYear: 2018Module: Msc Research ProjectLecturer: Horacio Gonzalez–VelezSubmission DueDate:

13/07/2018

Project Title: Automatic Software Dependency Management using Block-chain

Word Count: 5649

I hereby certify that the information contained in this (my submission) is informationpertaining to research I conducted for this project. All information other than my owncontribution will be fully referenced and listed in the relevant bibliography section at therear of the project.

ALL internet material must be referenced in the bibliography section. Studentsare encouraged to use the Harvard Referencing Standard supplied by the Library. Touse other author’s written or electronic work is illegal (plagiarism) and may result indisciplinary action. Students may be required to undergo a viva (oral examination) ifthere is suspicion about the validity of their submitted work.

Signature:

Date: 16th September 2018

PLEASE READ THE FOLLOWING INSTRUCTIONS:1. Please attach a completed copy of this sheet to each project (including multiple copies).2. You must ensure that you retain a HARD COPY of ALL projects, both foryour own reference and in case a project is lost or mislaid. It is not sufficient to keep acopy on the computer. Please do not bind projects or place in covers unless specificallyrequested.3. Assignments that are submitted to the Programme Coordinator office must be placedinto the assignment box located outside the office.

Office Use OnlySignature:

Date:Penalty Applied (ifapplicable):

Page 3: Automatic Software Dependency Management …trap.ncirl.ie/3300/1/gavindmello.pdfAutomatic Software Dependency Management using Blockchain Gavin D’mello x17110483 Msc Research Project

Automatic Software Dependency Management usingBlockchain

Gavin D’mellox17110483

Msc Research Project in Cloud Computing

16th September 2018

Abstract

Contemporary software deployments rely on cloud-based package managers forinstallation, where existing packages are installed on demand from remote coderepositories. Usually, frameworks or common utilities, packages increase the codereusability within the ecosystem, whilst keeping the code base small. However,disruptions in the package management services can potentially affect developmentand deployment workflows. Furthermore, cloud package managers have arguablyan ambiguous ownership model and offer limited visibility of packages to the users.This work describes the development of a blockchain-based package control systemwhich is decentralised, reliable and transparent. Blockchain nodes are installedwithin the infrastructure to provide immutability, and then a dependency graph isconstructed to trace the software provenance and package reliance. Our system hasbeen tested with 4338 packages from NPM, 950 out of which are the top depended-upon packages.

Contents

1 Introduction 2

2 Related Work 3

3 Contribution 4

4 Methodology 64.1 Data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64.2 Storage solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94.4 Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

5 Implementation 13

6 Evaluation 15

7 Conclusion and Future Work 17

1

Page 4: Automatic Software Dependency Management …trap.ncirl.ie/3300/1/gavindmello.pdfAutomatic Software Dependency Management using Blockchain Gavin D’mello x17110483 Msc Research Project

1 Introduction

Long considered a key practice in the industry, software reuse entails the creation of newsoftware systems using existing software packages Krueger (1992). Most software pack-ages are made available as common utility tools or frameworks which are used by millionsof users. With the advent of a microservices and cloud architectures, package reuse hasincreased as each service has its lifetime and own state to manage independently of otherservices. Each language and community tend to have a different operating package man-ager, and the proliferation of online version control tools such as Github and Bitbuckethave led to the creation wider range of interdependent software components Decan et al.(2016).

Package managers provide a platform for code sharing and reliability. Reliable applic-ation package managers are of prime importance to software developers. Most packagesneed to be installed right before the deployment phase. Software packages tend to havedirect and transitive dependencies on other packages, which make them vulnerable an-d/or prone to failure if any dependency is unpublished or compromised. Dependenciesare not necessarily straightforward and can have multiple nesting levels. For example, thepackage libcurl, which is used for sending HTTP requests, depends on other packageslike zlib which is used to compress data. Any failure in an application package managercould lead to build failures and hinder the development processes.

Different software package managers offer mirrors and streaming. Nonetheless, mostpackage managers are heavily centralised in their architectures, which can present a singlepoint of failure and, more relevant to this work, a source of inconsistencies when packagecomponents or versions change. It is therefore important to check if existing packagemanagers can be decentralised to improve the reliability of the ecosystem.

Blockchain smart contracts allow us to store data and execute functions on them ina decentralised setup. Once the smart contract is deployed, transactions can be sent tothe contract. The changes made by the transactions can be mined and broadcasted tothe entire network. This work focuses on the application of smart contracts to maintainan immutable decentralised change control system for packages and versions. It canassure the provenance of a given set of packages to assure the correct development anddeployment for a given set of new packages. We have evaluated our work using 4338packages from NPM.

This paper is organised as follows. Section 2 discusses package managers and existingBlockchain systems. Section 4 outlines our proposed method to manage software packageswith Blockchain, including the proposed algorithm for smart contracts. Section 5 showsthe implementation of versions on the Smart contracts. Solidity was used to write theSmart contract and peer to peer storage was used to upload the packages. The versiontree is described which shows how one version is different from another. The patternused to store the contract also helps us to seamlessly change the processing logic fromthe storage. Section 6 presents our evaluation using 4338 packages along with some oftheir versions. These 4338 packages include packages which are the most depended-uponand key utilities. The section also outlines the bandwidth requirements of the blockchainnode and the latency involved in pulling the packages from the system. It also showsa part of a network graph modeled directly from the data coming from the blockchainnode. Finally, Section 7 presents some concluding remarks.

Page 5: Automatic Software Dependency Management …trap.ncirl.ie/3300/1/gavindmello.pdfAutomatic Software Dependency Management using Blockchain Gavin D’mello x17110483 Msc Research Project

2 Related Work

Traditionally, software has been created using a waterfall model where every change goesthrough some pre-requisite number of stages. With the advent of open-source and rapidcollaborative environments, rapid application development has increasingly become thenorm for applicative environments Ruparelia (2010). Agile, Scrum, Extreme Program-ming, and other rapid application development methodologies have become more popular,since they allow changes to be added dynamically, leading to continuous implementationusing a backlog. Developers pick software features from the backlog and releases aremade at shorter durations compared to the traditional model, leading to adaptive soft-ware development Highsmith and Cockburn (2001).

Code bases are constantly evolving over time and version control tools like Git andSubversion are widely used to manage versions and control changes. Github and Bitbucketare widely employed in the software industry for hosting remote code repositories basedon such services. These are centralised stores keep all the repositories published and alsoimprove code discovery.

Github and Bitbucket encourage reuse of application code and, arguably, save re-sources and configuration efforts as the same package can be globally used by manydifferent programs. However, Github and BitBucket package management can be com-plex. Packages have to be typically downloaded using source code and each client needsto explicitly keep version control on each package.

Cloud computing enables the provision of software on demand but requires applica-tions to be delivered as a consistent lightweight service over the Internet. Consequently,monolithic software architectures are constantly getting divided into microservices Balalaieet al. (2016). The important decision developers have to make is what constitutes a ser-vice. The separation of concerns has made life difficult for developers because some codecan be duplicated across services.

Managing dynamic versions and software dependencies is complicated, as the programdependency graph for large programs is long known to be difficult to handle Ottensteinand Ottenstein (1984). Microservice-based cloud architecture—where package depend-encies can be linked to different packages—then increases the challenge at hand Toffettiet al. (2017), since the search space is significantly large to completely understand con-flicts between dependencies.

The term ’DLL hell’ has been coined to describe many different versions of the samelibrary Tucker et al. (2007). Programs using the different versions of the same librarytend to break in case there are major changes in the package, so package managers areexpected to be ‘intelligent enough’ to handle different versions of the same package. ACUDF (Common Upgrade Description Format) document was proposed to keep a trackof the package definition and its dependencies Abate et al. (2012), similar to PyPI’srequirement.txt file or NPM’s package.json. However, modern-day application pack-age managers such as NPM can have multiple versions of the package Schlueter (2010),and common version requirements are kept in a common directory and alternate versionsare kept local to the package which helps to eliminate collisions of different versions.Some package managers use semantic versioning or ’semver’ principles. These principlesshould be clearly understood by the authors and users. Authors must understand thatthe breaking changes must always be released as major changes. Users must carefully re-view the changes in the packages to avoid build errors. Failure to understand these lawsputs build systems at risk. Versions are divided into three parts Semantic Versioning

Page 6: Automatic Software Dependency Management …trap.ncirl.ie/3300/1/gavindmello.pdfAutomatic Software Dependency Management using Blockchain Gavin D’mello x17110483 Msc Research Project

Table 1: List of languages and their package managersLanguage Package manager

Java MavenNodejs NpmPython Pypi

C# NugetPHP PackagistRuby Ruby gems

user guide (2013): Major.Minor.PatchA security-oriented management framework, CHAINIAC has been used to verify integ-

rity and authenticity for software-release processes based on decentralised nodes Nikitinet al. (2017). However, CHAINIAC does not appear to address the immutability issue aschanges in versions may eventually break compilation and software components. Majorversion bumps are for breaking changes in the API or when the package big parts of thepackage are being rewritten. Minor versions are for new features additions to existing setwithout breaking changes. A patch is bug fixes without breaking changes.

Every programming language has its own package manager. Table 1 has a list ofselected languages along with their package managers

Having openly released their architecture for dependency management, NPM is thepackage manager for JavaScript and is also widely considered among the largest coderepositories in the world Wittern et al. (2016).

3 Contribution

This is a brief comparison between the existing systems and the proposed method.

Table 2: Comparison of existing systems and the proposed method

Variable name Different Package ManagersHead Maven Nuget NPM Proposed Method

Decentralized No No No YesWrite Throughput High High High LowRead Throughput High High High High

Immutable No No No Yes

The proposed method involves using a blockchain-based smart contract to decentralisethe package management. The idea is to propose a model which can work well withboth centralized and decentralized systems. Decentralisation is offered by a blockchainnetwork, in our case Ethereum. Ethereum uses Solidity for creating smart contracts.The language has similar syntax to Javascript. The contract is compiled by a Soliditycompiler. The Solidity compiler returns the bytecode which needs to be deployed to theEthereum node. language is yet under heavy development. There are several featureslike the dynamic passing of structures and mappings which are experimental under thecurrent version.

Page 7: Automatic Software Dependency Management …trap.ncirl.ie/3300/1/gavindmello.pdfAutomatic Software Dependency Management using Blockchain Gavin D’mello x17110483 Msc Research Project

Figure 1: Blockchain data flow.

A blockchain can be construed as a network rather than a technology or system. Thesmart contract storage is essentially a public database which is immutable and decentral-ized. It has primarily been used in financial systems. It only came into prominence whenNakamoto released the Bitcoin paper Nakamoto (2008). The system did not have a cent-ral authority for verifying transactions and involves having a group of miners to whichverify transactions. The miners are paid for the electricity and CPU spent in verifyingthe transactions. Figure 1 shows how data flows between different nodes in a blockchainnetwork. The work was substantial enough to gain traction. Bitcoin is not Turing com-plete and potential blockchain adopters could not do anything but get inspired by thesystem and reinvent the wheel by building something similar.

Ethereum is considered a natural extension of Bitcoin which is also a cryptocurrency.Ethereum is a Turing complete solution where the user can program and deploy bytecodeto the Ethereum nodes Buterin et al. (2014). This led to interesting opportunities whereusers could actually harness the actual strength of the network. The concept of smartcontracts allowed the user to add custom logic to the Ethereum nodes. The smart contractis written in Solidity which is a general-purpose programming language. The smartcontract functions can be called from thin clients like browsers which we call decentralizedapplications. Any change to data on the Blockchain is mined by the Ethereum communityminers and there is a gas price which the miner gets for blocks that they add to thesystem. Ethereum uses a distributed ledger for transactions and provides a decentralizedarchitecture for smart contract execution. After the smart contract is deployed onto theEthereum node, the contract is executed via RPC calls from thin clients.

Page 8: Automatic Software Dependency Management …trap.ncirl.ie/3300/1/gavindmello.pdfAutomatic Software Dependency Management using Blockchain Gavin D’mello x17110483 Msc Research Project

F

C3

x4:1x5:0x2:1

C2

x4:0x3:0x2:0

C1

x3:0x2:0x1:0

Figure 2: Transitive nature of packages.

4 Methodology

Equation (1) presents the clauses which act as dependencies of a package to be installed.

(¬x1 ∨ ¬x2 ∨ ¬x3) ∧ (¬x2 ∨ x3 ∨ x4) ∧ (¬x2 ∨ ¬x5 ∨ x3) (1)

Each clause has three versions which have all literals as sub-dependencies of depend-encies. Each literal has two versions x1:0 and x1:1, where x1:0 is given as the negationin the package. To install a package F, we need to install all its sub packages given byclauses in the equation. Assuming we can install only one version of the package at atime, we need to find a satisfying assignment to Equation (1) such that we get a suc-cessful response where all clauses can be installed for the main package to be installed.It has been shown that the problem is NP-complete by converting the 3SAT instanceto a dependency problem instance Cox (2016). The NP-completeness can be avoided byattacking some assumptions like only a single package can be installed on the instance ata time and having a versioning system which does not specify package range.

Every package manager has to deal with direct and transitive dependencies. The morethe transitivity, the more the complicated the network. The transitivity can be seen from2. Here, all the nodes are packages and the edges represent the dependence. We can seethat package F depends on C1, C2 and C3. Also, packages C1, C2 and C3 directly dependon other packages.

Here, each C clause has three versions each of which depends on different packages.For example, C1:0 depends on x1:0 and C1:1 depends on x2:0. A dependency chart was usedfor the 3SAT reduction as shown by Cox (2016). If the x packages were to be removed,it would affect F and all C packages. The ’left-pad’ problem was pointed out by Decanet al. (2016), where a package named ’left-pad’ was used by major packages like Atom andBabel. The owner of ’left-pad’ unpublished this package which caused all installationsand builds to fail. Two percent of the transitive packages installations failed because ofthis event. Developers started rethinking their dependency structures and including onlywhich they dire needed.

Smart contracts are similar to stored procedures in SQL. The bytecode of the smartcontract is deployed to all the network nodes. The functions of the bytecode are theninvoked via RPC calls on the client. New data to be mined requires proof of work fromthe Ethereum node. Proof of work here means the node will have to perform somecomputational work before adding a block to the network. This reduces the number ofthe blocks a node can add to the network and gets fairness to the system.

4.1 Data structure

The data schema can be best seen as a tree. The root node acts the package name andthe leaf node is the actual package information. The nodes are version numbers. The

Page 9: Automatic Software Dependency Management …trap.ncirl.ie/3300/1/gavindmello.pdfAutomatic Software Dependency Management using Blockchain Gavin D’mello x17110483 Msc Research Project

gcc

0

0

1

v0.0.1

2

v0.0.2

1

0

v0.1.0

Figure 3: Placement of new versions on version tree

{owner : ’ example ownerid ’ ,dependenc ies : {

’ Package1 ’ : ’ 1 . 1 . 2 ’ ,’ Package2 ’ : ’ 1 . 2 . 1 ’

} ,l i n k : ’ example l ink ’ ,checksum : ’ example checksum ’

}

Figure 4: Data structure for package versioning.

Page 10: Automatic Software Dependency Management …trap.ncirl.ie/3300/1/gavindmello.pdfAutomatic Software Dependency Management using Blockchain Gavin D’mello x17110483 Msc Research Project

Figure 5: Smart contract interaction.

data structure provided in Figure 4 gives a gist of the data structure. Every packagewould have its package tree. The data structure is flexible to both the minor majorversioning technique as well as semantic versioning where we have three versions. Anoversimplification of this data structure would be to have to nested map pointing to aresource. Using as hashmap keeps the time complexity to O(1).

The package information will contain information which is important to install thepackage like dependencies, link, dependents. The package dependencies are the packageswhich will be installed with the package. The version of the dependencies is importanthere to exactly install the dependency the package needs. The data can be fed to theclient either in a single go or differently for every package. The ownerId helps us toidentify the owner of the package. The package ecosystem can be looked upon as a giantgraph where each can depend on other packages or other packages depend on it. The linkis the pointer to the package which will be stored with IPFS.

The developers can host the Ethereum nodes in their environment or use the network.Some users would like to keep a copy of the blockchain in the event the network goesdown. The usage of the node or the network will be configurable via the package managerclient.

4.2 Storage solution

We need to keep the bare minimum information on the ledger so that the intermediationof metadata on the peer to peer network is fast. A compressed version of the packagewould be kept on the IPFS. IPFS(Interplanetary file system) is peer to peer storagesolution. While uploading the package file to IPFS we receive an immutable hash. Thisis hash can be used to retrieve the file in the future. IPFS uses a Distributed hash tableto store the hash and the data is stored locally in the node where it is published. Anyother node which requests a file has to download the file from the nearest node and keepit locally with its self for a definite period of time.

We use a Coral Distributed hash table to store the data which concentrates moreon locality Benet (2014). A replication factor can be added to have some replication ofblocks. The hash returned is stored on the Smart contract. IPFS internally uses the Bit

Page 11: Automatic Software Dependency Management …trap.ncirl.ie/3300/1/gavindmello.pdfAutomatic Software Dependency Management using Blockchain Gavin D’mello x17110483 Msc Research Project

Torrent protocol where it opens up many connections to the different peers and downloadsbits and pieces of the file simultaneously. By using IPFS we make sure that no part ofthe architecture is centralised. IPFS is used for storing web archival records where thepayloads are stored in IPFS and the indexes are stored in a format called CDXJ Alamet al. (2016). CDXJ is an extension to CDX which has JSON support. CDXJ plays asimilar role to the blockchain node in our system except that it is mutable.

4.3 Algorithms

The algorithm is for storing package which the developer wants to make available. Theclient side would have to provide the owner name, package name, version, and depend-encies. The package version, name, and dependencies are extracted from the dependencyfile on the client. The package to be published is stored in the data structure mentionedabove which will contain trees for all packages. The complexity of this algorithm is O(1).The package Info is the same as the mentioned above. It will contain the dependencies,dependency versions and link to the current package.

Algorithm 1 Publish algorithm

1: procedure publishPackage. *Smart contract storage

2: packages ← package map. *inputs

3: pn← name of the package4: packageInfo← packageInfo5: v ← version6: major ← version major7: minor ← version minor8: patch← version patch9: if packages[pn] then

10: if packages[pn][major][minor][patch] == null then11: packages[pn][major][minor][patch]← packageInfo12: success← true13: else14: success← false15: end if16: end if

return success17: end procedure

The algorithm for downloading an installed package is given. The asymptotic com-plexity of installing the entire package depends on the number of dependencies in thepackage. Requesting a package with its version from the storage layer would have a timecomplexity of O(1). Web3, which is the client used to connect to Ethereum node is doesnot support passing structures downstream yet. As and when support is added for dy-namic structures being returned downstream, the model contract can have the processinglogic to collect the package with all its dependencies. This would decouple the storageand the processing and the model contract can be changed with time.

Page 12: Automatic Software Dependency Management …trap.ncirl.ie/3300/1/gavindmello.pdfAutomatic Software Dependency Management using Blockchain Gavin D’mello x17110483 Msc Research Project

Algorithm 2 Install package algorithm

1: procedure getpackage. *Smart contract storage

2: packages← map of packages. *inputs

3: major ← version major4: minor ← version minor5: patch← version patch6: result← []7: if packages[pn][major][minor][patch] != null then8: packageInfo← packages[pn][major][minor][patch]9: dependencies← packageInfo[’dependencies’]

10: result← packageInfo, dependencies11: success← true12: else13: success← false14: end if

return success, result15: end procedure

Figure 6: Contract Deployment pattern.

4.4 Diagrams

Class diagram

Contracts are immutable in nature. In a centralized setup, we just update the code withand deploy to all the servers. While using contracts cannot be updated, new contractsneed to be deployed. We need to make sure we do not lose references to our storagelayer. The storage layer is separated to ensure a reference is maintained. Every changeto the contract would require an update to the client side binaries, this would be highlyinefficient. To overcome this issue, we propose a register-contract which contains a refer-ence to the main contract. The client function will have to execute the register-contractand get the address of the main contract. Once the address is received, it will be able toget the latest code of the contract. Separation of concerns is of prime importance in theblockchain.

Page 13: Automatic Software Dependency Management …trap.ncirl.ie/3300/1/gavindmello.pdfAutomatic Software Dependency Management using Blockchain Gavin D’mello x17110483 Msc Research Project

Figure 7: Package installation.

Figure 8: Publishing a package.

Sequence diagrams

The installation sequence diagram shows the installation of a package. The diagramshows the interaction between the client, the network, and the IPFS server. In HDFS,the blocks do not flow through the name-node. However, the name-node does keep a logof all the blocks on the data nodes Shvachko et al. (2010). Similar to HDFS, the file datadoes not go through the blockchain nodes. The file data is fetched from the IPFS servers.This would increase the throughput and also avoid the blockchain intermediation issue.

The package to be published is uploaded is converted to a tar file. Once the packageis compressed, it is uploaded to IPFS. IPFS returns a hash link which helps us uniquelyidentify the file when needed. We take this hash link and send it to the blockchain nodewhich stores the package information.

The client state diagram shows the flow of events on the client side library. The client-side library will not installed packages that are already installed unless the developer triesto install a different version of the same package. By doing this we save a considerableamount of bandwidth. The client maintains a local state which keeps a track of thepackages that are installed. Top level packages will directly replace the other version bydesign. The packages and their dependencies will be installed in a shared space to avoidduplication. Dependencies of the different versions will be kept local to the package thatrequires them and not in a shared space.

Page 14: Automatic Software Dependency Management …trap.ncirl.ie/3300/1/gavindmello.pdfAutomatic Software Dependency Management using Blockchain Gavin D’mello x17110483 Msc Research Project

Figure 9: State diagram for the client.

Page 15: Automatic Software Dependency Management …trap.ncirl.ie/3300/1/gavindmello.pdfAutomatic Software Dependency Management using Blockchain Gavin D’mello x17110483 Msc Research Project

5 Implementation

The proposed solution uses Ethereum smart contracts to store package metadata andbinaries in IPFS. Table 3 shows the software requirements for the implementation. TheGeth binary was used to run the Ethereum node. The web3 package was used for theclient side implementation. Solidity was used to write the smart contract. The web3client gives a nice interface to interact with Ethereum based smart contract. A CentOSinstance was created on Open stack to run the Geth binary and also the P2P storageserver. The Geth binary is built in Go which means it is compatible with operatingsystem.

The setup can be used as a standalone system or in collaboration with another packagemanager as shown in figure 10. The standalone system would require users identifyingthemselves to the network and would require sufficient funds to submit transactions. Thecollaboration with other package managers would allow anyone to submit transactions asthere would only group of services publishing for everyone. The copies of the transactionswould be maintained in both the centralized and the decentralized systems.

The NPM listener pushes metadata to interested services. The metadata consists ofthe name, version, license etc. A daemon was created to get the metadata from NPM.This daemon also excludes packages which don’t have a valid version. It was noticedthat the speed at which the packages arrived was too fast. This would cause transactionclouding on the Ethereum node. A queue was placed to improve the flow control betweenthe workers and the listener.

Another CentOS instance was used to run the workers and the service. The messagesare given to the workers in a round robin fashion. The workers are the processes whichupload the binaries to the peer to peer storage and push the metadata on the Ethereumnetwork. The workers can be horizontally scaled out to improve job throughput. Theworkers are decoupled from the listener.

Table 3: Software requirements.

Software Version

Geth v1.8.13Solidity v0.4.24Node.js v6.9.1IPFS v0.4.17web3 v0.18.4

The smart contract models the dependencies graph of each package. A graph networkcan be visualized with contract storing all the dependencies and the dependents on thepackage. The public nature of the Ethereum makes it well suited for open source packages.The storage contract has data operations and is primarily the data layer of the system.As of now, web3 does not have a stable way of handling dynamic structures. Once thesupport for the this is stable a model contract can be used to leverage this feature. Amodel contract as shown in figure 6 can be used to improve the algorithm or add otherinteresting features on top of the storage layer. This pattern is also used to ensure thatwe do not lose the reference to our storage data as contracts are immutable and structurescannot be changed once the contract is deployed.

Page 16: Automatic Software Dependency Management …trap.ncirl.ie/3300/1/gavindmello.pdfAutomatic Software Dependency Management using Blockchain Gavin D’mello x17110483 Msc Research Project

Web3 v0.20 was chosen over v1.0 beta because it’s been around longer. Also, some ofthe experimental solidity features on string arrays were tried. The bytes array was foundto be more stable than the string array in solidity. This, however, has an additionalcost of converting all the byte elements to string elements after getting them from thecontract.

Figure 10: System Architecture for publishing new packages

A test simulator was used for the creation of the contract and initial testing. Thetest simulator has fake accounts and fake Ethereum which can be easy to check thefunctionality before deploying the main or any test Ethereum network.

The Rinkeby test network was used for the actual execution of the smart contracts.Rinkeby uses Proof of Authority, which means that there are authorized set of minerswho verify transactions and add blocks. Ether needs to be expended while submittingtransactions to the Smart contract. This Ether was received from the Ethereum Rinkebyfaucet.

The block synchronisation of Ethereum nodes depends on other peers in the testnetwork. The Ethereum node first synchronizes all the blocks and then starts acceptingcontracts and transactions. The full synchronization mode requires a lot of time as thechain is very large. The light mode, on the other hand, is the fastest option but does nothave much support from peers. The fast synchronization mode just downloads the blockheaders instead of the entire block and it also has enough support from the communityso the synchronization in this mode is fast.

The main network uses the Proof of work concept where the miners need to expendCPU to solve a computation problem and then add blocks to the network. The minersare rewarded with Ether once they solve the problem. The test network instead workson a proof of authority concept. It means there are predefined nodes which are used toverify transactions and adding blocks on the network. There are no miners on the testRinkeby network. The implementation of proof of authority reduces the risk of the chainbecoming big quickly because the authorized nodes include blocks at a fixed rate.

In all its essence, the Ethereum network is similar to a PaaS system. The contracts aredistributed to all the miners, who execute the function with the transaction parameters forus and broadcast the results over the network. The clients pay for submitting transactionsand deploying contracts, similar to how we pay the cloud service providers for deployingservices on their platform. We do not control the environment in which the transactionis mined. In case we need more control on the mining activity, it would be good to optfor a private network rather than a public one.

Page 17: Automatic Software Dependency Management …trap.ncirl.ie/3300/1/gavindmello.pdfAutomatic Software Dependency Management using Blockchain Gavin D’mello x17110483 Msc Research Project

6 Evaluation

Our evaluation has been performed using 4338 packages, including the top 950 packageswhich are used directly and transitively by other packages as documented in Kashcha(2018).

It has been noticed that if the Ethereum node is bombarded with transactions, thenode does not broadcast those transactions immediately and these get stuck on the nodeas pending transactions. Miners get an incentive for mining blocks on the main network,so the main network has more miners. The Rinkeby network does not have any miners asit works on a Proof of Authority system. Only authorized nodes can verify transactionson the Rinkeby network. The authorized nodes ensure that the Rinkeby network growsat a steady pace.

Consequently, we have to set the gas price appropriately in order for the transactions tobe mined quickly. Had the gas price been low, the transactions would not have been pickedup for mining because the miners would have found other more lucrative transactions.Consequently, our initial gas price was 1 gwei for transactions and, by using a simpledemand-supply iterative function, it was finally increased to 5 gwei.

Our tests ran for a week. It was initially noticed that the workers would crash becauseof insufficient funds. As the funds come from the faucet, which happens to be rate limited,we decided to slow submit transactions and keep adaptively funding the address with moreether.

Bandwidth monitoring was kept on the instance where the blockchain node was in-stalled. The results are plotted in Figure 11. The evaluation involved requesting a list ofpackages along with their versions present on the node. The metadata for these packageswas then requested individually.

0 20 40 60 80 1000

100

200

300

400

500

600

700

seconds

Ban

dw

idth

inkB

it/s

Bandwidth measurement

Figure 11: Empirical evaluation of Bandwidth usage

Nload was used to track the bandwidth of the instance. As seen in the figure, thereis a sharp increase in the outgoing bandwidth, as much as 650.48 Kbit/s. The eventhappens because all the events are requested from the node. Requesting the events forthe packages caused the bandwidth to spike up. The outgoing bandwidth graduallytrickled down to 396.32 Kbit/s as other package metadata was requested.

Latency test was conducted on the Blockchain node. The results for the test are givenin 4.

Page 18: Automatic Software Dependency Management …trap.ncirl.ie/3300/1/gavindmello.pdfAutomatic Software Dependency Management using Blockchain Gavin D’mello x17110483 Msc Research Project

Table 4: Latency test for getting package meta data from the node.

Number of packages Statistics (ms)Head Mean Median standard deviation

250 148.3 146 8.73500 149.578 147 9.231000 148.991 147 9.05

It was noticed that the mean latency to get the package metadata from the Blockchainnode is approximately 148 ms. The cost to get the overall package would be higher asthere is no handling for parsing dynamic structures on the web3 client. Once there is astable support for dynamic structures, metadata for many packages can be requested atonce.

Figure 12: Network graph of packages (partial view).

Referring to 12, we can see at many packages in the Javascript ecosystem can havemultiple levels of dependencies. The graph is modeled directly from the data receivedfrom the Blockchain node. It was noticed that some packages have cyclic dependencies.The impact of the cyclic nature of the dependencies has not been fully studied, but weargue that it may be detrimental to the overall traceability and version control for arepository.

The highest number of direct dependents observed in the subset which was testedis 361. Some packages such as lodash, commander and body-parser show an in-

Page 19: Automatic Software Dependency Management …trap.ncirl.ie/3300/1/gavindmello.pdfAutomatic Software Dependency Management using Blockchain Gavin D’mello x17110483 Msc Research Project

creasing number of first level dependents. This number is likely to grow as previouslyshown Kashcha (2018). These packages are critical to the community as they form thebasis of any sort of development.

7 Conclusion and Future Work

The proposed system is a reliable decentralised architecture for package management.The system relies on the Ethereum for functioning. Smart contracts have been used forintegrating our logic onto the blockchain network, where IPFS has been proposed forstoring actual binaries.

The blockchain network can have any number of nodes in the network at a time. Theusers can install the Ethereum node on their infrastructure or use it as a service in order tokeep track of the dependencies. Also, the underlying architecture can be also be used forany kind of dependency management with tweaks to the client and version changes in thecontract. The native client will, however, change according to the platform. Blockchainnetworks have good read throughout compared to the write throughput which could beideal for dependency management.

Further work is envisioned to study dependencies at multiple levels as our currentwork has only taken into account single-level. Ideally, it may be useful to study in detailcliques which can uncover not only functional software properties, but also non-functionalcharacteristics such as developer relationships, code styles, and other useful patterns.

References

Abate, P., Cosmo, R. D., Treinen, R. and Zacchiroli, S. (2012). Dependency solving: Aseparate concern in component evolution management, Journal of Systems and Soft-ware 85(10): 2228–2240.

Alam, S., Kelly, M. and Nelson, M. L. (2016). Interplanetary wayback: The permanentweb archive, JCDL ’16, ACM, Newark, pp. 273–274.

Balalaie, A., Heydarnoori, A. and Jamshidi, P. (2016). Microservices architecture enablesdevops: Migration to a cloud-native architecture, IEEE Software 33(3): 42–52.

Benet, J. (2014). Ipfs-content addressed, versioned, p2p file system. Accessed on19.2.2018.URL: https://arxiv.org/pdf/1407.3561.pdf

Buterin, V. et al. (2014). A next-generation smart contract and decentralized applicationplatform, Technical report. Accessed on 19.2.2018; Cited by 215.URL: https://github.com/ethereum/wiki/wiki/White-Paper

Cox, R. (2016). Version SAT. Accessed on 19.2.2018.URL: https://research.swtch.com/version-sat

Decan, A., Mens, T. and Claes, M. (2016). On the topology of package dependencynetworks: A comparison of three programming language ecosystems, ECSA ’16, ACM,Copenhagen, pp. 21:1–21:4.

Page 20: Automatic Software Dependency Management …trap.ncirl.ie/3300/1/gavindmello.pdfAutomatic Software Dependency Management using Blockchain Gavin D’mello x17110483 Msc Research Project

Highsmith, J. and Cockburn, A. (2001). Agile software development: the business ofinnovation, Computer 34(9): 120–127.

Kashcha, A. (2018). Npm rank. Accessed on 25.07.2018.URL: https://gist.github.com/anvaka/8e8fa57c7ee1350e3491

Krueger, C. W. (1992). Software reuse, ACM Computing Surveys 24(2): 131–183.

Nakamoto, S. (2008). Bitcoin: A peer-to-peer electronic cash system. Accessed on19.2.2018; Cited by 3059.URL: https://bitcoin.org/bitcoin.pdf

Nikitin, K., Kokoris-Kogias, E., Jovanovic, P., Gailly, N., Gasser, L., Khoffi, I., Cap-pos, J. and Ford, B. (2017). CHAINIAC: proactive software-update transparency viacollectively signed skipchains and verified builds, USENIX Security 2017, USENIXAssociation, Vancouver, pp. 1271–1287.

Ottenstein, K. J. and Ottenstein, L. M. (1984). The program dependence graph in asoftware development environment, SIGPLAN Not. 19(5): 177–184.

Ruparelia, N. B. (2010). Software development lifecycle models, SIGSOFT Softw. Eng.Notes 35(3): 8–13.

Schlueter, I. (2010). The node package manager and registry. Accessed on 19.2.2018.URL: https://www.npmjs.org

Semantic Versioning user guide (2013). Accessed on 10.3.2018.URL: https://semver.org/

Shvachko, K., Kuang, H., Radia, S. and Chansler, R. (2010). The hadoop distributed filesystem, MSST’10, IEEE, Lake Tahoe, pp. 1–10.

Toffetti, G., Brunner, S., Blchlinger, M., Spillner, J. and Bohnert, T. M. (2017). Self-managing cloud-native applications: Design, implementation, and experience, FutureGeneration Computer Systems 72: 165 – 179.

Tucker, C., Shuffelton, D., Jhala, R. and Lerner, S. (2007). OPIUM: Optimal packageinstall/uninstall manager, ICSE ’07, IEEE, Minneapolis, pp. 178–188.

Wittern, E., Suter, P. and Rajagopalan, S. (2016). A look at the dynamics of the JavaS-cript package ecosystem, MSR’16, ACM/IEEE, Austin, pp. 351–361.