visualizing dynamic bitcoin transaction patterns

11
For Peer Review Only/Not for Distribution Big Data Visualizing Dynamic Bitcoin Transaction Patterns Journal: Big Data Manuscript ID BIG-2015-0056.R3 Manuscript Type: Original Article Date Submitted by the Author: n/a Complete List of Authors: McGinn, Dan; Imperial College London, Computing Birch, David; Imperial College London, Computing Akroyd, David; Imperial College London, Computing Molina-Solana, Miguel; Imperial College London, Computing Guo, Yi-ke; Imperial College London, Computing Knottenbelt, William; Imperial College London, Computing Keywords: Big data analytics, Structured data, Data mining ScholarOne Support phone: 434-964-4100 email: [email protected] Mary Ann Liebert, Inc.

Upload: others

Post on 05-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Visualizing Dynamic Bitcoin Transaction Patterns

For Peer Review O

nly/Not for Distribution

Big Data

Visualizing Dynamic Bitcoin Transaction Patterns

Journal: Big Data

Manuscript ID BIG-2015-0056.R3

Manuscript Type: Original Article

Date Submitted by the Author: n/a

Complete List of Authors: McGinn, Dan; Imperial College London, Computing Birch, David; Imperial College London, Computing Akroyd, David; Imperial College London, Computing Molina-Solana, Miguel; Imperial College London, Computing Guo, Yi-ke; Imperial College London, Computing Knottenbelt, William; Imperial College London, Computing

Keywords: Big data analytics, Structured data, Data mining

ScholarOne Support phone: 434-964-4100 email: [email protected]

Mary Ann Liebert, Inc.

Page 2: Visualizing Dynamic Bitcoin Transaction Patterns

For Peer Review O

nly/Not for DistributionVisualizing Dynamic Bitcoin Transaction Patterns

Dan McGinn, David Birch, David Akroyd, Miguel Molina-Solana, Yike Guo, and

William J. Knottenbelt

Data Science Institute, Imperial College London

Dated: May 14, 2016

Abstract

This work presents a systemic top-down visualization of Bitcoin transaction activity to explore dy-

namically generated patterns of algorithmic behaviour. Bitcoin dominates the cryptocurrency markets

and presents researchers with a rich source of real-time transactional data. The pseudonymous yet public

nature of the data presents opportunities for the discovery of human and algorithmic behavioural patterns

of interest to many parties such as financial regulators, protocol designers and security analysts. However

retaining visual fidelity to the underlying data in order to retain a fuller understanding of activity within

the network remains challenging, particularly in real-time.

We expose an effective force-directed graph visualization employed in our large-scale data observation

facility to accelerate this data exploration and derive useful insight amongst domain experts and the general

public alike. The high fidelity visualizations demonstrated in this paper allowed for collaborative discovery

of unexpected high frequency transaction patterns including automated laundering operations and the

evolution of multiple distinct algorithmic denial of service attacks on the Bitcoin network.

Introduction

Deriving insight into the dense data sets generatedby modern computational and sensing systems is stillprimarily performed by humans in possession of do-main knowledge and the necessary mathematical andstatistical tools. Visualization has also been shownto be an effective way of gaining insights into theavailable data. In that regard, the volume edited byCard, Mackinlya and Schneiderman 9 is still a valu-able reference and provides plenty of examples of suchvisualizations.

A system of interest which generates a largeamount of connected data and lacks meaningful sys-temic visualization tools is that of Bitcoin.10 Thiscryptocurrency system is primarily composed of apermissionless public database to which anyone witha tokenized pseudonymous identity may write pro-tocol conformant data. Since identity is obfuscatedthrough the use of tokenized addresses, the ability toidentify and classify anomalous patterns of behaviourin the data has utility to many interested parties suchas financial regulators (in the case of money launder-ing activity for example) or protocol developers (inthe case of attacks on the system’s resilience). Con-ducting an initial graphical observation is a usefulfirst step in the data-analysis workflow to investigatethe structural properties of such repeated anomalousbehaviours. We investigated different visualizationsable to provide this useful exploratory insight intothe underlying behaviours observable in the data.

This paper describes the design and developmentof tools for dynamically visualizing Bitcoin transac-tions. The visualizations demonstrated in this paperhave enabled the discovery of unexpected transac-tion patterns such as money laundering activity andthe observation of several distinct denial of serviceattacks on the Bitcoin network. This allowed rapidunderstanding amongst researchers of the structureof such behavioural patterns for accelerated analysisand classification investigation.

The tools have been successfully deployed in ourdata observatory facility: a high-resolution 64 screendistributed rendering cluster with a canvas of 132Mpixels. We reflect upon how the employment of sucha large-scale observatory environment benefits a moreeffective data visualization and provides for greaterinsight into the data.

Bitcoin Network and Data

Bitcoin, with its inception dating from 2009, is thedominant cryptocurrency implementation. The sys-tem is primarily composed of an agreed protocolfor broadcasting exchanges of value between tok-enized participants of a peer-to-peer network. Thesetransaction records are subsequently regularly ver-ified by specialist ‘mining’ nodes on the network,whose honesty is ensured through economic jeop-ardy, and recorded into a publicly distributed tam-perproof ledger known as the blockchain. By design,

1

Page 2 of 11

ScholarOne Support phone: 434-964-4100 email: [email protected]

Mary Ann Liebert, Inc.

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 3: Visualizing Dynamic Bitcoin Transaction Patterns

For Peer Review O

nly/Not for DistributionPage 3 of 11

ScholarOne Support phone: 434-964-4100 email: [email protected]

Mary Ann Liebert, Inc.

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 4: Visualizing Dynamic Bitcoin Transaction Patterns

For Peer Review O

nly/Not for Distributionbottom-up approaches. The first interesting deploy-ment of small-scale visualization to directly analysetransaction data in the blockchain is presented by DiBattista et al,21 which exposes a tool to perform abottom-up visual analysis of the influence of selectedsource transactions on subsequent flows in the trans-action graph.With 132M pixels at our disposal, our motivation

was to generate a top-down system-wide visualizationin order to explain Bitcoin to a lay-audience and tobegin an explorative analysis of algorithmic patternsof associated behaviours in the transaction data.

Total Bytes: 54,814,349,473Total Blocks: 391,570Total Transactions: 101,533,304Total Inputs: 267,860,693Total Outputs: 301,970,961Implied UTXO’s: 34,110,268BTC Minted: 15,039,250Market Cap @431$/B: $6.48bn

Table 1: Bitcoin blockchain summary statistics at the7 year anniversary of the genesis block on 03Jan16

The Bitcoin blockchain, with its canonical orderingof sequences of transactions and associations betweenspending addresses, naturally lends itself to graph vi-sualization, and that is the focus of our work. How-ever faced with the large size of the full transactiongraph described in Table 1, any visualization effort isforced to compromise between which discrete subsetof data to visualize and how to abstract away unnec-essary detail. Previous bottom-up approaches haveachieved this by restricting the scope of their analysesto identifying a limited subset of starting points of in-terest in the blockchain from which to visualize. Ad-dress based graph visualizations have typically beenseparated from transaction based graphs. Further-more details of the particular associations in trans-action graphs are usually abstracted away into sum-mary form. Specifically a transaction is the only typeof node represented in typical transaction graph visu-alizations, with its edge associations between its in-puts and any number of other transactions and theiroutputs abstracted to a single labelled edge betweentransaction nodes. Whilst retaining enough informa-tion for quantitative analysis, the visual fidelity tothe underlying data is much reduced. Concretely,visually identifying a transaction with an unusuallylarge number of outputs, or an anomalous amount ofbitcoin sourced from a previous transaction becomesan arduous visual operation on textual data in suchabstracted form.With the full benefit of the large-scale digital can-

vas available in our data observatory, our visualiza-tion goal was to remain as faithful to the underlyingdata as possible in order to retain the richest observa-tional insight into the identification of anomalies andpatterns of behaviour. In particular we found it im-

portant to retain visual impact regarding the inputand output structure of a transaction, the relativevalue of transactions, and to maintain associationsbetween both transactions and addresses within thescope of a single visualization. We chose to restrictour subset of blockchain data based on sequential se-ries of blocks without abstraction. In order to layoutour graph in a force-directed minimum energy equi-librium state to visually discern its structure we usedthe continuous ForceAtlas219 algorithm available inthe SigmaJS23 library. The implementation providesfor Barnes-Hut optimisation familiar to n-body sim-ulations in order to reduce the computational com-plexity from O(N2) to O(NlogN). To that end thebasic design of our graph visualization is as follows:

• Transactions are visualized as nodes in a neu-tral colour whose size is fixed at the value ofthe current coinbase reward (25BTC) in orderto give a fixed sense of scale since the size of in-put and output nodes is variable depending onvalue. A transaction node’s only purpose thoughis to provide a local focus for its associated in-puts and outputs.

• Inputs are nodes of an orange colour whose sizeis proportional to its value. They are associ-ated to their containing transaction by an orangeedge.

• Outputs are nodes blue in colour whose size isalso proportional to its value. They are asso-ciated to their containing transaction by a blueedge and if an output should become referencedas an input in a subsequent transaction withinthe scope of the visualization, it is joined to thattransaction by an orange input edge, thus form-ing a chain of spends.

• Addresses are visualized as a grey associativeedge only if more than one input or output ref-erences the same address within the scope of thevisualization.

Figure 2: Stylized transaction visualization sourcing5 equal input amounts from a single address and pay-ing 25BTC to a new address

3

Page 4 of 11

ScholarOne Support phone: 434-964-4100 email: [email protected]

Mary Ann Liebert, Inc.

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 5: Visualizing Dynamic Bitcoin Transaction Patterns

For Peer Review O

nly/Not for DistributionIt can be seen from the stylized representation

shown in Figure 2 that all contextual and associa-tion information from the transaction data structurecan be visualized in one graph and thus any amounts,structures of individual transactions, high-frequencychains of spends or address associations of an anoma-lous nature will be immediately apparent by visualinspection.

Visualizing Bitcoin Transactions

We now take our transaction representation and ap-ply it to an animated graph whose layout evolves inreal-time to visualize transactions and their associ-ations as they are broadcast into the network andjoin all peers’ mempools. Furthermore we apply thesame animated force-directed visualization to exploreindividual blocks of static data laid out on request inorder to explore past behaviours. In order to gentlyintroduce a lay audience to some of the the abstractconcepts of Bitcoin, we also produced a global vi-sual manifestation of the activity on the peer-to-peernetwork, less intimidating in its complexity.

Mempool Visualization

The aim of this animated visualization∗ (e.g. Fig-ure 4(A)) was to demonstrate the current activityand degree of connectivity as transactions enter themempool in real-time through a continuously up-dated force-directed graph layout. By interactingwith the Bitcoin network through known stained ad-dresses, it is also possible to conduct an active dataanalysis by identifying one’s own transactions andthe network’s responses.

Figure 3: Visualizing a simple chain of spends in themempool with blue outputs from one transaction be-coming orange inputs to the next, from a source coin-base transaction in red.

Independent transactions are visually associated toeach other in two ways: either directly through anexisting output becoming an input to a new transac-tion within the timeframe of the visualization, or in-directly through the re-use of the same cryptographicpublic key within an element of a transaction, whichwe connect with a grey edge.Interacting with the visualization is simple. We

provide for pan, zoom and hover over methods todisplay uncluttered textual data such as transactionreferences and address information. We facilitate fur-ther detailed data analysis by highlighting connected

∗A low resolution video demonstrating the system can be

found at: https://imperialcollegelondon.app.box.com/v/

bitcoinVis

components along with the ability to transmit suchsub-component data in JSON by PeerJS to hand-heldtablet displays for a more detailed, localized analy-sis directly linked to online Bitcoin exploration toolssuch as blockchain.info. Filtering the visualized dataset by amount, address or reference is also possiblefrom the hand-held tablet display.

Figure 4: (A) High resolution (8k) visualization ofa standard block; (B) detail of both a low (smallnode) and a high (large node) value transaction, (C)known and linked Bitcoin addresses, (D) a payoutsystem, and (E) a highly associated disconnectedcomponent believed to be a coin-tumbling service tomove amounts rapidly between addresses, obfuscat-ing the source and destination of funds.

4

Page 5 of 11

ScholarOne Support phone: 434-964-4100 email: [email protected]

Mary Ann Liebert, Inc.

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 6: Visualizing Dynamic Bitcoin Transaction Patterns

For Peer Review O

nly/Not for DistributionPage 6 of 11

ScholarOne Support phone: 434-964-4100 email: [email protected]

Mary Ann Liebert, Inc.

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 7: Visualizing Dynamic Bitcoin Transaction Patterns

For Peer Review O

nly/Not for DistributionPage 7 of 11

ScholarOne Support phone: 434-964-4100 email: [email protected]

Mary Ann Liebert, Inc.

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 8: Visualizing Dynamic Bitcoin Transaction Patterns

For Peer Review O

nly/Not for DistributionPage 8 of 11

ScholarOne Support phone: 434-964-4100 email: [email protected]

Mary Ann Liebert, Inc.

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 9: Visualizing Dynamic Bitcoin Transaction Patterns

For Peer Review O

nly/Not for DistributionPage 9 of 11

ScholarOne Support phone: 434-964-4100 email: [email protected]

Mary Ann Liebert, Inc.

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 10: Visualizing Dynamic Bitcoin Transaction Patterns

For Peer Review O

nly/Not for Distributionable to understand the layout of the linking betweentransactions far more clearly than the raw data, andthe majority of people were then able to spot anoma-lous patterns in the visualisation and question theirsignificance based on oral feedback after the initialpresentation.For visiting executives, the conversations tended

towards questioning the anonymity of the data to as-certain the feasibility of tracking transactions acrosstime to determine their origin. They were able toidentify the majority of formed structures thoughgenerally were more interested in the ability to applythe visualization to alternative financial transactions.For researchers from different fields, a large num-

ber of observations were made about the resemblanceto areas in their areas of expertise. In particular,those in medical and biological fields made referenceto the visual similarities between the network attacksand parasitic organisms. Again, there was ease inthe recognition of structures as well as the ability toidentify them in further block illustrations.The greatest benefit however was to researchers

both internal and external specifically working in thefield of cryptocurrencies. As with previous groups,the large size of the visualization allowed viewing asa group rather than an individual, but in addition,the ability to identify an individual transaction ina block that might contain several thousand. Thiscan be then recorded for later study, or investigatedwithin the space. The ability to identify large trans-actions, as well as identify the patterns for hostilealgorithms, coin-tumbling services, payment servicesand otherwise unknown transaction patterns allowedfor continuing research.

Conclusions

This paper presents the development of tools to gainan exploratory understanding of associated patternsof behaviour in the densely connected dataset of allBitcoin transactions. Compared to previous bottom-up approaches exploring data from singular sourcetransactions, our approach has been to generate atop-down system-wide visualization enabling patterndetection subsequently allowing drilled-down detailinto any transaction. Furthermore we have shownhow we combine both the transaction and addressgraphs into one high-fidelity visualization of associa-tions.Precisely, these visualizations have elegantly re-

vealed the structure of the recurring high frequencypatterns of an algorithmic denial of service attack onthe Bitcoin system, and revealed previously hiddeninsights into the multiple distinct phases of such at-tack. Identification and classification of such observ-able patterns of behaviour amongst other recurringpatterns such as money laundering have provideduseful kernels for analysis and discussion amongstmulti-disciplinary researchers.

In brief, the described visualizations have provedtheir usefulness for three distinct purposes: 1) un-derstanding transaction patterns, 2) collaborativelyevaluating and exploring these patterns with groupsof experts, and 3) finally providing an introductoryeducational primer on the operation of the Bitcoinsystem to the general public.

Acknowledgements

The infrastructure for the KPMG Data Observatoryto which these visualizations were deployed has beenpartially funded by KPMG and Imperial College.This work is supported by the Digital City Exchangeproject funded by RCUK Grant ref: EP/I038837/1.The authors wish to acknowledge the support of Im-perial College’s Centre for Cryptocurrency Researchand Engineering in preparing this work.

Author Disclosure Statement

No competing financial interest exist.

References

1 Cruz-Neira C, Sandin DJ, De Fanti TA, KenyonRV, Hart JC. The CAVE: Audio visual expe-rience automatic virtual environment. CommunACM 1992; 35(6):64–72. ISSN 0001-0782. doi:10.1145/129888.129892.

2 Febretti A, Nishimoto A, Thigpen T, Talandis J,Long L, Pirtle J, Peterka T, Verlo A, Brown M,Plepys D, Sandin D, Renambot L, Johnson A,Leigh J. CAVE2: A hybrid reality environmentfor immersive simulation and information analy-sis. In Proc. IS&T / SPIE Electronic Imaging,The Engineering Reality of Virtual Reality 2013San Francisco, US 2013.

3 Leigh J, Johnson A, Renambot L, Peterka T, JeongB, Sandin DJ, Talandis J, Jagodic R, Nam S, HurH, Sun Y. Scalable resolution display walls. ProcIEEE 2013; 101(1):115–129. doi:10.1109/JPROC.2012.2191609.

4 Reda K, Febretti A, Knoll A, Aurisano J, LeighJ, Johnson A, Papka M.E, Hereld, M. Visualiz-ing Large, Heterogeneous Data in Hybrid-RealityEnvironments. IEEE Comput Graph Appl 2013;33(4):38–48. ISSN 0272-1716. doi:10.1109/MCG.2013.37.

5 Li K, Hibbs M, Wallace G, Troyanskaya O. Dy-namic scalable visualization for collaborative sci-entific applications Proc. 19th IEEE InternationalParallel and Distributed Processing Symposium(IPDPS’05). doi:10.1109/IPDPS.2005.183.

9

Page 10 of 11

ScholarOne Support phone: 434-964-4100 email: [email protected]

Mary Ann Liebert, Inc.

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 11: Visualizing Dynamic Bitcoin Transaction Patterns

For Peer Review O

nly/Not for Distribution6 Wallace G, Anshus O-J, Bi P, Chen H, Chen Y,Clark D, Cook P, Finkelstein A, Funkhouser T,Gupta A, Hibbs M, Li K, Liu Z, Samanta R, Suk-thankar R, Troyanskaya O. Tools and Applica-tions for Large-Scale Display Walls IEEE ComputGraph Appl 2005; 25(4):24–33. ISSN 0272-1716.doi:10.1109/MCG.2005.89.

7 Doerr K-U, Kuester F CGLX: A Scalable, High-Performance Visualization Framework for Net-worked Display Environments IEEE Trans VisComput Graph 2011; 17(3):320–332. ISSN 1077-2626. doi:10.1109/TVCG.2010.59.

8 Shneiderman B. The eyes have it: a task by datatype taxonomy for information visualizations. ProcIEEE Symposium on Visual Languages 1996; 336–343. doi:10.1109/VL.1996.545307.

9 Card S-K, Mackinlay J-D, Shneiderman B. Read-ings in Information Visualization: Using Vision toThink. Morgan Kaufmann Publishers Inc. 1999ISBN 1-55860-533-9.

10 Nakamoto S. Bitcoin: A Peer-to-Peer ElectronicCash System. Technical report 2008. Available athttps://bitcoin.org/bitcoin.pdf Accessed on2016-04-15.

11 Reid F, Harrigan M. An Analysis of Anonymityin the Bitcoin System. In Altshuler Y, Elovici Y,Cremers AB, Aharony N, Pentland A (eds.), Secu-rity and Privacy in Social Networks Springer NewYork2013. ISBN 978-1-4614-4138-0, 197–223. doi:10.1007/978-1-4614-4139-7{\ }10.

12 Ron D, Shamir A. Quantitative analysis of the fullbitcoin transaction graph. In Sadeghi AR (ed.), Fi-nancial Cryptography and Data Security SpringerBerlin Heidelberg2013, volume 7859 of LectureNotes in Computer Science. ISBN 978-3-642-39883-4, 6–24. doi:10.1007/978-3-642-39884-1{\ }2.

13 Meiklejohn S, Pomarole M, Jordan G, LevchenkoK, McCoy D, Voelker GM, Savage S. A fistfulof bitcoins: Characterizing payments among menwith no names. In Proc. 2013 Internet Measure-ment Conference New York, NY, USA: ACM2013.ISBN 978-1-4503-1953-9, 127–140. doi:10.1145/2504730.2504747.

14 Ober M, Katzenbeisser S, Hamacher K. Structureand Anonymity of the Bitcoin Transaction Graph.Future Internet 2013; 5(2):237–250. doi:10.3390/fi5020237.

15 Baumann A, Fabian B, Lischke M. Exploring theBitcoin network. In Proc. 10th Int. Conf. on WebInformation Systems and Technologies (WEBIST)Barcelona, Spain2014. 369–374.

16 Miller A, Litton J, Pachulski A, Gupta N, SpringN, Bhattacharjee B, Levin D. Discovering Bit-coins Public Topology and Influential Nodes. Pre-publication 2015; .

17 Biryukov A, Khovratovich D, Pustogarov I.Deanonymisation of clients in bitcoin P2P net-work. In Proc. ACM SIGSAC Conf. on Com-puter and Communications Security New York,NY, USA: ACM2014. ISBN 978-1-4503-2957-6,15–29. doi:10.1145/2660267.2660379.

18 Bitnodes. Available at https://getaddr.bitnodes.ioAccessed on 2015-12-15.

19 Jacomy M, Venturini T, Heymann S, BastianM. ForceAtlas2, a Continuous Graph LayoutAlgorithm for Handy Network Visualization De-signed for the Gephi Software. PLoS ONE 2014;9(6):e98679. doi:10.1371/journal.pone.0098679.

20 BBC Click. BBC News Channel, first shown 5thDecember 2015.

21 Battista G-D, Donato V-D, Patrignani M, PizzoniaM, Roselli V, Tamassia R. BitconeView: visualiza-tion of flows in the bitcoin transaction graph. ProcIEEE Symposium on Visualization for Cyber Secu-rity (VizSec) 2015; 1–8. ISBN 978-1-4673-7600-6.doi:10.1109/VIZSEC.2015.7312773

22 Moser, M. Moser - 2013 - Anonymity of bit-coin transactions An analysis of mixing services.Anonymity of bitcoin transactions: An analysis ofmixing services Proc of Munster Bitcoin Confer-ence 2013; 17–18.

23 SigmaJS. SigmaJS Javascript Library 2015https://github.com/jacomyal/sigma.js/

10

Page 11 of 11

ScholarOne Support phone: 434-964-4100 email: [email protected]

Mary Ann Liebert, Inc.

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960