sequence crdt: a scalable sequence encoding for massive collaborative editing brice nédelec, pascal...

28
Sequence CRDT: A Scalable Sequence Encoding for Massive Collaborative Editing Brice Nédelec, Pascal Molli & Achour Mostefaoui GDD – LINA – University of Nantes Workshop on Highly-Scalable Distributed Systems Wednesday 14 January 2015, Paris France.

Upload: selena-goodloe

Post on 14-Dec-2015

234 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1

Sequence CRDT: A Scalable Sequence Encoding for Massive Collaborative Editing Brice Ndelec, Pascal Molli & Achour Mostefaoui GDD LINA University of Nantes Workshop on Highly-Scalable Distributed Systems Wednesday 14 January 2015, Paris France. Slide 2 Distributed Collaborative Editors Distributed Collaborative Editors allow people to work distributed in space, time and organizations. Google Doc, Etherpad, Google Wave 190M users on GDrive. (include Gdoc) Slide 3 Google Doc is great, but... Single point of failure: If provider is down -> no collaboration Privacy, economic intelligence: What if google search for ANR on 15 October ;) ? Mass editing: Google has limitations on simultaneous users (50), up to 50 -> just readers Slide 4 Is it possible to build a fully decentralized editor that support 1M of simultaneous users? Why? Because it is hard ;) We choose to go to the moon in this decade and do the other things, not because they are easy, but because they are hard. Kennedy 1962 Because it can also be useful, mass collaboration -> Mooc, Webinars, events, Google Wave has already been used like that Slide 5 Distributed Collaborative Editors Principles (OT or CRDT) Based on optimistic replication algorithms Operations are generated locally No lock, no communication with others sites Broadcasted to others sites Every operation eventually derlivered Re-executed when received System is correct if it ensures causality, convergence and intention preservation (OT definition) i.e. preserve partial orders in the sequence Slide 6 Principles of Sequence CRDT Encode the order of the sequence in the Id of elements (remember ;) 10 LET B=A 15 For I=1 to 27 20 LET A=A*A 21 NEXT I Arghh, I forgot LET B=B^2 before NEXT I, no way to use 20,5 ?? Slide 7 Insert alpha between p and q Create an id for alpha Create a disambiguator for alpha so path+dis unique) Space and time complexity of Sequence CRDT mainly decided here !! Slide 8 Scientific problem Write an allocation strategy ID for sequence element that is independent of insertion order Many ways to type QWERTY, how to compute the smallest IDs for each character whatever insertion order ? Slide 9 PB: Order of Insertions Typed: Q;W;E;R;T;YTyped: Y;T;R;E;W;Q Slide 10 Slide 11 Slide 12 Slide 13 Slide 14 Slide 15 Slide 16 Slide 17 Combine Exponential tree & random allocation Slide 18 LSEQ Complexities O((log n) 2) -> avoid to rebalance IDs Slide 19 Experiments We built the CRATE Editor 1 LSEQ for ID allocation Gossip for broadcast Anti-entropy for missed delivery interval version vectors for causal reception 2 2 M. Mukund, G. Shenoy R., S. Suresh, Optimized or-sets without ordering constraints, in: M. Chatterjee, J.-n. Cao, K. Kothapalli, S. Rajsbaum (Eds.), Distributed Computing and Networking, Vol. 8314 of Lecture Notes in Computer Science, Springer Berlin Heidelberg, 2014, pp. 227{241. doi:10.1007/978-3-642-45249-9_15. 1 https://github.com/Chat-Wane/CRATE.git Slide 20 1 st Setup Objective: Validate the space complexity analysis of LSEQ. when the editing behaviour is monotonic, LSEQ has a polylogarithmic upper-bound on space complexity with respect to the number of insert operations. When the editing behaviour is random, LSEQ has a logarithmic space complexity. Setup: A single machine with 2 peers Peers globally produce 166 char/s to create a doc of 500000 chars Monotonic behavior Slide 21 Evaluation Slide 22 2 nd Setup Objective: Show that CRATE scales in terms of the number of peers. In other words, the size of the network does not impact the space complexity upper bound of messages. Setup: On GRID500, number of peers grows from 2 to 450, 166 C/s uniformely distributed among peers Slide 23 Slide 24 3rd Setup Objective: Show that concurrency does not negatively impact the size of identifiers. Hence, scenarios without concurrency show the upper- bound on the size of identifiers. Setup: A single machine emulates 10 peers using the application CRATE. 10000 char at 3 ins/s uniformly distributed among the peers 5 runs with the approximate following latencies: 0: 02ms, 100ms, 500ms, 1s, and 10s. Slide 25 Slide 26 Slide 27 Conclusions LSEQ allows to compute IDs for sequence CRDT with an upper bound to log(n) 2 The number of peers and concurrency do not impact negatively the performances of CRATE One million users is reachable Ndelec, B., Molli, P., Mostefaoui, A., & Desmontils, E. (2013, September). LSEQ: an adaptive structure for sequences in distributed collaborative editing. In Proceedings of the 2013 ACM symposium on Document engineering (pp. 37-46). ACM. Ndelec, B., Molli, P., Mostefaoui, A., & Desmontils, E. (2013). Concurrency Effects Over Variable-size Identifiers in Distributed Collaborative Editing. In Proceedings of the International workshop on Document Changes: Modeling, Detection, Storage and Visualization, Florence, Italy, September 10, 2013 (Vol. 1008, pp. 0-7). Slide 28 Perspectives Deploy a 1M editor on a network of browsers 1M users Editing 1M characters And measures performances Under progress, nearly ready Download "Sequence CRDT: A Scalable Sequence Encoding for Massive Collaborative Editing Brice Ndelec, Pascal Molli & Achour Mostefaoui GDD LINA University of." Similar presentationsDCV: A Causality Detection Approach for Large- scale Dynamic Collaboration Environments Jiang-Ming Yang Microsoft Research Asia Ning Gu, Qi-Wei Zhang, 1 Computer Systems & Architecture Lesson 3 5. Designing the Architecture.Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.Distributed Algorithms 2g1513 Lecture 6 by Ali Ghodsi Leader Election and Anonymous Networks.Navarro, Marqus & Freitag 1 April 2004 On Distributed Systems and CSCL Joan Manuel Marqus - [email protected] Universitat Oberta de Catalunya Leandro.CS 552 Peer 2 Peer Networking R. Martin Credit slides from B. Richardson, I. Stoica, M. Cuenca.Unit Testing - 1 CS494: Intro to Unit Testing Adapted from the SWENET Module (http://www.swenet.org) Developed with support from the National Science Foundation.Distributed Systems Technologies CM0356/CM0456 Andrew Harrison [email protected] 1. 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Case studies September 24, 2013.Peer-to-peer and agent-based computing P2P Algorithms & Issues.Parallel MIMD Algorithm Design Chapter 3, Quinn Textbook.Deiva Preetha/Sornalakshmi Asst Prof SRM University, Kattankulathur School of Computing, Department of IT 1.Replication & Consistency. EECE 411: Design of Distributed Software Applications Logistics Project P01 Non-blocking IO lecture: Nov 4 th. P02 deadline.CS 552 Peer 2 Peer Networking R. Martin Credit slides from B. Richardson, I. Stoica, M. Cuenca.Scalable and Dynamic Quorum Systems Moni Naor & Udi Wieder The Weizmann Institute of Science.An Introduction to Peer-to-Peer networks Diganta Goswami IIT Guwahati.1 Distributed Systems Consistency and Replication Chapter 7.Logical Clocks. Topics r Logical clocks r Totally-Ordered Multicasting.Reinforcement Learning, Dynamic Programming COSC 878 Doctoral Seminar Georgetown University Presenters: Tavish Vaidya, Yuankai Zhang Jan 20, 2014.USING PARALLEL GENETIC ALGORITHM IN A PREDICTIVE JOB SCHEDULING By: Amseena Mansoor Rayan Alsemmeari.1 Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan.P2P Systems in Information Organization Sanjay Goel University at Albany.P2P Systems - 1 2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systmes d'informations rpartis Peer-to-Peer Systems Karl Aberer, Manfred.A GRASP Heuristic to the Extended Car Sequencing Problem Lucas Rizzo {surrutia,rizzo}@dcc.ufmg.br Sebastin Urrutia Federal University of Minas Gerais.Synchronization in Distributed Systems Chapter 6.Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services Passive and active replication Highly.Distributed DBMSM. T. zsu & P. Valduriez Ch.16/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.Destructive Behavior with Constructive Attitude. Testing is the process of evaluating a system or its component(s) with the intent to find whether it.Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-usewww.db-book.com Chapter 26: Advanced.Copyright 1995-2006 Clifford Neuman and Dongho Kim - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Advanced Operating Systems Lecture.