office co-authoring chalk talk (ms-fsshttp)

Office 365 DevDays

2017.11.4-6 | 上海

Office 365 DevDays

2017.11.4-6 | 上海

Office Co-Authoring Chalk Talk (MS-FSSHTTP)

Jinghui Zhang

Software Engineer @Microsoft

What is MS-FSSHTTP?

Requests

File Model and Co-auth

Storage/Sync Model

The documents (MS-FSSHTTP, MS-FSSHTTPB,

MS-FSSHTTPD)

目录CONTENTS

MS-FSSHTTP

• Co-authoring and incremental file transfer for Word, Excel, PowerPoint and OneNote.

• ~75% reduction in network bandwidth and server I/O for Office documents (~45% all file types)

MS-FSSHTTP

• A file access and collaboration model for cloud based document applications• Drive bytes over the wire and storage costs to size of change not size of file.

• High frequency incremental sync.

• Optimized for the WAN• Minimize chattiness.

• Support for multi-user/co-authoring.• Shared locking• Metadata storage for coauthoring scenarios.

• Allow apps to be decoupled from the constraints of efficient/fast access using traditional file formats.• OneNote

Components

• A mechanism for packaging a set of requests (verbs) into a single package for transmission and execution in a single roundtrip.

• Co-authoring/Multi-user Services

• Exclusive and shared locks

• Editors table

• Metadata storage

• Storage/Sync Model

• Object graph

• Storage graph

• Application schemas

Requests and File Model

URL addressable fileE.g. http://server/foo.doc

MS-FSSHTTP/B/DMain

Content

Partition

Metadata

Partition

1

Metadata

Partition

2

Lock

State

FSSHTTP request package

Request

Request

Request

Depend

s on

Locks

• Traditional exclusive lock• Single writer. Multiple readers.

• Used for single user editing.

• Shared lock• Many writers.

• Ensure co-authoring capable and non-capable clients interact smoothly. i.e. an older client can’t disrupt a co-authoring session by taking an exclusive lock.

• Requests to take, refresh and release shared and exclusive locks

Editors Table

• Entry per user• GUID client id

• Time-out

• Edit permissions

• SIP/Email address

• Arbitrary properties (key/value pairs)

• Requests to add/remove/enumerate the editors table

Co-authoring Requests

• JoinCoauthoring• Takes a shared lock

• Adds the user to the editors table

• RefreshCoauthoring• Updates the timeout on the lock and editors table entry

• ExitCoauthoring• Removes the user from the editors table

• Release shared lock

Metadata Storage

• Metadata ‘partitions’ for storing temporary co-authoring data.• Each partition has a GUID ID.

• Used to store and transfer presence and other app specific co-auth data.

• Similar to NTFS’ alternate streams.

• Rapid sync via the storage/sync model

• Different SLA.

Main Content

Partition

Metadata

Partition 1

Metadata

Partition 2

Storage/Sync Model

• The layer in the MS-FSSHTTP protocol

suite responsible for incremental file

storage.

• Instead of storing a file as an array of

bytes, we store a graph of objects (the

“object” graph). This graph is designed to

be easy for applications to use and

modify, but is not optimized for

storage/sync.

• The object graph is stored in another

graph, the storage graph, which is

optimized for storage and sync.

Object GraphStorage Graph

Storage/Sync Layers

Applications

Application Specific Schemas

Optimized for Application Use

Optimized for Storage and Sync

Storage Graph

Object Graph

OneNote Graph

OneNote

Generic File Stream Tree

File Stream based apps

(Word/PPT/Excel)

MS-FSSHTTPB

MS-FSSHTTP(D)/MS-ONE

Storage Graph

• Two types of node in the graph:• Immutable “Data Element” nodes.

• Mutable “Index” nodes.

• All nodes are identified by GUID based ID.

• Data element nodes contain the data and are serialized into binary blobs.

• Index nodes carry no data and are serialized as key/value pairs.

Index

node

Data Element Node

Data=“The”

Index

node

Data Element Node

Data=“Cat”

Index

node

Data Element Node

Data=“sat”

1

43

2

5 6

Storage Graph Sync Example

Index

node

Data Element

Node

Data=“The”

Index

node

Data Element

Node

Data=“Cat”

Index

node

Data Element

Node

Data=“Sat”

1

43

2

5 6

Index

node

Data Element

Node

Data=“The”

Index

node

Data Element

Node

Data=“Cat”

Index

node

Data Element

Node

Data=“Sat”

1

43

2

5 6

Server Client

Index

node

Data Element

Node

Data=“The”

Index

node

Data Element

Node

Data=“Cat”

Index

node

Data Element

Node

Data=“Sat”

1

43

2

5 6


Index

node

Data Element

Node

Data=“The”

Index

node

Data Element

Node

Data=“Cat”

Index

node

Data Element

Node

Data=“Sat”

1

43

2

5 6

Server Client

Index

node

Data Element

Node

Data=“The”

Index

node

Data Element

Node

Data=“Cat”

Index

node

Data Element

Node

Data=“Sat”

1

43

2

5 6

Index

node

Data Element

Node

Data=“The”

Index

node

Data Element

Node

Data=“Cat”

Index

node

Data Element

Node

Data=“Sat”

1

43

8

5 6

Data Element

Node

Data=“On”

Data Element

Node

Data=“The”

Index

node

9

10

11


Index

node

Data Element

Node

Data=“Cat”

Index

node

Data Element

Node

Data=“Sat”

1

43

5 6

Index

node

Data Element

Node

Data=“The”

8

Data Element

Node

Data=“On”

Data Element

Node

Data=“The”

Index

node

9

10

11

Server Client

Index

node

Data Element

Node

Data=“The”

Index

node

Data Element

Node

Data=“Cat”

Index

node

Data Element

Node

Data=“Sat”

1

43

2

5 6

Index

node

Data Element

Node

Data=“The”

8

Data Element

Node

Data=“On”

Data Element

Node

Data=“The”

Index

node

9

10

11

Index

node

Data Element

Node

Data=“Cat”

Index

node

Data Element

Node

Data=“Sat”

1

43

5 6

Index

node

Data Element

Node

Data=“The”

8

Data Element

Node

Data=“On”

Data Element

Node

Data=“The”

Index

node

9

10

11

Storage Graph

• Sync is all about receiving/pushing changes to this graph.• Made easier because most nodes are immutable.

• Server implementations typically implement the graph using a table for the index node key/value pairs and a table for the immutable data element blobs.

• The immutable nature of Data Elements, which comprise the majority of the file data, are ideal for caching.

Syncing File Data

• File data is divided into variable length blocks

• Each block is assigned• A GUID identifier

• A unique binary signature (usually an MD5 hash)

Block NBlock 1 Block2 … …

Syncing File Data (using the graphs)

• Zip files are broken into blocks based on the zip headers and item data

• All other file types are chunked using Microsoft RDC’s FilterMaxalgorithm:

….A PDF Document.PDF

FilterMax Split Points

Zip Item

HeaderZip Item Data

Zip Item

HeaderZip Item Data

Zip

Directory….


• The blocks, IDs and signatures are arranged in a tree. This is the ‘object graph’.

• A depth first traversal can be used to access the file data

Block 4Block 1 Block 2 Block 3 Block N

GUID ID

Signature

GUID ID

Signature

GUID ID

Signature

GUID ID

Signature

GUID ID

Signature

Root


• Nodes from the tree are stored in data elements and arranged to form a graph with index nodes. This is the ‘storage graph’:

Data Element 2 Data Element 4Data Element 3

Block 1Block 1

SignatureBlock 2Block 2

SignatureBlock N

Block N

Signature

Data Element 1

Index

Node

All this sounds complicated…

• Sync and storage:• Immutable blobs with GUID IDs

• 3 key/value pairs

• Extracting the file• MS-FSSHTTP(B/D) describes where the blocks are in the data elements

• Depth first traversal and concatenate the leaf nodes yields the file.

• ‘Importing’ a file• You can represent small files as a single block using a trivial graph.

The Documents

• MS-FSSHTTP• XML based.• Request packaging and dependencies.• Requests for managing locks and coauthoring.• Describes how MS-FSSHTTPB requests are embedded.

• MS-FSSHTTPB• Binary based.• Request packaging.• Requests for incremental sync (cell storage requests)• Embedded into MS-FSSHTTP

• MS-FSSHTTPD• Describes how to represent an arbitrary file in the cell storage model (described in MS-FSSHTTPB) for

synchronization.• Generally, only required if implementing a client.

Practically…

• The protocol is a transport for batched requests. Most of these requests are small and manage collaboration.

• User data travels as immutable binary blobs (within the protocol), identified by GUIDs.• Blobs are only replaced or retrieved when needed – ideal for caching. Think immutable “sub-files”

with GUIDs as “filenames”.• MS-FSSHTTPB documentation defines where in the protocol to find the data.

• Sync and conflict detection relies on stable IDs. If an implementation or man-in-the-middle changes these IDs then excessive bandwidth and unnecessary conflicts will occur.

• The system is inherently incremental and it’s often not possible to examine the entire file –e.g. changing/examining the file from over the wire data is typically not possible.

• Allow sufficient time for testing. The system is complex and user data is at stake.

Office Inspectors for Fiddler

• Fiddler parsers available for FSSHTTP/B and WOPI protocols

• Github: https://github.com/OfficeDev/Office-Inspectors-for-Fiddler

https://github.com/OfficeDev/Office-Inspectors-for-Fiddler

Office 365 DevDays

Thank you