office co-authoring chalk talk (ms-fsshttp)
TRANSCRIPT
Office 365 DevDays
2017.11.4-6 | 上海
Office 365 DevDays
2017.11.4-6 | 上海
Office Co-Authoring Chalk Talk (MS-FSSHTTP)
Jinghui Zhang
Software Engineer @Microsoft
What is MS-FSSHTTP?
Requests
File Model and Co-auth
Storage/Sync Model
The documents (MS-FSSHTTP, MS-FSSHTTPB,
MS-FSSHTTPD)
目 录CONTENTS
MS-FSSHTTP
• Co-authoring and incremental file transfer for Word, Excel, PowerPoint and OneNote.
• ~75% reduction in network bandwidth and server I/O for Office documents (~45% all file types)
MS-FSSHTTP
• A file access and collaboration model for cloud based document applications• Drive bytes over the wire and storage costs to size of change not size of file.
• High frequency incremental sync.
• Optimized for the WAN• Minimize chattiness.
• Support for multi-user/co-authoring.• Shared locking• Metadata storage for coauthoring scenarios.
• Allow apps to be decoupled from the constraints of efficient/fast access using traditional file formats.• OneNote
Components
• A mechanism for packaging a set of requests (verbs) into a single package for transmission and execution in a single roundtrip.
• Co-authoring/Multi-user Services
• Exclusive and shared locks
• Editors table
• Metadata storage
• Storage/Sync Model
• Object graph
• Storage graph
• Application schemas
Requests and File Model
URL addressable fileE.g. http://server/foo.doc
MS-FSSHTTP/B/DMain
Content
Partition
Metadata
Partition
1
Metadata
Partition
2
Lock
State
FSSHTTP request package
Request
Request
Request
Depend
s on
Locks
• Traditional exclusive lock• Single writer. Multiple readers.
• Used for single user editing.
• Shared lock• Many writers.
• Ensure co-authoring capable and non-capable clients interact smoothly. i.e. an older client can’t disrupt a co-authoring session by taking an exclusive lock.
• Requests to take, refresh and release shared and exclusive locks
Editors Table
• Entry per user• GUID client id
• Time-out
• Edit permissions
• SIP/Email address
• Arbitrary properties (key/value pairs)
• Requests to add/remove/enumerate the editors table
Co-authoring Requests
• JoinCoauthoring• Takes a shared lock
• Adds the user to the editors table
• RefreshCoauthoring• Updates the timeout on the lock and editors table entry
• ExitCoauthoring• Removes the user from the editors table
• Release shared lock
Metadata Storage
• Metadata ‘partitions’ for storing temporary co-authoring data.• Each partition has a GUID ID.
• Used to store and transfer presence and other app specific co-auth data.
• Similar to NTFS’ alternate streams.
• Rapid sync via the storage/sync model
• Different SLA.
Main Content
Partition
Metadata
Partition 1
Metadata
Partition 2
Storage/Sync Model
• The layer in the MS-FSSHTTP protocol
suite responsible for incremental file
storage.
• Instead of storing a file as an array of
bytes, we store a graph of objects (the
“object” graph). This graph is designed to
be easy for applications to use and
modify, but is not optimized for
storage/sync.
• The object graph is stored in another
graph, the storage graph, which is
optimized for storage and sync.
Object GraphStorage Graph
Storage/Sync Layers
Applications
Application Specific Schemas
Optimized for Application Use
Optimized for Storage and Sync
Storage Graph
Object Graph
OneNote Graph
OneNote
Generic File Stream Tree
File Stream based apps
(Word/PPT/Excel)
MS-FSSHTTPB
MS-FSSHTTP(D)/MS-ONE
Storage Graph
• Two types of node in the graph:• Immutable “Data Element” nodes.
• Mutable “Index” nodes.
• All nodes are identified by GUID based ID.
• Data element nodes contain the data and are serialized into binary blobs.
• Index nodes carry no data and are serialized as key/value pairs.
Index
node
Data Element Node
Data=“The”
Index
node
Data Element Node
Data=“Cat”
Index
node
Data Element Node
Data=“sat”
1
43
2
5 6
Storage Graph Sync Example
Index
node
Data Element
Node
Data=“The”
Index
node
Data Element
Node
Data=“Cat”
Index
node
Data Element
Node
Data=“Sat”
1
43
2
5 6
Index
node
Data Element
Node
Data=“The”
Index
node
Data Element
Node
Data=“Cat”
Index
node
Data Element
Node
Data=“Sat”
1
43
2
5 6
Server Client
Index
node
Data Element
Node
Data=“The”
Index
node
Data Element
Node
Data=“Cat”
Index
node
Data Element
Node
Data=“Sat”
1
43
2
5 6
Storage Graph Sync Example
Index
node
Data Element
Node
Data=“The”
Index
node
Data Element
Node
Data=“Cat”
Index
node
Data Element
Node
Data=“Sat”
1
43
2
5 6
Server Client
Index
node
Data Element
Node
Data=“The”
Index
node
Data Element
Node
Data=“Cat”
Index
node
Data Element
Node
Data=“Sat”
1
43
2
5 6
Index
node
Data Element
Node
Data=“The”
Index
node
Data Element
Node
Data=“Cat”
Index
node
Data Element
Node
Data=“Sat”
1
43
8
5 6
Data Element
Node
Data=“On”
Data Element
Node
Data=“The”
Index
node
9
10
11
Storage Graph Sync Example
Index
node
Data Element
Node
Data=“Cat”
Index
node
Data Element
Node
Data=“Sat”
1
43
5 6
Index
node
Data Element
Node
Data=“The”
8
Data Element
Node
Data=“On”
Data Element
Node
Data=“The”
Index
node
9
10
11
Server Client
Index
node
Data Element
Node
Data=“The”
Index
node
Data Element
Node
Data=“Cat”
Index
node
Data Element
Node
Data=“Sat”
1
43
2
5 6
Index
node
Data Element
Node
Data=“The”
8
Data Element
Node
Data=“On”
Data Element
Node
Data=“The”
Index
node
9
10
11
Index
node
Data Element
Node
Data=“Cat”
Index
node
Data Element
Node
Data=“Sat”
1
43
5 6
Index
node
Data Element
Node
Data=“The”
8
Data Element
Node
Data=“On”
Data Element
Node
Data=“The”
Index
node
9
10
11
Storage Graph
• Sync is all about receiving/pushing changes to this graph.• Made easier because most nodes are immutable.
• Server implementations typically implement the graph using a table for the index node key/value pairs and a table for the immutable data element blobs.
• The immutable nature of Data Elements, which comprise the majority of the file data, are ideal for caching.
Syncing File Data
• File data is divided into variable length blocks
• Each block is assigned• A GUID identifier
• A unique binary signature (usually an MD5 hash)
Block NBlock 1 Block2 … …
Syncing File Data (using the graphs)
• Zip files are broken into blocks based on the zip headers and item data
• All other file types are chunked using Microsoft RDC’s FilterMaxalgorithm:
….A PDF Document.PDF
FilterMax Split Points
Zip Item
HeaderZip Item Data
Zip Item
HeaderZip Item Data
Zip
Directory….
Syncing File Data (using the graphs)
• The blocks, IDs and signatures are arranged in a tree. This is the ‘object graph’.
• A depth first traversal can be used to access the file data
Block 4Block 1 Block 2 Block 3 Block N
GUID ID
Signature
GUID ID
Signature
GUID ID
Signature
GUID ID
Signature
GUID ID
Signature
Root
Syncing File Data (using the graphs)
• Nodes from the tree are stored in data elements and arranged to form a graph with index nodes. This is the ‘storage graph’:
Data Element 2 Data Element 4Data Element 3
Block 1Block 1
SignatureBlock 2Block 2
SignatureBlock N
Block N
Signature
Data Element 1
Index
Node
All this sounds complicated…
• Sync and storage:• Immutable blobs with GUID IDs
• 3 key/value pairs
• Extracting the file• MS-FSSHTTP(B/D) describes where the blocks are in the data elements
• Depth first traversal and concatenate the leaf nodes yields the file.
• ‘Importing’ a file• You can represent small files as a single block using a trivial graph.
The Documents
• MS-FSSHTTP• XML based.• Request packaging and dependencies.• Requests for managing locks and coauthoring.• Describes how MS-FSSHTTPB requests are embedded.
• MS-FSSHTTPB• Binary based.• Request packaging.• Requests for incremental sync (cell storage requests)• Embedded into MS-FSSHTTP
• MS-FSSHTTPD• Describes how to represent an arbitrary file in the cell storage model (described in MS-FSSHTTPB) for
synchronization.• Generally, only required if implementing a client.
Practically…
• The protocol is a transport for batched requests. Most of these requests are small and manage collaboration.
• User data travels as immutable binary blobs (within the protocol), identified by GUIDs.• Blobs are only replaced or retrieved when needed – ideal for caching. Think immutable “sub-files”
with GUIDs as “filenames”.• MS-FSSHTTPB documentation defines where in the protocol to find the data.
• Sync and conflict detection relies on stable IDs. If an implementation or man-in-the-middle changes these IDs then excessive bandwidth and unnecessary conflicts will occur.
• The system is inherently incremental and it’s often not possible to examine the entire file –e.g. changing/examining the file from over the wire data is typically not possible.
• Allow sufficient time for testing. The system is complex and user data is at stake.
Office Inspectors for Fiddler
• Fiddler parsers available for FSSHTTP/B and WOPI protocols
• Github: https://github.com/OfficeDev/Office-Inspectors-for-Fiddler
Office 365 DevDays
Thank you