9p code walkthrough
DESCRIPTION
Walkthrough of 9P protocol traces and Linux 9P Client Code OverviewTRANSCRIPT
© 2010 IBM Corporation
IBM Research
9P Trace and Code Walkthrough
Eric Van HensbergenIBM Austin Research Lab([email protected])
IBM Research
9P Trace and Code Walkthroughs © 2010 IBM Corporation2
Agenda
• 9P Trace analysis for common operations• mount• open + write + close• open + read + close• chmod• ls -l
• High level code organization• Client and Transport Interfaces• Important data structures and their accounting• Code Review• VFS Code Review• Network Code Review
IBM Research
9P Trace and Code Walkthroughs © 2010 IBM Corporation
9P Trace: Mount 9P /mnt (Plan 9)
mount 9p /mnt
3
Tversion(NOFID, 8216, 9P2000, “”)
Tattach(1, 70, 4294967295, ericvh,””)
IBM Research
9P Trace and Code Walkthroughs © 2010 IBM Corporation
9P Trace: Mount 9P /mnt (Linux)
mount 9p /mnt
4
Tversion(NOFID, 8216, 9P2000, “”)
Tattach(1, 70, -1, ericvh,””)Twalk(1, 70, 102, array[] of {})Tstat(1, 102)Tclunk(1, 102)
IBM Research
9P Trace and Code Walkthroughs © 2010 IBM Corporation
9P Trace: Write a File (Plan 9)
echo hello > /mnt/tmp/hello.txtfd = create(“/mnt/tmp/hello.txt”);
pwrite(fd,”hello”, 5, 0);
close(fd);
5
Twalk(1, 70, 59, array[] of {“tmp”})Twalk(1, 59, 86, array[] of {“hello.txt”})Rerror(1, “file does not exist”)Twalk(1, 59, 86, nil)Tcreate(1, 86, “hello.txt”, 8r666, 1)Tclunk(1, 59);
Twrite(1, 86, 0, array[6] of {“hello”})
Tclunk(1, 86)
IBM Research
9P Trace and Code Walkthroughs © 2010 IBM Corporation
9P Trace: Write a File (Linux)echo hello > /mnt/tmp/hello.txtfd = create(“/mnt/tmp/hello.txt”);
pwrite(fd,”hello”, 5, 0);close(fd);
6
Twalk(1, 70, 103, array[] of {“tmp”})Tstat(1, 103)Twalk(1, 103, 109, nil)Twalk(1, 109, 75, nil) Twalk(1, 75, 97, array[] of {“hello.txt”})Rerror(1, “file does not exist”)Tclunk(1, 75);Tclunk(1, 109);Twalk(1, 103, 109, nil)Twalk(1, 109, 75, nil)Tcreate(1, 75, “hello.txt”, 8r666, 1)Twalk(1, 109, 99, nil)Twalk(1, 99, 107, nil)Twalk(1, 107, 110, array[] of {“hello.txt})Tclunk(1, 107)Tclunk(1, 99)Tclunk(1, 109)Tstat(1, 110)Twrite(1, 75, 0, array[6] of {“hello”})Tclunk(1, 75)Tclunk(1, 110)Tclunk(1, 103)
IBM Research
9P Trace and Code Walkthroughs © 2010 IBM Corporation
9P Trace: Read a File (Plan 9)
% cat /mnt/tmp/hello.txtfd = open(“/mnt/tmp/hello.txt”);
n = 0;do { result = pread(fd, buf+n, 255-n, n) n += result;} while (result > 0);
close(fd);
7
TWalk(1, 70, 85, array[] of {“tmp”}, {“hello.txt”})TOpen(1, 85, 0)
Tread(1, 85, 0, 255)Rread(1, array[6] of “hello”)Tread(1, 85, 6, 249)Rread(1, array[0] of “”)
TClunk(1, 85)
IBM Research
9P Trace and Code Walkthroughs © 2010 IBM Corporation
9P Trace: Read a File (Linux)
% cat /mnt/tmp/hello.txtfd = open(“/mnt/tmp/hello.txt”);
n = 0;do { result = pread(fd, buf+n, 255-n, n) n += result;} while (result > 0);
close(fd);
8
Twalk(1, 70, 106, array[] of {“tmp”})Tstat(1, 106)Twalk(1, 106, 104, nil)Twalk(1, 104, 75, nil)Twalk(1, 75, 100, array[] of {“hello.txt”})Tclunk(1, 75)Tclunk(1, 104)Tstat(1, 100)Twalk(1, 100, 104, nil)TOpen(1, 104, 0)Tstat(1, 100)Tread(1, 104, 0, 255)Rread(1, array[6] of “hello”)Tread(1, 104, 6, 249)Rread(1, array[0] of “”)Tclunk(1, 104)Tclunk(1, 100)Tclunk(1, 106)
IBM Research
9P Trace and Code Walkthroughs © 2010 IBM Corporation
9P Trace: Get/Set Attributes (Linux)
% chmod ugo+rwx /mnt/tmp/hello.txts = stat(“/mnt/tmp/hello.txt”);
chmod(“/mnt/tmp/hello.txt”, 0777);
9
Twalk(1, 70, 114, array[] of {“tmp”})Tstat(1, 114)Twalk(1, 114, 113, nil)Twalk(1, 113, 104, nil)Twalk(1, 104, 75, array[] of {“hello.txt”})Tclunk(1, 104)Tclunk(1, 113)Tstat(1, 75)Tclunk(1, 75)Tclunk(1, 114)Twalk(1, 70, 102, array[] of {“tmp})Tstat(1, 102)Twalk(1, 102, 112, nil)Twalk(1, 112, 113, nil)Twalk(1, 113, 104, array[] of {“hello.txt”})Tclunk(1, 113)Tclunk(1, 112)Tstat(1, 104)Twstat(1, 104, Dir(...””,8r777,-1,-1,...)Tclunk(1, 104)Tclunk(1, 102)
IBM Research
9P Trace and Code Walkthroughs © 2010 IBM Corporation
9P Trace: Read Directory (Plan 9)
% ls -l /mnt/tmp
10
Twalk(1, 70, 85, array[] of {“tmp”})Tstat(1, 85)Tclunk(1, 85)
Twalk(1, 70, 85, array[] of {“tmp”})Topen(1, 85, 0);Tread(1, 85, 2048)Rread(1, array[69] of Dir(...))Tread(1, 85, 2048)Rread(1, array[0] of “”)Tclunk(1, 85)
IBM Research
9P Trace and Code Walkthroughs © 2010 IBM Corporation
9P Trace: Read Directory (Linux)
% ls -l /mnt/tmp
11
Walk(2,70,103,array[] of {"tmp"})Stat(2,103)Stat(2,103)Clunk(2,103)Walk(2,70,114,array[] of {"tmp"})Stat(2,114)Clunk(2,114)Walk(2,70,102,array[] of {"tmp"})Stat(2,102)Clunk(2,102)Walk(2,70,103,array[] of {"tmp"})Stat(2,103)Walk(2,103,109,nil)Open(2,109,0)Read(2,109,0,8168)Read(2,109,69,8168)Walk(2,103,112,nil)Walk(2,112,100,nil)Walk(2,100,97,array[] of {"hello.txt"})Clunk(2,100)Clunk(2,112)Stat(2,97)Stat(2,97)Clunk(2,97)Walk(2,103,97,nil)Walk(2,97,112,nil)Walk(2,112,100,array[] of {"hello.txt"})Clunk(2,112)Clunk(2,97)Stat(2,100)Clunk(2,100)Walk(2,103,100,nil)Walk(2,100,97,nil)Walk(2,97,112,array[] of {"hello.txt"})Clunk(2,97)Clunk(2,100)Stat(2,112)Clunk(2,112)Read(2,109,69,8168)Clunk(2,109)Clunk(2,103)
IBM Research
9P Trace and Code Walkthroughs © 2010 IBM Corporation
High Level Code Organization
12
fs/9p fs/net
fd rdma virtioVFS Hooks
Core Protocol
Transports
IBM Research
9P Trace and Code Walkthroughs © 2010 IBM Corporation
Core Network Interfaces (client.h)
• p9_client_create(dev_name, options)• create a new client instance (mount)
• p9_client_destroy(client)• called by VFS interface to destroy a client (after umount)
• p9_client_disconnect(client)• called by transport if client session is interrupted or has a
fatal error• p9_client_<operation>: execute a 9P operation • (version, attach, open, read, write, etc.)• almost all called with p9_fid structure as element
• p9_client_cb(client, request)• called when a response is received to wake up client thread
13
IBM Research
9P Trace and Code Walkthroughs © 2010 IBM Corporation
Transport Interface (in transport.h)
• create(client_struct, device name, options)• create a new connection for client on the transport• return value indicates success/failure
• close(client_struct)• release a connection for client on the transport• no return
• request(client_struct, p9_req_t)• issue a request on the transport• return value indicates success
• cancel(client_struct, p9_req_t)• cancel a request (if it hasn’t been sent)• return value indicates success/failure (if req already sent)
14
IBM Research
9P Trace and Code Walkthroughs © 2010 IBM Corporation
Data Structure Overview
15
v9fs_session
p9_client transport private
fid pool
tag pool
request array
request request fcall
response fcall
fid list
fid
dentryfile
IBM Research
9P Trace and Code Walkthroughs © 2010 IBM Corporation
Client Accounting (p9_client) in client.h
• Client and session information accounting• lock: protect client structure• dotu: whether or not extensions are active• trans_mod: transport for this session• status: current status (connected, error, etc.)• trans: transport private information• conn: trans_fd specific tracking structure• fidpool: per session fid accounting• fidlist: list of active fid handles (for cleanup)• tagpool: per session tag tracking• reqs[] - double array of requests for quick lookup• max_tag - maximum number of outstanding requests so far
16
IBM Research
9P Trace and Code Walkthroughs © 2010 IBM Corporation
Data Structure Overview
17
v9fs_session
p9_client transport private
fid pool
tag pool
request array
request request fcall
response fcall
fid list
fid
dentryfile
IBM Research
9P Trace and Code Walkthroughs © 2010 IBM Corporation
Request (p9_req_t) Structure (in client.h)
• Passed between core network and transport to track ops• status: status of this request slot• t_err: transport error reporting• wq: wait queue (client thread blocks while waiting for
response)• tc: request fcall• rc: response fcall• aux: transport specific data• req_list: link for higher level objects to chain requests
• Allocated and released by core network code
18
IBM Research
9P Trace and Code Walkthroughs © 2010 IBM Corporation
Fcall (p9_fcall) Structure (in 9p.h)
• Encapsulates protocol message (Either request or response)• size: size of entire message• id: protocol operation• tag: multiplexer identifier• offset: used by marshalling to track current buffer pos• capacity: used by marshalling to track total buffer capacity• sdata: actual protocol buffer
• Allocated and paired with buffer for tracking purposes• Usually grouped inside a request structure
19
IBM Research
9P Trace and Code Walkthroughs © 2010 IBM Corporation
Data Structure Overview
20
v9fs_session
p9_client transport private
fid pool
tag pool
request array
request request fcall
response fcall
fid list
fid
dentryfile
IBM Research
9P Trace and Code Walkthroughs © 2010 IBM Corporation
FID (p9_fid) Structure (in client.h)
• Encapsulates file handle and user credentials• client: client back-pointer• fid: numeric identifier• mode: if open, then the mode it was opened• qid: current qid for fid• iounit: max data per packet on this fid• uid: user associated with this fid• rdir: accounting structure for dirread
• flist: per-client-instance fid tracking• dlist: per-dentry fid tracking
• FIDs associated with dentries on Client for tracking purposes21
IBM Research
9P Trace and Code Walkthroughs © 2010 IBM Corporation
Support Interfaces• p9_errstr2errno: used to map Plan 9 error strings to errno• fid accounting• p9_fid_create - allocate numeric fid and initialize fid struct• p9_fid_destroy - release numeric fid & its resources
• request/tag accounting• p9_tag_allocate - allocate a request• p9_tag_lookup - lookup a request by tag• p9_free_req - release a tag and cleanup memory
22
IBM Research
9P Trace and Code Walkthroughs © 2010 IBM Corporation
Code Review• http://lxr.linux.no/linux/include/net/9p/• http://lxr.linux.no/linux/fs/9p/
• fid.c/fid.h - fid management• v9fs.c/v9fs.h - session management• vfs_super.c - superblock ops (mount, unmount)• vfs_inode.c - inode operations (lookup, stat, wstat, create..)• vfs_file.c - file operations (open, read, write, close)• vfs_dir.c - dirread• vfs_addr.c - address space operations (mmap, etc.)• vfs_dentry.c - dentry operations (mostly fid releasing)• cache.c - fscache code
• http://lxr.linux.no/linux/net/9p/• client.c - core client code• protocol.c - marshaling functions• trans_[fd,rdma,virtio].c - transport implementation• util.c - misc utility functions (pool accounting)• mod.c - module accounting and dynamic transport registration• error.c - error mapping
23