dr a sahu dept of comp sc & engg. iit guwahati. file system, block devices block device...
TRANSCRIPT
Block Device Driver
Dr A SahuDept of Comp Sc & Engg.
IIT Guwahati
Outline• File System, Block Devices• Block Device Registration• Initialization of Sbull• Block Device Operation • Request processing
File System & Block Devices
• Block Devices (Disk)– Sector, inode
• File systems (Operations)– Read/write, open,close, lseek, type
Block Devices Registration
• Block Devices (Disk)– Sector, inode
• File systems (Operations)– Read/write, open,close, lseek, type
File System & Block Devices
• Block Devices (Disk)– Sector, inode
• File systems (Operations)– Read/write, open,close, lseek, type
What is the VFS ?
• Component in the kernel that handles file-systems, directory and file access.
• Abstracts common tasks of many file-systems.• Presents the user with a unified interface, via
the file-related system calls (open, stat, chmod etc.).
• Filesystem-specific operations:- vector them to the filesystem in charge of the file.
Mounting a device
• $ mount -t iso9660 -o ro /dev/cdrom /mnt/cdrom
• Steps involved:– Find the file system.(file_systems list)– Find the VFS inode of the directory that is to be the
new file system's mount point. – Allocate a VFS superblock and call the file system
specific read_super function.
What will we learn ?
– The details of how a block device works• ll_rw_block() : trigger I/O transfer• __make_request (): make request -> request
queue • task queue : plug/unplug use the mechanism • request service routine :
–How to write a block device driver• writing a module– Init / exit
• implement the necessary operations– block_device_operations– request_fn_proc
Common Block Device Operations
• In fs/block_dev.cstruct file_operations def_blk_fops = {
open: blkdev_open,release: blkdev_close,llseek: block_llseek,read: generic_file_read,write: generic_file_write,mmap: generic_file_mmap,fsync: block_fsync,ioctl: blkdev_ioctl,
};
Block Device Specific Operations• Additional operations for block device only• In include/linux/fs.h :
struct block_device_operations {int (*open) (struct inode *, struct file *);int (*release) (struct inode *, struct file *);int (*ioctl) (struct inode *, struct file *, unsigned,unsigned long);int (*check_media_change) (kdev_t);int (*revalidate) (kdev_t);
};• In include/linux/blkdev.h :
typedef void (request_fn_proc) (request_queue_t *q);
EXT2generic_file_read
readpageext2_readpage
ext2_aops(address spaceoperation table)
Generic block device layer
block_read_full_page
submit_bh
cache search
cache search
file
Page page
bh bh
ext2_get_block
do_generic_file_readgeneric_readahead
get logical block number
bh bh
ll_rw_block
bread
Bh= block header
EXT2generic_file_write
prepare_writeext2_prepare_write
ext2_aops(address spaceoperation table)
Generic block device layer
block_prepare_write
ll_rw_block
submit_bh
cache search
cache search
file
bh bh
ext2_get_block
read request
bread
get logical block number
EXT2generic_file_write
commit_writegeneric_commit_write
ext2_aops(address spaceoperation table)
Generic block device layer
__block_commit_write
file
Page Page
balance_dirty
__mark_dirty
bdflush
wakeup_bdflush
submit_bh
write_some_buffers
bhbhdirty dirty
Generic Block Device Layer• Provides common functionality for all block devices
in Linux– Uniform interface (to file system) e.g. bread( ) block_prepare_write( ) block_read_full_page( ) ll_rw_block( )– buffer management and disk caching– Block I/O requests scheduling
• Generates and queues actual I/O requests in a request queue (per device)– Individual device driver services this queue (likely interrupt
driven)
Request Queue
• Data structure: in include/linux/blkdev.h• Queue header: type request_queue_t
typedef structure request_queue request_queue_t– queue_head: double linked list of pending requests– request_fn: pointer to request service routine
• Queue element: struct request– cmd: read or write– Number of request sectors, segments– bh, bhtail: a list of buffer header– Memory area (for I/O transfer)
Request Queue
Invoking the Lower Layer
• Generic block device layer– Generates and queues I/O request– If the request queue is initially empty, schedule a plug_tq
tasklet into tq_disk task queue
• Asynchronous run of task queue tq_disk– Run in a few places (e.g., in kswapd)– Take a request from the queue and call the request_fn
function:• q->request_fn(q);
Request Service Routine
• To service all I/O requests in the queue• Typical interrupt-driven procedure– Service the first request in the queue– Set up hardware so it raises interrupt when it is done– Return
• Interrupt handler tasklet– Remove the just-finished request from the queue– Re-enter the request service routine (to service the next)
Request submission
• ll_rw_block()• submit_bh()• generic_make_request()• __make_request()– generic_plug_device()– elevator algorithm– __get_request_wait()
ll_rw_block()• void ll_rw_block(int rw, int nr, struct buffer_head * bhs[])
– rw: read/write– nr: number of buffer_head structures in the array– bhs: array of buffer_head structures
• Top-level function to submit the I/O request• Checks whether the requested operation is permitted by the device
– Performing a write operation on a read-only device• Checks the buffer size is a multiple of the sector size of the device• Locks the buffer and verifies whether the operation is required
– If the dirty bit is not set on the buffer, write operation is not necessary– Read operation on a buffer with uptodate buffer is redundant
submit_bh()• void submit_bh(int rw, struct buffer_head *bh)• submit a buffer_head to the block device later for I/O• Sets the BH_Req and the BH_Launder flags on the
buffer• Sets the real device and sector values– count = bh->b_size >> 9;– bh->b_rdev = bh->b_dev;– bh->b_rsector = bh->b_blocknr * count;
generic_make_request()• void generic_make_request(int rw, struct
buffer_head *bh)• Hand a buffer head to it’s device driver for I/O• Checks the requested sector is within the range
(blk_size[major][minor]• Get the request queue of the device, calls
make_request_fn to put the buffer in the request queue (in most case,this handler is __make_request)
__make_request()• static int __make_request(request_queue_t * q, int
rw, struct buffer_head * bh)• Inserts the buffer in the request queue• Plug device by calling plug_device_fn handler of the
request queue– In most case, this is generic_plug_device()– Submits the plug_tq to the disk task queue tq_disk
• Enlarger an existing request – elevator algorithm
__make_request() (cont.)• __get_request_wait()
– If a new request has to be created and there are no free request objects, it waits on the request queue till it gets a free request object
– static struct request *__get_request_wait(request_queue_t *q, int rw){ register struct request *rq; DECLARE_WAITQUEUE(wait, current);
generic_unplug_device(q); add_wait_queue(&q->wait_for_request, &wait); do { set_current_state(TASK_UNINTERRUPTIBLE); if (q->rq[rw].count < batch_requests) schedule(); spin_lock_irq(&io_request_lock); rq = get_request(q,rw); spin_unlock_irq(&io_request_lock); } while (rq == NULL); remove_wait_queue(&q->wait_for_request, &wait); current->state = TASK_RUNNING; return rq;}
Request processing
• __generic_unplug_device()– Called by __get_request_wait() or
generic_unplug_device()– Marks the queue as unplugged– Calls the request_fn handler of the request queue
submit_bh
generic_make_request
loop_make_request
VFS
__make_request
I/O task queue
request queue
request queue
request queue
I/O request
I/O request
I/O request I/O request
I/O request
I/O request
block device driver
IDE SCSI
Generic block device layer
generic_unplug_device
I/O task queue
request queue
request queue
request queue
I/O request
I/O request
I/O request I/O request
I/O request
I/O request
block device driver
IDE
run_task_queue(tq_disk)
SCSI
generic block layer
Block Device in 2.4 Kernel• Not completely rid of major/minor number yet– Still keep queues and device driver related parameters in
arrays indexed by major numbers• In include/linux/blkdev.h– struct blk_dev_struct {
request_queue_t request_q;queue_proc *queue;void *data;
}struct blk_dev_struct blk_dev[MAX_BLKDEV];#define BLK_DEFAULT_QUEUE(_MAJOR) &blk_dev[_MAJOR].request_q
Block Device in 2.4 Kernel (2)• Matrices for device paramaters– Type: int * xxx[MAX_BLKDEV];– Indexed by major then minor number– blk_size, blksize_size, hardsec_size, max_readahead,
max_sectors, max_segments• Read ahead parameters– int read_ahead[] (include/linux/fs.h)– Indexed by major number
• You will need to set these parameters for any device in your init and open functions
include/linux/blk.h• Assign major number for each device driver• Define macros for each device (by major number)– MAJOR_NR: major number– DEVICE_NAME: name of your device– DEVICE_INTR: device interrupt handler– DEVICE_REQUEST: request service routine– DEVICE_NR(): how to calculate the minor number
• You may have to add a set for each new device driver (major number) you introduce
Skeleton Block Device
• Device operation structure:– static struct block_device_operations xxx_fops =
{open: xxx_open,release: xxx_release,ioctl: xxx_ioctl,
check_media_change, xxx_check_change, revalidate, xxx_revalidate,
owner: THIS_MODULE,};
Skeleton Block Device• Xxx_open()– MOD_INC_USE_COUNT;
• Xxx_release()– MOD_DEC_USE_COUNT;
• Xxx_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)– switch(cmd)
• BLKGETSIZE• HDIO_GETGEO• …
– default: return blk_ioctl(inode->i_rdev, cmd, arg);
Skeleton “ Init” Function#define MAJOR_NR XXX_MAJORstatic int __init xxx_init(void){
/* probe the hardware, request irq, … */devfs_dir = devfs_mk_dir(NULL, “xxx_dir”, NULL);
/* old way: register_blkdev(MAJOR_NR, “xxx”, &xxx_bdops); */devfs_handle = devfs_register_blk(devfs_dir, “xxx", ......);blk_init_queue(BLK_DEFAULT_QUEUE(MAJOR_NR), xxx_request);read_ahead[MAJOR_NR] = 8; /* 8 sector (4K) read ahead *//* you may also setup those:
blk_size[MAJOR_NR] blksize_size[MAJOR_NR] hardsect_size[MAJOR_NR]*/
/* rest of the initial setup */printk( … ); return 0;
}
Skeleton “ Exit” Functionstatic void __exit xxx_exit (void){
/* clean up */blk_cleanup_queue(BLK_DEFAULT_QUEUE(MAJOR_NR));
/* old way: unregister_blkdev(MAJOR_NR, “xxx”); */devfs_unregister(devfs_handle);devfs_unregister(devfs_dir);
/* clean up */}
module_init(xxx_init);module_exit(xxx_exit);
Skeleton “ Request” Operationstatic void xxx_request(request_queue_t *q){
while (1) {INIT_REQUEST;//a macro,quit while loop when request queue is emptyswitch (CURRENT->cmd) { case READ: /* do read request, i.e: memcpy(q->buffer, mem_block, size); */
case WRITE: /* do write request, i.e: memcpy(mem_block, q->buffer, size); */
default: /* impossible */ return 0;}end_request(status);//when finishing a request, remove it
}}
To Write a Block Device Driver (summarize)
• Write all the device operation functions– xxx_open(), xxx_release(), xxx_ioctl()...
• Write a request service routine– xxx_request()
• Write interrupt handler and related tasklets• Write module “ init” and “ exit” functions to– Register and unregister the device driver– Set up and clear up the request queue and parameters– Set up and clear up the interrupt line and handler
ThanksRef: Chap 16, LDD 3e Rubini- Corbet
Wishing u happy diwali