a4: layered block-structured file system · 2018. 1. 26. · slides originally by robbert van...

Post on 09-Oct-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A4: Layered Block-Structured File System

CS4410OperatingSystems

SlidesoriginallybyRobbertvanRenesse.

Introduction

2

BlockStore

PhysicalDevice(e.g.,DISK)

FileSystemabstractionthatprovidespersistent,named data

Disk: sectorsidentifiedwithlogicalblockaddresses,specifyingsurface,track,andsectortobeaccessed.

Layered Abstractions to access storage(HIGHLY SIMPLIFIED FIGURE 11.7 from book)

abstractionprovidingaccesstoasequenceofnumbered blocks.

(Nonames.)

BlockStoreAbstractionProvidesadisk-likeinterface:• asequenceofblocksnumbered0,1,… (typicallyafewKB)• youcanreadorwrite1blockatatime

3

nblocks() returns size of the block store in #blocks

read(block_num) returns contents of given block number

write(block_num, block) writes block contents at given block num

setsize(size) sets the size of the block store

A4hasyouworkwithmultipleversions/instantiationsof

thisabstraction.

Headsupaboutthecode!Thisentirecodebaseiswhathappenswhenyouwantobjectorientedprogramming,butyouonlyhaveC.

PutonyourC++/JavaGoggles!

block_store_t (ablockstoretype)isessentiallyanabstractclass

4

Contentsofblock_store.h#define BLOCK_SIZE 512 // # bytes in a block

typedef unsigned int block_no; // index of a block

typedef struct block { char bytes[BLOCK_SIZE];

} block_t;

typedef struct block_store {void *state;int (*nblocks)(struct block_store *this_bs);int (*read)(struct block_store *this_bs, block_no offset, block_t *block);int (*write)(struct block_store *this_bs, block_no offset, block_t *block);int (*setsize)(struct block_store *this_bs, block_no size);void (*destroy)(struct block_store *this_bs);

} block_store_t;

5

ß poorman’sclass

Noneofthisisdata!Alltypedefs!

BlockStoreInstructions• block_store_t *xxx_init(…)– Name&signaturevaries,setsupthefn pointers

• int nblocks(…)• read(…)• write(…)• setsize(…)• destroy()– freeseverythingassociatedwiththisblockstore

6

ß “constructor”

ß “destructor”

sample.c -- justalonedisk#include ...#include “block_store.h”

int main(){block_store_t *disk = disk_init(“disk.dev”, 1024);block_t block;strcpy(block.bytes, “Hello World”);(*disk->write)(disk, 0, &block);(*disk->destroy)(disk);return 0;

}

RUN IT! IT’S COOL!> gcc -g block_store.c sample.c> ./a.out> less disk.dev

7

BlockStorescanbeLayered!

Eachlayerpresentsablockstoreabstraction

CACHEDISK

STATDISK

DISK

block_store

keepsacacheofrecentlyusedblocks

keepstrackof#readsand#writesforstatistics

keepsblocksinaLinuxfile

8

ACachefortheDisk?Yes!Allrequestsforagivenblockgothroughblockcache

9

BlockCacheAKAcachedisk

Disk

FileSystemAKAtreedisk

• Benefit#1:Performance– Cachesrecentlyreadblocks– Buffersrecentlywrittenblocks(tobewrittenlater)

• Benefit#2:Synchronization:Foreachentry,OSaddsinformationto:• preventaprocessfromreadingblockwhileanotherwrites

• ensurethatagivenblockisonlyfetchedfromstoragedeviceonce,evenifitissimultaneouslyreadbymanyprocesses

layer.c -- codewithlayers#define CACHE_SIZE 10 // #blocks in cache

block_t cache[CACHE_SIZE];

int main(){block_store_t *disk = disk_init(“disk2.dev”, 1024);block_store_t *sdisk = statdisk_init(disk);block_store_t *cdisk = cachedisk_init(sdisk, cache, CACHE_SIZE);

block_t block;strcpy(block.bytes, “Farewell World!”);(*cdisk->write)(cdisk, 0, &block);(*cdisk->destroy)(cdisk);(*sdisk->destroy)(sdisk);(*disk->destroy)(disk);

return 0;}

RUN IT! IT’S COOL!> gcc -g block_store.c statdisk.c cachedisk.c layer.c> ./a.out> less disk2.dev

10

CACHEDISK

STATDISK

DISK

ExampleLayersblock_store_t *statdisk_init(block_store_t *below);

// counts all reads and writes

block_store_t *debugdisk_init(block_store_t *below, char *descr);// prints all reads and writes

block_store_t *checkdisk_init(block_store_t *below);// checks that what’s read is what was written

block_store_t *disk_init(char *filename, int nblocks)// simulated disk stored on a Linux file// (could also use real disk using /dev/*disk devices)

block_store_t *ramdisk_init(block_t *blocks, nblocks)// a simulated disk in memory, fast but volatile

11

Howtowritealayerstruct statdisk_state {

block_store_t *below; // block store belowunsigned int nread, nwrite; // stats

};

block_store_t *statdisk_init(block_store_t *below){struct statdisk_state *sds = calloc(1, sizeof(*sds));sds->below = below;

block_store_t *this_bs = calloc(1, sizeof(*this_bs));this_bs->state = sds;this_bs->nblocks = statdisk_nblocks;this_bs->setsize = statdisk_setsize;this_bs->read = statdisk_read;this_bs->write = statdisk_write;this_bs->destroy = statdisk_destroy;return this_bs;

} 12

layer-specificdata

statdisk implementation(cont’d)int statdisk_read(block_store_t *this_bs, block_no offset, block_t *block){

struct statdisk_state *sds = this_bs->state;sds->nread++;return (*sds->below->read)(sds->below, offset, block);

}

int statdisk_write(block_store_t *this_bs, block_no offset, block_t *block){struct statdisk_state *sds = this_bs->state;sds->nwrite++;return (*sds->below->write)(sds->below, offset, block);

}

void statdisk_destroy(block_store_t *this_bs){free(this_bs->state);free(this_bs);

} 13

recordsthestatsandpassestherequesttothelayerbelow

AnotherPossibleLayer:Treedisk• Afilesystem,similartoUnixfilesystems• InitializedtosupportNvirtualblockstores(AKAfiles)• Underlyingblockstore(below)partitionedinto3sections:1. Superblock: block#02. Fixednumberofi-nodeblocks: startsatblock#1– FunctionofN(enoughtostoreNi-nodes)

3. Remainingblocks: startsafteri-nodeblocks– datablocks,freeblocks,indirectblocks,freelist blocks

14

blocknumber 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

blocks:

Remainingblocksi-nodeblocks

superblock

TypesofBlocksinTreedisk

• Superblock:the0th blockbelow• Freelistblock:listofallunusedblocksbelow• I-nodeblock: listofinodes• Indirblock: listofblocks• Datablock: justdata

15

union treedisk_block {block_t datablock;struct treedisk_superblock superblock;struct treedisk_inodeblock inodeblock;struct treedisk_freelistblock freelistblock;struct treedisk_indirblock indirblock;

};

treedisk Superblock

// one per underlying block storestruct treedisk_superblock {

block_no n_inodeblocks; block_no free_list; // 1st block on free list

// 0 means no free blocks};

16

blocknumber 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

blocks:

remainingblocksinode blockssuperblock

Notice:therearenopointers.Everythingisablocknumber.

n_inodeblocks 4free_list ?(some green box)

treedisk FreeList

struct treedisk_freelistblock {block_no refs[REFS_PER_BLOCK];

};

refs[0]:#ofanotherfreelistblock or0ifendoflist

refs[i]:#offreeblockfori>1,0ifslotempty

17

blocknumber 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

blocks: 413

remainingblocksinode blockssuperblock

0678

5101112

914150Suppose REFS_PER_BLOCK = 4

treedisk freelist

n_inodeblocks #

free_list

superblock:

0 0 0

freelist block

0freelist block

freeblock

freeblockfreeblock

freeblock

18

treedisk I-nodeblock

19

blocknumber 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

blocks:

remainingblocksinode blockssuperblock

struct treedisk_inodeblock {struct treedisk_inode inodes[INODES_PER_BLOCK];

};

struct treedisk_inode {block_no nblocks; // # blocks in virtual block storeblock_no root; // block # of root node of tree (or 0)

};

11500

inode[0]

inode[1]

91400

SupposeREFS_PER_BLOCK = 4

Whatifthefileisbiggerthan1block?

treedisk Indirectblock

20

blocknumber 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

blocks:

remainingblocksinode blockssuperblock

struct treedisk_indirblock {block_no refs[REFS_PER_BLOCK];

};

115314

Suppose INODES_PER_BLOCK = 2

inode[0]

inode[1]

nblocksroot

nblocksroot

1312110

virtualblockstore:3blocks

nblocks 3

rooti-node:

indirectblock

datablock

datablock

datablock

21Whatifthefileisbiggerthan3 blocks?

treedisk virtualblockstore

nblocks ####

root

i-node: (double)indirectblock

indirectblock indirectblock

datablock

datablockdatablock

22HowdoIknowifthisisdataorablocknumber?

treedisk virtualblockstore

• alldatablocksatbottomlevel• #levels:ceil(logRPB(#blocks))+1

RPB=REFS_PER_BLOCK

• Forexample,ifrpb =16:#blocks #levels

0 0

1 1

2- 16 2

17- 256 3

257- 4096 4

REFS_PER_BLOCKmorecommonlyatleast128orso 23

virtualblockstore:withhole

nblocks 3

root

i-node: indirectblock

datablock

datablock

• Holeappearsasavirtualblockfilledwithnullbytes• pointertoindirectblockcanbe0too• virtualblockstorecanbemuchlargerthanthe“physical”blockstoreunderneath!

24

0

Puttingitalltogether

25

blocknumber 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

blocks: 49

remainingblocksinodeblocks

superblock

0678

51000

1312110

115314

inode[0]

inode[1]

nblocksroot

nblocksroot

Ashort-livedtreedisk filesystem

#define DISK_SIZE 1024#define MAX_INODES 128

int main(){block_store_t *disk = disk_init(“disk.dev”, DISK_SIZE);

treedisk_create(disk, MAX_INODES);

treedisk_check(disk); // optional: check integrity of file system

(*disk->destroy)(cdisk);

return 0;}

26

Examplecodewithtreediskblock_t cache[CACHE_SIZE];

int main(){block_store_t *disk = disk_init(“disk.dev”, 1024);block_store_t *cdisk = cachedisk_init(disk, cache, CACHE_SIZE);treedisk_create(disk, MAX_INODES);block_store_t *file0 = treedisk_init(cdisk, 0);block_store_t *file1 = treedisk_init(cdisk, 1);

block_t block;(*file0->read)(file0, 4, &block);(*file1->read)(file1, 4, &block);

(*file0->destroy)(file0);(*file1->destroy)(file1);(*cdisk->destroy)(cdisk);(*disk->destroy)(cdisk);

return 0;}

27

Layeringontopoftreedisk

CACHEDISK

DISK

inode 0 inode 1 inode …

block_store_t *treedisk_init(block_store_t *below,unsigned int inode_no);

TREEDISK TREEDISK

28

...

...

//createsanewfileassociatedwithinode_no

TREEDISK

traceutility

TREEDISK

CHECKDISK

STATDISK

CHECKDISK CHECKDISK CHECKDISK

TREEDISK TREEDISK

TRACEDISK

RAMDISK

29

CACHEDISK

...

...

tracedisk• ramdisk isbottom-levelblockstore• tracedisk isatop-levelblockstore– or“application-level”ifyouwill– youcan’tlayerontopofit

block_store_t *tracedisk_init(block_store_t *below,char *trace, //tracefilenameunsigned int n_inodes);

30

TracefileCommandsW:0:3 //writeinode 0,block3Ifnothingisknownaboutthefileassociatedwithinode 0priortothisline,bywritingtoblock3,youareimplicitlysettingthesizeofthefileto4blocks

W:0:4 // writeto inode 0,block4bythesamelogic,younowsetthesizeto5sinceyou'vewrittentoblock4

N:0:2 //checksifinode 0isofsize2thiswillfailb/cthesizeshouldbe5

S:1:0 //setsizeofinode 1to0

R:1:1 // read inode 1,block1thiswillfailb/cyou’rereadingpasttheendofthefile(thereisnoblock1forthefileassociatedwithinode 1)

31

ExampletracefileW:0:0 //writeinode 0,block0N:0:1 //checksifinode 0isofsize1W:1:1 //writeinode 1,block1N:1:2 //checksifinode 1isofsize2R:1:1 //readinode 1,block1S:1:0 //setsizeofinode 1to0N:1:0 //checksifinode 0isofsize0

ifNfails, prints “!!CHKSIZE ..”

32

CompilingandRunning• run“make”inthereleasedirectory– thisgeneratesanexecutablecalled“trace”

• run“./trace”– thisreadstracefile“trace.txt”– youcanpassanothertracefileasargument

• ./tracemyowntracefile

33

Outputtobeexpected$ makecc -Wall -c -o trace.o trace.c. . .cc -Wall -c -o treedisk_chk.o treedisk_chk.ccc -o trace trace.o block_store.o cachedisk.o checkdisk.odebugdisk.o ramdisk.o statdisk.o tracedisk.o treedisk.otreedisk_chk.o$ ./traceblocksize: 512refs/block: 128!!TDERR: setsize not yet supported!!ERROR: tracedisk_run: setsize(1, 0) failed!!CHKSIZE 10: nblocks 1: 0 != 2!$STAT: #nnblocks: 0!$STAT: #nsetsize: 0!$STAT: #nread: 32!$STAT: #nwrite: 20 34

TraceW:0:0N:0:1W:0:1N:0:2W:1:0N:1:1W:1:1N:1:2S:1:0N:1:0

Cmd:inode:block

A4:Part1/3Implementtreedisk_setsize(0)– currentlyitgeneratesanerror– whatyouneedtodo:

• iteratethroughalltheblocksintheinode• putthemonthefreelist

Usefulfunctions:• treedisk_get_snapshot

35

A4:Part2/3Implementcachedisk– currentlyitdoesn’tactuallydoanything– whatyouneedtodo:

• pickacachingalgorithm:LRU,MFU,ordesignyourown– gowild!

• implementitwithincachedisk.c• write-throughcache!!• consultthewebforcachingalgorithms!

36

A4:Part3/3Implementyourowntracefilethat:• isatleast10lineslong• usesall4commands(RWNS)• hasaneditdistanceofatleast6fromthetracewegaveyou• iswell-formed.Forexample,itshouldnottrytoverifythatafilehas

asizeXwhenthepreviouscommandhaveinfactdeterminedthatitshouldhavesizeY.Youmayfindthechktrace.c fileuseful

• Atmost:10,000commands,128inodes, 1<<27blocksizeStep1:useittoconvinceyourselfthatyourcacheisworkingcorrectly.OptionalStep:makeatracethatishardforacachinglayertobeeffective(randomreads/writes)sothatitcanbeusedtodistinguishgoodcachesfrombadones.

37

Whattosubmit• treedisk.c //withtreedisk_setsize(0)• cachedisk.c• trace.txt

38

TheBigRedCachingContest!!!• Wewillruneverybody’straceagainsteverybody’streedisk andcachedisk

• Wewillrunthisontopofastatdisk• Wewillcountthetotalnumberofreadoperations

• Thewinneriswhomeverendsupdoingthefewestreadoperationstotheunderlyingdisk

• DoesnotcounttowardsgradeofA4,butyoumaywinfameandglory

39

top related