zfs talk part 1
Post on 22-Jun-2015
363 Views
Preview:
DESCRIPTION
TRANSCRIPT
zfs to developers
zfs, a modern file system built forlarge scale data integrity
wikipedia to the rescue!
NFSLustreOpen Office
Sun Microsystems
They also made a large amount of hardware
http://zfsonlinux.org/docs/LUG12_ZFS_Lustre_for_Sequoia.pdf
2005, integrated into solaris kernel
2013OpenZFS
2010 illumos founded
45 commits to ZoL 1174 commits to ZoL
2008 first commit to ZoL
-File systems should be large-Storage media is not to be trusted-Storage maintenance should be easy-Disk storage should be more like ram
File systems should be largeOur largest system was 144 TB of storage.
disks * capacity36 * 4
ZFS can address hard drives so large they could not be stored on this planet.
File systems should be largeext4 1 EXBHFS+ 1 EXBBTRFS 16 EXBzfs 256X(1024) EXB
File systems should be largewho cares?
Storage media is not to be trusted
-Spinning disks have a bit error rate-Sometimes the head writes to the wrong place-”Modern hard disks write so fast and so faint that they are only guessing that what they read is what you wrote”-Cables go bad-Cosmic rays (!!!)
Storage media is not to be trustedzfs overcomes these problems with checksumming. Every block is run through fletcher4 before it is written, and that checksum is combined with other metadata and written “far away” from the data when they are written out.
sha256futureEdon-RSkein
Storage media is not to be trusted
Does not happen too often, is usually just a great early warning that the drive is failing
Storage maintenance should be easy
zpool create name diskszfs create filesystemzfs set compression=off filesystemzfs set sync=disabled filesystemzpool statuszfs destroy
Storage maintenance should be easy
Is it intuitive?
zfs snapshotzfs send/receivezfs create/destroy
Storage maintenance should be easy
Is it intuitive?
zpool add VS zpool attach
Storage maintenance should be easy
Is it easy?
I think so
Disk storage should be more like ram
Should open a computer up, throw some disks in there and be running. Never need to mess with it, never need to tune it.
Disk storage should be more like ram
FAIL
tuning is not recommended
Disk storage should be more like ram
“Tuning is evil, yes, in the way that doing something against the will of the creator is evil”
zfs sits above your hard drives and below your directory, it adds features you might like.
zfs sits above your hard drives and below your directory, it adds features you might like.
data integrity
trasnparent compression (LZ4)
improved throughput
snapshoting replication via snapshoting
speed via ARC
easy maintancne
choice in raid setup
Command overview
zfszpool
zdb
Command overview
zfs every weekzpool every month
zdb depends on the day
Command overview
zfs Awesome man pagezpool Awesome man page
zdb meh...
zpool create
zpool createzpool create tank -o ashift=12 -O compression=lz4 mirror ata-WDC_WD1002FAEX-00Y9A0_WD-WCAW32714185 ata-WDC_WD1002FAEX-00Z3A0_WD-WMATR0443468
zpool createzpool create tank -o ashift=12 -O compression=lz4 mirror ata-WDC_WD1002FAEX-00Y9A0_WD-WCAW32714185 ata-WDC_WD1002FAEX-00Z3A0_WD-WMATR0443468
zpool createzpool create tank -o ashift=12 -O compression=lz4 mirror ata-WDC_WD1002FAEX-00Y9A0_WD-WCAW32714185 ata-WDC_WD1002FAEX-00Z3A0_WD-WMATR0443468
zpool createzpool create tank -o ashift=12 -O compression=lz4 mirror ata-WDC_WD1002FAEX-00Y9A0_WD-WCAW32714185 ata-WDC_WD1002FAEX-00Z3A0_WD-WMATR0443468
zpool createzpool create tank -o ashift=12 -O compression=lz4 mirror ata-WDC_WD1002FAEX-00Y9A0_WD-WCAW32714185 ata-WDC_WD1002FAEX-00Z3A0_WD-WMATR0443468
zpool createzpool create tank -o ashift=12 -O compression=lz4 mirror ata-WDC_WD1002FAEX-00Y9A0_WD-WCAW32714185 ata-WDC_WD1002FAEX-00Z3A0_WD-WMATR0443468
zpool createzpool create tank -o ashift=12 -O compression=lz4 mirror ata-WDC_WD1002FAEX-00Y9A0_WD-WCAW32714185 ata-WDC_WD1002FAEX-00Z3A0_WD-WMATR0443468
zpool createzpool create tank -o ashift=12 -O compression=lz4 mirror ata-WDC_WD1002FAEX-00Y9A0_WD-WCAW32714185 ata-WDC_WD1002FAEX-00Z3A0_WD-WMATR0443468
/dev/disk/by-id/ata-*
zpool createzpool create tank -o ashift=12 -O compression=lz4 mirror ata-WDC_WD1002FAEX-00Y9A0_WD-WCAW32714185 ata-WDC_WD1002FAEX-00Z3A0_WD-WMATR0443468
/dev/disk/by-id/ata-*
http://zfsonlinux.org/faq.html#WhatDevNamesShouldIUseWhenCreatingMyPool
zpool status/home/sburgess > zpool status pool: tank state: ONLINE scan: scrub repaired 0 in 19h39m with 0 errors on Tue Jul 15 10:23:16 2014config:
NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ata-WDC_WD1002FAEX-00Y9A0_WD-WCAW32714185 ONLINE 0 0 0 ata-WDC_WD1002FAEX-00Z3A0_WD-WMATR0443468 ONLINE 0 0 0
so far/home/sburgess > zpool get all tank
NAME PROPERTY VALUE SOURCE
tank size 928G -
tank capacity 34% -
tank health ONLINE -
so far/home/sburgess > zfs get all tank
NAME PROPERTY VALUE SOURCE tank type filesystem - tank creation Thu Jan 3 15:55 2013 - tank used 325G - tank available 589G - tank referenced 184K - tank compressratio 1.54x - tank mounted yes - tank recordsize 128K default tank mountpoint /tank default tank compression lz4 local tank sync standard default tank refcompressratio 1.00x
so far/home/sburgess > ls /tank/
zfs create
zfs create tank/home
zfs create
zfs create tank/home/sburgess-o mountpoint=/home/sburgess
zfs create
zfs create tank/home/sburgess/downloadszfs create tank/home/sburgess/projectszfs create tank/home/sburgess/tools
zfs create
zfs create tank/home/sburgess/downloadszfs create tank/home/sburgess/projectszfs create tank/home/sburgess/tools
chown -R sburgess: /home/sburgess
zfs createzfs list -o name,refer,used,compressratio -r tank/home/sburgess
NAME REFER USED RATIOtank/home/sburgess 4.37G 114G 1.73xtank/home/sburgess/downloads 34.8G 36.0G 1.66xtank/home/sburgess/projects 2.08G 11.7G 1.30xtank/home/sburgess/tools 583M 635M 1.54x
zfs createmv Pictures pic
zfs create tank/home/sburgess/Pictures
chown -R surgess: Pictures
mv pic/* Pictures
zfs create/home/sburgess > zfs list -o name,refer,used,compressratio -r tank/home/sburgessNAME REFER USED RATIOtank/home/sburgess 4.36G 114G 1.73xtank/home/sburgess/Pictures 11.3M 11.3M 1.16xtank/home/sburgess/downloads 34.8G 36.0G 1.66xtank/home/sburgess/projects 2.08G 11.7G 1.30xtank/home/sburgess/tools 583M 635M 1.54x
zfs createshopt -s dotglob
du -hs *
2.9G .kde
1.3G .cache
uberblock
uberblockThe root of the zfs hash tree
“A Merkle tree is a tree in which every non-leaf node is labelled with the hash of the labels of its children nodes.”
uberblockzdb -u poolName
uberblockzdb -u test
Uberblock: magic = 0000000000bab10c version = 5000 txg = 5 guid_sum = 16411893724316372364 timestamp = 1392754246 UTC = Tue Feb 18 15:10:46 2014
uberblockUberblock: magic = 0000000000bab10c version = 5000 txg = 5 guid_sum = 16411893724316372364 timestamp = 1392754246 UTC = Tue Feb 18 15:10:46 2014
… cat /dev/urandom > file …
Uberblock: magic = 0000000000bab10c version = 5000 txg = 163 guid_sum = 16411893724316372364 timestamp = 1392755035 UTC = Tue Feb 18 15:23:55 2014
uberblockUberblock: magic = 0000000000bab10c version = 5000 txg = 163 guid_sum = 16411893724316372364 timestamp = 1392755035 UTC = Tue Feb 18 15:23:55 2014
… zpool attach pool disk1 disk2…
Uberblock: magic = 0000000000bab10c version = 5000 txg = 197 guid_sum = 16865875370843337150 timestamp = 1392755190 UTC = Tue Feb 18 15:26:30 2014
uberblockGo back in time via
zpool import -F
snapshotting
snapshotting
zfs snapshot tank/home/sburgess@now
snapshotting
zfs list -o name,creation,used -t all -r tank/home/sburgess
What to do with snapshots
.zfs directory
Always there, whether or not it shows up in ls -a is controlled by
zfs set snapdir=hidden|visible filesystem
.zfs directory
Contains .zfs/snapshots, which has a directory for each snapshot. When you access any directory, it is temporarily mounted read only there.
.zfs directory
Use case:
-Test if/when a file was created
-Easily restore a file or two, for large complicated restores, use clone.
zfs rollback
zfs rollback tank/home/sburgess@then
Should be the most recent snapshot, but you can use -r to roll back further
zfs rollback
Use case:
Being too bold with tar -x
zfs clone
zfs clone tank/home/sburgess@now tank/other
tank/other is aread/write, snapshotable, cloneable file system
Initially shares all blocks with the parent, takes 0 space, amplify ARC hits
zfs clone
Use case:
Virtual Machine base images
All configs, modules, programs and OS data shared
zfs clone
zfs clone-o readonly=on-o mountpoint=/tmp/rotank/home/sburgess@now tank/other
zfs clone
-safe (readonly)-0 time-0 space
zfs clone
Use case:
-large file restore-diffing files across both
zfs clone
What clones of this snapshot exist?zfs get clones filesystem@snapshot
What snapshot was this filesystem cloned from?zfs get origin filesystem
a note on -
“-” is zfs none/null/not applicable
zfs get clones tankNAME PROPERTY VALUE SOURCEtank clones - -
zfs get origin tank@nowNAME PROPERTY VALUE SOURCEtank@now origin - -
a note on -
“-” is zfs none/null/not applicable
zpool get version
NAME PROPERTY VALUE SOURCEtank version - default
a note on 5000
zpool version numbers no longer increase with features
zfs send
zfs send
Original idea:
Send the changes I made today across the ocean
zfs send
Create a file detailing the changes that need to be made to transition a filesystem from one snapshot to another.
zfs send
zfs send is a dictation, not a conversation
zfs sendzpool create -O compression=off -O copies=2 -o ashift=12
zpool create -O compression=lz4 -O checksum=sha256 -o ashift=9
zfs send
zfs send tank/currr@1387825261Error: Stream can not be written to a terminal.You must redirect standard output.
zfs send
-n
-v
zfs send
zfs send -n -v tank/home/sburgess@now
zfs send
zfs send -n -v tank/home/sburgess@nowsend from @ to tank/home/sburgess@now total estimated size is 9.22G
zfs send
zfs send tank/home/sburgess@now
What does this send? What does it create when its received?
zfs send
zfs send tank/home/sburgess@now
Its sends a “full” filesystem, everything that is needed to create tank/home/sburgess@now
The receiving side gets a new FS with a single snapshot named now
zfs send
Can be used with the -i and -I options to send incremental changes. Only send the blocks that changed between the first and second snapshots.
zfs send
-i do not send intermediate snapshots
-I send intermediate snapshots
zfs send
-i do not send intermediate snapshots
-I send intermediate snapshots
zfs send -I early file/system/path@late
zfs get vs zfs list
zfs get vs zfs list
When working interactively use zfs list
zfs list -t all -o name,written,used,mounted
NAME WRITTEN USED MOUNTED
tank/home/sburgess/tools@1387825261 0 0 -
tank/images 590M 8.82G no
tank/images@base 8.25G 369M -
tank/other 8K 8K yes
tank/trick 0 136K yes
zfs get vs zfs list
zfs list is the same as
zfs list -o name,used,avail,refer,mountpoint
zfs get vs zfs list
zfs list is the same as
zfs list -o name,used,avail,refer,mountpoint ^^^^
zfs get vs zfs list
zfs list | grep/awk/??
zfs get vs zfs list
when looking at an FS or snapshot, I callzfs get all item | less
zfs get vs zfs list
For programmatic use, use zfs get -H -P
zfs get used tank
NAME PROPERTY VALUE SOURCE
tank used 484G -
zfs get used -o value -H -p tank
519265562624
Learn more
read the zpool man page
read the zfs man page
subscribe to the ZoL mailing list, and just read new messages as they come in
top related