Difference between revisions of "ZFS"
(→ZFS Resources) |
(→freebsd zfs creation) |
||
(52 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | |||
− | |||
ZFS is a combined file system and logical volume manager created by Sun Microsystems. For more official information see: [http://en.wikipedia.org/wiki/ZFS Wikipedia ZFS Entry] This means that not only is ZFS a file system but it also functions as a software raid. While ZFS has many features and is a solution for things such as the RAID 5 write hole it is also extremely simplistic. ZFS is easy, fast, flexible, and under development. | ZFS is a combined file system and logical volume manager created by Sun Microsystems. For more official information see: [http://en.wikipedia.org/wiki/ZFS Wikipedia ZFS Entry] This means that not only is ZFS a file system but it also functions as a software raid. While ZFS has many features and is a solution for things such as the RAID 5 write hole it is also extremely simplistic. ZFS is easy, fast, flexible, and under development. | ||
Line 9: | Line 7: | ||
*Nexenta (This OS claims to be the Solaris kernel with the Ubuntu (Linux) userland which seems nice but it has no window manager by default) | *Nexenta (This OS claims to be the Solaris kernel with the Ubuntu (Linux) userland which seems nice but it has no window manager by default) | ||
*StormOS (This OS is the same as Nexenta but has GNOME installed by default) | *StormOS (This OS is the same as Nexenta but has GNOME installed by default) | ||
+ | |||
+ | =Sending Pools Across Network Simple= | ||
+ | <pre> | ||
+ | |||
+ | Snapshot | ||
+ | zfs snapshot eh/data@###### | ||
+ | |||
+ | Initial Send | ||
+ | zfs send eh/data@130120 | ssh 10.0.0.6 /usr/sbin/zfs recv eh/data | ||
+ | |||
+ | Initial Send nc | ||
+ | Server | ||
+ | zfs send eh/data@130120 | pv | nc -w 20 10.0.0.6 8023 | ||
+ | Client | ||
+ | nc -w 120 -l 8023 | zfs recv eh/data | ||
+ | |||
+ | Sync Send | ||
+ | Server | ||
+ | zfs send -v -i 130120 eh/data@130428 | pv | nc -w 20 10.0.0.6 8023 | ||
+ | |||
+ | Client | ||
+ | nc -w 120 -l 8023 | zfs recv eh/data | ||
+ | |||
+ | </pre> | ||
+ | |||
+ | =NFS Specific Share Options= | ||
+ | *http://docs.oracle.com/cd/E26502_01/html/E28997/rfsrefer-13.html#rfsrefer-103 | ||
+ | zfs sharenfs=rw,anon=uidofuseremulated share/name | ||
+ | |||
+ | =ZFS Performance/Slow ZFS Pool= | ||
+ | iostat -mX | ||
+ | zpool iostat eh 2 | ||
+ | zpool iostat -v | ||
+ | |||
+ | svc_t - average response time of transactions, in milliseconds | ||
+ | |||
+ | *http://docs.oracle.com/cd/E19253-01/819-5461/gammt/index.html | ||
+ | |||
+ | =Hotplug= | ||
+ | *http://docs.oracle.com/cd/E23824_01/html/821-1459/devconfig2-25.html | ||
+ | *https://blogs.oracle.com/sa/entry/hotplugging_sata_drives | ||
+ | |||
+ | cfgadm -a | ||
+ | cfgadm -c unconfigure sata0/1 | ||
+ | cfgadm -c configure sata0/1 | ||
+ | |||
+ | =Mirror rpool= | ||
+ | *http://docs.oracle.com/cd/E19253-01/819-5461/gkdep/index.html | ||
+ | *http://www.nickebo.net/making-your-zfs-root-pool-a-mirror-post-installation/ | ||
+ | |||
+ | *List disks | ||
+ | format -e c0d1 (or whatever you new disk is called) | ||
+ | *you probably do not need fdisk | ||
+ | >fdisk | ||
+ | >y | ||
+ | *k? | ||
+ | >label | ||
+ | >0 (SMI) | ||
+ | >y | ||
+ | >quit | ||
+ | prtvtoc /dev/rdsk/c5t0d0s2 | fmthard -s - /dev/rdsk/c5t1d0s2 | ||
+ | |||
+ | *Attach drive | ||
+ | zpool attach -f rpool c5t0d0s0 c5t1d0s0 | ||
+ | Make sure to wait until resilver is done before rebooting. | ||
+ | |||
+ | *Check | ||
+ | zpool status | ||
+ | |||
+ | *Install grub bootloader on 2nd disk: | ||
+ | installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c5t1d0s0 | ||
+ | |||
+ | =OS Support= | ||
+ | |||
+ | * I have been using ZFS and OpenIndiana for over a year now! Works great!!! | ||
+ | * I have been using ZFS and FreeBSD for a while now! Works great!!! | ||
+ | * I have not tested on Linux but here is a full port of it: http://zfsonlinux.org/ | ||
=ZFS Background= | =ZFS Background= | ||
Line 46: | Line 121: | ||
ZFS also has its own share maker. That is the zfs command creates your NFS or CIFS shares...look it up. | ZFS also has its own share maker. That is the zfs command creates your NFS or CIFS shares...look it up. | ||
+ | |||
+ | =Create ZFS Raidz= | ||
+ | |||
+ | |||
+ | ==Basics== | ||
+ | *First find the names of your disks: | ||
+ | |||
+ | format | ||
+ | AVAILABLE DISK SELECTIONS: | ||
+ | 0. c5t0d0 <ATA-ST920217AS-3.01 cyl 2429 alt 2 hd 255 sec 63> | ||
+ | /pci@0,0/pci8086,202d@1f,2/disk@0,0 | ||
+ | 1. c7t50014EE2B1448CFEd0 <ATA-WDCWD20EARX-00P-AB51 cyl 60798 alt 2 hd 255 sec 252> | ||
+ | /scsi_vhci/disk@g50014ee2b1448cfe | ||
+ | 2. c7t50014EE25C1A7300d0 <ATA-WDCWD20EARX-00P-AB51 cyl 60798 alt 2 hd 255 sec 252> | ||
+ | /scsi_vhci/disk@g50014ee25c1a7300 | ||
+ | 3. c7t50014EE25CC0D3EEd0 <ATA-WDCWD20EARX-32P-AB51 cyl 60798 alt 2 hd 255 sec 252> | ||
+ | /scsi_vhci/disk@g50014ee25cc0d3ee | ||
+ | 4. c7t50014EE20699B771d0 <ATA-WDCWD20EARX-00P-AB51 cyl 60798 alt 2 hd 255 sec 252> | ||
+ | /scsi_vhci/disk@g50014ee20699b771 | ||
+ | |||
+ | *No redundancy | ||
+ | zpool create tank raidz c7t50014EE2B1448CFEd0 c7t50014EE25C1A7300d0 c7t50014EE25CC0D3EEd0 c7t50014EE20699B771d0 | ||
+ | *1 disk | ||
+ | zpool create tank raidz1 c7t50014EE2B1448CFEd0 c7t50014EE25C1A7300d0 c7t50014EE25CC0D3EEd0 c7t50014EE20699B771d0 | ||
+ | *2 disk | ||
+ | zpool create tank raidz2 c7t50014EE2B1448CFEd0 c7t50014EE25C1A7300d0 c7t50014EE25CC0D3EEd0 c7t50014EE20699B771d0 | ||
+ | *etc | ||
+ | |||
+ | Everyone is recommending raidz2 now a days. I dunno, I guess if you do not backup, that is the way to go. But do keep in mind, drives do fail. Really, they do. I think the big issue is on a rebuild if another drive goes bad. But then what if all the disks go bad? See? | ||
+ | |||
+ | ==Advanced Format Drives and the Future== | ||
+ | So new drives use 4k sectors now to save on allocation of ECC data on a physical disk. | ||
+ | |||
+ | To make a long story short<ref>http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks</ref>, if your system truly recognizes 4k disks, and all your disks are 4k, zfs should do the rest of the work. | ||
+ | |||
+ | But lets say you want to future proof everything (4k sizes on 512 byte disks, no performance impact) or have a mixed array that you plan to grow and eventually replace with all 4k disks. You are going to need to manually add the ashift parm do the create line (You cannot change your ashift later). | ||
+ | |||
+ | This value is actually a bit shift value, so an ashift value for 512 bytes is 9 (29 = 512) while the ashift value for 4,096 bytes is 12 (212 = 4,096). To force the pool to use 4,096 byte sectors we must specify this at pool creation time<ref>http://zfsonlinux.org/faq.html</ref>: | ||
+ | |||
+ | zpool create -o ashift=12 tank raidz1 diskbla bladisk whatever | ||
+ | |||
+ | You can see what your drives are reporting with<ref>http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks</ref>: | ||
+ | |||
+ | echo ::sd_state | mdb -k | egrep '(^un|_blocksize)' | ||
+ | |||
+ | The unit number (un) corresponds to the SCSI driver (sd) instance number. In the above example, "un 0" is also known as "sd0." Sizes are in bytes, written as hex: 0x200 = 512, 0x1000 = 4096. | ||
+ | |||
+ | To inspect the ashift value actually used by ZFS for a particular pool, you can use zdb (without parameters it dumps labels of imported pools), for example: | ||
+ | |||
+ | <pre> | ||
+ | zdb | egrep 'ashift| name' | ||
+ | name='pond' | ||
+ | ashift=9 | ||
+ | ashift=9 | ||
+ | ashift=9 | ||
+ | ashift=9 | ||
+ | ashift=9 | ||
+ | ashift=9 | ||
+ | ashift=9 | ||
+ | ashift=9 | ||
+ | ashift=9 | ||
+ | name='rpool' | ||
+ | ashift=9 | ||
+ | name='temp' | ||
+ | ashift=9 | ||
+ | </pre> | ||
+ | |||
+ | Note that each pool can consist of one or several Top-level VDEVs, and each of those can have an individual composition of devices, i.e. not only mixing of mirrors and raidzN sets is possible, but also of devices with different hardware sector size and thus ashift as set at TLVDEV creation time. However, in order to avoid unbalanced IO and "unpleasant surprises" which might be difficult to explain and debug, it is discouraged to build pools from such mixtures. | ||
+ | |||
+ | ===Advanced Format Drives and the Future with Openindiana and sd.conf=== | ||
+ | So openindiana or illumos-gate does not accept the ashift value. Instead you have to edit /kernel/drv/sd.conf to force the driver. I am telling the system to treat my 512 byte disks as 4k so in the future I am happy. | ||
+ | |||
+ | This all comes from: http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks starting @ Overriding the Physical Block Size | ||
+ | |||
+ | Ways to get the drive info: | ||
+ | |||
+ | format -e | ||
+ | Select disk | ||
+ | inquiry | ||
+ | |||
+ | Vendor: ATA | ||
+ | Product: SAMSUNG HD204UI | ||
+ | Revision: 0001 | ||
+ | Or | ||
+ | iostat -Er | ||
+ | |||
+ | sd6 ,Soft Errors: 0 ,Hard Errors: 0 ,Transport Errors: 0 | ||
+ | Vendor: ATA ,Product: SAMSUNG HD204UI ,Revision: 0001 ,Serial No: S2H7J1CB112293 | ||
+ | Size: 2000.40GB <2000398934016 bytes> | ||
+ | |||
+ | Or | ||
+ | echo "::walk sd_state | ::grep '.!=0' | ::print struct sd_lun un_sd | \ | ||
+ | ::print struct scsi_device sd_inq | ::print struct scsi_inquiry \ | ||
+ | inq_vid inq_pid" | mdb -k | ||
+ | |||
+ | inq_vid = [ "ATA " ] | ||
+ | inq_pid = [ "SAMSUNG HD204UI " ] | ||
+ | |||
+ | So now that we have some information we have to edit sd.conf: | ||
+ | |||
+ | nano /kernel/drv/sd.conf | ||
+ | |||
+ | sd-config-list = | ||
+ | "ATA SAMSUNG HD204UI", "physical-block-size:4096"; | ||
+ | |||
+ | Another example: | ||
+ | |||
+ | sd-config-list = | ||
+ | "SEAGATE ST3300657SS", "physical-block-size:4096", | ||
+ | "DGC RAID", "physical-block-size:4096", | ||
+ | "NETAPP LUN", "physical-block-size:4096"; | ||
+ | |||
+ | If this<ref>https://www.illumos.org/issues/2665</ref> is still right the format of sd-config-list: | ||
+ | |||
+ | Padding is actually not necessary. Extra spaces (not tabs!) are ignored. | ||
+ | |||
+ | Asterisks have special meaning if they appear at the beginning and end of the ID string only. sd searches the product ID (not the vendor ID) for the substring enclosed by asterisks. | ||
+ | Examples: | ||
+ | |||
+ | "*ST9320*" will match any vendor ID and product ID "ST9320423AS". | ||
+ | "**" will match any vendor ID and product ID. | ||
+ | |||
+ | The ATA in my disk string had a bunch of spaces after it, it also had spaces after the model number. Looks like it does not matter. | ||
+ | |||
+ | |||
+ | Then: | ||
+ | update_drv -vf sd | ||
+ | |||
+ | Cannot unload module: sd | ||
+ | Will be unloaded upon reboot. | ||
+ | Forcing update of sd.conf | ||
+ | sd.conf updated in the kernel. | ||
+ | |||
+ | <ref>http://blog.delphix.com/gwilson/2012/11/15/4k-sectors-and-zfs/</ref>Don’t worry about the errors we only care that the sd.conf was re-read by the kernel. Unfortunately this new setting will not take effect as long as the current disk is attached so we must force it to unattach and reattach: | ||
+ | |||
+ | Match up your devices: | ||
+ | |||
+ | format output: | ||
+ | <pre> | ||
+ | 3. c2t50024E90049754DAd0 <ATA-SAMSUNGHD204UI-0001 cyl 60798 alt 2 hd 255 sec 252> | ||
+ | /scsi_vhci/disk@g50024e90049754da | ||
+ | 4. c2t50024E90049754DBd0 <ATA-SAMSUNGHD204UI-0001 cyl 60798 alt 2 hd 255 sec 252> | ||
+ | /scsi_vhci/disk@g50024e90049754db | ||
+ | 5. c2t50024E900497550Bd0 <ATA-SAMSUNGHD204UI-0001 cyl 60798 alt 2 hd 255 sec 252> | ||
+ | /scsi_vhci/disk@g50024e900497550b | ||
+ | 6. c2t50024E9004975524d0 <ATA-SAMSUNGHD204UI-0001 cyl 60798 alt 2 hd 255 sec 252> | ||
+ | /scsi_vhci/disk@g50024e9004975524 | ||
+ | </pre> | ||
+ | |||
+ | cfgadm output: | ||
+ | |||
+ | <pre> | ||
+ | c0 scsi-sas connected configured unknown | ||
+ | c0::w50024e90049754da,0 disk-path connected configured unknown | ||
+ | c1 scsi-sas connected configured unknown | ||
+ | c1::w50024e900497550b,0 disk-path connected configured unknown | ||
+ | c3 scsi-sas connected configured unknown | ||
+ | c3::w50024e9004975524,0 disk-path connected configured unknown | ||
+ | c4 scsi-sas connected configured unknown | ||
+ | c4::w50014ee65ac710bc,0 disk-path connected configured unknown | ||
+ | c5 scsi-sas connected configured unknown | ||
+ | c5::w50014ee65ac68d37,0 disk-path connected configured unknown | ||
+ | c6 scsi-sas connected unconfigured unknown | ||
+ | c7 scsi-sas connected configured unknown | ||
+ | c7::w50024e90049754db,0 disk-path connected configured unknown | ||
+ | c8 scsi-sas connected configured unknown | ||
+ | c8::w50014ee20c18549c,0 disk-path connected configured unknown | ||
+ | </pre> | ||
+ | |||
+ | Looks like I need to reconfigure: | ||
+ | |||
+ | c0::w50024e90049754da,0 | ||
+ | c1::w50024e900497550b,0 | ||
+ | c3::w50024e9004975524,0 | ||
+ | c7::w50024e90049754db,0 | ||
+ | |||
+ | cfgadm -al | ||
+ | |||
+ | cfgadm -c unconfigure c0::w50024e90049754da,0 | ||
+ | cfgadm -c configure c0::w50024e90049754da,0 | ||
+ | |||
+ | Do all the ones that changed. | ||
+ | |||
+ | |||
+ | This DOES NOT work to Check: | ||
+ | |||
+ | echo ::sd_state | mdb -k | egrep '(^un|_blocksize)' | ||
+ | |||
+ | NO NO NO | ||
+ | |||
+ | I had to add the disks to a pool and use zdb -l on the disk. Same end step written here: http://blog.delphix.com/gwilson/2012/11/15/4k-sectors-and-zfs/ | ||
+ | |||
+ | So the sd.conf modification does this: | ||
+ | |||
+ | 22:34 < tsoome> it will set sd driver options, so when you do ioctl( DIOCGMEDIASIZE), you get this value | ||
+ | 22:34 < tsoome> or DIOCGMEDIASIZEEXT | ||
+ | |||
+ | =sd.conf disable idle park heads= | ||
+ | |||
+ | You can also stop the 'green drives' from parking heads: | ||
+ | |||
+ | nano /kernel/drv/sd.conf | ||
+ | |||
+ | sd-config-list = "SEAGATE ST3300657SS", "power-condition:false", | ||
+ | "SEAGATE ST2000NM0001", "power-condition:false"; | ||
+ | |||
+ | ==Autoexpand== | ||
+ | It looks like you can enable and disable this one after creation<ref>https://docs.oracle.com/cd/E19253-01/819-5461/githb/index.html</ref>: | ||
+ | |||
+ | zpool set autoexpand=on tank | ||
+ | |||
+ | But on creation: | ||
+ | |||
+ | zpool create -o autoexpand=on tank c1t13d0 | ||
+ | |||
+ | ==Case Sensitive== | ||
+ | You many want to look at the casesensitivity property. Windows file systems are insensitive. | ||
+ | |||
+ | https://docs.oracle.com/cd/E19120-01/open.solaris/817-2271/gftgr/index.html | ||
+ | |||
+ | ==Other Cool Options== | ||
+ | listsnaps=on | ||
+ | Controls whether information about snapshots associated with this | ||
+ | pool is output when "zfs list" is run without the -t option. The | ||
+ | default value is off. | ||
+ | |||
+ | ==An Example Of A Final Command== | ||
+ | |||
+ | illumos-gate does not support ashift values btw. See: http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks and look for Overriding the Physical Block Size | ||
+ | |||
+ | zpool create -o ashift=12 -o autoexpand=on -o listsnaps=on data raidz1 c2t50014EE20C18549Cd0 c2t50024E90049754DAd0 c2t50024E90049754DBd0 c2t50024E900497550Bd0 c2t50024E9004975524d0 | ||
=ZFS Snapshots= | =ZFS Snapshots= | ||
Line 126: | Line 432: | ||
==Replication== | ==Replication== | ||
+ | ===Problem?=== | ||
+ | *http://hobby.keluargareski.net/tag/zfs/ | ||
+ | I was having a problem with receive through ssh. If you receive error such as zfs command not found, the solution is that you write full path of zfs in the remote size. | ||
+ | zfs send DV/Sn03 | ssh reski@zfsserver /usr/sbin/zfs receive DV/Sn03 | ||
+ | |||
+ | *I always run the send and receive commands in screen. | ||
+ | ===How=== | ||
Below text copied from: http://www.markround.com/archives/38-ZFS-Replication.html | Below text copied from: http://www.markround.com/archives/38-ZFS-Replication.html | ||
Line 192: | Line 505: | ||
[root@solaris]$ zfs set readonly=on slave/data | [root@solaris]$ zfs set readonly=on slave/data | ||
+ | *You may want to grant full permissions to username with this command (http://docs.oracle.com/cd/E19082-01/817-2271/gbchv/index.html) | ||
+ | zfs allow username send,receive,clone,create,destroy,hold,mount,promote,rename,rollback,share,snapshot tank | ||
So, let's look in the slave to see if our files are there : | So, let's look in the slave to see if our files are there : | ||
Line 227: | Line 542: | ||
And that's it. All that remains to turn this into a production system between two hosts is for a periodic cron job to be written that runs at the appropriate intervals (daily, or even every minute if need be) and snapshots the filesystem before transferring it. You'll also likely want to have another job that clears out old snapshots, or maybe archives them off somewhere. | And that's it. All that remains to turn this into a production system between two hosts is for a periodic cron job to be written that runs at the appropriate intervals (daily, or even every minute if need be) and snapshots the filesystem before transferring it. You'll also likely want to have another job that clears out old snapshots, or maybe archives them off somewhere. | ||
+ | |||
+ | ==ZFS Share== | ||
+ | * Enable Sharing and set sharename | ||
+ | zfs set sharesmb=name=myshare yourpool/shares/bob | ||
+ | * Share Filesystem | ||
+ | zfs set sharesmb=on fsname | ||
+ | *Check if shared | ||
+ | sharemgr show -vp | ||
+ | |||
+ | *Enable "pam_smb_passwd" to make regular OpenIndiana users have smb passwords. To do so, add the following line to the end of the file "/etc/pam.conf": | ||
+ | other password required pam_smb_passwd.so.1 nowarn | ||
+ | ==Quick Reference== | ||
+ | *Initial Send | ||
+ | zfs send tank/data@blabla | ssh remoteserver /usr/sbin/zfs recv tank/data | ||
+ | |||
+ | zfs list -t snapshot | ||
+ | zfs snapshot bla/bla@?????? | ||
+ | zfs send -v -i bla/bla@?????? bla/bla@?????? | ssh 10.0.0.6 zfs recv bla/bla/blackhole0 | ||
+ | ( zfs send -v -i bla/bla@older bla/bla@newer | ssh 10.0.0.6 zfs recv bla/bla/blackhole0 ) | ||
+ | |||
+ | =freebsd zfs creation= | ||
+ | |||
+ | *to list drives | ||
+ | |||
+ | atacontrol list | ||
+ | or | ||
+ | camcontrol devlist | ||
+ | |||
+ | *identify sector size | ||
+ | camcontrol identify ada2 | ||
+ | etc (the rest of your drives) | ||
+ | *If it is 4096 create nop devices, to make zfs use 4096 sized sectors | ||
+ | gnop create -S 4096 /dev/ada2 | ||
+ | etc (the rest of your drives) | ||
+ | *create raidz pool | ||
+ | zfs create tank raidz ada2.nop ada3.nop | ||
+ | *create raid0 pool | ||
+ | zfs create tank ada2.nop ada3.nop | ||
+ | |||
+ | *zfs vs zpool | ||
+ | *export zpool | ||
+ | zpool export data | ||
+ | gnop destroy /dev/ada2.nop /dev/ada3.nop | ||
+ | zpool import data | ||
+ | You can check the configuration of the pool by using the "zdb" command on the pool: | ||
+ | zdb -C data | grep ashift | ||
+ | The ashift should be "12" for 4K alignment. This works because ZFS writes the ashift value in its metadata | ||
+ | ==URL Reference== | ||
+ | *http://forums.freebsd.org/showthread.php?t=21644 | ||
+ | *https://wiki.freebsd.org/ZFSQuickStartGuide | ||
+ | |||
+ | *http://www.mailpile.is/ | ||
+ | *https://hakshop.myshopify.com/products/wifi-pineapple | ||
+ | *http://www.i-programmer.info/news/105-artificial-intelligence/6197-anonymouth-hides-identity.html | ||
+ | *https://lavabit.com/ | ||
+ | *http://ivoras.sharanet.org/blog/tree/2011-01-01.freebsd-on-4k-sector-drives.html (Section 3) | ||
+ | |||
+ | =zfs clone= | ||
+ | *http://thegeekdiary.com/zfs-tutorials-creating-zfs-snapshot-and-clones/ | ||
+ | |||
+ | Clones can only be created from snapshots. Snapshots can't be deleted until you have delete the clone created from it. They are read only until you make them read. | ||
=ZFS Resources= | =ZFS Resources= | ||
Line 241: | Line 617: | ||
[http://mail.opensolaris.org/mailman/listinfo] - Open Solaris Mailing Lists | [http://mail.opensolaris.org/mailman/listinfo] - Open Solaris Mailing Lists | ||
+ | |||
+ | [http://docs.oracle.com/cd/E19082-01/817-2271/gbcve/index.html] - Understanding the zpool status Output | ||
[[File:ZFS_Command_Quick_Reference.odt]] - Sun ZFS Command Quick Reference | [[File:ZFS_Command_Quick_Reference.odt]] - Sun ZFS Command Quick Reference | ||
[[File:819-5461.pdf]] - Solaris ZFS Administration Guide | [[File:819-5461.pdf]] - Solaris ZFS Administration Guide | ||
+ | |||
+ | [http://resilvered.blogspot.com/2011/07/how-to-shrink-zfs-root-pool.html] - Shrink rpool | ||
+ | |||
+ | [http://www.128bitstudios.com/2010/07/23/fun-with-zfs-send-and-receive/] - Fun with ZFS send and receive | ||
+ | |||
+ | [http://www.markround.com/archives/38-ZFS-Replication.html] - ZFS Send/Rec | ||
+ | |||
+ | [http://www.solarisinternals.com/wiki/index.php/ZFS_forensics_scrollback_script] - ZFS Rollback Forensics |
Latest revision as of 07:28, 29 August 2015
ZFS is a combined file system and logical volume manager created by Sun Microsystems. For more official information see: Wikipedia ZFS Entry This means that not only is ZFS a file system but it also functions as a software raid. While ZFS has many features and is a solution for things such as the RAID 5 write hole it is also extremely simplistic. ZFS is easy, fast, flexible, and under development.
ZFS Usable OS's:
- Solaris (Suns main OS no longer free but just a trial version)
- OpenSolaris (Suns opensource OS)
- FreeBSD (ZFS is being ported to this OS and while it is stable it lags behind the Solaris release for obvious reasons)
- Nexenta (This OS claims to be the Solaris kernel with the Ubuntu (Linux) userland which seems nice but it has no window manager by default)
- StormOS (This OS is the same as Nexenta but has GNOME installed by default)
Contents
- 1 Sending Pools Across Network Simple
- 2 NFS Specific Share Options
- 3 ZFS Performance/Slow ZFS Pool
- 4 Hotplug
- 5 Mirror rpool
- 6 OS Support
- 7 ZFS Background
- 8 ZFS Possibilities + A Poor Man's Raid
- 9 ZFS Misc Notes
- 10 Create ZFS Raidz
- 11 sd.conf disable idle park heads
- 12 ZFS Snapshots
- 13 freebsd zfs creation
- 14 zfs clone
- 15 ZFS Resources
Sending Pools Across Network Simple
Snapshot zfs snapshot eh/data@###### Initial Send zfs send eh/data@130120 | ssh 10.0.0.6 /usr/sbin/zfs recv eh/data Initial Send nc Server zfs send eh/data@130120 | pv | nc -w 20 10.0.0.6 8023 Client nc -w 120 -l 8023 | zfs recv eh/data Sync Send Server zfs send -v -i 130120 eh/data@130428 | pv | nc -w 20 10.0.0.6 8023 Client nc -w 120 -l 8023 | zfs recv eh/data
zfs sharenfs=rw,anon=uidofuseremulated share/name
ZFS Performance/Slow ZFS Pool
iostat -mX zpool iostat eh 2 zpool iostat -v
svc_t - average response time of transactions, in milliseconds
Hotplug
- http://docs.oracle.com/cd/E23824_01/html/821-1459/devconfig2-25.html
- https://blogs.oracle.com/sa/entry/hotplugging_sata_drives
cfgadm -a cfgadm -c unconfigure sata0/1 cfgadm -c configure sata0/1
Mirror rpool
- http://docs.oracle.com/cd/E19253-01/819-5461/gkdep/index.html
- http://www.nickebo.net/making-your-zfs-root-pool-a-mirror-post-installation/
- List disks
format -e c0d1 (or whatever you new disk is called)
- you probably do not need fdisk
>fdisk >y
- k?
>label >0 (SMI) >y >quit prtvtoc /dev/rdsk/c5t0d0s2 | fmthard -s - /dev/rdsk/c5t1d0s2
- Attach drive
zpool attach -f rpool c5t0d0s0 c5t1d0s0 Make sure to wait until resilver is done before rebooting.
- Check
zpool status
- Install grub bootloader on 2nd disk:
installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c5t1d0s0
OS Support
- I have been using ZFS and OpenIndiana for over a year now! Works great!!!
- I have been using ZFS and FreeBSD for a while now! Works great!!!
- I have not tested on Linux but here is a full port of it: http://zfsonlinux.org/
ZFS Background
During my research I found some of the information on ZFS confusing and have decided that it was for two reasons. The first was that it is still relatively new and people have many different questions about it. The second is that from its development it seems to have undergone a lot of changes. Even when I went on Freenode and was in the #ZFS IRC room and asking questions about it I was getting opinionated answers and no solid fact from some of the people using ZFS.
From what I have managed to gather ZFS likes and is used most with whole disks. It can be used with slices (the term for partitions in BSD and Solaris) and even files. Files to me was the most interesting because it allows one to experiment and understand ZFS without destroying anything and also allows one interesting opportunities if they see fit. Like completely ignoring performance and utilising the full sizes of disks. Like RAID ZFS can't use different size disks but takes the size of the smallest and applies that max size to all your drives. I think it is a limitation that ZFS should overcome but one step at a time I suppose.
As far as commands go try 'man zpool' etc and look at the links at the bottom of the page.
ZFS Possibilities + A Poor Man's Raid
Like I said though ZFS can use files and slices. That is I could divide drives into whatever size I want and use them how I please. With files and slices, though, one loses performance because of write caching issues which are enabled by default with whole disks but not with this method. You could have two 1 terrabyte drives and 1 500GB Drive and have 1.5 Terrabytes of usable space. That is you would use the Raid 2 level of ZFS and divide all the drives into 500GB slices or files.
Raid 5 or raidz (as ZFS calls it) Equation from 'man zpool':
A raidz group with N disks of size X with P parity disks can hold approximately (N-P)*X bytes and can withstand P device(s) failing before data integrity is compromised. The minimum number of devices in a raidz group is one more than the number of parity disks. The recommended number is between 3 and 9 to help increase performance.
Ex:
If you have 5 500GB drives or 2x 1TB drives and 1X 500GB which are divided into slices or files you would be fine if any one drive failed.
(5-2)*500GB = 1.5 Terrabytes
This has the redundancy of 2x 500gb drives. So if one terrabyte drive failed you would be fine but have no redundancy.
You could do this but currently ZFS would not perform optimal (3-5x supposedly) and it may be simpler just to by more drives. Still. From reading this one can see what is possible with ZFS. One could even build a Poor Man's Raid with this and still be safe. The only considerations are the performance of this raid. Slices seem to be easy to use if they are not part of a root ZFS filesystem and with files one has to worry about outside data corruption of the files and possible outside config issues.
The way ZFS was 'ment' to be used is with full disks. It enables HD write caching by default and is extremely simple in most cases. It is as simple as finding out your device names and doing 'zpool create mirrorname mirror devicename1 devicename2' or 'zpool create mirrorname mirror filepathname1 filepathname2' or 'zpool create raidmirror raidz file/device/slicename file/device/slicename' on any system with ZFS installed. The 'device' that ZFS creates will be located at /mirrororraidname/ on the root filesystem.
ZFS Misc Notes
In OpenSolaris format -e or just format will give you your device names. Also when using entire drives with ZFS and you are not root it will not find the devices. su and then issue the command. You should be fine.
ZFS also has its own share maker. That is the zfs command creates your NFS or CIFS shares...look it up.
Create ZFS Raidz
Basics
- First find the names of your disks:
format AVAILABLE DISK SELECTIONS: 0. c5t0d0 <ATA-ST920217AS-3.01 cyl 2429 alt 2 hd 255 sec 63> /pci@0,0/pci8086,202d@1f,2/disk@0,0 1. c7t50014EE2B1448CFEd0 <ATA-WDCWD20EARX-00P-AB51 cyl 60798 alt 2 hd 255 sec 252> /scsi_vhci/disk@g50014ee2b1448cfe 2. c7t50014EE25C1A7300d0 <ATA-WDCWD20EARX-00P-AB51 cyl 60798 alt 2 hd 255 sec 252> /scsi_vhci/disk@g50014ee25c1a7300 3. c7t50014EE25CC0D3EEd0 <ATA-WDCWD20EARX-32P-AB51 cyl 60798 alt 2 hd 255 sec 252> /scsi_vhci/disk@g50014ee25cc0d3ee 4. c7t50014EE20699B771d0 <ATA-WDCWD20EARX-00P-AB51 cyl 60798 alt 2 hd 255 sec 252> /scsi_vhci/disk@g50014ee20699b771
- No redundancy
zpool create tank raidz c7t50014EE2B1448CFEd0 c7t50014EE25C1A7300d0 c7t50014EE25CC0D3EEd0 c7t50014EE20699B771d0
- 1 disk
zpool create tank raidz1 c7t50014EE2B1448CFEd0 c7t50014EE25C1A7300d0 c7t50014EE25CC0D3EEd0 c7t50014EE20699B771d0
- 2 disk
zpool create tank raidz2 c7t50014EE2B1448CFEd0 c7t50014EE25C1A7300d0 c7t50014EE25CC0D3EEd0 c7t50014EE20699B771d0
- etc
Everyone is recommending raidz2 now a days. I dunno, I guess if you do not backup, that is the way to go. But do keep in mind, drives do fail. Really, they do. I think the big issue is on a rebuild if another drive goes bad. But then what if all the disks go bad? See?
Advanced Format Drives and the Future
So new drives use 4k sectors now to save on allocation of ECC data on a physical disk.
To make a long story short[1], if your system truly recognizes 4k disks, and all your disks are 4k, zfs should do the rest of the work.
But lets say you want to future proof everything (4k sizes on 512 byte disks, no performance impact) or have a mixed array that you plan to grow and eventually replace with all 4k disks. You are going to need to manually add the ashift parm do the create line (You cannot change your ashift later).
This value is actually a bit shift value, so an ashift value for 512 bytes is 9 (29 = 512) while the ashift value for 4,096 bytes is 12 (212 = 4,096). To force the pool to use 4,096 byte sectors we must specify this at pool creation time[2]:
zpool create -o ashift=12 tank raidz1 diskbla bladisk whatever
You can see what your drives are reporting with[3]:
echo ::sd_state | mdb -k | egrep '(^un|_blocksize)'
The unit number (un) corresponds to the SCSI driver (sd) instance number. In the above example, "un 0" is also known as "sd0." Sizes are in bytes, written as hex: 0x200 = 512, 0x1000 = 4096.
To inspect the ashift value actually used by ZFS for a particular pool, you can use zdb (without parameters it dumps labels of imported pools), for example:
zdb | egrep 'ashift| name' name='pond' ashift=9 ashift=9 ashift=9 ashift=9 ashift=9 ashift=9 ashift=9 ashift=9 ashift=9 name='rpool' ashift=9 name='temp' ashift=9
Note that each pool can consist of one or several Top-level VDEVs, and each of those can have an individual composition of devices, i.e. not only mixing of mirrors and raidzN sets is possible, but also of devices with different hardware sector size and thus ashift as set at TLVDEV creation time. However, in order to avoid unbalanced IO and "unpleasant surprises" which might be difficult to explain and debug, it is discouraged to build pools from such mixtures.
Advanced Format Drives and the Future with Openindiana and sd.conf
So openindiana or illumos-gate does not accept the ashift value. Instead you have to edit /kernel/drv/sd.conf to force the driver. I am telling the system to treat my 512 byte disks as 4k so in the future I am happy.
This all comes from: http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks starting @ Overriding the Physical Block Size
Ways to get the drive info:
format -e
Select disk
inquiry
Vendor: ATA Product: SAMSUNG HD204UI Revision: 0001
Or
iostat -Er
sd6 ,Soft Errors: 0 ,Hard Errors: 0 ,Transport Errors: 0 Vendor: ATA ,Product: SAMSUNG HD204UI ,Revision: 0001 ,Serial No: S2H7J1CB112293 Size: 2000.40GB <2000398934016 bytes>
Or
echo "::walk sd_state | ::grep '.!=0' | ::print struct sd_lun un_sd | \ ::print struct scsi_device sd_inq | ::print struct scsi_inquiry \ inq_vid inq_pid" | mdb -k
inq_vid = [ "ATA " ] inq_pid = [ "SAMSUNG HD204UI " ]
So now that we have some information we have to edit sd.conf:
nano /kernel/drv/sd.conf
sd-config-list = "ATA SAMSUNG HD204UI", "physical-block-size:4096";
Another example:
sd-config-list = "SEAGATE ST3300657SS", "physical-block-size:4096", "DGC RAID", "physical-block-size:4096", "NETAPP LUN", "physical-block-size:4096";
If this[4] is still right the format of sd-config-list:
Padding is actually not necessary. Extra spaces (not tabs!) are ignored.
Asterisks have special meaning if they appear at the beginning and end of the ID string only. sd searches the product ID (not the vendor ID) for the substring enclosed by asterisks. Examples:
"*ST9320*" will match any vendor ID and product ID "ST9320423AS". "**" will match any vendor ID and product ID.
The ATA in my disk string had a bunch of spaces after it, it also had spaces after the model number. Looks like it does not matter.
Then:
update_drv -vf sd
Cannot unload module: sd Will be unloaded upon reboot. Forcing update of sd.conf sd.conf updated in the kernel.
[5]Don’t worry about the errors we only care that the sd.conf was re-read by the kernel. Unfortunately this new setting will not take effect as long as the current disk is attached so we must force it to unattach and reattach:
Match up your devices:
format output:
3. c2t50024E90049754DAd0 <ATA-SAMSUNGHD204UI-0001 cyl 60798 alt 2 hd 255 sec 252> /scsi_vhci/disk@g50024e90049754da 4. c2t50024E90049754DBd0 <ATA-SAMSUNGHD204UI-0001 cyl 60798 alt 2 hd 255 sec 252> /scsi_vhci/disk@g50024e90049754db 5. c2t50024E900497550Bd0 <ATA-SAMSUNGHD204UI-0001 cyl 60798 alt 2 hd 255 sec 252> /scsi_vhci/disk@g50024e900497550b 6. c2t50024E9004975524d0 <ATA-SAMSUNGHD204UI-0001 cyl 60798 alt 2 hd 255 sec 252> /scsi_vhci/disk@g50024e9004975524
cfgadm output:
c0 scsi-sas connected configured unknown c0::w50024e90049754da,0 disk-path connected configured unknown c1 scsi-sas connected configured unknown c1::w50024e900497550b,0 disk-path connected configured unknown c3 scsi-sas connected configured unknown c3::w50024e9004975524,0 disk-path connected configured unknown c4 scsi-sas connected configured unknown c4::w50014ee65ac710bc,0 disk-path connected configured unknown c5 scsi-sas connected configured unknown c5::w50014ee65ac68d37,0 disk-path connected configured unknown c6 scsi-sas connected unconfigured unknown c7 scsi-sas connected configured unknown c7::w50024e90049754db,0 disk-path connected configured unknown c8 scsi-sas connected configured unknown c8::w50014ee20c18549c,0 disk-path connected configured unknown
Looks like I need to reconfigure:
c0::w50024e90049754da,0 c1::w50024e900497550b,0 c3::w50024e9004975524,0 c7::w50024e90049754db,0
cfgadm -al
cfgadm -c unconfigure c0::w50024e90049754da,0 cfgadm -c configure c0::w50024e90049754da,0
Do all the ones that changed.
This DOES NOT work to Check:
echo ::sd_state | mdb -k | egrep '(^un|_blocksize)'
NO NO NO
I had to add the disks to a pool and use zdb -l on the disk. Same end step written here: http://blog.delphix.com/gwilson/2012/11/15/4k-sectors-and-zfs/
So the sd.conf modification does this:
22:34 < tsoome> it will set sd driver options, so when you do ioctl( DIOCGMEDIASIZE), you get this value 22:34 < tsoome> or DIOCGMEDIASIZEEXT
sd.conf disable idle park heads
You can also stop the 'green drives' from parking heads:
nano /kernel/drv/sd.conf
sd-config-list = "SEAGATE ST3300657SS", "power-condition:false", "SEAGATE ST2000NM0001", "power-condition:false";
Autoexpand
It looks like you can enable and disable this one after creation[6]:
zpool set autoexpand=on tank
But on creation:
zpool create -o autoexpand=on tank c1t13d0
Case Sensitive
You many want to look at the casesensitivity property. Windows file systems are insensitive.
https://docs.oracle.com/cd/E19120-01/open.solaris/817-2271/gftgr/index.html
Other Cool Options
listsnaps=on Controls whether information about snapshots associated with this pool is output when "zfs list" is run without the -t option. The default value is off.
An Example Of A Final Command
illumos-gate does not support ashift values btw. See: http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks and look for Overriding the Physical Block Size
zpool create -o ashift=12 -o autoexpand=on -o listsnaps=on data raidz1 c2t50014EE20C18549Cd0 c2t50024E90049754DAd0 c2t50024E90049754DBd0 c2t50024E900497550Bd0 c2t50024E9004975524d0
ZFS Snapshots
You can make snapshots to backup information. You can destroy them and hold them and send them across the network to a zfs system.
Rolling Back
To discard all changes made since a snapshot was taken and revert the filesystem back to its state at the time the snapshot was taken:
# zfs rollback <snapshot_to_roll_back_to>
# zfs rollback test_pool/fs1@monday
Note: if the filesystem you want to rollback is currently mounted, you will need to unmount it and remount it. Use -f to force unmount.
You can only rollback to the most recent snapshot. If you want to rollback to an earlier snapshot, either delete the snapshots in between or use the -r option.
# zfs rollback test_pool/fs1@monday cannot rollback to ’test_pool/fs1@monday’: more recent snapshots exist use ’-r’ to force deletion of the following snapshots: test_pool/fs1@tuesday test_pool/fs1@wednesday
# zfs rollback -r test_pool/fs1@monday
Basics
Below text copied from: http://docs.oracle.com/cd/E19253-01/819-5461/gbcya/index.html
Creating and Destroying ZFS Snapshots
Snapshots are created by using the zfs snapshot command, which takes as its only argument the name of the snapshot to create. The snapshot name is specified as follows:
filesystem@snapname volume@snapname
The snapshot name must satisfy the naming requirements in ZFS Component Naming Requirements.
In the following example, a snapshot of tank/home/ahrens that is named friday is created.
# zfs snapshot tank/home/ahrens@friday
You can create snapshots for all descendent file systems by using the -r option. For example:
# zfs snapshot -r tank/home@now # zfs list -t snapshot NAME USED AVAIL REFER MOUNTPOINT rpool/ROOT/zfs2BE@zfs2BE 78.3M - 4.53G - tank/home@now 0 - 26K - tank/home/ahrens@now 0 - 259M - tank/home/anne@now 0 - 156M - tank/home/bob@now 0 - 156M - tank/home/cindys@now 0 - 104M -
Snapshots have no modifiable properties. Nor can dataset properties be applied to a snapshot. For example:
# zfs set compression=on tank/home/ahrens@now cannot set compression property for 'tank/home/ahrens@now': snapshot properties cannot be modified
Snapshots are destroyed by using the zfs destroy command. For example:
# zfs destroy tank/home/ahrens@now
A dataset cannot be destroyed if snapshots of the dataset exist. For example:
# zfs destroy tank/home/ahrens cannot destroy 'tank/home/ahrens': filesystem has children use '-r' to destroy the following datasets: tank/home/ahrens@tuesday tank/home/ahrens@wednesday tank/home/ahrens@thursday
In addition, if clones have been created from a snapshot, then they must be destroyed before the snapshot can be destroyed.
For more information about the destroy subcommand, see Destroying a ZFS File System.
Replication
Problem?
I was having a problem with receive through ssh. If you receive error such as zfs command not found, the solution is that you write full path of zfs in the remote size.
zfs send DV/Sn03 | ssh reski@zfsserver /usr/sbin/zfs receive DV/Sn03
- I always run the send and receive commands in screen.
How
Below text copied from: http://www.markround.com/archives/38-ZFS-Replication.html
ZFS Replication Sysadmin
As I've been investigating ZFS for use on production systems, I've been making a great deal of notes, and jotting down little "cookbook recipies" for various tasks. One of the coolest systems I've created recently utilised the zfs send & receive commands, along with incremental snapshots to create a replicated ZFS environment across two different systems. True, all this is present in the zfs manual page, but sometimes a quick demonstration makes things easier to understand and follow.
While this isn't true filesystem replication (you'd have to look at something like StorageTek AVS for that) it does provide periodic snapshots and incremental updates; these can be run every minute if you're driving this from cron - or, at even more granular intervals if you write your own daemon. Nonetheless, this suffices for disaster recovery and redundancy if you don't need up-to-the second replication between systems.
I've typed up my notes in blog format so you can follow along with this example yourself, all you'll need is a Solaris system running ZFS. Read more for the full demonstration...
First, as with my last walkthrough, I'll create a couple of files to use for testing purposes. In a real-life scenario, these would most likely be pools of disks in a RAIDZ configuration, and the two pools would also be on physically separate systems. I'm only using 100Mb files for each, as that's all I need for this proof of concept.
[root@solaris]$ mkfile 100m master [root@solaris]$ mkfile 100m slave [root@solaris]$ zpool create master $PWD/master [root@solaris]$ zpool create slave $PWD/slave [root@solaris]$ zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT master 95.5M 84.5K 95.4M 0% ONLINE - slave 95.5M 52.5K 95.4M 0% ONLINE - [root@solaris]$ zfs list NAME USED AVAIL REFER MOUNTPOINT master 77K 63.4M 24.5K /master slave 52.5K 63.4M 1.50K /slave
There we go. The naming should be pretty self-explanatory : The "master" is the primary storage pool, which will replicate and push data through to the backup "slave" pool.
Now, I'll create a ZFS filesystem and add something to it. I had a few source tarballs knocking around, so I just unpacked one (GNU grep) to give me a set of files to use as a test :
[root@solaris]$ zpool create master/data [root@solaris]$ cd /master/data/ [root@solaris]$ gtar xzf ~/grep-2.5.1.tar.gz [root@solaris]$ ls grep-2.5.1
We can also see from "zfs list" we've now taken up some space :
[root@solaris]$ zfs list NAME USED AVAIL REFER MOUNTPOINT master 3.24M 60.3M 25.5K /master master/data 3.15M 60.3M 3.15M /master/data slave 75.5K 63.4M 24.5K /slave
Now, we'll transfer all this over to the "slave", and start the replication going. We first need to take an initial snapshot of the filesystem, as that's what "zfs send" works on. It's also worth noting here that in order to transfer the data to the slave, I simply piped it to "zfs receive". If you're doing this between two physically separate systems, you'd most likely just pipe this through SSH between the systems and set up keys to avoid the need for passwords. Anyway, enough talk :
[root@solaris]$ zfs snapshot master/data@1 [root@solaris]$ zfs send master/data@1 | zfs receive slave/data
This now sent it through to the slave. It's also worth pointing out that I didn't have to recreate the exact same pool or zfs structure on the slave (which may be useful if you are replicating between dissimilar systems), but I chose to keep the filesystem layout the same for the sake of legibility in this example. I also simply used a numeric identifier for each snapshot; in a production system, timestamps may be more appropriate.
Anyway, let's take a quick look at "zfs list", where we'll see the slave has now gained a snapshot utilising exactly the same amount of space as the master :
[root@solaris]$ zfs list NAME USED AVAIL REFER MOUNTPOINT master 3.25M 60.3M 25.5K /master master/data 3.15M 60.3M 3.15M /master/data master/data@1 0 - 3.15M - slave 3.25M 60.3M 24.5K /slave slave/data 3.15M 60.3M 3.15M /slave/data slave/data@1 0 - 3.15M -
Now, here comes a big "gotcha". You now have to set the "readonly" attribute on the slave. I discovered that if this was not set, even just cd-ing into the slave's mountpoints would cause things to break in subsequent replication operations; presumably down to metadata (access times and the like) being altered.
[root@solaris]$ zfs set readonly=on slave/data
- You may want to grant full permissions to username with this command (http://docs.oracle.com/cd/E19082-01/817-2271/gbchv/index.html)
zfs allow username send,receive,clone,create,destroy,hold,mount,promote,rename,rollback,share,snapshot tank
So, let's look in the slave to see if our files are there :
[root@solaris]$ ls /slave/data grep-2.5.1
Excellent stuff! However, the real coolness starts with the incremental transfers - instead of transferring the whole lot again, we can just send only the bits of data that actually changed - this will drastically reduce bandwidth and the time taken to replicate data, making a "cron" based system of periodic snapshots and transfers feasable. To demonstrate this, I'll unpack another tarball (this time, GNU bison) on the master so I have some more data to send :
[root@solaris]$ cd /master/data [root@solaris]$ gtar xzf ~/bison-2.3.tar.gz
And we'll now make a second snapshot, and transfer differences between this one and the last :
[root@solaris]$ zfs snapshot master/data@2 [root@solaris]$ zfs send -i master/data@1 master/data@2 | zfs receive slave/data
Checking to see what's happened, we see the slave has gained another snapshot:
[root@solaris]$ zfs list NAME USED AVAIL REFER MOUNTPOINT master 10.2M 53.3M 25.5K /master master/data 10.1M 53.3M 10.1M /master/data master/data@1 32.5K - 3.15M - master/data@2 0 - 10.1M - slave 10.2M 53.3M 25.5K /slave slave/data 10.1M 53.3M 10.1M /slave/data slave/data@1 32.5K - 3.15M - slave/data@2 0 - 10.1M -
And our new data is now there as well :
[root@solaris]$ ls /slave/data/ bison-2.3 grep-2.5.1
And that's it. All that remains to turn this into a production system between two hosts is for a periodic cron job to be written that runs at the appropriate intervals (daily, or even every minute if need be) and snapshots the filesystem before transferring it. You'll also likely want to have another job that clears out old snapshots, or maybe archives them off somewhere.
- Enable Sharing and set sharename
zfs set sharesmb=name=myshare yourpool/shares/bob
- Share Filesystem
zfs set sharesmb=on fsname
- Check if shared
sharemgr show -vp
- Enable "pam_smb_passwd" to make regular OpenIndiana users have smb passwords. To do so, add the following line to the end of the file "/etc/pam.conf":
other password required pam_smb_passwd.so.1 nowarn
Quick Reference
- Initial Send
zfs send tank/data@blabla | ssh remoteserver /usr/sbin/zfs recv tank/data
zfs list -t snapshot zfs snapshot bla/bla@?????? zfs send -v -i bla/bla@?????? bla/bla@?????? | ssh 10.0.0.6 zfs recv bla/bla/blackhole0 ( zfs send -v -i bla/bla@older bla/bla@newer | ssh 10.0.0.6 zfs recv bla/bla/blackhole0 )
freebsd zfs creation
- to list drives
atacontrol list
or
camcontrol devlist
- identify sector size
camcontrol identify ada2 etc (the rest of your drives)
- If it is 4096 create nop devices, to make zfs use 4096 sized sectors
gnop create -S 4096 /dev/ada2
etc (the rest of your drives)
- create raidz pool
zfs create tank raidz ada2.nop ada3.nop
- create raid0 pool
zfs create tank ada2.nop ada3.nop
- zfs vs zpool
- export zpool
zpool export data gnop destroy /dev/ada2.nop /dev/ada3.nop zpool import data
You can check the configuration of the pool by using the "zdb" command on the pool:
zdb -C data | grep ashift
The ashift should be "12" for 4K alignment. This works because ZFS writes the ashift value in its metadata
URL Reference
- http://www.mailpile.is/
- https://hakshop.myshopify.com/products/wifi-pineapple
- http://www.i-programmer.info/news/105-artificial-intelligence/6197-anonymouth-hides-identity.html
- https://lavabit.com/
- http://ivoras.sharanet.org/blog/tree/2011-01-01.freebsd-on-4k-sector-drives.html (Section 3)
zfs clone
Clones can only be created from snapshots. Snapshots can't be deleted until you have delete the clone created from it. They are read only until you make them read.
ZFS Resources
[1] - Great for beginners. Lets one understand how to use ZFS without having any spare hard drives.
[2] - Turning a ZFS mirror into a raidz array.
[3] - Official ZFS Admin Guide Link
[4] - ZFS Command Quick Reference
[5] - ZFS Best Practices Guide
[6] - Open Solaris Mailing Lists
[7] - Understanding the zpool status Output
File:ZFS Command Quick Reference.odt - Sun ZFS Command Quick Reference
File:819-5461.pdf - Solaris ZFS Administration Guide
[8] - Shrink rpool
[9] - Fun with ZFS send and receive
[10] - ZFS Send/Rec
[11] - ZFS Rollback Forensics- ↑ http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks
- ↑ http://zfsonlinux.org/faq.html
- ↑ http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks
- ↑ https://www.illumos.org/issues/2665
- ↑ http://blog.delphix.com/gwilson/2012/11/15/4k-sectors-and-zfs/
- ↑ https://docs.oracle.com/cd/E19253-01/819-5461/githb/index.html