Difference between revisions of "ZFS"

From Hack Sphere Labs Wiki
Jump to: navigation, search
(Replication)
(freebsd zfs creation)
 
(48 intermediate revisions by the same user not shown)
Line 7: Line 7:
 
*Nexenta (This OS claims to be the Solaris kernel with the Ubuntu (Linux) userland which seems nice but it has no window manager by default)
 
*Nexenta (This OS claims to be the Solaris kernel with the Ubuntu (Linux) userland which seems nice but it has no window manager by default)
 
*StormOS (This OS is the same as Nexenta but has GNOME installed by default)
 
*StormOS (This OS is the same as Nexenta but has GNOME installed by default)
 +
 +
=Sending Pools Across Network Simple=
 +
<pre>
 +
 +
Snapshot
 +
zfs snapshot eh/data@######
 +
 +
Initial Send
 +
zfs send eh/data@130120 | ssh 10.0.0.6 /usr/sbin/zfs recv eh/data
 +
 +
Initial Send nc
 +
Server
 +
zfs send eh/data@130120 | pv | nc -w 20 10.0.0.6 8023
 +
Client
 +
nc -w 120 -l 8023 | zfs recv eh/data
 +
 +
Sync Send
 +
Server
 +
zfs send -v -i 130120 eh/data@130428 | pv | nc -w 20 10.0.0.6 8023
 +
 +
Client
 +
nc -w 120 -l 8023 | zfs recv eh/data
 +
 +
</pre>
 +
 +
=NFS Specific Share Options=
 +
*http://docs.oracle.com/cd/E26502_01/html/E28997/rfsrefer-13.html#rfsrefer-103
 +
zfs sharenfs=rw,anon=uidofuseremulated share/name
 +
 +
=ZFS Performance/Slow ZFS Pool=
 +
iostat -mX
 +
zpool iostat eh 2
 +
zpool iostat -v
 +
 +
svc_t - average response time  of  transactions,  in  milliseconds
 +
 +
*http://docs.oracle.com/cd/E19253-01/819-5461/gammt/index.html
 +
 +
=Hotplug=
 +
*http://docs.oracle.com/cd/E23824_01/html/821-1459/devconfig2-25.html
 +
*https://blogs.oracle.com/sa/entry/hotplugging_sata_drives
 +
 +
cfgadm -a
 +
cfgadm -c unconfigure sata0/1
 +
cfgadm -c configure sata0/1
 +
 +
=Mirror rpool=
 +
*http://docs.oracle.com/cd/E19253-01/819-5461/gkdep/index.html
 +
*http://www.nickebo.net/making-your-zfs-root-pool-a-mirror-post-installation/
 +
 +
*List disks
 +
format -e c0d1 (or whatever you new disk is called)
 +
*you probably do not need fdisk
 +
>fdisk
 +
>y
 +
*k?
 +
>label
 +
>0 (SMI)
 +
>y
 +
>quit
 +
prtvtoc /dev/rdsk/c5t0d0s2 | fmthard -s - /dev/rdsk/c5t1d0s2
 +
 +
*Attach drive
 +
zpool attach -f rpool c5t0d0s0 c5t1d0s0
 +
Make sure to wait until resilver is done before rebooting.
 +
 +
*Check
 +
zpool status
 +
 +
*Install grub bootloader on 2nd disk:
 +
installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c5t1d0s0
  
 
=OS Support=
 
=OS Support=
Line 50: Line 121:
  
 
ZFS also has its own share maker.  That is the zfs command creates your NFS or CIFS shares...look it up.
 
ZFS also has its own share maker.  That is the zfs command creates your NFS or CIFS shares...look it up.
 +
 +
=Create ZFS Raidz=
 +
 +
 +
==Basics==
 +
*First find the names of your disks:
 +
 +
format
 +
AVAILABLE DISK SELECTIONS:
 +
      0. c5t0d0 <ATA-ST920217AS-3.01 cyl 2429 alt 2 hd 255 sec 63>
 +
          /pci@0,0/pci8086,202d@1f,2/disk@0,0
 +
      1. c7t50014EE2B1448CFEd0 <ATA-WDCWD20EARX-00P-AB51 cyl 60798 alt 2 hd 255 sec 252>
 +
          /scsi_vhci/disk@g50014ee2b1448cfe
 +
      2. c7t50014EE25C1A7300d0 <ATA-WDCWD20EARX-00P-AB51 cyl 60798 alt 2 hd 255 sec 252>
 +
          /scsi_vhci/disk@g50014ee25c1a7300
 +
      3. c7t50014EE25CC0D3EEd0 <ATA-WDCWD20EARX-32P-AB51 cyl 60798 alt 2 hd 255 sec 252>
 +
          /scsi_vhci/disk@g50014ee25cc0d3ee
 +
      4. c7t50014EE20699B771d0 <ATA-WDCWD20EARX-00P-AB51 cyl 60798 alt 2 hd 255 sec 252>
 +
          /scsi_vhci/disk@g50014ee20699b771
 +
 +
*No redundancy
 +
zpool create tank raidz c7t50014EE2B1448CFEd0 c7t50014EE25C1A7300d0 c7t50014EE25CC0D3EEd0 c7t50014EE20699B771d0
 +
*1 disk
 +
zpool create tank raidz1 c7t50014EE2B1448CFEd0 c7t50014EE25C1A7300d0 c7t50014EE25CC0D3EEd0 c7t50014EE20699B771d0
 +
*2 disk
 +
zpool create tank raidz2 c7t50014EE2B1448CFEd0 c7t50014EE25C1A7300d0 c7t50014EE25CC0D3EEd0 c7t50014EE20699B771d0
 +
*etc
 +
 +
Everyone is recommending raidz2 now a days.  I dunno, I guess if you do not backup, that is the way to go.  But do keep in mind, drives do fail.  Really, they do.  I think the big issue is on a rebuild if another drive goes bad.  But then what if all the disks go bad?  See?
 +
 +
==Advanced Format Drives and the Future==
 +
So new drives use 4k sectors now to save on allocation of ECC data on a physical disk. 
 +
 +
To make a long story short<ref>http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks</ref>, if your system truly recognizes 4k disks, and all your disks are 4k, zfs should do the rest of the work.
 +
 +
But lets say you want to future proof everything (4k sizes on 512 byte disks, no performance impact) or have a mixed array that you plan to grow and eventually replace with all 4k disks.  You are going to need to manually add the ashift parm do the create line (You cannot change your ashift later).
 +
 +
This value is actually a bit shift value, so an ashift value for 512 bytes is 9 (29 = 512) while the ashift value for 4,096 bytes is 12 (212 = 4,096). To force the pool to use 4,096 byte sectors we must specify this at pool creation time<ref>http://zfsonlinux.org/faq.html</ref>:
 +
 +
zpool create -o ashift=12 tank raidz1 diskbla bladisk whatever
 +
 +
You can see what your drives are reporting with<ref>http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks</ref>:
 +
 +
echo ::sd_state | mdb -k | egrep '(^un|_blocksize)'
 +
 +
The unit number (un) corresponds to the SCSI driver (sd) instance number. In the above example, "un 0" is also known as "sd0." Sizes are in bytes, written as hex: 0x200 = 512, 0x1000 = 4096.
 +
 +
To inspect the ashift value actually used by ZFS for a particular pool, you can use zdb (without parameters it dumps labels of imported pools), for example:
 +
 +
<pre>
 +
zdb | egrep 'ashift| name'
 +
    name='pond'
 +
                ashift=9
 +
                ashift=9
 +
                ashift=9
 +
                ashift=9
 +
                ashift=9
 +
                ashift=9
 +
                ashift=9
 +
                ashift=9
 +
                ashift=9
 +
    name='rpool'
 +
                ashift=9
 +
    name='temp'
 +
                ashift=9
 +
</pre>
 +
 +
Note that each pool can consist of one or several Top-level VDEVs, and each of those can have an individual composition of devices, i.e. not only mixing of mirrors and raidzN sets is possible, but also of devices with different hardware sector size and thus ashift as set at TLVDEV creation time. However, in order to avoid unbalanced IO and "unpleasant surprises" which might be difficult to explain and debug, it is discouraged to build pools from such mixtures.
 +
 +
===Advanced Format Drives and the Future with Openindiana and sd.conf===
 +
So openindiana or illumos-gate does not accept the ashift value.  Instead you have to edit /kernel/drv/sd.conf to force the driver.  I am telling the system to treat my 512 byte disks as 4k so in the future I am happy.
 +
 +
This all comes from:  http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks starting @ Overriding the Physical Block Size
 +
 +
Ways to get the drive info:
 +
 +
format -e
 +
Select disk
 +
inquiry
 +
 +
Vendor:  ATA   
 +
Product:  SAMSUNG HD204UI
 +
Revision: 0001
 +
Or
 +
iostat -Er
 +
 +
sd6      ,Soft Errors: 0 ,Hard Errors: 0 ,Transport Errors: 0
 +
Vendor: ATA      ,Product: SAMSUNG HD204UI  ,Revision: 0001 ,Serial No: S2H7J1CB112293 
 +
Size: 2000.40GB <2000398934016 bytes>
 +
 +
Or
 +
echo "::walk sd_state | ::grep '.!=0' | ::print struct sd_lun un_sd | \
 +
  ::print struct scsi_device sd_inq | ::print struct scsi_inquiry \
 +
  inq_vid inq_pid" | mdb -k
 +
 +
inq_vid = [ "ATA    " ]
 +
inq_pid = [ "SAMSUNG HD204UI " ]
 +
 +
So now that we have some information we have to edit sd.conf:
 +
 +
nano /kernel/drv/sd.conf
 +
 +
sd-config-list =
 +
        "ATA SAMSUNG HD204UI", "physical-block-size:4096";
 +
 +
Another example:
 +
 +
sd-config-list =
 +
        "SEAGATE ST3300657SS", "physical-block-size:4096",
 +
        "DGC    RAID", "physical-block-size:4096",
 +
        "NETAPP  LUN", "physical-block-size:4096";
 +
 +
If this<ref>https://www.illumos.org/issues/2665</ref> is still right the format of sd-config-list:
 +
 +
Padding is actually not necessary. Extra spaces (not tabs!) are ignored.
 +
 +
Asterisks have special meaning if they appear at the beginning and end of the ID string only. sd searches the product ID (not the vendor ID) for the substring enclosed by asterisks.
 +
Examples:
 +
 +
    "*ST9320*" will match any vendor ID and product ID "ST9320423AS".
 +
    "**" will match any vendor ID and product ID.
 +
 +
The ATA in my disk string had a bunch of spaces after it, it also had spaces after the model number.  Looks like it does not matter.
 +
 +
 +
Then:
 +
update_drv -vf sd
 +
 +
Cannot unload module: sd
 +
Will be unloaded upon reboot.
 +
Forcing update of sd.conf
 +
sd.conf updated in the kernel.
 +
 +
<ref>http://blog.delphix.com/gwilson/2012/11/15/4k-sectors-and-zfs/</ref>Don’t worry about the errors we only care that the sd.conf was re-read by the kernel. Unfortunately this new setting will not take effect as long as the current disk is attached so we must force it to unattach and reattach:
 +
 +
Match up your devices:
 +
 +
format output:
 +
<pre>
 +
      3. c2t50024E90049754DAd0 <ATA-SAMSUNGHD204UI-0001 cyl 60798 alt 2 hd 255 sec 252>
 +
          /scsi_vhci/disk@g50024e90049754da
 +
      4. c2t50024E90049754DBd0 <ATA-SAMSUNGHD204UI-0001 cyl 60798 alt 2 hd 255 sec 252>
 +
          /scsi_vhci/disk@g50024e90049754db
 +
      5. c2t50024E900497550Bd0 <ATA-SAMSUNGHD204UI-0001 cyl 60798 alt 2 hd 255 sec 252>
 +
          /scsi_vhci/disk@g50024e900497550b
 +
      6. c2t50024E9004975524d0 <ATA-SAMSUNGHD204UI-0001 cyl 60798 alt 2 hd 255 sec 252>
 +
          /scsi_vhci/disk@g50024e9004975524
 +
</pre>
 +
 +
cfgadm output:
 +
 +
<pre>
 +
c0                            scsi-sas    connected    configured  unknown
 +
c0::w50024e90049754da,0        disk-path    connected    configured  unknown
 +
c1                            scsi-sas    connected    configured  unknown
 +
c1::w50024e900497550b,0        disk-path    connected    configured  unknown
 +
c3                            scsi-sas    connected    configured  unknown
 +
c3::w50024e9004975524,0        disk-path    connected    configured  unknown
 +
c4                            scsi-sas    connected    configured  unknown
 +
c4::w50014ee65ac710bc,0        disk-path    connected    configured  unknown
 +
c5                            scsi-sas    connected    configured  unknown
 +
c5::w50014ee65ac68d37,0        disk-path    connected    configured  unknown
 +
c6                            scsi-sas    connected    unconfigured unknown
 +
c7                            scsi-sas    connected    configured  unknown
 +
c7::w50024e90049754db,0        disk-path    connected    configured  unknown
 +
c8                            scsi-sas    connected    configured  unknown
 +
c8::w50014ee20c18549c,0        disk-path    connected    configured  unknown
 +
</pre>
 +
 +
Looks like I need to reconfigure:
 +
 +
c0::w50024e90049754da,0
 +
c1::w50024e900497550b,0
 +
c3::w50024e9004975524,0
 +
c7::w50024e90049754db,0
 +
 +
cfgadm -al
 +
 +
cfgadm -c unconfigure c0::w50024e90049754da,0
 +
cfgadm -c configure c0::w50024e90049754da,0
 +
 +
Do all the ones that changed.
 +
 +
 +
This DOES NOT work to Check:
 +
 +
echo ::sd_state | mdb -k | egrep '(^un|_blocksize)'
 +
 +
NO NO NO
 +
 +
I had to add the disks to a pool and use zdb -l on the disk.  Same end step written here:  http://blog.delphix.com/gwilson/2012/11/15/4k-sectors-and-zfs/
 +
 +
So the sd.conf modification does this:
 +
 +
22:34 < tsoome> it will set sd driver options, so when you do ioctl( DIOCGMEDIASIZE), you get this value
 +
22:34 < tsoome> or DIOCGMEDIASIZEEXT
 +
 +
=sd.conf disable idle park heads=
 +
 +
You can also stop the 'green drives' from parking heads:
 +
 +
nano /kernel/drv/sd.conf
 +
 +
sd-config-list = "SEAGATE ST3300657SS", "power-condition:false",
 +
                "SEAGATE ST2000NM0001", "power-condition:false";
 +
 +
==Autoexpand==
 +
It looks like you can enable and disable this one after creation<ref>https://docs.oracle.com/cd/E19253-01/819-5461/githb/index.html</ref>:
 +
 +
zpool set autoexpand=on tank
 +
 +
But on creation:
 +
 +
zpool create -o autoexpand=on tank c1t13d0
 +
 +
==Case Sensitive==
 +
You many want to look at the casesensitivity property.  Windows file systems are insensitive.
 +
 +
https://docs.oracle.com/cd/E19120-01/open.solaris/817-2271/gftgr/index.html
 +
 +
==Other Cool Options==
 +
listsnaps=on
 +
      Controls whether information about snapshots associated with this
 +
      pool is output when "zfs list" is run without the -t option. The
 +
      default value is off.
 +
 +
==An Example Of A Final Command==
 +
 +
illumos-gate does not support ashift values btw. See:  http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks and look for Overriding the Physical Block Size
 +
 +
zpool create -o ashift=12 -o autoexpand=on -o listsnaps=on data raidz1 c2t50014EE20C18549Cd0 c2t50024E90049754DAd0 c2t50024E90049754DBd0 c2t50024E900497550Bd0 c2t50024E9004975524d0
  
 
=ZFS Snapshots=
 
=ZFS Snapshots=
Line 130: Line 432:
  
 
==Replication==
 
==Replication==
 +
===Problem?===
 +
*http://hobby.keluargareski.net/tag/zfs/
 +
I was having a problem with receive through ssh.  If you receive error such as zfs command not found, the solution is that you write full path of zfs in the remote size.
 +
zfs send DV/Sn03 | ssh reski@zfsserver /usr/sbin/zfs receive DV/Sn03
 +
 +
*I always run the send and receive commands in screen.
  
 +
===How===
 
Below text copied from:  http://www.markround.com/archives/38-ZFS-Replication.html
 
Below text copied from:  http://www.markround.com/archives/38-ZFS-Replication.html
  
Line 196: Line 505:
  
 
     [root@solaris]$ zfs set readonly=on slave/data
 
     [root@solaris]$ zfs set readonly=on slave/data
 +
*You may want to grant full permissions to username with this command (http://docs.oracle.com/cd/E19082-01/817-2271/gbchv/index.html)
 +
zfs allow username send,receive,clone,create,destroy,hold,mount,promote,rename,rollback,share,snapshot tank
  
 
So, let's look in the slave to see if our files are there :
 
So, let's look in the slave to see if our files are there :
Line 232: Line 543:
 
And that's it. All that remains to turn this into a production system between two hosts is for a periodic cron job to be written that runs at the appropriate intervals (daily, or even every minute if need be) and snapshots the filesystem before transferring it. You'll also likely want to have another job that clears out old snapshots, or maybe archives them off somewhere.
 
And that's it. All that remains to turn this into a production system between two hosts is for a periodic cron job to be written that runs at the appropriate intervals (daily, or even every minute if need be) and snapshots the filesystem before transferring it. You'll also likely want to have another job that clears out old snapshots, or maybe archives them off somewhere.
  
 +
==ZFS Share==
 +
* Enable Sharing and set sharename
 +
zfs set sharesmb=name=myshare yourpool/shares/bob
 +
* Share Filesystem
 +
zfs set sharesmb=on fsname
 +
*Check if shared
 +
sharemgr show -vp
 +
 +
*Enable "pam_smb_passwd" to make regular OpenIndiana users have smb passwords. To do so, add the following line to the end of the file "/etc/pam.conf": 
 +
other password required pam_smb_passwd.so.1 nowarn
 
==Quick Reference==
 
==Quick Reference==
 +
*Initial Send
 +
zfs send tank/data@blabla | ssh remoteserver /usr/sbin/zfs recv tank/data
 +
 +
zfs list -t snapshot
 +
zfs snapshot bla/bla@??????
 +
zfs send -v -i bla/bla@?????? bla/bla@?????? | ssh 10.0.0.6 zfs recv bla/bla/blackhole0
 +
( zfs send -v -i bla/bla@older bla/bla@newer | ssh 10.0.0.6 zfs recv bla/bla/blackhole0 )
 +
 +
=freebsd zfs creation=
 +
 +
*to list drives
 +
 +
atacontrol list
 +
or
 +
camcontrol devlist
 +
 +
*identify sector size
 +
camcontrol identify ada2
 +
etc (the rest of your drives)
 +
*If it is 4096 create nop devices, to make zfs use 4096 sized sectors
 +
gnop create -S 4096 /dev/ada2
 +
etc (the rest of your drives)
 +
*create raidz pool
 +
zfs create tank raidz ada2.nop ada3.nop
 +
*create raid0 pool
 +
zfs create tank ada2.nop ada3.nop
 +
 +
*zfs vs zpool
 +
*export zpool
 +
zpool export data
 +
gnop destroy /dev/ada2.nop /dev/ada3.nop
 +
zpool import data
 +
You can check the configuration of the pool by using the "zdb" command on the pool:
 +
zdb -C data | grep ashift
 +
The ashift should be "12" for 4K alignment. This works because ZFS writes the ashift value in its metadata
 +
==URL Reference==
 +
*http://forums.freebsd.org/showthread.php?t=21644
 +
*https://wiki.freebsd.org/ZFSQuickStartGuide
 +
 +
*http://www.mailpile.is/
 +
*https://hakshop.myshopify.com/products/wifi-pineapple
 +
*http://www.i-programmer.info/news/105-artificial-intelligence/6197-anonymouth-hides-identity.html
 +
*https://lavabit.com/
 +
*http://ivoras.sharanet.org/blog/tree/2011-01-01.freebsd-on-4k-sector-drives.html (Section 3)
 +
 +
=zfs clone=
 +
*http://thegeekdiary.com/zfs-tutorials-creating-zfs-snapshot-and-clones/
 +
 +
Clones can only be created from snapshots. Snapshots can't be deleted until you have delete the clone created from it.  They are read only until you make them read.
  
 
=ZFS Resources=
 
=ZFS Resources=
Line 247: Line 617:
  
 
[http://mail.opensolaris.org/mailman/listinfo] - Open Solaris Mailing Lists
 
[http://mail.opensolaris.org/mailman/listinfo] - Open Solaris Mailing Lists
 +
 +
[http://docs.oracle.com/cd/E19082-01/817-2271/gbcve/index.html] - Understanding the zpool status Output
  
 
[[File:ZFS_Command_Quick_Reference.odt‎]] - Sun ZFS Command Quick Reference
 
[[File:ZFS_Command_Quick_Reference.odt‎]] - Sun ZFS Command Quick Reference
  
 
[[File:819-5461.pdf]] - Solaris ZFS Administration Guide
 
[[File:819-5461.pdf]] - Solaris ZFS Administration Guide
 +
 +
[http://resilvered.blogspot.com/2011/07/how-to-shrink-zfs-root-pool.html] - Shrink rpool
 +
 +
[http://www.128bitstudios.com/2010/07/23/fun-with-zfs-send-and-receive/] - Fun with ZFS send and receive
 +
 +
[http://www.markround.com/archives/38-ZFS-Replication.html] - ZFS Send/Rec
 +
 +
[http://www.solarisinternals.com/wiki/index.php/ZFS_forensics_scrollback_script] - ZFS Rollback Forensics

Latest revision as of 08:28, 29 August 2015

ZFS is a combined file system and logical volume manager created by Sun Microsystems. For more official information see: Wikipedia ZFS Entry This means that not only is ZFS a file system but it also functions as a software raid. While ZFS has many features and is a solution for things such as the RAID 5 write hole it is also extremely simplistic. ZFS is easy, fast, flexible, and under development.

ZFS Usable OS's:

  • Solaris (Suns main OS no longer free but just a trial version)
  • OpenSolaris (Suns opensource OS)
  • FreeBSD (ZFS is being ported to this OS and while it is stable it lags behind the Solaris release for obvious reasons)
  • Nexenta (This OS claims to be the Solaris kernel with the Ubuntu (Linux) userland which seems nice but it has no window manager by default)
  • StormOS (This OS is the same as Nexenta but has GNOME installed by default)

Sending Pools Across Network Simple


Snapshot
zfs snapshot eh/data@######

Initial Send
zfs send eh/data@130120 | ssh 10.0.0.6 /usr/sbin/zfs recv eh/data

Initial Send nc
Server
zfs send eh/data@130120 | pv | nc -w 20 10.0.0.6 8023
Client
nc -w 120 -l 8023 | zfs recv eh/data

Sync Send
Server
zfs send -v -i 130120 eh/data@130428 | pv | nc -w 20 10.0.0.6 8023

Client
nc -w 120 -l 8023 | zfs recv eh/data

NFS Specific Share Options

zfs sharenfs=rw,anon=uidofuseremulated share/name

ZFS Performance/Slow ZFS Pool

iostat -mX
zpool iostat eh 2
zpool iostat -v
svc_t - average response time  of  transactions,  in  milliseconds

Hotplug

cfgadm -a
cfgadm -c unconfigure sata0/1
cfgadm -c configure sata0/1

Mirror rpool

  • List disks
format -e c0d1 (or whatever you new disk is called)
  • you probably do not need fdisk
>fdisk
>y
  • k?
>label
>0 (SMI)
>y
>quit
prtvtoc /dev/rdsk/c5t0d0s2 | fmthard -s - /dev/rdsk/c5t1d0s2

  • Attach drive
zpool attach -f rpool c5t0d0s0 c5t1d0s0
Make sure to wait until resilver is done before rebooting.
  • Check
zpool status
  • Install grub bootloader on 2nd disk:
installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c5t1d0s0

OS Support

  • I have been using ZFS and OpenIndiana for over a year now! Works great!!!
  • I have been using ZFS and FreeBSD for a while now! Works great!!!
  • I have not tested on Linux but here is a full port of it: http://zfsonlinux.org/

ZFS Background

During my research I found some of the information on ZFS confusing and have decided that it was for two reasons. The first was that it is still relatively new and people have many different questions about it. The second is that from its development it seems to have undergone a lot of changes. Even when I went on Freenode and was in the #ZFS IRC room and asking questions about it I was getting opinionated answers and no solid fact from some of the people using ZFS.

From what I have managed to gather ZFS likes and is used most with whole disks. It can be used with slices (the term for partitions in BSD and Solaris) and even files. Files to me was the most interesting because it allows one to experiment and understand ZFS without destroying anything and also allows one interesting opportunities if they see fit. Like completely ignoring performance and utilising the full sizes of disks. Like RAID ZFS can't use different size disks but takes the size of the smallest and applies that max size to all your drives. I think it is a limitation that ZFS should overcome but one step at a time I suppose.

As far as commands go try 'man zpool' etc and look at the links at the bottom of the page.

ZFS Possibilities + A Poor Man's Raid

Like I said though ZFS can use files and slices. That is I could divide drives into whatever size I want and use them how I please. With files and slices, though, one loses performance because of write caching issues which are enabled by default with whole disks but not with this method. You could have two 1 terrabyte drives and 1 500GB Drive and have 1.5 Terrabytes of usable space. That is you would use the Raid 2 level of ZFS and divide all the drives into 500GB slices or files.

Raid 5 or raidz (as ZFS calls it) Equation from 'man zpool':

A raidz group with N disks of size X with P parity disks can hold approximately (N-P)*X bytes and can withstand P device(s) failing before data integrity is compromised. The minimum number of devices in a raidz group is one more than the number of parity disks. The recommended number is between 3 and 9 to help increase performance.

Ex:

If you have 5 500GB drives or 2x 1TB drives and 1X 500GB which are divided into slices or files you would be fine if any one drive failed.

(5-2)*500GB = 1.5 Terrabytes

This has the redundancy of 2x 500gb drives. So if one terrabyte drive failed you would be fine but have no redundancy.

You could do this but currently ZFS would not perform optimal (3-5x supposedly) and it may be simpler just to by more drives. Still. From reading this one can see what is possible with ZFS. One could even build a Poor Man's Raid with this and still be safe. The only considerations are the performance of this raid. Slices seem to be easy to use if they are not part of a root ZFS filesystem and with files one has to worry about outside data corruption of the files and possible outside config issues.

The way ZFS was 'ment' to be used is with full disks. It enables HD write caching by default and is extremely simple in most cases. It is as simple as finding out your device names and doing 'zpool create mirrorname mirror devicename1 devicename2' or 'zpool create mirrorname mirror filepathname1 filepathname2' or 'zpool create raidmirror raidz file/device/slicename file/device/slicename' on any system with ZFS installed. The 'device' that ZFS creates will be located at /mirrororraidname/ on the root filesystem.

ZFS Misc Notes

In OpenSolaris format -e or just format will give you your device names. Also when using entire drives with ZFS and you are not root it will not find the devices. su and then issue the command. You should be fine.

ZFS also has its own share maker. That is the zfs command creates your NFS or CIFS shares...look it up.

Create ZFS Raidz

Basics

  • First find the names of your disks:
format
AVAILABLE DISK SELECTIONS:
      0. c5t0d0 <ATA-ST920217AS-3.01 cyl 2429 alt 2 hd 255 sec 63>
         /pci@0,0/pci8086,202d@1f,2/disk@0,0
      1. c7t50014EE2B1448CFEd0 <ATA-WDCWD20EARX-00P-AB51 cyl 60798 alt 2 hd 255 sec 252>
         /scsi_vhci/disk@g50014ee2b1448cfe
      2. c7t50014EE25C1A7300d0 <ATA-WDCWD20EARX-00P-AB51 cyl 60798 alt 2 hd 255 sec 252>
         /scsi_vhci/disk@g50014ee25c1a7300
      3. c7t50014EE25CC0D3EEd0 <ATA-WDCWD20EARX-32P-AB51 cyl 60798 alt 2 hd 255 sec 252>
         /scsi_vhci/disk@g50014ee25cc0d3ee
      4. c7t50014EE20699B771d0 <ATA-WDCWD20EARX-00P-AB51 cyl 60798 alt 2 hd 255 sec 252>
         /scsi_vhci/disk@g50014ee20699b771
  • No redundancy
zpool create tank raidz c7t50014EE2B1448CFEd0 c7t50014EE25C1A7300d0 c7t50014EE25CC0D3EEd0 c7t50014EE20699B771d0
  • 1 disk
zpool create tank raidz1 c7t50014EE2B1448CFEd0 c7t50014EE25C1A7300d0 c7t50014EE25CC0D3EEd0 c7t50014EE20699B771d0
  • 2 disk
zpool create tank raidz2 c7t50014EE2B1448CFEd0 c7t50014EE25C1A7300d0 c7t50014EE25CC0D3EEd0 c7t50014EE20699B771d0
  • etc

Everyone is recommending raidz2 now a days. I dunno, I guess if you do not backup, that is the way to go. But do keep in mind, drives do fail. Really, they do. I think the big issue is on a rebuild if another drive goes bad. But then what if all the disks go bad? See?

Advanced Format Drives and the Future

So new drives use 4k sectors now to save on allocation of ECC data on a physical disk.

To make a long story short[1], if your system truly recognizes 4k disks, and all your disks are 4k, zfs should do the rest of the work.

But lets say you want to future proof everything (4k sizes on 512 byte disks, no performance impact) or have a mixed array that you plan to grow and eventually replace with all 4k disks. You are going to need to manually add the ashift parm do the create line (You cannot change your ashift later).

This value is actually a bit shift value, so an ashift value for 512 bytes is 9 (29 = 512) while the ashift value for 4,096 bytes is 12 (212 = 4,096). To force the pool to use 4,096 byte sectors we must specify this at pool creation time[2]:

zpool create -o ashift=12 tank raidz1 diskbla bladisk whatever

You can see what your drives are reporting with[3]:

echo ::sd_state | mdb -k | egrep '(^un|_blocksize)'

The unit number (un) corresponds to the SCSI driver (sd) instance number. In the above example, "un 0" is also known as "sd0." Sizes are in bytes, written as hex: 0x200 = 512, 0x1000 = 4096.

To inspect the ashift value actually used by ZFS for a particular pool, you can use zdb (without parameters it dumps labels of imported pools), for example:

zdb | egrep 'ashift| name'
     name='pond'
                 ashift=9
                 ashift=9
                 ashift=9
                 ashift=9
                 ashift=9
                 ashift=9
                 ashift=9
                 ashift=9
                 ashift=9
     name='rpool'
                 ashift=9
     name='temp'
                 ashift=9

Note that each pool can consist of one or several Top-level VDEVs, and each of those can have an individual composition of devices, i.e. not only mixing of mirrors and raidzN sets is possible, but also of devices with different hardware sector size and thus ashift as set at TLVDEV creation time. However, in order to avoid unbalanced IO and "unpleasant surprises" which might be difficult to explain and debug, it is discouraged to build pools from such mixtures.

Advanced Format Drives and the Future with Openindiana and sd.conf

So openindiana or illumos-gate does not accept the ashift value. Instead you have to edit /kernel/drv/sd.conf to force the driver. I am telling the system to treat my 512 byte disks as 4k so in the future I am happy.

This all comes from: http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks starting @ Overriding the Physical Block Size

Ways to get the drive info:

format -e

Select disk

inquiry
Vendor:   ATA     
Product:  SAMSUNG HD204UI 
Revision: 0001

Or

iostat -Er
sd6       ,Soft Errors: 0 ,Hard Errors: 0 ,Transport Errors: 0 
Vendor: ATA      ,Product: SAMSUNG HD204UI  ,Revision: 0001 ,Serial No: S2H7J1CB112293  
Size: 2000.40GB <2000398934016 bytes>

Or

echo "::walk sd_state | ::grep '.!=0' | ::print struct sd_lun un_sd | \
 ::print struct scsi_device sd_inq | ::print struct scsi_inquiry \
 inq_vid inq_pid" | mdb -k
inq_vid = [ "ATA     " ]
inq_pid = [ "SAMSUNG HD204UI " ]

So now that we have some information we have to edit sd.conf:

nano /kernel/drv/sd.conf
sd-config-list =
       "ATA SAMSUNG HD204UI", "physical-block-size:4096";

Another example:

sd-config-list =
       "SEAGATE ST3300657SS", "physical-block-size:4096",
       "DGC     RAID", "physical-block-size:4096",
       "NETAPP  LUN", "physical-block-size:4096";

If this[4] is still right the format of sd-config-list:

Padding is actually not necessary. Extra spaces (not tabs!) are ignored.

Asterisks have special meaning if they appear at the beginning and end of the ID string only. sd searches the product ID (not the vendor ID) for the substring enclosed by asterisks. Examples:

   "*ST9320*" will match any vendor ID and product ID "ST9320423AS".
   "**" will match any vendor ID and product ID.

The ATA in my disk string had a bunch of spaces after it, it also had spaces after the model number. Looks like it does not matter.


Then:

update_drv -vf sd

Cannot unload module: sd Will be unloaded upon reboot. Forcing update of sd.conf sd.conf updated in the kernel.

[5]Don’t worry about the errors we only care that the sd.conf was re-read by the kernel. Unfortunately this new setting will not take effect as long as the current disk is attached so we must force it to unattach and reattach:

Match up your devices:

format output:

       3. c2t50024E90049754DAd0 <ATA-SAMSUNGHD204UI-0001 cyl 60798 alt 2 hd 255 sec 252>
          /scsi_vhci/disk@g50024e90049754da
       4. c2t50024E90049754DBd0 <ATA-SAMSUNGHD204UI-0001 cyl 60798 alt 2 hd 255 sec 252>
          /scsi_vhci/disk@g50024e90049754db
       5. c2t50024E900497550Bd0 <ATA-SAMSUNGHD204UI-0001 cyl 60798 alt 2 hd 255 sec 252>
          /scsi_vhci/disk@g50024e900497550b
       6. c2t50024E9004975524d0 <ATA-SAMSUNGHD204UI-0001 cyl 60798 alt 2 hd 255 sec 252>
          /scsi_vhci/disk@g50024e9004975524

cfgadm output:

c0                             scsi-sas     connected    configured   unknown
c0::w50024e90049754da,0        disk-path    connected    configured   unknown
c1                             scsi-sas     connected    configured   unknown
c1::w50024e900497550b,0        disk-path    connected    configured   unknown
c3                             scsi-sas     connected    configured   unknown
c3::w50024e9004975524,0        disk-path    connected    configured   unknown
c4                             scsi-sas     connected    configured   unknown
c4::w50014ee65ac710bc,0        disk-path    connected    configured   unknown
c5                             scsi-sas     connected    configured   unknown
c5::w50014ee65ac68d37,0        disk-path    connected    configured   unknown
c6                             scsi-sas     connected    unconfigured unknown
c7                             scsi-sas     connected    configured   unknown
c7::w50024e90049754db,0        disk-path    connected    configured   unknown
c8                             scsi-sas     connected    configured   unknown
c8::w50014ee20c18549c,0        disk-path    connected    configured   unknown

Looks like I need to reconfigure:

c0::w50024e90049754da,0
c1::w50024e900497550b,0
c3::w50024e9004975524,0
c7::w50024e90049754db,0
cfgadm -al
cfgadm -c unconfigure c0::w50024e90049754da,0
cfgadm -c configure c0::w50024e90049754da,0

Do all the ones that changed.


This DOES NOT work to Check:

echo ::sd_state | mdb -k | egrep '(^un|_blocksize)'

NO NO NO

I had to add the disks to a pool and use zdb -l on the disk. Same end step written here: http://blog.delphix.com/gwilson/2012/11/15/4k-sectors-and-zfs/

So the sd.conf modification does this:

22:34 < tsoome> it will set sd driver options, so when you do ioctl( DIOCGMEDIASIZE), you get this value
22:34 < tsoome> or DIOCGMEDIASIZEEXT

sd.conf disable idle park heads

You can also stop the 'green drives' from parking heads:

nano /kernel/drv/sd.conf
sd-config-list = "SEAGATE ST3300657SS", "power-condition:false",
                "SEAGATE ST2000NM0001", "power-condition:false";

Autoexpand

It looks like you can enable and disable this one after creation[6]:

zpool set autoexpand=on tank

But on creation:

zpool create -o autoexpand=on tank c1t13d0

Case Sensitive

You many want to look at the casesensitivity property. Windows file systems are insensitive.

https://docs.oracle.com/cd/E19120-01/open.solaris/817-2271/gftgr/index.html

Other Cool Options

listsnaps=on
     Controls whether information about snapshots associated with this
     pool is output	when "zfs list"	is run without the -t option. The
     default value is off.

An Example Of A Final Command

illumos-gate does not support ashift values btw. See: http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks and look for Overriding the Physical Block Size

zpool create -o ashift=12 -o autoexpand=on -o listsnaps=on data raidz1 c2t50014EE20C18549Cd0 c2t50024E90049754DAd0 c2t50024E90049754DBd0 c2t50024E900497550Bd0 c2t50024E9004975524d0

ZFS Snapshots

You can make snapshots to backup information. You can destroy them and hold them and send them across the network to a zfs system.

Rolling Back

To discard all changes made since a snapshot was taken and revert the filesystem back to its state at the time the snapshot was taken:

# zfs rollback <snapshot_to_roll_back_to>
# zfs rollback test_pool/fs1@monday


Note: if the filesystem you want to rollback is currently mounted, you will need to unmount it and remount it. Use -f to force unmount.

You can only rollback to the most recent snapshot. If you want to rollback to an earlier snapshot, either delete the snapshots in between or use the -r option.

# zfs rollback test_pool/fs1@monday
cannot rollback to ’test_pool/fs1@monday’: more recent snapshots exist
use ’-r’ to force deletion of the following snapshots:
test_pool/fs1@tuesday
test_pool/fs1@wednesday
# zfs rollback -r test_pool/fs1@monday

Basics

Below text copied from: http://docs.oracle.com/cd/E19253-01/819-5461/gbcya/index.html

Creating and Destroying ZFS Snapshots

Snapshots are created by using the zfs snapshot command, which takes as its only argument the name of the snapshot to create. The snapshot name is specified as follows:

filesystem@snapname volume@snapname

The snapshot name must satisfy the naming requirements in ZFS Component Naming Requirements.

In the following example, a snapshot of tank/home/ahrens that is named friday is created.

# zfs snapshot tank/home/ahrens@friday

You can create snapshots for all descendent file systems by using the -r option. For example:

# zfs snapshot -r tank/home@now
# zfs list -t snapshot
NAME                       USED  AVAIL  REFER  MOUNTPOINT
rpool/ROOT/zfs2BE@zfs2BE  78.3M      -  4.53G  -
tank/home@now                 0      -    26K  -
tank/home/ahrens@now          0      -   259M  -
tank/home/anne@now            0      -   156M  -
tank/home/bob@now             0      -   156M  -
tank/home/cindys@now          0      -   104M  -

Snapshots have no modifiable properties. Nor can dataset properties be applied to a snapshot. For example:

# zfs set compression=on tank/home/ahrens@now
cannot set compression property for 'tank/home/ahrens@now': snapshot
properties cannot be modified

Snapshots are destroyed by using the zfs destroy command. For example:

# zfs destroy tank/home/ahrens@now

A dataset cannot be destroyed if snapshots of the dataset exist. For example:

# zfs destroy tank/home/ahrens
cannot destroy 'tank/home/ahrens': filesystem has children
use '-r' to destroy the following datasets:
tank/home/ahrens@tuesday
tank/home/ahrens@wednesday
tank/home/ahrens@thursday

In addition, if clones have been created from a snapshot, then they must be destroyed before the snapshot can be destroyed.

For more information about the destroy subcommand, see Destroying a ZFS File System.


Replication

Problem?

I was having a problem with receive through ssh. If you receive error such as zfs command not found, the solution is that you write full path of zfs in the remote size.

zfs send DV/Sn03 | ssh reski@zfsserver /usr/sbin/zfs receive DV/Sn03
  • I always run the send and receive commands in screen.

How

Below text copied from: http://www.markround.com/archives/38-ZFS-Replication.html

ZFS Replication Sysadmin

As I've been investigating ZFS for use on production systems, I've been making a great deal of notes, and jotting down little "cookbook recipies" for various tasks. One of the coolest systems I've created recently utilised the zfs send & receive commands, along with incremental snapshots to create a replicated ZFS environment across two different systems. True, all this is present in the zfs manual page, but sometimes a quick demonstration makes things easier to understand and follow.

While this isn't true filesystem replication (you'd have to look at something like StorageTek AVS for that) it does provide periodic snapshots and incremental updates; these can be run every minute if you're driving this from cron - or, at even more granular intervals if you write your own daemon. Nonetheless, this suffices for disaster recovery and redundancy if you don't need up-to-the second replication between systems.

I've typed up my notes in blog format so you can follow along with this example yourself, all you'll need is a Solaris system running ZFS. Read more for the full demonstration...

First, as with my last walkthrough, I'll create a couple of files to use for testing purposes. In a real-life scenario, these would most likely be pools of disks in a RAIDZ configuration, and the two pools would also be on physically separate systems. I'm only using 100Mb files for each, as that's all I need for this proof of concept.

   [root@solaris]$ mkfile 100m master
   [root@solaris]$ mkfile 100m slave
   [root@solaris]$ zpool create master $PWD/master
   [root@solaris]$ zpool create slave $PWD/slave
   [root@solaris]$ zpool list
   NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
   master                 95.5M   84.5K   95.4M     0%  ONLINE     -
   slave                  95.5M   52.5K   95.4M     0%  ONLINE     -
   [root@solaris]$ zfs list
   NAME                   USED  AVAIL  REFER  MOUNTPOINT
   master                  77K  63.4M  24.5K  /master
   slave                 52.5K  63.4M  1.50K  /slave

There we go. The naming should be pretty self-explanatory : The "master" is the primary storage pool, which will replicate and push data through to the backup "slave" pool.

Now, I'll create a ZFS filesystem and add something to it. I had a few source tarballs knocking around, so I just unpacked one (GNU grep) to give me a set of files to use as a test :

   [root@solaris]$ zpool create master/data
   [root@solaris]$ cd /master/data/
   [root@solaris]$ gtar xzf ~/grep-2.5.1.tar.gz
   [root@solaris]$ ls
   grep-2.5.1

We can also see from "zfs list" we've now taken up some space :

   [root@solaris]$ zfs list
   NAME                   USED  AVAIL  REFER  MOUNTPOINT
   master                3.24M  60.3M  25.5K  /master
   master/data           3.15M  60.3M  3.15M  /master/data
   slave                 75.5K  63.4M  24.5K  /slave

Now, we'll transfer all this over to the "slave", and start the replication going. We first need to take an initial snapshot of the filesystem, as that's what "zfs send" works on. It's also worth noting here that in order to transfer the data to the slave, I simply piped it to "zfs receive". If you're doing this between two physically separate systems, you'd most likely just pipe this through SSH between the systems and set up keys to avoid the need for passwords. Anyway, enough talk :

   [root@solaris]$ zfs snapshot master/data@1
   [root@solaris]$ zfs send master/data@1 | zfs receive slave/data

This now sent it through to the slave. It's also worth pointing out that I didn't have to recreate the exact same pool or zfs structure on the slave (which may be useful if you are replicating between dissimilar systems), but I chose to keep the filesystem layout the same for the sake of legibility in this example. I also simply used a numeric identifier for each snapshot; in a production system, timestamps may be more appropriate.

Anyway, let's take a quick look at "zfs list", where we'll see the slave has now gained a snapshot utilising exactly the same amount of space as the master :

   [root@solaris]$ zfs list
   NAME                   USED  AVAIL  REFER  MOUNTPOINT
   master                3.25M  60.3M  25.5K  /master
   master/data           3.15M  60.3M  3.15M  /master/data
   master/data@1             0      -  3.15M  -
   slave                 3.25M  60.3M  24.5K  /slave
   slave/data            3.15M  60.3M  3.15M  /slave/data
   slave/data@1              0      -  3.15M  -

Now, here comes a big "gotcha". You now have to set the "readonly" attribute on the slave. I discovered that if this was not set, even just cd-ing into the slave's mountpoints would cause things to break in subsequent replication operations; presumably down to metadata (access times and the like) being altered.

   [root@solaris]$ zfs set readonly=on slave/data
zfs allow username send,receive,clone,create,destroy,hold,mount,promote,rename,rollback,share,snapshot tank

So, let's look in the slave to see if our files are there :

   [root@solaris]$ ls /slave/data
   grep-2.5.1

Excellent stuff! However, the real coolness starts with the incremental transfers - instead of transferring the whole lot again, we can just send only the bits of data that actually changed - this will drastically reduce bandwidth and the time taken to replicate data, making a "cron" based system of periodic snapshots and transfers feasable. To demonstrate this, I'll unpack another tarball (this time, GNU bison) on the master so I have some more data to send :

   [root@solaris]$ cd /master/data
   [root@solaris]$ gtar xzf ~/bison-2.3.tar.gz

And we'll now make a second snapshot, and transfer differences between this one and the last :

   [root@solaris]$ zfs snapshot master/data@2
   [root@solaris]$ zfs send -i master/data@1 master/data@2 | zfs receive slave/data

Checking to see what's happened, we see the slave has gained another snapshot:

   [root@solaris]$ zfs list
   NAME                   USED  AVAIL  REFER  MOUNTPOINT
   master                10.2M  53.3M  25.5K  /master
   master/data           10.1M  53.3M  10.1M  /master/data
   master/data@1         32.5K      -  3.15M  -
   master/data@2             0      -  10.1M  -
   slave                 10.2M  53.3M  25.5K  /slave
   slave/data            10.1M  53.3M  10.1M  /slave/data
   slave/data@1          32.5K      -  3.15M  -
   slave/data@2              0      -  10.1M  -

And our new data is now there as well :

   [root@solaris]$ ls /slave/data/
   bison-2.3   grep-2.5.1

And that's it. All that remains to turn this into a production system between two hosts is for a periodic cron job to be written that runs at the appropriate intervals (daily, or even every minute if need be) and snapshots the filesystem before transferring it. You'll also likely want to have another job that clears out old snapshots, or maybe archives them off somewhere.

ZFS Share

  • Enable Sharing and set sharename

zfs set sharesmb=name=myshare yourpool/shares/bob

  • Share Filesystem
zfs set sharesmb=on fsname
  • Check if shared
sharemgr show -vp
  • Enable "pam_smb_passwd" to make regular OpenIndiana users have smb passwords. To do so, add the following line to the end of the file "/etc/pam.conf":
other password required pam_smb_passwd.so.1 nowarn

Quick Reference

  • Initial Send
zfs send tank/data@blabla | ssh remoteserver /usr/sbin/zfs recv tank/data
zfs list -t snapshot
zfs snapshot bla/bla@??????
zfs send -v -i bla/bla@?????? bla/bla@?????? | ssh 10.0.0.6 zfs recv bla/bla/blackhole0
( zfs send -v -i bla/bla@older bla/bla@newer | ssh 10.0.0.6 zfs recv bla/bla/blackhole0 )

freebsd zfs creation

  • to list drives
atacontrol list

or

camcontrol devlist
  • identify sector size
camcontrol identify ada2
etc (the rest of your drives)
  • If it is 4096 create nop devices, to make zfs use 4096 sized sectors
gnop create -S 4096 /dev/ada2

etc (the rest of your drives)

  • create raidz pool
zfs create tank raidz ada2.nop ada3.nop
  • create raid0 pool
zfs create tank ada2.nop ada3.nop
  • zfs vs zpool
  • export zpool
zpool export data
gnop destroy /dev/ada2.nop /dev/ada3.nop
zpool import data

You can check the configuration of the pool by using the "zdb" command on the pool:

zdb -C data | grep ashift

The ashift should be "12" for 4K alignment. This works because ZFS writes the ashift value in its metadata

URL Reference

zfs clone

Clones can only be created from snapshots. Snapshots can't be deleted until you have delete the clone created from it. They are read only until you make them read.

ZFS Resources

[1] - Great for beginners. Lets one understand how to use ZFS without having any spare hard drives.

[2] - Turning a ZFS mirror into a raidz array.

[3] - Official ZFS Admin Guide Link

[4] - ZFS Command Quick Reference

[5] - ZFS Best Practices Guide

[6] - Open Solaris Mailing Lists

[7] - Understanding the zpool status Output

File:ZFS Command Quick Reference.odt - Sun ZFS Command Quick Reference

File:819-5461.pdf - Solaris ZFS Administration Guide

[8] - Shrink rpool

[9] - Fun with ZFS send and receive

[10] - ZFS Send/Rec

[11] - ZFS Rollback Forensics
  1. http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks
  2. http://zfsonlinux.org/faq.html
  3. http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks
  4. https://www.illumos.org/issues/2665
  5. http://blog.delphix.com/gwilson/2012/11/15/4k-sectors-and-zfs/
  6. https://docs.oracle.com/cd/E19253-01/819-5461/githb/index.html