ZFS Slow Performance Fix
Update: The following information could be beneficial to some, however my issues actually were with Caviar black drives shipping with TLER disabled. You need to pay Western Digital a premium for their “RAID” drives with TLER enabled. So for anyone reading this, avoid consumer Western Digital drives if you plan on using them for RAID.
zfs_vdev_max_pending
I can’t believe how long I have been tolerating horrible concurrent IO performance on OpenSolaris running ZFS. When I have any IO intensive writes happening the whole system slows down to a crawl for any further IO. Running “ls” on a uncached directory is just painful.
victori@opensolaris:/opt# iostat -xnz 1 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 87.0 0.0 2878.1 0.0 0.0 0.0 0.4 0 100 c4t0d0 0.0 83.0 0.0 2878.1 0.0 0.1 0.2 0.7 1 50 c4t1d0 1.0 0.0 28.0 0.0 0.0 0.0 0.0 5.4 0 1 c4t2d0
Notice c4t0d0 is blocking at 100%. If a disk is blocking at 100% good luck getting the disk to do any other operations such as reads.
SATA disks do Native Command Queuing while SAS disks do Tagged Command Queuing, this is an important distinction. Seems like OpenSolaris/Solaris is optimized for the latter with a 32 wide command queue set by default. This completely saturates the SATA disks with IO commands in turn making the system unusable for short periods of time.
Dynamically set the ZFS command queue to 1 to optimize for NCQ.
And add to /etc/system
Enjoy your OpenSolaris server on cheap SATA disks!


Hey thanks for the info.
I had the same problem as you: while writing to zpool reads would become impossible.
I spent a whole week, going from zpool v6 running on FreeBSD 7.2, to OpenSolaris r127, upgraded pool to v13, added dedicated external log device, changed vdev_max_pending to 1, all to no avail… although, vdev_max_pending=1 did help a lot.
In the end, however, I discovered the problem was not in vdev_max_pending (which on osol 127 was set to 10 by default for me, not 32, so they have probably fixed that now in the kernel to use 10 by default for SATA disks?).
The real culprit was a bad hard disk – I discovered using iostat -xn 1 that zfs freezes when it gets stuck to writing to only one of the drives in the pool, which would show as 100% blocking in iostat => one of the drives was a lot slower, even though it never reported any errors, and scrubbing the pool actually went through fine (although slower too).
I noticed that in your example also it is just one drive that is blocking at 100%, so I thought I’d let you know, just in case this might be the problem in your case too.
Actually for SATA disks the ZFS folks required that
only a single I/O be sent down to the driver at a time
as they say a fall off in concurrent sequential I/O
performance. The problem is likely is in an assumption
in ZFS about the order of completion of the I/Os by
the disk be at least roughly in the order that ZFS submitted
them. This is not true with either SATA NCQ or SCSI/SAS
simple tagged queueing. Nikolay was correct in pointing
out that that something else is going on. Look at the size
of the active queue (“actv”). It is 0, not 32.
I completely forgot about this post. I actually figured out the root cause of our performance problem. The issue was with a high TLER value defined on caviar black hard disks. See the link below for more information on TLER.
http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery
Point of story is, don’t go cheap when setting up a RAID system. Buy only RAID edition drives, they are tuned for RAID setups. Live and learn I guess.