<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: ZFS Slow Performance Fix</title>
	<atom:link href="http://letsgetdugg.com/2009/10/21/zfs-slow-performance-fix/feed/" rel="self" type="application/rss+xml" />
	<link>http://letsgetdugg.com/2009/10/21/zfs-slow-performance-fix/</link>
	<description>Random tech jargon</description>
	<lastBuildDate>Mon, 06 Sep 2010 09:29:35 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
	<item>
		<title>By: victori</title>
		<link>http://letsgetdugg.com/2009/10/21/zfs-slow-performance-fix/comment-page-1/#comment-208</link>
		<dc:creator>victori</dc:creator>
		<pubDate>Mon, 08 Feb 2010 22:05:20 +0000</pubDate>
		<guid isPermaLink="false">http://letsgetdugg.com/?p=377#comment-208</guid>
		<description>I completely forgot about this post. I actually figured out the root cause of our performance problem. The issue was with a high TLER value defined on caviar black hard disks. See the link below for more information on TLER.

http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery

Point of story is, don&#039;t go cheap when setting up a RAID system. Buy only RAID edition drives, they are tuned for RAID setups. Live and learn I guess.</description>
		<content:encoded><![CDATA[<p>I completely forgot about this post. I actually figured out the root cause of our performance problem. The issue was with a high TLER value defined on caviar black hard disks. See the link below for more information on TLER.</p>
<p><a href="http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery" rel="nofollow">http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery</a></p>
<p>Point of story is, don&#8217;t go cheap when setting up a RAID system. Buy only RAID edition drives, they are tuned for RAID setups. Live and learn I guess.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brian Horn</title>
		<link>http://letsgetdugg.com/2009/10/21/zfs-slow-performance-fix/comment-page-1/#comment-206</link>
		<dc:creator>Brian Horn</dc:creator>
		<pubDate>Sun, 07 Feb 2010 17:57:31 +0000</pubDate>
		<guid isPermaLink="false">http://letsgetdugg.com/?p=377#comment-206</guid>
		<description>Actually for SATA disks the ZFS folks required that
only a single I/O be sent down to the driver at a time
as they say a fall off in concurrent sequential I/O
performance.  The problem is likely is in an assumption
in ZFS about the order of completion of the I/Os by
the disk be at least roughly in the order that ZFS submitted
them.  This is not true with either SATA NCQ or SCSI/SAS
simple tagged queueing.  Nikolay was correct in pointing
out that that something else is going on.  Look at the size
of the active queue (&quot;actv&quot;).  It is 0, not 32.</description>
		<content:encoded><![CDATA[<p>Actually for SATA disks the ZFS folks required that<br />
only a single I/O be sent down to the driver at a time<br />
as they say a fall off in concurrent sequential I/O<br />
performance.  The problem is likely is in an assumption<br />
in ZFS about the order of completion of the I/Os by<br />
the disk be at least roughly in the order that ZFS submitted<br />
them.  This is not true with either SATA NCQ or SCSI/SAS<br />
simple tagged queueing.  Nikolay was correct in pointing<br />
out that that something else is going on.  Look at the size<br />
of the active queue (&#8220;actv&#8221;).  It is 0, not 32.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nikolay Botev</title>
		<link>http://letsgetdugg.com/2009/10/21/zfs-slow-performance-fix/comment-page-1/#comment-152</link>
		<dc:creator>Nikolay Botev</dc:creator>
		<pubDate>Mon, 30 Nov 2009 04:45:19 +0000</pubDate>
		<guid isPermaLink="false">http://letsgetdugg.com/?p=377#comment-152</guid>
		<description>Hey thanks for the info.

I had the same problem as you: while writing to zpool reads would become impossible.

I spent a whole week, going from zpool v6 running on FreeBSD 7.2, to OpenSolaris r127, upgraded pool to v13, added dedicated external log device, changed vdev_max_pending to 1, all to no avail... although, vdev_max_pending=1 did help a lot.

In the end, however, I discovered the problem was not in vdev_max_pending (which on osol 127 was set to 10 by default for me, not 32, so they have probably fixed that now in the kernel to use 10 by default for SATA disks?).

The real culprit was a bad hard disk - I discovered using iostat -xn 1 that zfs freezes when it gets stuck to writing to only one of the drives in the pool, which would show as 100% blocking in iostat =&gt; one of the drives was a lot slower, even though it never reported any errors, and scrubbing the pool actually went through fine (although slower too).

I noticed that in your example also it is just one drive that is blocking at 100%, so I thought I&#039;d let you know, just in case this might be the problem in your case too.</description>
		<content:encoded><![CDATA[<p>Hey thanks for the info.</p>
<p>I had the same problem as you: while writing to zpool reads would become impossible.</p>
<p>I spent a whole week, going from zpool v6 running on FreeBSD 7.2, to OpenSolaris r127, upgraded pool to v13, added dedicated external log device, changed vdev_max_pending to 1, all to no avail&#8230; although, vdev_max_pending=1 did help a lot.</p>
<p>In the end, however, I discovered the problem was not in vdev_max_pending (which on osol 127 was set to 10 by default for me, not 32, so they have probably fixed that now in the kernel to use 10 by default for SATA disks?).</p>
<p>The real culprit was a bad hard disk &#8211; I discovered using iostat -xn 1 that zfs freezes when it gets stuck to writing to only one of the drives in the pool, which would show as 100% blocking in iostat =&gt; one of the drives was a lot slower, even though it never reported any errors, and scrubbing the pool actually went through fine (although slower too).</p>
<p>I noticed that in your example also it is just one drive that is blocking at 100%, so I thought I&#8217;d let you know, just in case this might be the problem in your case too.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
