<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>SSD Performance Blog</title>
	<atom:link href="http://www.ssdperformanceblog.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.ssdperformanceblog.com</link>
	<description>Percona&#039;s blog about SSD performance and MySQL</description>
	<lastBuildDate>Tue, 14 May 2013 17:36:27 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5</generator>
		<item>
		<title>Virident vCache vs. FlashCache: Part 2</title>
		<link>http://www.ssdperformanceblog.com/2013/05/virident-vcache-vs-flashcache-part-2/</link>
		<comments>http://www.ssdperformanceblog.com/2013/05/virident-vcache-vs-flashcache-part-2/#comments</comments>
		<pubDate>Tue, 14 May 2013 17:36:27 +0000</pubDate>
		<dc:creator>Ernie Souhrada</dc:creator>
				<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[MLC]]></category>
		<category><![CDATA[PCIe]]></category>
		<category><![CDATA[ssd]]></category>
		<category><![CDATA[Ernie Souhrada]]></category>
		<category><![CDATA[flashcache]]></category>
		<category><![CDATA[sysbench]]></category>
		<category><![CDATA[vCache]]></category>
		<category><![CDATA[virident]]></category>

		<guid isPermaLink="false">http://www.ssdperformanceblog.com/?p=346</guid>
		<description><![CDATA[This is the second part in a two-part series comparing Virident&#8217;s vCache to FlashCache. The first part was focused on usability and feature comparison; in this post, we&#8217;ll look at some sysbench test results. Disclosure: The research and testing conducted for this post were sponsored by Virident. First, some background information. All tests were conducted [...]]]></description>
				<content:encoded><![CDATA[<p>This is the second part in a two-part series comparing Virident&#8217;s vCache to FlashCache.  The first part was focused on usability and feature comparison; in this post, we&#8217;ll look at some sysbench test results.</p>
<p>Disclosure: The research and testing conducted for this post were sponsored by Virident.</p>
<p>First, some background information.  All tests were conducted on <A TARGET="_blank" HREF="http://www.percona.com/docs/wiki/benchmark%3Ahardware%3Acisco_ucs_c250">Percona&#8217;s Cisco UCS C250</A> test machine, and both the vCache and FlashCache tests used the same 2.2TB Virident FlashMAX II as the cache storage device.  EXT4 is the filesystem, and CentOS 6.4 the operating system, although the pre-release modules I received from Virident required the use of the CentOS 6.2 kernel, 2.6.32-220, so that was the kernel in use for all of the benchmarks on both systems.  The benchmark tool used was sysbench 0.5 and the version of MySQL used was Percona Server 5.5.30-rel30.1-465.  Each test was allowed to run for 7200 seconds, and the first 3600 seconds were discarded as warmup time; the remaining 3600 seconds were averaged into 10-second intervals.  All tests were conducted with approximately 78GiB of data (32 tables, 10M rows each) and a 4GiB buffer pool.  The cache devices were flushed to disk immediately prior to and immediately following each test run.</p>
<p>With that out of the way, let&#8217;s look at some numbers.</p>
<h3>vCache vs. vCache &#8211; MySQL parameter testing</h3>
<p>The first test was designed to look solely at vCache performance under some different sets of MySQL configuration parameters.  For example, given that the front-end device is a very fast PCIe SSD, would it make more sense to configure MySQL as if it were using SSD storage or to just use an optimized HDD storage configuration?  After creating a vCache device with the default configuration, I started with a baseline HDD configuration for MySQL (configuration A, listed at the bottom of this post) and then tried three additional sets of experiments.  First, the baseline configuration plus:<br />
<code><br />
  innodb_read_io_threads = 16<br />
  innodb_write_io_threads = 16<br />
</code><br />
We call this configuration B.  The next one contained four SSD-specific optimizations based partially on some earlier work that I&#8217;d done with this Virident card (configuration C):<br />
<code><br />
  innodb_io_capacity = 30000<br />
  innodb_adaptive_flushing_method = keep_average<br />
  innodb_flush_neighbor_pages=none<br />
  innodb_max_dirty_pages_pct = 60<br />
</code><br />
And then finally, a fourth test (configuration D) which combined the parameter changes from tests B and C.  The graph below shows the sysbench throughput (tps) for these four configurations:<br />
<a href="http://www.ssdperformanceblog.com/2013/05/virident-vcache-vs-flashcache-part-2/vcache_trx_params/" rel="attachment wp-att-347"><img src="http://www.ssdperformanceblog.com/wp-content/uploads/2013/04/vcache_trx_params.png" alt="vcache_trx_params" width="640" height="480" class="aligncenter size-full wp-image-347" /></a><br />
As we can see, all of the configuration options produce numbers that, in the absence of outliers, are roughly identical, but it&#8217;s configuration C (shown in the graph as the blue line &#8211; SSD config) which shows the most consistent performance.  The others all have assorted performance drops scattered throughout the graph.  We see the exact same pattern when looking at transaction latency; the baseline numbers are roughly identical for all four configurations, but configuration C avoids the spikes and produces a very constant and predictable result.<br />
<a href="http://www.ssdperformanceblog.com/2013/05/virident-vcache-vs-flashcache-part-2/vcache_response_params/" rel="attachment wp-att-350"><img src="http://www.ssdperformanceblog.com/wp-content/uploads/2013/04/vcache_response_params.png" alt="vcache_response_params" width="640" height="480" class="aligncenter size-full wp-image-350" /></a></p>
<h3>vCache vs. FlashCache &#8211; the basics</h3>
<p>Once I&#8217;d determined that configuration C appeared to produce the most optimal results, I moved on to reviewing FlashCache performance versus that of vCache, and I also included a &#8220;no cache&#8221; test run as well using the base HDD MySQL configuration for purposes of comparison.  Given the apparent differences in time-based flushing in vCache and FlashCache, both cache devices were set up so that time-based flushing was disabled.  Also, both devices were set up such that all IO would be cached (i.e., no special treatment of sequential writes) and with a 50% dirty page threshold.  Again, for comparison purposes, I also include the numbers from the vCache test where the time-based flushing is enabled.<br />
<a href="http://www.ssdperformanceblog.com/2013/05/virident-vcache-vs-flashcache-part-2/vcache_fcache_trx_params/" rel="attachment wp-att-351"><img src="http://www.ssdperformanceblog.com/wp-content/uploads/2013/04/vcache_fcache_trx_params.png" alt="vcache_fcache_trx_params" width="640" height="480" class="aligncenter size-full wp-image-351" /></a><br />
As we&#8217;d expect, the HDD-only solution barely registered on the graph.  With a buffer pool that&#8217;s much smaller than the working set, the no-cache approach is fairly crippled and ineffectual.  FlashCache does substantially better, coming in at an average of around 600 tps, but vCache is about 3x better.  The interesting item here is that vCache with time-based flushing enabled actually produces better and more consistent performance than vCache without time-based flushing, but even at its worst, the vCache test without time-based flushing still outperforms FlashCache by over 2x, on average.</p>
<p>Looking just at sysbench reads, vCache with time-based flushing consistently hit about 27000 per second, whereas without time-based flushing it averaged about 12500.  FlashCache came in around 7500 or so.  Sysbench writes came in just under 8000 for vCache + time-based flushing, around 6000 for vCache without time-based flushing, and somewhere around 2500 for FlashCache.<br />
<a href="http://www.ssdperformanceblog.com/2013/05/virident-vcache-vs-flashcache-part-2/vcache_fcache_read_write/" rel="attachment wp-att-352"><img src="http://www.ssdperformanceblog.com/wp-content/uploads/2013/04/vcache_fcache_read_write.png" alt="vcache_fcache_read_write" width="1000" height="375" class="aligncenter size-full wp-image-352" /></a></p>
<p>We can take a look at some vmstat data to see what&#8217;s actually happening on the system during all these various tests.  Clockwise from the top left in the next graph, we have &#8220;no cache&#8221;, &#8220;FlashCache&#8221;, &#8220;vCache with no time-based flushing&#8221;, and &#8220;vCache with time-based flushing.&#8221;  As the images demonstrate, the no-cache system is being crushed by IO wait.  FlashCache and vCache both show improvements, but it&#8217;s not until we get to vCache with the time-based flushing that we see some nice, predictable, constant performance.<br />
<a href="http://www.ssdperformanceblog.com/2013/05/virident-vcache-vs-flashcache-part-2/cpu-usage-all/" rel="attachment wp-att-354"><img src="http://www.ssdperformanceblog.com/wp-content/uploads/2013/04/cpu-usage-all-1024x768.png" alt="cpu-usage-all" width="625" height="468" class="aligncenter size-large wp-image-354" /></a></p>
<p>So why is it the case that vCache with time-based flushing appears to outperform all the rest?  My hypothesis here is that time-based flushing allows the backing store to be written to at a more constant and, potentially, submaximal, rate compared to dirty-page-threshold flushing, which kicks in at a given level and then attempts to flush as quickly as possible to bring the dirty pages back within acceptable bounds.  This is, however, only a hypothesis.</p>
<h3>vCache vs. FlashCache &#8211; dirty page threshold</h3>
<p>Finally, we examine the impact of a couple of different dirty-page ratios on device performance, since this is the only parameter which can be reliably varied between the two in the same way.  The following graph shows sysbench OLTP performance for FlashCache vs. vCache with a 10% dirty threshold versus the same metrics at a 50% dirty threshold.  Time-based flushing has been disabled.  In this case, both systems produce better performance when the dirty-page threshold is set to 50%, but once again, vCache at 10% outperforms FlashCache at 10%.  </p>
<p><a href="http://www.ssdperformanceblog.com/2013/05/virident-vcache-vs-flashcache-part-2/vcache-dirty_trx_params/" rel="attachment wp-att-357"><img src="http://www.ssdperformanceblog.com/wp-content/uploads/2013/04/vcache-dirty_trx_params.png" alt="vcache-dirty_trx_params" width="640" height="480" class="aligncenter size-full wp-image-357" /></a></p>
<p>The one interesting item here is that vCache actually appears to get *better* over time; I&#8217;m not entirely sure why that&#8217;s the case or at what point the performance is going to level off since these tests were all run for 2 hours anyway, but I think the overall results still speak for themselves, and even with a vCache volume where the dirty ratio is only 10%, such as might be the case where a deployment has a massive data set size in relation to both the working set and the cache device size, the numbers are encouraging.</p>
<h3>Conclusion</h3>
<p>Overall, the I think the graphs speak for themselves. When the working set outstrips the available buffer pool memory but still fits into the cache device, vCache shines. Compared to a deployment with no SSD cache whatsoever, FlashCache still does quite well, massively outperforming the HDD-only setup, but it doesn&#8217;t even really come close to the numbers obtained with vCache. There may be ways to adjust the FlashCache configuration to produce better or more consistent results, or results that are more inline with the numbers put up by vCache, but when we consider that overall usability was one of the evaluation points and combine that with the fact that the best vCache performance results were obtained with the default vCache configuration, I think vCache can be declared the clear winner.</p>
<h3>Base MySQL &#038; Benchmark Configuration</h3>
<p>All benchmarks were conducted with the following:<br />
<code><br />
sysbench ­­--num­-threads=32 ­­--test=tests/db/oltp.lua ­­--oltp_tables_count=32 \<br />
--oltp­-table­-size=10000000 ­­--rand­-init=on ­­--report­-interval=1 ­­--rand­-type=pareto \<br />
--forced­-shutdown=1 ­­--max­-time=7200 ­­--max­-requests=0 ­­--percentile=95 ­­\<br />
--mysql­-user=root --mysql­-socket=/tmp/mysql.sock ­­--mysql­-table­-engine=innodb ­­\<br />
--oltp­-read­-only=off run<br />
</code></p>
<p>The base MySQL configuration (configuration A) appears below:<br />
<code><br />
#####fixed innodb options <br />
innodb_file_format = barracuda <br />
innodb_buffer_pool_size = 4G <br />
innodb_file_per_table = true <br />
innodb_data_file_path = ibdata1:100M<br />
innodb_flush_method = O_DIRECT <br />
innodb_log_buffer_size = 128M <br />
innodb_flush_log_at_trx_commit = 1 <br />
innodb_log_file_size = 1G <br />
innodb_log_files_in_group = 2 <br />
innodb_purge_threads = 1 <br />
innodb_fast_shutdown = 1 <br />
#not innodb options (fixed) <br />
back_log = 50 <br />
wait_timeout = 120 <br />
max_connections = 5000 <br />
max_prepared_stmt_count=500000 <br />
max_connect_errors = 10 <br />
table_open_cache = 10240 <br />
max_allowed_packet = 16M <br />
binlog_cache_size = 16M <br />
max_heap_table_size = 64M <br />
sort_buffer_size = 4M <br />
join_buffer_size = 4M <br />
thread_cache_size = 1000 <br />
query_cache_size = 0 <br />
query_cache_type = 0 <br />
ft_min_word_len = 4 <br />
thread_stack = 192K <br />
tmp_table_size = 64M <br />
server­id = 101 <br />
key_buffer_size = 8M <br />
read_buffer_size = 1M <br />
read_rnd_buffer_size = 4M <br />
bulk_insert_buffer_size = 8M <br />
myisam_sort_buffer_size = 8M <br />
myisam_max_sort_file_size = 10G <br />
myisam_repair_threads = 1 <br />
myisam_recover <br />
</code></p>
]]></content:encoded>
			<wfw:commentRss>http://www.ssdperformanceblog.com/2013/05/virident-vcache-vs-flashcache-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Virident vCache vs. FlashCache: Part 1</title>
		<link>http://www.ssdperformanceblog.com/2013/05/virident-vcache-vs-flashcache-part-1/</link>
		<comments>http://www.ssdperformanceblog.com/2013/05/virident-vcache-vs-flashcache-part-1/#comments</comments>
		<pubDate>Wed, 01 May 2013 23:53:05 +0000</pubDate>
		<dc:creator>Ernie Souhrada</dc:creator>
				<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[PCIe]]></category>
		<category><![CDATA[Virident]]></category>
		<category><![CDATA[Ernie Souhrada]]></category>
		<category><![CDATA[flashcache]]></category>
		<category><![CDATA[flashmax II]]></category>
		<category><![CDATA[vCache]]></category>
		<category><![CDATA[virident]]></category>

		<guid isPermaLink="false">http://www.ssdperformanceblog.com/?p=337</guid>
		<description><![CDATA[This is part one of a two part series. Over the past few weeks I have been looking at a preview release of Virident&#8217;s vCache software, which is a kernel module and set of utilities designed to provide functionality similar to that of FlashCache. In particular, Virident engaged Percona to do a usability and feature-set [...]]]></description>
				<content:encoded><![CDATA[<p>This is part one of a two part series.</p>
<p>Over the past few weeks I have been looking at a preview release of Virident&#8217;s vCache software, which is a kernel module and set of utilities designed to provide functionality similar to that of FlashCache.  In particular, Virident engaged Percona to do a usability and feature-set comparison between vCache and FlashCache and also to conduct some benchmarks for the use case where the MySQL working set is significantly larger than the InnoDB buffer pool (thus leading to a lot of buffer pool disk reads) but still small enough to fit into the cache device.  In this post and the next, I&#8217;ll present some of those results.</p>
<p>Disclosure: The research and testing for this post series was sponsored by Virident.</p>
<p>Usability is, to some extent, a subjective call, as I may have preferences for or against a certain mode of operation that others may not share, so readers may have a different opinion than mine, but on this point I call it an overall draw between vCache and FlashCache.  </p>
<p>Ease of basic installation.  The setup process was simply a matter of installing two RPMs and running a couple of commands to enable vCache on the PCIe flash card (a Virident FlashMAX II) and set up the cache device with the command-line utilities supplied with one of the RPMs.  Moreover, the vCache software is built in to the Virident driver, so there is no additional module to install.  FlashCache, on the other hand, requires building a separate kernel module in addition to whatever flash memory driver you&#8217;ve already had to install, and then further configuration requires modification to assorted sysctls.  I would also argue that the vCache documentation is superior.  <strong>Winner: vCache.</strong></p>
<p>Ease of post-setup modification / advanced installation.  Many of the FlashCache device parameters can be easily modified by echoing the desired value to the appropriate sysctl setting; with vCache, there is a command-line binary which can modify many of the same parameters, but doing so requires a cache flush, detach, and reattach.  <strong>Winner: FlashCache.</strong></p>
<p>Operational Flexibility: Both solutions share many features here; both of them allow whitelisting and blacklisting of PIDs or simply running in a &#8220;cache everything&#8221; mode.  Both of them have support for not caching sequential IO, adjusting the dirty page threshold, flushing the cache on demand, or having a time-based cache flushing mechanism, but some of these features operate differently with vCache than with FlashCache.  For example, when doing a manual cache flush with vCache, this is a blocking operation.  With FlashCache, echoing &#8220;1&#8243; to the do_sync sysctl of the cache device triggers a cache flush, but it happens in the background, and while countdown messages are written to syslog as the operation proceeds, the device never reports that it&#8217;s actually finished.  I think both kinds of flushing are useful in different situations, and I&#8217;d like to see a non-blocking background flush in vCache, but if I had to choose one or the other, I&#8217;ll take blocking and modal over fire-and-forget any day.  FlashCache does have the nice ability to switch between FIFO and LRU for its flushing algorithm; vCache does not.  This is something that could prove useful in certain situations.  <strong>Winner: FlashCache.</strong></p>
<p>Operational Monitoring: Both solutions offer plenty of statistics; the main difference is that FlashCache stats can be pulled from /proc but vCache stats have to be retrieved by running the vgc-vcache-monitor command.  Personally, I prefer &#8220;cat /proc/something&#8221; but I&#8217;m not sure that&#8217;s sufficient to award this category to FlashCache.  <strong>Winner: None.</strong></p>
<p>Time-based Flushing: This wouldn&#8217;t seem like it should be a separate category, but because the behavior seems to be so different between the two cache solutions, I&#8217;m listing it here.  The vCache manual indicates that “flush period” specifies the time after which dirty blocks will be written to the backing store, whereas FlashCache has a setting called “fallow_delay”, defined in the documentation as the time period before “idle” dirty blocks are cleaned from the cache device. It is not entirely clear whether or not these mechanisms operate in the same fashion, but based on the documentation, it appears that they do not.  I find the vCache implementation more useful than the one present in FlashCache.  <strong>Winner: vCache.</strong></p>
<p>Although nobody likes a tie, if you add up the scores, usability is a 2-2-1 draw between vCache and FlashCache.  There are things that I really liked better with FlashCache, and there are other things that I thought vCache did a much better job with.  If I absolutely must pick a winner in terms of usability, then I&#8217;d give a slight edge to FlashCache due to configuration flexibility, but if the GA release of vCache added some of FlashCache&#8217;s additional configuration options and exposed statistics via /proc, I&#8217;d vote in the other direction.</p>
<p>Stay tuned for part two of this series, wherein we&#8217;ll take a look at some benchmarks.  There&#8217;s no razor-thin margin of victory for either side here: vCache outperforms FlashCache by a landslide.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ssdperformanceblog.com/2013/05/virident-vcache-vs-flashcache-part-1/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Testing the Micron P320h</title>
		<link>http://www.ssdperformanceblog.com/2013/04/testing-the-micron-p320h/</link>
		<comments>http://www.ssdperformanceblog.com/2013/04/testing-the-micron-p320h/#comments</comments>
		<pubDate>Fri, 12 Apr 2013 20:05:24 +0000</pubDate>
		<dc:creator>Vadim Tkachenko</dc:creator>
				<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[Micron]]></category>
		<category><![CDATA[PCIe]]></category>
		<category><![CDATA[SLC]]></category>
		<category><![CDATA[P320h]]></category>

		<guid isPermaLink="false">http://www.ssdperformanceblog.com/?p=310</guid>
		<description><![CDATA[The Micron P320h SSD is an SLC-based PCIe solid-state storage device which claims to provide the highest read throughput of any server-grade SSD, and at Micron&#8217;s request, I recently took some time to put the card through its paces, and the numbers are indeed quite impressive. For reference, the benchmarks for this device were performed [...]]]></description>
				<content:encoded><![CDATA[<p>The Micron P320h SSD is an SLC-based PCIe solid-state storage device which claims to provide the highest read throughput of any server-grade SSD, and at Micron&#8217;s request, I recently took some time to put the card through its paces, and the numbers are indeed quite impressive.</p>
<p>For reference, the benchmarks for this device were performed primarily on a Dell R710 with 192GB of RAM and two Xeon E5-2660 processors that yield a total of 32 virtual cores.  This is the same machine which was used in my previous benchmark run.  A small handful of additional tests were also performed using the <a href="http://www.percona.com/docs/wiki/benchmark:hardware:cisco_ucs_c250" target="_blank">Cisco UCS C250</a>. The operating system in use was CentOS 6.3, and for the sysbench fileIO tests, the EXT4 filesystem was used.  The card itself is the 700GB model.</p>
<p>So let&#8217;s take a look at the data.</p>
<p>With the sysbench fileIO test in asynchronous mode, read performance is an extremely steady <strong>3202MiB/sec</strong> with almost no deviation. Write performance is also both very strong and very steady, coming in at a bit over <strong>1730MiB/sec</strong> with a standard deviation of a bit less than 13MiB/sec.</p>
<p><a href="http://www.ssdperformanceblog.com/2013/04/testing-the-micron-p320h/realssd-asyncio/" rel="attachment wp-att-311"><img class="aligncenter size-large wp-image-311" alt="realssd-asyncIO" src="http://www.ssdperformanceblog.com/wp-content/uploads/2013/03/realssd-asyncIO-791x1024.png" width="625" height="809" /></a></p>
<p>When we calculate in the fact that the block size in use here is 16KiB, these numbers equate to over <strong>110,000 write IOPS</strong> and almost <strong>205,000 read IOPS.</strong></p>
<p>When we switch over to synchronous IO, we find that the card is quite capable of matching the asynchronous performance:</p>
<p><a href="http://www.ssdperformanceblog.com/2013/04/testing-the-micron-p320h/syncio-throughput/" rel="attachment wp-att-312"><img class="aligncenter size-full wp-image-312" alt="syncIO-throughput" src="http://www.ssdperformanceblog.com/wp-content/uploads/2013/03/syncIO-throughput.png" width="611" height="859" /></a></p>
<p>Synchronous read reaches peak capacity somewhere between 32 and 64 threads, and synchronous write tops out somewhere between 64 and 128 threads. The latency numbers are equally impressive; the next two graphs show 95th and 99th-percentile response time, but there really isn&#8217;t much difference between the two.</p>
<p><a href="http://www.ssdperformanceblog.com/2013/04/testing-the-micron-p320h/syncio-latency/" rel="attachment wp-att-316"><img class="aligncenter size-large wp-image-316" alt="syncIO-latency" src="http://www.ssdperformanceblog.com/wp-content/uploads/2013/03/syncIO-latency-1024x574.png" width="625" height="350" /></a></p>
<p>At 64 read threads, we reach peak performance with latency of roughly <strong>0.5 milliseconds</strong>; and at 128 write threads we have maximum throughput with latency just over <strong>3ms</strong>.</p>
<p>How well does it perform with MySQL?  Exact results vary, depending upon the usual factors (read/write ratio, working set size, buffer pool size, etc.) but overall the card is extremely quick and handily outperforms the other cards that it was tested against.  For example, in the graph below we compare the performance of the P320h on a standard TPCC-MySQL test to the original FusionIO and the Intel i910 with assorted buffer pool sizes:</p>
<p><a href="http://www.ssdperformanceblog.com/2013/04/testing-the-micron-p320h/tpcc-mysql-devicecompare/" rel="attachment wp-att-319"><img src="http://www.ssdperformanceblog.com/wp-content/uploads/2013/03/tpcc-mysql-devicecompare.png" alt="tpcc-mysql-devicecompare" width="502" height="498" class="aligncenter size-full wp-image-319" /></a></p>
<p>And in this graph we look at the card&#8217;s performance on sysbench OLTP:</p>
<p><a href="http://www.ssdperformanceblog.com/2013/04/testing-the-micron-p320h/sysbench-oltp-ext4xfs/" rel="attachment wp-att-317"><img class="aligncenter size-full wp-image-317" alt="sysbench-oltp-ext4xfs" src="http://www.ssdperformanceblog.com/wp-content/uploads/2013/03/sysbench-oltp-ext4xfs.png" width="602" height="753" /></a></p>
<p>It is worth noting here that EXT4 outperforms XFS by a fairly significant margin.  The approximate raw numbers, in tabular format, are:</p>
<table border="1" width="198">
<tr>
<th>-</th>
<th>EXT4</th>
<th>XFS</th>
</tr>
<tr>
<td>13GiB BP</td>
<td>22000</td>
<td>7500</td>
</tr>
<tr>
<td>25GiB BP</td>
<td>17000</td>
<td>9000</td>
</tr>
<tr>
<td>50GiB BP</td>
<td>21000</td>
<td>11000</td>
</tr>
<tr>
<td>75GiB BP</td>
<td>25000</td>
<td>15000</td>
</tr>
<tr>
<td>100GiB BP</td>
<td>31000</td>
<td>19000</td>
</tr>
<tr>
<td>125GiB BP</td>
<td>36000</td>
<td>25000</td>
</tr>
</table>
<p>In the final analysis, there may or may not be faster cards out there, but the Micron P320h is the fastest one that I have personally seen to date.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ssdperformanceblog.com/2013/04/testing-the-micron-p320h/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Testing the Virident FlashMax II</title>
		<link>http://www.ssdperformanceblog.com/2013/03/testing-virident-flashmax-ii/</link>
		<comments>http://www.ssdperformanceblog.com/2013/03/testing-virident-flashmax-ii/#comments</comments>
		<pubDate>Mon, 18 Mar 2013 10:53:19 +0000</pubDate>
		<dc:creator>Ernie Souhrada</dc:creator>
				<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[MLC]]></category>
		<category><![CDATA[PCIe]]></category>
		<category><![CDATA[Virident]]></category>

		<guid isPermaLink="false">http://www.ssdperformanceblog.com/?p=277</guid>
		<description><![CDATA[Approximately 11 months ago, Vadim reported some test results from the Virident FlashMax 1400M, an MLC PCIe SSD device. Since that time, Virident has released the FlashMAX II, which promises both increased capacity and increased performance over the previous model. In this post, we present some benchmark results comparing this new model to its predecessor, [...]]]></description>
				<content:encoded><![CDATA[<p>Approximately 11 months ago, <a href="http://www.ssdperformanceblog.com/2012/05/testing-virident-flashmax-1400">Vadim reported some test results from the Virident FlashMax 1400M</a>, an MLC PCIe SSD device. Since that time, Virident has released the FlashMAX II, which promises both increased capacity and increased performance over the previous model. In this post, we present some benchmark results comparing this new model to its predecessor, and we find that indeed, the FlashMax II is a significant upgrade.</p>
<p>For reference, all of the FlashMax II benchmarks were performed on a Dell R710 with 192GB of RAM. This is a dual-socket Xeon E5-2660 machine with 16 physical and 32 virtual cores. (I had originally planned to use the <a href="http://www.percona.com/docs/wiki/benchmark:hardware:cisco_ucs_c250" target="_blank">Cisco UCS C250</a> that is often used for our test runs, but that machine ran into some unrelated hardware difficulties and was ultimately unavailable.)  The operating system in use was CentOS 6.3, and the filesystem used for the test was XFS, mounted with both the noatime,nodiratime options.  The card was physically formatted back to factory default settings in between the synchronous and asynchronous test suites. Note that factory default settings for the FlashMax II will cause it to be formatted in &#8220;maxcapacity&#8221; mode rather than &#8220;maxperformance&#8221; mode (maxperformance reserves some additional space internally to provide better write performance). In &#8220;maxcapacity&#8221; mode, the device tested provides approximately 2200GB of space. In &#8220;maxperformance&#8221; mode, it&#8217;s a bit less than 1900GB.</p>
<p>Without further ado, then, here are the numbers.</p>
<p>First, asynchronous random writes:</p>
<p><a href="http://www.ssdperformanceblog.com/2013/03/testing-virident-flashmax-ii-part-1/async-rndwr-warmup-lg/" rel="attachment wp-att-299"><img class="aligncenter size-full wp-image-299" alt="async-rndwr-warmup-lg" src="http://www.ssdperformanceblog.com/wp-content/uploads/2013/03/async-rndwr-warmup-lg.png" width="942" height="1109" /></a></p>
<p>There is a warmup period of around 18 minutes or so, and after about 45 minutes the performance stabilizes and remains effectively constant, as shown by the next graph.</p>
<p><a href="http://www.ssdperformanceblog.com/2013/03/testing-virident-flashmax-ii-part-1/async-rndwr-8-64-lg/" rel="attachment wp-att-288"><img class="aligncenter size-full wp-image-288" alt="async-rndwr-8-64-lg" src="http://www.ssdperformanceblog.com/wp-content/uploads/2013/03/async-rndwr-8-64-lg.png" width="719" height="736" /></a></p>
<p>Once the write performance reaches equilibrium, it does so at just under <strong>780MiB/sec</strong>, which is approximately <strong>40% higher</strong> than the 550MiB/sec exhibited by the FlashMax 1400M.</p>
<p>Asynchronous random read is up next:</p>
<p><a href="http://www.ssdperformanceblog.com/2013/03/testing-virident-flashmax-ii-part-1/async-rndrd-128-lg/" rel="attachment wp-att-289"><img class="aligncenter size-full wp-image-289" alt="async-rndrd-128-lg" src="http://www.ssdperformanceblog.com/wp-content/uploads/2013/03/async-rndrd-128-lg.png" width="942" height="1109" /></a></p>
<p>The behavior of the FlashMax II is very similar to that of the FlashMax 1400M in terms of predictable performance; the standard deviation on the asynchronous random read throughput measurement is only 5.7MiB/sec.  However, the overall read throughput is over <strong>1000MiB/sec better with the FlashMax II</strong>: we see a read throughput of approximately <strong>2580MiB/sec</strong> vs. 1450MiB/sec with the previous generation of hardware, an <strong>improvement of roughly 80%</strong>.</p>
<p>Finally, we take a look at synchronous random read.</p>
<p><a href="http://www.ssdperformanceblog.com/2013/03/testing-virident-flashmax-ii-part-1/sync-rndrd-128-lg/" rel="attachment wp-att-291"><img class="aligncenter size-full wp-image-291" alt="sync-rndrd-128-lg" src="http://www.ssdperformanceblog.com/wp-content/uploads/2013/03/sync-rndrd-128-lg.png" width="942" height="1109" /></a>At 256 threads, read throughput tops out at <strong>2090MiB/sec</strong>, which is about <strong>20% less</strong> than the asynchronous results; given the small bump in throughput going from 128 to 256 threads and the doubling of latency that was also introduced there, this is likely about as good as it is going to get.</p>
<p>For comparison, the FlashMax 1400M synchronous random read test stopped after 64 threads, reaching a synchronous random read throughput of 1345MiB/sec and a 95th-percentile latency of 1.49ms.  With those same 64 threads, the FlashMax II reaches <strong>1883MiB/sec</strong> with a 95th-percentile latency of <strong>1.105ms</strong>.  This represents approximately <strong>40% more throughput, 25% faster.</strong></p>
<p>In every area tested, the FlashMax II outperforms the original FlashMax 1400M by a significant margin, and can be considered a worthy successor.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ssdperformanceblog.com/2013/03/testing-virident-flashmax-ii/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>On SSDs &#8211; Lifespans, Health Measurement and RAID</title>
		<link>http://www.ssdperformanceblog.com/2012/10/on-ssds-lifespans-health-measurement-and-raid/</link>
		<comments>http://www.ssdperformanceblog.com/2012/10/on-ssds-lifespans-health-measurement-and-raid/#comments</comments>
		<pubDate>Fri, 12 Oct 2012 16:51:46 +0000</pubDate>
		<dc:creator>ovais.tariq</dc:creator>
				<category><![CDATA[Intel]]></category>
		<category><![CDATA[MLC]]></category>
		<category><![CDATA[ssd]]></category>
		<category><![CDATA[Available_Reservd_Space]]></category>
		<category><![CDATA[endurance]]></category>
		<category><![CDATA[erasure]]></category>
		<category><![CDATA[health]]></category>
		<category><![CDATA[intel]]></category>
		<category><![CDATA[Media_Wearout_Indicator]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[RAID]]></category>
		<category><![CDATA[S.M.A.R.T.]]></category>
		<category><![CDATA[smartctl]]></category>

		<guid isPermaLink="false">http://www.ssdperformanceblog.com/?p=269</guid>
		<description><![CDATA[Solid State Drive (SSD) have made it big and have made their way not only in desktop computing but also in mission-critical servers. SSDs have proved to be a break-through in IO performance and leave HDD far far behind in terms of Random IO performance. Random IO is what most of the database administrators would [...]]]></description>
				<content:encoded><![CDATA[<p>Solid State Drive (SSD) have made it big and have made their way not only in desktop computing but also in mission-critical servers. SSDs have proved to be a break-through in IO performance and leave HDD far far behind in terms of Random IO performance. Random IO is what most of the database administrators would be concerned about as that is 90% of the IO pattern visible on database servers like MySQL. I have found Intel 520-series and Intel 910-series to be quite popular and they do give very good numbers in terms of Random IOPS. However, its not just performance that you should be concerned about, failure predictions and health gauges are also very important, as loss of data is a big NO-NO. There is a great deal of misconception about the endurance level of SSD, as its mostly compared to rotating disks even when measuring endurance levels, however, there is a big difference in how both SSD and HDD work, and that has a direct impact on the endurance level of SSD. </p>
<p>I will mostly be taling about MLC SSD, now let&#8217;s start off with a SSD primer.</p>
<h3>SSD Primer</h3>
<p>The smallest unit of SSD storage that can be read or written to is a page which is typically 4KB or 8KB in size. These pages are typically organized into blocks which are between 256KB or 1MB in size. SSDs have no mechanical parts and no heads or anything and their is no seeks needed as in conventional rotating disks. Reads involve reading pages from the SSD, however its the writes that are more tricky. Once you write to a page on SSD, you cannot simply overwrite (if you want to write new data) it in the same way you do with a HDD. Instead, you must erase the contents and then write again. However, a SSD can only do erasures at the block level and not the page level. What this means is that the SSD must relocate any valid data in the block to be erased, before the block can be erased and have new data written to it. To summarize, writes mean erase+write. Nowadays, SSD controllers are intelligent and do erasures in the background, so that the latency of the write operation is not affected. These background erasures are typically done within a process known garbage collection. You can imagine if these erasures were not done in the background, then writes would be too slow.</p>
<p>Of course every SSD has a lifespan after which it can be seen as unusable, let&#8217;s see what factors matter here.</p>
<h3>SSD Lifespans</h3>
<p>The lifespan of blocks that make up a SSD is really the number of times erasures and writes can be performed on those blocks. The lifespan is measure in terms of erase/write cycles. Typically enterprise grade MLC SSDs have a lifespan of about 30000 erase/write cycles, while consumer grade MLC SSD have a life span of 5000 to 10000 erase/write cycles. This fact makes it clear that the lifespan of a SSD depends on how much time it is written to. If you have a write-intensive workload then you should expect the SSD to fail much more quickly, in comparison to a read-heavy workload. This is by design.<br />
To offset this behaviour of writes reducing the life of a SSD, engineers use two techniques, wear-levelling and over-provisioning. Wear-levelling works by making sure that all the blocks in a SSD are erased and written to in a evenly distributed fashion, this makes sure that some blocks do not die quickly then other blocks. Over-provisioning SSD capacity is one another technique that increases SSD endurance. This is accomplished by having a large population of blocks to distribute erases and writes over time (bigger capacity SSD), and by providing a large spare area. Many SSD models over provision the space, for example a 80GB SSD could have 10GB of over-provisioned space, so that while it is actually 90GB in size it is reported as a 80GB SSD. While this over-provisioning is done by the SSD manufacturers, this can also be done by not utilising the entire SSD, for example partitioning the SSD in such a way that you only partition about 75% to 80% of the SSD and leave the rest as RAW space that is not visible to the OS/filesystem. So while over-provisioning takes away some part of the disk capacity, it gives back in terms of increased endurance and performance.</p>
<p>Now comes the important part of the post that I would like to discuss.</p>
<h3>Health Measurement and failure predictability</h3>
<p>As you may have noticed after reading the above part of this post, its all the more important to be able to predict when a SSD would fail and to be able to see health related information about the SSD. Yet I haven&#8217;t found much written about how to gauge the health of a SSD. RAID controllers employed with SSD tend to be very limited in terms of the amount of information that they provide about an SSD that could allow predicting when a SSD could fail. However, most of the SSD provide a lot of information via S.M.A.R.T. and this can be leveraged to good affect.<br />
Let&#8217;s consider the example of Intel SSD, these SSD have to S.M.A.R.T. attributes that can be leveraged to predict when the SSD would fail. These attributes are:</p>
<ul>
<li>Available_Reservd_Space: This attribute reports the number of reserve blocks remaining. The value of the attribute starts at 100, which means that the reserved space is 100 percent available. The threshold value for this attribute is 10 which means 10 percent availability, which indicates that the drive is close to its end of life.</li>
<li>Media_Wearout_Indicator: This attribute reports the number of erase/write cycles the NAND media has performed. The value of the attribute decreases from 100 to 1, as the average erase cycle count increases from 0 to the maximum rated cycles. Once the value of this attribute reaches 1, the number will not decrease, although it is likely that significant additional wear can be put on the device. A value of 1 should be thought of as the threshold value for this attribute.</li>
</ul>
<p>Using the <a href="http://smartmontools.sourceforge.net/man/smartctl.8.html">smartctl</a> tool (part of the smartmontools package) we can very easily read the values of these attributes and then use it to predict failures. For example for SATA SSD drives attached to a LSI Megaraid controller, we could very easily read the values of those attributes using the following bash snippet:</p>
<pre class="mysql">
Available_Reservd_Space_current=$(smartctl -d sat+megaraid,${device_id} -a /dev/sda | grep "Available_Reservd_Space" | awk '{print $4}')
Media_Wearout_Indicator_current=$(smartctl -d sat+megaraid,${device_id} -a /dev/sda | grep "Media_Wearout_Indicator" | awk '{print $4}') 
</pre>
<p>Then the above information can be used in different fashions, we could raise an alert if its nearing the threshold value, or measure how quickly the values decrease and then use the rate of decrease to estimate when the drive could fail.</p>
<h3>SSDs and RAID levels</h3>
<p> RAID have been typically with HDD used for data protection via redundancy and for increased performance, and they have found their use with SSD as well. Its common to see RAID level 5 or 6 being used with SSD on mixed read/write workloads, because the write penalty visible by using these level with rotating disks, is not of that extent when talking about SSD because there is no disk seek involved, so the read-modify-write cycle typically involved with parity based RAID levels does not cause a lot of performance hit. On the other hand striping and mirroring does improve the read performance of the SSD a lot and redundant arrays using SSD deliver far better performance as compared to HDD arrays.<br />
But what about data protection? Do the parity-based RAID levels and mirroring provide the same level of data protection for SSDs as they are thought of? I am skeptical about that, because as I have mentioned above the endurance of a SSD depends a lot on how much it has been written to. In parity-based RAID configurations, a lot of extra writes are generated because of parity changes and they of course decrease the lifespan of the SSD, similarly in the case of mirroring, I am not sure it can provide any benefit in case of wearing out of SSD, if both the SSD in the mirror configuration have the same age, why? Because in mirroring both the SSDs in the array would be receiving the same amount of writes and hence the lifespan would decrease at the same amount of time.<br />
I would think that there is some drastic changes that are needed to the thought process when thinking of data protection and RAID levels, because for me parity-based configuration or mirroring configuration are not going to provide any extra data protection in cases where the SSD used are of similar ages. It might actually be a good idea to periodically replace drives with younger ones so as to make sure that all the drives do not age together.</p>
<p>I would like to know what my readers think!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ssdperformanceblog.com/2012/10/on-ssds-lifespans-health-measurement-and-raid/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Intel SSD 910 vs HDD RAID in tpcc-mysql benchmark</title>
		<link>http://www.ssdperformanceblog.com/2012/09/intel-ssd-910-vs-hdd-raid-in-tpcc-mysql-benchmark/</link>
		<comments>http://www.ssdperformanceblog.com/2012/09/intel-ssd-910-vs-hdd-raid-in-tpcc-mysql-benchmark/#comments</comments>
		<pubDate>Wed, 12 Sep 2012 04:44:36 +0000</pubDate>
		<dc:creator>Vadim Tkachenko</dc:creator>
				<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[Intel]]></category>
		<category><![CDATA[MLC]]></category>

		<guid isPermaLink="false">http://www.ssdperformanceblog.com/?p=261</guid>
		<description><![CDATA[I continue my benchmarks of Intel SSD 910, previous time I compared it with Fusion-io ioDrive http://www.mysqlperformanceblog.com/2012/09/07/intel-ssd-910-in-tpcc-mysql-benchmark/. Now I want to test this card against RAID over spinning disks. Benchmark date: Sep-2012 Benchmark goal: Test Intel SSD 910 under tpcc-mysql workload and compare with HDD RAID10 Hardware specification Server: Dell PowerEdge R710 CPU: 2x Intel(R) [...]]]></description>
				<content:encoded><![CDATA[<p>I continue my benchmarks of Intel SSD 910, previous time I compared it with Fusion-io ioDrive <a href="http://www.mysqlperformanceblog.com/2012/09/07/intel-ssd-910-in-tpcc-mysql-benchmark/">http://www.mysqlperformanceblog.com/2012/09/07/intel-ssd-910-in-tpcc-mysql-benchmark/</a>. Now I want to test this card against RAID over spinning disks.</p>
<p><span id="more-261"></span></p>
<li>Benchmark date: Sep-2012
</li>
<li>Benchmark goal: Test Intel SSD 910 under tpcc-mysql workload and compare with HDD RAID10 </li>
<li>Hardware specification
<ul>
<li>Server: Dell PowerEdge R710</li>
<li>CPU: 2x Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz</li>
<li>Memory: 192GB</li>
<li>Storage: Hardware RAID10 over 8 disks, card: Perc H710, disks: Seagate ST9750420AS 750GB, 2.5&#8243;, 7200RPM, 16MB, SATA. Intel SSD 910 (software RAID over 2x200GB devices)</li>
<li>Filesystem: ext4</li>
</ul>
</li>
<li>Software
<ul>
<li>OS: Ubuntu 12.04.1</li>
<li>MySQL Version: Percona Server 5.5.27-28.1</li>
</ul>
</li>
<li>Benchmark specification
<ul>
<li>Benchmark name: tpcc-mysql</li>
<li>Scale factor: 2500W (~250GB of data)</li>
<li>Benchmark length: 2h for SSD, 4h for HDD RAID, but the result is taken only for last 1h to remove warm-up phase</li>
</ul>
</li>
<li>Parameters to vary: we vary <strong>innodb_buffer_pool_size</strong>:<em>25, 50, 75GB</em> to have different memory/data ration. And we test it on two storages: HDD RAID10 and Intel SSD 910
</li>
<p><strong>Results</strong><br />
There is a jitter graph of Throughput taken every 10 sec:</p>
<p><a href="http://www.mysqlperformanceblog.com/wp-content/uploads/2012/09/res_10sec_thrp.png"><img src="http://www.mysqlperformanceblog.com/wp-content/uploads/2012/09/res_10sec_thrp.png" alt="" title="res_10sec_thrp" width="640" height="398" class="aligncenter size-full wp-image-11023" /></a></p>
<p>I put number of median throughput, so we can estimate a performance gain.</p>
<p>Or in text form:</p>
<table border=1>
<tr>
<td>BP size</td>
<td>HDD RAID</td>
<td>Intel SSD 910</td>
<td>Ratio (i910/raid)</td>
</tr>
<tr>
<td>25 GB</td>
<td>228</td>
<td>1620</td>
<td>7.1</td>
</tr>
<tr>
<td>50 GB</td>
<td>552</td>
<td>3182</td>
<td>5.76</td>
</tr>
<tr>
<td>75 GB</td>
<td>1094</td>
<td>5729</td>
<td>5.24</td>
</tr>
</table>
<p>So gain is in 5-7x range, which is quite decent.</p>
<p>One thing to pay attention is a density of results. In case with RAID it is much more dense.<br />
So I build a graph where throughput is shown every second:</p>
<p><a href="http://www.mysqlperformanceblog.com/wp-content/uploads/2012/09/res_1sec_thrp.png"><img src="http://www.mysqlperformanceblog.com/wp-content/uploads/2012/09/res_1sec_thrp.png" alt="" title="res_1sec_thrp" width="640" height="398" class="aligncenter size-full wp-image-11022" /></a></p>
<p>The variation of throughput with Intel SSD 910 is much bigger, though I am not totally sure what is the main contributor into that: the card of itself<br />
or MySQL internals + flushing logic.</p>
<p>Now, all these results are received with <strong>innodb_flush_log_at_trx_commit=2</strong>, which in comments to previous post was called cheating.<br />
So I ran another round with <strong>innodb_flush_log_at_trx_commit=1</strong> to see what kind of penalty to expect.<br />
<a href="http://www.mysqlperformanceblog.com/wp-content/uploads/2012/09/res_10sec_trx12.png"><img src="http://www.mysqlperformanceblog.com/wp-content/uploads/2012/09/res_10sec_trx12.png" alt="" title="res_10sec_trx12" width="640" height="398" class="aligncenter size-full wp-image-11024" /></a></p>
<p>There is some penalty of using <strong>innodb_flush_log_at_trx_commit=1</strong>, but it is not significant.</p>
<p><strong>Conclusion</strong></p>
<p>In conclusion I see that for its price (around $2000 on date of publishing) Intel SSD 910 handles MySQL workload quite well, I did not face any problem working with this card. I think Intel SSD 910 is suitable to use with MySQL / Percona Server, especially if you are looking for quick performance boost in IO heavy workload.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ssdperformanceblog.com/2012/09/intel-ssd-910-vs-hdd-raid-in-tpcc-mysql-benchmark/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Intel SSD 910 in tpcc-mysql benchmark</title>
		<link>http://www.ssdperformanceblog.com/2012/09/intel-ssd-910-in-tpcc-mysql-benchmark/</link>
		<comments>http://www.ssdperformanceblog.com/2012/09/intel-ssd-910-in-tpcc-mysql-benchmark/#comments</comments>
		<pubDate>Fri, 07 Sep 2012 16:43:14 +0000</pubDate>
		<dc:creator>Vadim Tkachenko</dc:creator>
				<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[Intel]]></category>
		<category><![CDATA[MLC]]></category>

		<guid isPermaLink="false">http://www.ssdperformanceblog.com/?p=255</guid>
		<description><![CDATA[I continue my benchmarks of Intel SSD 910, the raw IO results are available in my previous experiment. Now I want to test this card under MySQL workload to see if the card is suitable to use with MySQL. Benchmark date: Sep-2012 Benchmark goal: Test Intel SSD 910 under tpcc-mysql workload and compare with baseline [...]]]></description>
				<content:encoded><![CDATA[<p>I continue my benchmarks of Intel SSD 910, the raw IO results are available in <a href="http://www.mysqlperformanceblog.com/2012/09/04/testing-intel-ssd-910/">my previous experiment</a>. Now I want to test this card under MySQL workload to see if the card is suitable to use with MySQL.</p>
<li>Benchmark date: Sep-2012
</li>
<li>Benchmark goal: Test Intel SSD 910 under tpcc-mysql workload and compare with baseline Fusion-io ioDrive card
</li>
<li>Hardware specification
<ul>
<li>Server: Dell PowerEdge R710</li>
<li>CPU: 2x Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz</li>
<li>Memory: 192GB</li>
<li>Storage: Fusion-io ioDrive 640GB, Intel SSD 910 (software RAID over 2x200GB devices)</li>
<li>Filesystem: ext4</li>
</ul>
</li>
<li>Software
<ul>
<li>OS: Ubuntu 12.04.1</li>
<li>MySQL Version: Percona Server 5.5.27-28.1</li>
</ul>
</li>
<li>Benchmark specification
<ul>
<li>Benchmark name: tpcc-mysql</li>
<li>Scale factor: 2500W (~250GB of data)</li>
<li>Benchmark length: 2h, but the result is taken only for last 1h to remove warm-up phase</li>
</ul>
</li>
<li>Parameters to vary: we vary <strong>innodb_buffer_pool_size</strong>:<em> 13, 25, 50, 75GB</em> to have different memory/data ration. And we test it on two storages: Fusion-io ioDrive and Intel SSD 910
</li>
<p><strong>Results</strong><br />
There is graph of Throughput taken every 10 sec:</p>
<p><a href="http://www.mysqlperformanceblog.com/wp-content/uploads/2012/09/intel910-res.png"><img src="http://www.mysqlperformanceblog.com/wp-content/uploads/2012/09/intel910-res.png" alt="" title="intel910-res" width="640" height="775" class="aligncenter size-full wp-image-10926" /></a></p>
<p>Jitter graph:<br />
<a href="http://www.mysqlperformanceblog.com/wp-content/uploads/2012/09/jitter-res.png"><img src="http://www.mysqlperformanceblog.com/wp-content/uploads/2012/09/jitter-res.png" alt="" title="jitter-res" width="640" height="398" class="aligncenter size-full wp-image-10934" /></a></p>
<p>Or to have final results I take total amount of transactions for 1h:</p>
<table border=1>
<tr>
<td>BP size</td>
<td>Fusion-io</td>
<td>Intel SSD 910</td>
<td>Ratio (fio/i910)</td>
</tr>
<tr>
<td>13 GB</td>
<td>397157</td>
<td>352750</td>
<td>1.13</td>
</tr>
<tr>
<td>25 GB</td>
<td>724011</td>
<td>497769</td>
<td>1.45</td>
</tr>
<tr>
<td>50 GB</td>
<td>1466559</td>
<td>1124223</td>
<td>1.30</td>
</tr>
<tr>
<td>75 GB</td>
<td>2464135</td>
<td>1939415</td>
<td>1.27</td>
</tr>
</table>
<p><strong>Conclusion</strong></p>
<p>In conclusion I see that Intel SSD 910 handles MySQL workload quite well, I did not face any problem working with this card.<br />
Level of stability of results is about the same as with Fusion-io card. The performance of Intel SSD 910 is about ~30% worse, but<br />
it is expected for this price level. I think Intel SSD 910 is suitable to use with MySQL / Percona Server.</p>
<p><strong>Link to raw results and stats</strong><br />
Raw results, config, OS and MySQL metrics are available from <a href="https://code.launchpad.net/~percona-dev/percona-benchmark-result/intel910-tpcc">Benchmarks Launchpad</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ssdperformanceblog.com/2012/09/intel-ssd-910-in-tpcc-mysql-benchmark/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Testing Intel® SSD 910</title>
		<link>http://www.ssdperformanceblog.com/2012/09/testing-intel-ssd-910/</link>
		<comments>http://www.ssdperformanceblog.com/2012/09/testing-intel-ssd-910/#comments</comments>
		<pubDate>Wed, 05 Sep 2012 03:15:27 +0000</pubDate>
		<dc:creator>Vadim Tkachenko</dc:creator>
				<category><![CDATA[Intel]]></category>
		<category><![CDATA[MLC]]></category>

		<guid isPermaLink="false">http://www.ssdperformanceblog.com/?p=235</guid>
		<description><![CDATA[Intel came on PCI-e SSD market with their Intel SSD 910 card. With a slogan &#8220;The ultimate data center SSD&#8221; I assume Intel targets rather a server grade hardware, not consumer level. I&#8217;ve got one of this card into our lab. I should say it is very price competitive, comparing with other enterprise level PCIe [...]]]></description>
				<content:encoded><![CDATA[<p>Intel came on PCI-e SSD market with their <a href="http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-910-series.html">Intel SSD 910 card</a>. With a slogan &#8220;The ultimate data center SSD&#8221; I assume Intel targets rather a server grade hardware, not consumer level.<br />
I&#8217;ve got one of this card into our lab. I should say it is very price competitive, comparing with other enterprise level PCIe vendors. For a 400GB card I paid $2100, which gives $5.25/GB. Of course I&#8217;ve got some performance numbers I&#8217;d like to share.</p>
<p>But before that, few words on the card internals. Intel puts separate 200GB modules, so 400GB card is visible as 2 x 200GB devices in operation system, and 800GB card is visible as 4 different devices. After that you can do software raid0, raid1 or raid10, whatever you prefer.</p>
<p>For my tests I used single 200GB device and pair combined in software raid0 (Duo).</p>
<p>For raw performance IO I follow scripts I used for other reviews, i.e. <a href="http://www.ssdperformanceblog.com/2012/05/testing-intel-ssd-520/">Testing Intel SSD 520</a></p>
<p>First results are for <em>asynchronous</em> writes:<br />
<a href="http://www.ssdperformanceblog.com/wp-content/uploads/2012/09/writes.async_.png"><img src="http://www.ssdperformanceblog.com/wp-content/uploads/2012/09/writes.async_.png" alt="" title="writes.async" width="640" height="398" class="aligncenter size-full wp-image-247" /></a></p>
<p>The result averages at <strong>150 MiB/sec</strong> for single device and at <strong>250 MiB/sec</strong> for Duo.<br />
I find it interesting, as on SATA based Intel 520 I was able to get <strong>300 MiB/sec</strong>.</p>
<p>Now <em>asynchronous</em> reads:</p>
<p><a href="http://www.ssdperformanceblog.com/wp-content/uploads/2012/09/reads.async_.png"><img src="http://www.ssdperformanceblog.com/wp-content/uploads/2012/09/reads.async_.png" alt="" title="reads.async" width="640" height="398" class="aligncenter size-full wp-image-244" /></a></p>
<p>The result line is quite stable and is <strong>270 MiB/sec</strong> for single drive, and <strong>530 MiB/sec</strong> for Duo.<br />
In the same workload for Intel 520 : <strong>370 MiB/sec</strong>.</p>
<p>Now we are getting to synchronous reads, to see how many threads we need to reach peak throughput and check corresponding response times:</p>
<p>Throughput:<br />
<a href="http://www.ssdperformanceblog.com/wp-content/uploads/2012/09/reads.sync_.thrp_.png"><img src="http://www.ssdperformanceblog.com/wp-content/uploads/2012/09/reads.sync_.thrp_.png" alt="" title="reads.sync.thrp" width="640" height="398" class="aligncenter size-full wp-image-245" /></a></p>
<p>Response time:<br />
<a href="http://www.ssdperformanceblog.com/wp-content/uploads/2012/09/reads.sync_.tr_.png"><img src="http://www.ssdperformanceblog.com/wp-content/uploads/2012/09/reads.sync_.tr_.png" alt="" title="reads.sync.tr" width="640" height="398" class="aligncenter size-full wp-image-246" /></a></p>
<p>I would say for single device the throughput peaking at 8 threads with 95% response time <strong>0.68ms</strong>, and for Duo at 16 threads with <strong>0.84ms</strong></p>
<p>In conclusion I can say that I have mixed feelings after this experiment. On the one hand the performance results are definitely lower than on alternative PCIe cards available on market, but on the other hand the price is absolutely attractive.</p>
<p>I am going to run more corresponding MySQL-based benchmarks to see how the card is compared to alternatives under database workload.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ssdperformanceblog.com/2012/09/testing-intel-ssd-910/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Intel 520 SSD in MySQL sysbench oltp benchmark</title>
		<link>http://www.ssdperformanceblog.com/2012/05/intel-520-ssd-in-mysql-sysbench-oltp-benchmark/</link>
		<comments>http://www.ssdperformanceblog.com/2012/05/intel-520-ssd-in-mysql-sysbench-oltp-benchmark/#comments</comments>
		<pubDate>Tue, 22 May 2012 00:58:44 +0000</pubDate>
		<dc:creator>Vadim Tkachenko</dc:creator>
				<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[Intel]]></category>
		<category><![CDATA[MLC]]></category>

		<guid isPermaLink="false">http://www.ssdperformanceblog.com/?p=233</guid>
		<description><![CDATA[In my raw IO benchmark of Intel 520 SSD we saw that the drive does not provide uniform throughput and response time, but it is interesting how does it affect workload if it comes from MySQL. I prepared benchmarks results for Sysbench OLTP workload with MySQL running on Intel 520. You can download it there. [...]]]></description>
				<content:encoded><![CDATA[<p>In my raw <a href="http://www.mysqlperformanceblog.com/2012/05/01/testing-intel-ssd-520/">IO benchmark of Intel 520 SSD</a> we saw that the drive does not provide uniform throughput and response time, but it is interesting how does it affect workload if it comes from MySQL.<br />
I prepared benchmarks results for Sysbench OLTP workload with MySQL running on Intel 520.<br />
You can download it <a href="http://www.percona.com/about-us/mysql-white-paper/intel-520-ssd-sysbench-report/">there</a>.<br />
<span id="more-233"></span><br />
There I want to publish graphs to compare Intel 520 vs regular RAID10.</p>
<p>Throughput:<br />
<a href="http://www.mysqlperformanceblog.com/wp-content/uploads/2012/05/throughput.png"><img src="http://www.mysqlperformanceblog.com/wp-content/uploads/2012/05/throughput.png" alt="" title="throughput" width="640" height="398" class="aligncenter size-full wp-image-9585" /></a></p>
<p>Response time:<br />
<a href="http://www.mysqlperformanceblog.com/wp-content/uploads/2012/05/response-time.png"><img src="http://www.mysqlperformanceblog.com/wp-content/uploads/2012/05/response-time.png" alt="" title="response-time" width="640" height="398" class="aligncenter size-full wp-image-9586" /></a></p>
<p>So despite big variation in raw IO, it seems it does not affect MySQL workload significantly, and single Intel 520 SSD gives much better throughput and response time comparing with traditional SAS RAID, and what is interesting it also much cheaper.<br />
What&#8217;s bad with Intel 520 is that this card does not have capacitor to protect write cache, so if you worry about data protection in case of power outage it is better to disable write cache on this card and use write cache from RAID controller (i.e. LSI-9260).</p>
<p>Benchmarks specification, hardware, scripts and raw results are available in <a href="http://www.percona.com/about-us/mysql-white-paper/intel-520-ssd-sysbench-report/">the full report</a>.</p>
<p><a href="https://twitter.com/VadimTk" class="twitter-follow-button" data-show-count="false">Follow @VadimTk</a><br />
<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.ssdperformanceblog.com/2012/05/intel-520-ssd-in-mysql-sysbench-oltp-benchmark/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Testing Fusion-io ioDrive2 Duo</title>
		<link>http://www.ssdperformanceblog.com/2012/05/testing-fusion-io-iodrive2-duo/</link>
		<comments>http://www.ssdperformanceblog.com/2012/05/testing-fusion-io-iodrive2-duo/#comments</comments>
		<pubDate>Thu, 10 May 2012 20:30:48 +0000</pubDate>
		<dc:creator>Vadim Tkachenko</dc:creator>
				<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[MLC]]></category>

		<guid isPermaLink="false">http://www.ssdperformanceblog.com/?p=230</guid>
		<description><![CDATA[I was lucky enough to get my hands on new Fusion-io ioDrive2 Duo card. So I decided to run the same series of tests I did for other Flash devices. This is ioDrive2 Duo 2.4TB card and it is visible to OS as two devices (1.2TB each), which can be connected together via software RAID. [...]]]></description>
				<content:encoded><![CDATA[<p>I was lucky enough to get my hands on new Fusion-io ioDrive2 Duo card. So I decided to run the same series of tests I did for <a href="http://www.mysqlperformanceblog.com/2012/05/07/testing-fusion-io-iodrive-now-with-driver-3-1/">other Flash devices</a>. This is ioDrive2 Duo 2.4TB card and it is visible to OS as two devices (1.2TB each), which can be connected together via software RAID. So I tested in two modes: single drive, and software RAID-0 over two drives.<br />
<span id="more-230"></span><br />
I should note that to run this card you need to have an external power, by the same reason I mentioned in <a href="http://www.mysqlperformanceblog.com/2012/05/04/testing-virident-flashmax-1400/">the previous post</a>: PCIe slot can provide only 25W power, which is not enough for ioDrive2 Duo to provide full performance. I mention this, as it may be challenge for some servers: some models may not have connector for external power, and for some you may need special &#8220;power kit&#8221;. So you need to make sure you have compatible hardware before getting Duo card. I personally ended up with setup like this: I use <a href="http://dl.dropbox.com/u/9893083/ppt/SSD/DSCF6739.JPG">a separate power supply</a>.</p>
<p>Fusion-io ioDrive2 Firmware v6.0.0, rev 107004 Public, Fusion-io driver version: 3.1.1.</p>
<p>Now to the results.<br />
For this test I also use <a href="http://www.percona.com/docs/wiki/benchmark:hardware:cisco_ucs_c250">Cisco UCS C250</a> server, and on the graph I show the results for both single card and raid (Duo).</p>
<p>Random writes, async:<br />
<a href="http://www.mysqlperformanceblog.com/wp-content/uploads/2012/05/rand-write4.png"><img src="http://www.mysqlperformanceblog.com/wp-content/uploads/2012/05/rand-write4.png" alt="" title="rand-write" width="640" height="396" class="aligncenter size-full wp-image-9468" /></a></p>
<p>We see stable and predictable write performance, with throughput: <strong>660 MiB/s</strong> for single, and <strong>1300 MiB/s</strong> for Duo</p>
<p>Random reads:<br />
<a href="http://www.mysqlperformanceblog.com/wp-content/uploads/2012/05/rand-read3.png"><img src="http://www.mysqlperformanceblog.com/wp-content/uploads/2012/05/rand-read3.png" alt="" title="rand-read" width="640" height="396" class="aligncenter size-full wp-image-9467" /></a></p>
<p>Again both modes provides stable level of throughput. <strong>1350 MiB/s</strong> for single and <strong>2300 MiB/s</strong> for Duo.</p>
<p>Now with separation per thread for random read synchronous IO:<br />
<a href="http://www.mysqlperformanceblog.com/wp-content/uploads/2012/05/read-sync-thrp1.png"><img src="http://www.mysqlperformanceblog.com/wp-content/uploads/2012/05/read-sync-thrp1.png" alt="" title="read-sync-thrp" width="640" height="396" class="aligncenter size-full wp-image-9470" /></a></p>
<p><a href="http://www.mysqlperformanceblog.com/wp-content/uploads/2012/05/read-sync-rt-1.png"><img src="http://www.mysqlperformanceblog.com/wp-content/uploads/2012/05/read-sync-rt-1.png" alt="" title="read-sync-rt " width="640" height="396" class="aligncenter size-full wp-image-9471" /></a></p>
<p>There is also excellent response time characteristics. <strong>0.25ms and 0.19ms</strong> for 8 threads, single and Duo cases.</p>
<p>In general ioDrive2 seems to provide better and more stable performance results comparing to <a href="http://www.mysqlperformanceblog.com/2012/05/07/testing-fusion-io-iodrive-now-with-driver-3-1/">previous generation ioDrive</a>.</p>
<p><a href="https://twitter.com/VadimTk" class="twitter-follow-button" data-show-count="false">Follow @VadimTk</a><br />
<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.ssdperformanceblog.com/2012/05/testing-fusion-io-iodrive2-duo/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>
