Category Archives: SLC

Testing the Micron P320h

The Micron P320h SSD is an SLC-based PCIe solid-state storage device which claims to provide the highest read throughput of any server-grade SSD, and at Micron’s request, I recently took some time to put the card through its paces, and the numbers are indeed quite impressive.

For reference, the benchmarks for this device were performed primarily on a Dell R710 with 192GB of RAM and two Xeon E5-2660 processors that yield a total of 32 virtual cores.  This is the same machine which was used in my previous benchmark run.  A small handful of additional tests were also performed using the Cisco UCS C250. The operating system in use was CentOS 6.3, and for the sysbench fileIO tests, the EXT4 filesystem was used.  The card itself is the 700GB model.

So let’s take a look at the data.

With the sysbench fileIO test in asynchronous mode, read performance is an extremely steady 3202MiB/sec with almost no deviation. Write performance is also both very strong and very steady, coming in at a bit over 1730MiB/sec with a standard deviation of a bit less than 13MiB/sec.

realssd-asyncIO

When we calculate in the fact that the block size in use here is 16KiB, these numbers equate to over 110,000 write IOPS and almost 205,000 read IOPS.

When we switch over to synchronous IO, we find that the card is quite capable of matching the asynchronous performance:

syncIO-throughput

Synchronous read reaches peak capacity somewhere between 32 and 64 threads, and synchronous write tops out somewhere between 64 and 128 threads. The latency numbers are equally impressive; the next two graphs show 95th and 99th-percentile response time, but there really isn’t much difference between the two.

syncIO-latency

At 64 read threads, we reach peak performance with latency of roughly 0.5 milliseconds; and at 128 write threads we have maximum throughput with latency just over 3ms.

How well does it perform with MySQL?  Exact results vary, depending upon the usual factors (read/write ratio, working set size, buffer pool size, etc.) but overall the card is extremely quick and handily outperforms the other cards that it was tested against. For example, in the graph below we compare the performance of the P320h on a standard TPCC-MySQL test to the original FusionIO and the Intel i910 with assorted buffer pool sizes:

tpcc-mysql-devicecompare

And in this graph we look at the card’s performance on sysbench OLTP:

sysbench-oltp-ext4xfs

It is worth noting here that EXT4 outperforms XFS by a fairly significant margin. The approximate raw numbers, in tabular format, are:

- EXT4 XFS
13GiB BP 22000 7500
25GiB BP 17000 9000
50GiB BP 21000 11000
75GiB BP 25000 15000
100GiB BP 31000 19000
125GiB BP 36000 25000

In the final analysis, there may or may not be faster cards out there, but the Micron P320h is the fastest one that I have personally seen to date.

MySQL 5.5.8 and Percona Server on Fast Flash card (Virident tachIOn)

Crossposted from MySQLPerformanceBlog.

This is to follow up on my previous post and show the results for MySQL 5.5.8 and Percona Server on the fastest hardware I have in our lab: a Cisco UCS C250 server with 384GB of RAM, powered by a Virident tachIOn 400GB SLC card.

To see different I/O patterns, I used different innodb_buffer_pool_size settings: 13G, 52G, an 144G on a tpcc-mysql workload with 1000W (around 100GB of data). This combination of buffer pool sizes gives us different data/memory ratios (for 13G – an I/O intensive workload, for 52G – half of the data fits into the buffer pool, for 144G – the data all fits into memory). For the cases when the data fits into memory, it is especially important to have big transactional log files, as in these cases the main I/O pressure comes from checkpoint activity, and the smaller the log size, the more I/O per second InnoDB needs to perform.

So let me point out the optimizations I used for Percona Server:

  • innodb_log_file_size=4G (innodb_log_files_in_group=2)
  • innodb_flush_neighbor_pages=0
  • innodb_adaptive_checkpoint=keep_average
  • innodb_read_ahead=none

For MySQL 5.5.8, I used:

  • innodb_log_file_size=2000M (innodb_log_files_in_group=2), as the maximal available setting
  • innodb_buffer_pool_instances=8 (for a 13GB buffer pool); 16 (for 52 and 144GB buffer pools), as it is seems in this configuration this setting provides the best throughput
  • innodb_io_capacity=20000; a difference from the FusionIO case, it gives better results for MySQL 5.5.8.

For both servers I used:

  • innodb_flush_log_at_trx_commit=2
  • ibdata1 and innodb_log_files located on separate RAID10 partitions, InnoDB datafiles on the Virident tachIOn 400G card

The raw results, config, and script are in our Benchmarks Wiki.
Here are the graphs:

13G innodb_buffer_pool_size:

In this case, both servers show a straight line, and it seems having 8 innodb_buffer_pool_instances was helpful.

52G innodb_buffer_pool_size:

144G innodb_buffer_pool_size:

The final graph shows the difference between different settings of innodb_io_capacity for MySQL 5.5.8.

Small innodb_io_capacity values are really bad, while 20000 allows us to get a more stable line.

In summary, if we take the average NOTPM for the final 30 minutes of the runs (to avoid the warmup stage), we get the following results:

  • 13GB: MySQL 5.5.8 – 23,513 NOTPM, Percona Server – 30,436 NOTPM, advantage: 1.29x
  • 52GB: MySQL 5.5.8 – 71,774 NOTPM, Percona Server – 88,792 NOTPM, advantage: 1.23x
  • 144GB: MySQL 5.5.8 – 78,091 NOTPM, Percona Server – 109,631 NOTPM, advantage: 1.4x

This is actually the first case where I’ve seen NOTPM greater than 100,000 for a tpcc-mysql workload with 1000W.

The main factors that allow us to get a 1.4x improvement in Percona Server are:

  • Big log files. Total size of logs are: innodb_log_file_size=8G
  • Disabling flushing of neighborhood pages: innodb_flush_neighbor_pages=0
  • New adaptive checkpointing algorithm innodb_adaptive_checkpoint=keep_average
  • Disabled read-ahead logic: innodb_read_ahead=none
  • Buffer pool scalability fixes (different from innodb_buffer_pool_instances)

We recognize that hardware like the Cisco UCS C250 and the Virident tachIOn card may not be for the mass market yet, but
it is a good choice for if you are looking for high MySQL performance, and we tune Percona Server to get the most from such hardware. Actually, from my benchmarks, I see that the Virident card is not fully loaded, and we may benefit from running two separate instances of MySQL on a single card. This is a topic for another round.

(Edited by: Fred Linhoss)

Write performance on Virident tachIOn card

One of the biggest problems with solid state drives is that write performance may drop significantly with decreasing free space. I wrote about this before (http://www.ssdperformanceblog.com/2010/07/free-space-and-write-performance/), using a
FusionIO 320GB Duo card as the example. In that case, when space utilization increased from 100GB to 200GB, the write performance
dropped 2.6 times.

In this regard, Virident claims that tachIOn cards provide “Sustained, predictable random IOPS – Best in the Industry”. Virident generously provided me model 400GB, and I ran the benchmark using the
same methodology as in my experiment with FusionIO, which was stripped for performance. Also using my script, Virident made runs on tachIOn 200GB and 800GB model cards and shared numbers with me ( to be clear I can certify only numbers for 400GB card, but I do not have reasons to question the numbers for 200GB and 800GB, as they corresponding to my results).

The benchmarks was done on Cisco UCS C250 box and raw results are on Benchmarks Wiki



Visually, the drop is not as drastic as it was in the case using FusionIO, but let’s get some numbers.
I am going to take the performance numbers at the points where the available space of the card is 1/3, 1/2, and 2/3 filled, as well as at the point where the card is full. Then I will compute the ratio of each of those IOS numbers to the IOS at the 1/3 point.

**For the 400GB tachIOn card:**

size Throughput, MiB/sec ratio
130 959.17
200 849.58 1.13
260 685.18 1.40
360 417.33 2.29

That is, at the 2/3 point, the 400GB card is slower by 29% than at the 1/3 point, and at full capacity it is slower by 57%.

Observations from looking at the graph:

* You can also see the card never goes below 400MB/sec, even when working at full capacity. This characteristic (i.e., high throughput at full capacity) is very important to know if you are looking to use an SSD card as a cache layer (say, with FlashCache), as, usually for caching, you will try to fill all available space.
* The ratio between the 1/3 capacity point and full capacity point is much smaller compared with FusionIO Duo (without additional spare capacity reservation).
* Also, looking at the graphs for Virident and comparing with the graphs for FusionIO, one might be tempted to say that Virident just has a lot of space reserved internally which is not exposed to the end user, and this is what they use to guarantee a high level of performance. I checked with Virident and they tell me that this is not the case. Actually from diagnostic info on Wiki page you can see: tachIOn0: 0×8000000000 bytes (512 GB), which I assume total installed memory. Regardless, it is not a point to worry about. For pricing, Virident defines GB as the capacity available for end users. So, a competitive $/GB level is maintained, and it does not matter how much space is reserved internally.

Now it would be interesting to compare results with results for FusionIO. As cards have different capacity I made graph which shows
throughput vs percentage of used capacity for both cards, FusionIO 320GB Duo SLC and Virident 400GB SLC

Util % Duo 320GB tachIOn 400GB advantage percent
20% 1,095 990 90%
30% 1,006 980 97%
40% 825 964 117%
50% 613 872 142%
60% 397 783 197%
70% 308 669 217%
80% 237 611 258%
90% 117 502 429%
100% 99 417 421%

In conclusion:
* On single Virident card I see random write throughput close or over 1GB/sec in with low space usage and it is comparable with throughput I’ve got on stripped FusionIO card. I assume Virident maintain good level of parallelism internally.
* Virident card maintains very good throughput level in close to full capacity mode, and that means you do not need to worry ( or worry less) about space reservation or formatting card with less space.

SLC and MLC

All modern solid state drives use NAND memory based on SLC (single level cell) or MLC (multi level cell) technologies.
Not going into physical details – SLC basically stores 1 bit of information, while MLC can do more. Most popular option for MLC is 2 bit, and there is movement into 3 bit direction.

This fact gives us next characteristics:

  • SLC provides less capacity
  • SLC is more expensive
  • SLC is know to have better quality chip, it fails less than MLC

Along with that there is also limitation on amount of write operations. SLC can handle about 100,000 write cycles, while MLC is 10,000 ( the numbers are rough, and changing with technology improvement)

No wonder that vendor very quickly come with next separation:

  • SLC for enterprise market ( servers )
  • MLC for consumer market ( desktops, workstations, laptops)

As obvious example here is Intel SSD cards: X25-E ( SLC) is sold as enterprise level card, and X25-M ( MLC ) is  sold for mass market. As another example of difference in capacity and price:

  • FusionIO 160GB SLC card price $6,308.99
  • FusionIO 320GB MLC card price $6,729.99

That is for the same price MLC card comes with doubled capacity.

However with increasing capacity difference between MLC and SLC is getting fuzzier. For MLC most critical part is software (firmware) algorithm which ensures a uniform usage of available NAND chips, and with bigger capacity it is much easier to implement.
This problem with handling lifetime and manage write cycle for MLC opened way for hardware solution like SandForce controller and recently Anabit announced “Memory Signal Processing (MSP™) technology enables MLC-based solutions at SLC-grade reliability and performance”.

Also important is increasing capacity for MLC devices, for example, if we take 10,000 writes vs 100,000 writes than to provide the same life time MLC would need about 10x more capacity, and
it seems not problem. I expect soon we will see MLC cards with 1600GB, which ideally will have the same lifetime as SLC 160GB cards.

On this way interesting to see Intel announces enterprise line for SSD card will be based on
eMLC
( enterprise MLC ), where each cell has 30,000 writes lifetime and with maximal capacity 400GB

So it seems market is gradually moving into “MLC is ready for enterprise” direction, and sounds as good option to have devices with high capacity and reasonable price in near future.

Some articles on this topics: