I recently was asked what maximal amount transactions per second we can get using MySQL and XtraDB / InnoDB storage engine if we have high-end server. Good questions, though not easy to answer, as it depends on:

– durability setting ( innodb_flush_log_at_trx_commit = 0 or 1 ) ?
– do we use binary logs ( I used ROW based replication for 5.1)
– do we have sync_binlog options.

So why would not take these as variable parameters and run simple benchmark.
I took sysbench update_key scenario ( update indexed field on simple table)
and used Dell PowerEdge R900 with 16 cores, FusionIO as storage for table and RAID 10 with BBU as storage for innodb log files, innodb system table space and binary logs. And I used Percon-XtraDB-5.1.43-9.1 for benchmarks. All used partitions are formatted in XFS and mounted with nobarrier option.

I run update key for various threads and with next parameters

  • trx_commit=0 : innodb_flush_log_at_trx_commit = 0 and no binary logs
  • trx_commit=1 : innodb_flush_log_at_trx_commit = 1 and no binary logs
  • trx_commit=0 & binlog : innodb_flush_log_at_trx_commit = 0 and binary logs
  • trx_commit=1 & binlog : innodb_flush_log_at_trx_commit = 1 and binary logs
  • trx_commit=1 & binlog & sync_bin : innodb_flush_log_at_trx_commit = 1 and binary logs and sync_binlog=1
  • There are results I get:

    I found results being quite interesting.
    with innodb_flush_log_at_trx_commit = 0 maximal tps is 36332.02 tps, which drops to 23115.04 tps as
    we switch to innodb_flush_log_at_trx_commit = 1. As we use RAID10 with BBU, I did not expect the drops is going to be significant. In second case InnoDB spends

    With enabling binary logs, the results drops to 17451.01 tps with innodb_flush_log_at_trx_commit = 0 and to 12097.39 tps with innodb_flush_log_at_trx_commit = 1. So with binary logs serialization is getting even worse.

    Enabling sync_binlog makes things really bad, and maximal results I have is
    3086.7 tps. So this is good decision if binary log protection is worth such drop.

    UPDATE ( 3/4/2010 )

    Results with innodb_flush_log_at_trx_commit = 2

    Results with innodb_flush_log_at_trx_commit = 2 and binlogs

    10 Comments
    Oldest
    Newest Most Voted
    Inline Feedbacks
    View all comments

    What about innodb_flush_log_at_trx_commit=2 ? That’s a common optimization technique used for less risk (although it’s still technically not 100% ACID compliant, it’s consistent/durable to within a second, give or take depending on the filesystem’s journaling, etc.)

    Nicolas Steinmetz

    Hi,

    Few questions :
    – should innodb_flush_log_at_trx_commit = 2 change anything on this ?
    – if you use replication, by default tutorials put you on the green curve. Would you advise moving to “trx_commit=1 & binlog” or even “trx_commit=0 & binlog” ?

    Thanks,
    Nicolas

    Raine

    Hi Vadim! Really cool article!

    BTW, I was wondering how does XtraDB and the like behaves over different RAID-10 stripe sizes and the fact that actually a lot of Linux distros don’t have their default partition boundaries aligned on raid stripe sizes (due to 63 sector offset from a msdos partition table). Have you ever considered (or have any plans) in making those tests? It would be great to see how XtraDB behaves on unaligned vs aligned systems.

    I’ve found some interesting links:
    http://www.linux.com/archive/articles/140734
    http://www.linuxfoundation.org/collaborate/workgroups/linux-raid/raid_setup
    http://kbase.redhat.com/faq/docs/DOC-2893 { unfortunately didn’t find anything like this for XFS 🙁 }

    Best regards,
    Raine

    VJ Kumar

    3086.7 tps. looks pretty reasonable. Clearly, synchronous IO is bootlenecked on the raid controller/disk. I am getting similar performance with a raid-10 with a simple dd command:

    time dd if=/dev/zero of=a.dat oflag=sync count=100000
    100000+0 records in
    100000+0 records out
    51200000 bytes (51 MB) copied, 27.9191 s, 1.8 MB/s

    real 0m27.931s
    user 0m0.047s
    sys 0m3.906s

    About 3.5K writes per sec.

    Darius Jahandarie

    That hump at 64 threads in trx_commit=1 is pretty interesting, assuming it isn’t a fluke in the data.

    Andy

    Interesting results.

    1) The huge drop in performance (sometimes over 50%) due to binlog is a surprise. According to the MySQL Manual “Running a server with binary logging enabled makes performance about 1% slower.” (http://dev.mysql.com/doc/refman/5.1/en/binary-log.html)

    What caused the huge discrepancy between the 50% you measured and the 1% according to MySQL manual?

    2) If the binlog IO is the bottleneck, would putting binlog on FusionIO (instead of RAID 10 HDD) speed things up?

    3) What about innodb_support_xa? Will enabling that cause a further significant drop in performance?

    Peter Zaitsev

    Vadim,

    What filesystem did you use here ? I remember EXT3 performs surprisingly bad with sync_binlog=1

    http://www.mysqlperformanceblog.com/2009/01/21/beware-ext3-and-sync-binlog-do-not-play-well-together/

    In any case it looks like sync_binlog has very bad overhead. in innodb_flush_log_at_trx comit=1 you should have 2 fsync per commit in single client (because of XA) with binlog it should be 3 as I understand – the fact performance gap is much larger is different.

    Does group commit works on stage of writiting to binary log or is it only on transactional log commit ?

    Eric Stone

    I have been wondering about Power7 / IBM architecture and so-called “high-end” server testing for MySQL.

    Obviously, the Power7 architecture is pound-for-pound more powerful than any Intel platform — Sparc isn’t even a consideration.

    I’d love to see the same tests run on the new 750 series Power7 CPU’s and hardware from IBM.

    My guess is that we’d see another full CLASS of hardware / performance benchmarks unlike any yet seen….

    Mark Callaghan

    I would also like to see results for Power CPUs, at least the ones that have high clock rates.

    Isaac Egglestone.

    Important thing here I’m not sure of.

    CPU doesn’t really matter here much actually.

    Do you have your write cache on your raid controller set to Write through or Write backed?

    Seems too slow for some of the tests you did for the hardware you have this is why I’m asking.

    With write backed you don’t usually see that much difference if raid controller is properly configured and working, it’s nearly like writing to memory rather than disk.

    Was your BBU in re-learn mode at the time?
    If so how much write back cache does your raid card have