Maximum Peformance of MySQL and Q4M
I always use to blog my temporary ideas on one of my Japanese blog (id:kazuhooku's memos). When I wrote my thoughts on how to further optimize Q4M, Nishida-san asked me "how fast is the raw performance without client overhead?" Although it seems a bit diffcult to answer directly, it is easy to measure the performance of MySQL core and the storage engine interface, and by deducting the overhead, the raw performance of I/O operations in Q4M can be estimated. All the benchmarks were taken on linux 2.6.18 running on two Opteron 2218s.
So at first, I measured the raw performance of MySQL core on my testbed using mysqlslap, which was 115k queries per second.
$ perl -e 'print "select 1;\n" for 1..10000' > /tmp/select10k.sql && /usr/local/mysql51/bin/mysqlslap --query=/tmp/select10k.sql --socket=/tmp/mysql51.sock --iterations=1 --concurrency=40 Benchmark Average number of seconds to run all queries: 3.470 seconds Minimum number of seconds to run all queries: 3.470 seconds Maximum number of seconds to run all queries: 3.470 seconds Number of clients running queries: 40 Average number of queries per client: 10000
And the throughput of single row selects to the Q4M storage engine was 76k queries per second.
$ perl -e 'print "select * from test.q4m_t limit 1;\n" for 1..10000' > /tmp/select10k.sql && /usr/local/mysql51/bin/mysqlslap --query=/tmp/select10k.sql --socket=/tmp/mysql51.sock --iterations=1 --concurrency=40 Benchmark Average number of seconds to run all queries: 5.282 seconds Minimum number of seconds to run all queries: 5.282 seconds Maximum number of seconds to run all queries: 5.282 seconds Number of clients running queries: 40 Average number of queries per client: 10000
And finally, the queue consumption speed of Q4M (configure option: --with-mt-pwrite --with-sync=no) was 28k messages per second. And when I turned the --with-sync flag to fsync the speed was 20k messages per second. Considering the fact that consumption of a single row requires two queries (one query for retrieving a row, and one query for removing it), the numbers seem quite well to me, although further optimization would be possible.
$ MESSAGES=200000 CONCURRENCY=40 DBI='dbi:mysql:test;mysql_socket=/tmp/mysql51.sock' t/05-multireader.t 1..4 ok 1 ok 2 ok 3 ok 4 Multireader benchmark result: Number of messages: 200000 Number of readers: 40 Elapsed: 7.040 seconds Throughput: 28410.198 mess./sec.
And regarding the question about the raw performance of Q4M, the answer would be that the overhead of consuming a single row takes about 30 microseconds in Q4M core with fsync enabled, and about 15 microseconds if only pwrite's are being called.