We many times wrote about InnoDB scalability problems, this time We are faced with one for MyISAM tables. We saw that several times in synthetic benchmarks but never in production, that’s why we did not escalate MyISAM scalability question. This time working on the customer system we figured out that box with 1 CPU Core is able to handle more queries per second than identical box, but with 4 CPU Cores.
The main query which showed this problem was similar to this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | SELECT name FROM t1, t2 WHERE t2.t1_id = t1.id AND t1.stat=1 AND t2.val = 5 LIMIT 1206,18; mysql> explain SELECT name FROM t1, t2 WHERE t2.t1_id = t1.id AND t1.stat=1 AND t2.val = 5 LIMIT 1206,18\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t2 type: ref possible_keys: val key: val key_len: 4 ref: const rows: 4092 Extra: *************************** 2. row *************************** id: 1 select_type: SIMPLE table: t1 type: eq_ref possible_keys: PRIMARY,id key: PRIMARY key_len: 4 ref: scale.t2.t1_id rows: 1 Extra: Using where 2 rows in set (0.00 sec) where CREATE TABLE `t1` ( `id` int(11) unsigned NOT NULL auto_increment, `stat` int(11) unsigned NOT NULL, PRIMARY KEY (`id`), KEY `id` (`id`,`stat`) ) ENGINE=MyISAM; CREATE TABLE `t2` ( `id` int(11) unsigned NOT NULL auto_increment, `val` int(11) unsigned NOT NULL, `name` varchar(100) NOT NULL, `t1_id` int(10) unsigned NOT NULL, PRIMARY KEY (`id`), KEY `val` (`val`) ) ENGINE=MyISAM AUTO_INCREMENT=4097 |
Table t1 contains about 260,000 records, all with stat=1, and t2 contains 4000 records, all with val=5 and different t1_id. It is surely not smart index structure for such data distribution but good enough for performance gotcha illustration purposes.
The benchmark shows following results for MyISAM using MySQL 5.0.45 run on 4 Core System:
Threads | queries/sec |
1 | 161 |
2 | 107 |
4 | 110 |
8 | 121 |
16 | 138 |
As you see running 2-4 threads concurrently we get result by 30% worse than with 1 thread, although it’s only simple select query which should be executed without exclusive table locking. Even on 16 threads we’re getting performance worse than with single query.
The problem in this case is key buffer contention which unlike popular belief not fully fixed by changes done in MySQL 4.1
As Monty explained us in MySQL 4.1 the change to key cache locking was done so disk IO is not done while lock is held, while lock is still held when key block is copied to processing thread local storage on Key Read Request. This lock is per key cache so if you have contention while multiple indexes are used you can create multiple key caches and map those to them. In this case however single index had most of the load.
This is partially proved by oprofile data (case with 4 threads):
123456 samples % app name symbol name2312008 31.3752 libpthread-2.3.4.so pthread_mutex_lock2235465 30.3364 no-vmlinux (no symbols)723200 9.8142 libpthread-2.3.4.so pthread_mutex_unlock237062 3.2171 mysqld key_cache_read215254 2.9211 mysqld find_key_block
As you see 40% of effective CPU time is spent in pthread_mutex_lock / pthread_mutex_unlock.
We could not get oprofile call tree to work on this box so we can only guess where these mutex lock requests come from.
A second confirmation that key_cache is a problem is benchmark run with disabled key_cache (=0).
Results for MyISAM with key_buffer_size=0
Threads | queries/sec |
1 | 128 |
2 | 113 |
4 | 193 |
8 | 196 |
16 | 195 |
The result for 1 thread is decreased and it is expected, but, funny, we have more queries per second for 4, 8, 16 with disabled key_cache.
The results for 2 threads is however quite unexpected. Though we did not have a time to profile it in more details.
Note however even in this case scalability is far from perfect giving only 1.5 times gain with.
The solution we proposed in this case was converting table t1 into InnoDB, and results:
Threads | Queries/sec |
1 | 296 |
2 | 341 |
4 | 544 |
8 | 493 |
16 | 498 |
InnoDB both performs much better in this case (not surprisingly as there is a lot of primary key lookups) but its scalability is not perfect giving less than 2x in peak which happens to be at 4 concurrent threads. So there is still work to do in addition to fixes done in later MySQL 5.0 versions.
We also decided to take a time and see may be brand new Falcon (significantly updated in 6.0.2 release) handles this query:
Threads | Queries/sec |
1 | 51 |
2 | 79 |
4 | 116 |
8 | 142 |
16 | 164 |
As you can see Falcon perform extremely poorly when single query executed being 1/3rd of MyISAM and 1/6th of Innodb. On other hand it scales quite nicely as number of threads increase.
The interesting thing is it shows best performance at 16 threads, showing 50% gain from 4 threads – which is quite unexpected for CPU bound load on system with 4 Cores.
Here is comparison of MyISAM Innodb and Falcon results in the graphical form:
I’ve created bug, for MyISAM key cache contention issue and lets see if there are any plans to have it fixed.
Unlike InnoDb, MyISAM loads up the entire row because it may need ‘name’. This is a problem when you have that ‘LIMIT 1206,18’. That first number is too high. You will see far greater performance when you break it into two queries — create a temp table then join. All that (unused) data will ruin the caches. I had a discussion about this with Money back in the 4.0 days. It is just not important enough since they don’t expect people to use high limits like that. And there is the temp table workaround. Might be able to use subselects to avoid the huge data load as well. In my case there were big blobs that got loaded up though they weren’t needed.
Monty, not Money! Heh… oops.
Steven,
The question in this case not the query optimization – we simply took query (a bit obfuscating) from users application and checked how well different storage engines can run it.
But note in this case there is no blobs so both Innodb and MyISAM will need to read full rows. Innodb has advantage doing join via primary key but this is other story.
OK, though I would be interested to see if you find any differences if you had your tests run with “LIMIT 18” instead of “LIMIT 1206,18”. I find that a high index into a limit plus a join kills all my caches. But then again, I may have blobs on all the tables involved… !
Dear Steven,
May I ask how you would re-write that same query into two queries, a temp table, and a join?
I’d like to see the example.
Michael
Hi Peter,
Thanks for the information. I’m not sure I fully understand the sentence:
“As Monty explained us in MySQL 4.1 the change to key cache locking was done so disk IO is not done while lock is held, while lock is still held when key block is copied to processing thread local storage on Key Read Request.”
Does that mean that the contention doesn’t happen when the index is read from disk to the buffer, but rather each time some thread try to read from the buffer? Put another way, does the amount of contention relate more to the status variable Key_read_requests rather than Key_reads?
Bests, Morten
I have been hearing a lot of anecdotal evidence that multiple cores are either bad or not good. Thanks for the detailed analysis of one case. Here are the three main causes of poor performance in multiple cores:
* Query Cache has locking problems. What was the setting for Query Cache? If it was even turned on and not used, then there were a locks there. (Granted, perhaps only 1 per SELECT.)
* Key_buffer has locking problems, as you clearly note.
* InnoDB has malloc problems — the MySQL version is less efficient than plain ‘ole malloc when you have multiple cores.
I wish there would be a fix in 5.1; I don’t want to have to wait for 6.x and Maria and Falcon.
@Peter,
Thanks for the good article.
@Rick,
> I wish there would be a fix in 5.1;
> I don’t want to have to wait for 6.x and Maria and Falcon.
I could be wrong but Peters results do not give me the impression that Maria or Falcon will fix this.