This is part 3 of our series. In part 1 we talked about boosting performance with memcached on top of MySQL, in Part 2 we talked about running 100% outside the data with memcached, and now in Part 3 we are going to look at a possible solution to free you from the database. The solution I am going to discuss here is Tokyo Cabinet and Tyrant.
I am not going to give you a primer or Tutorial on Tyrant and Cabinet, there are plenty of these out there already. Instead I want to see what sort of performance we can see compared to MySQL and Memcached, and later on other NoSQL solutions. Tokyo actually allows you to use several types of databases that are supported, there are hash databases which are very similar to memcached, a table database which is similar to your classic database tables where you can add a where clause and search individual columns, and a ton more “database options” beyond just those two. Again my goal is to not make this a Tokyo Tyrant tutorial but rather show one potential role it can play.
More details can be read about here:
http://1978th.net/tokyotyrant/
http://1978th.net/tokyocabinet/
So if we can get performance similar to memcached with Tokoyo Tyrant when using disk based hash tables it would be a compelling replacement for our application here. It should provide the interface and the same access we saw in memcached but with disk persistence. So let’s look at the numbers:
Tyrant’s disk based hash was almost 2x faster then combining memcached and MySQL, and about 20% slower then the all memory memcached approach. So for this particular application I would have been much better off not storing my data in MySQL and instead looking outside the database for an answer. Now sure there are other reasons you may want to keep data in the database… but I am trying to get you to think about your application and if those reasons are really valid. Helping clients pick the right solution is one of the things we do here at Percona. If an application requires a database great, but if there is a better solution we want to suggest it. It’s our goal to make your application perform optimally.
Finally, one concern you have to have is the scalability of your storage solution. As load, number of threads, and data size increases how does performance differ or change? One knock on Tokyo -vs- Memcached is Tokyo is not distributed by default. Now that’s not to say we could not shard it based on a hash, or even build an api with the capability built in ( or use the memcached clients which works! )… but native support is lacking. It does support replication which could make some rather interesting architectures in the future.
So lets look at some scalability benchmarks, my server resources are rather limited but I thought I should try throwing more threads and work at the server until it hit its limit and fell over dead. It’s interesting to see the number of transactions that occur with a given number of threads. let’s look at some of these:
As expected the smaller buffer pool struggled ( why a smaller buffer pool? This simulates a much larger data set. A BP of 256M with 1GB of data, can give similar performance to 20GB of data and a 5GB BP ). So with 256M BP and 4GB of memcached we were well off the numbers we hit with a 4GB BP+4Gb of memcached ( which is expected ).  Adding more threads even up to 128 threads increased overall throughput but my load average on the server hit 40 and my CPU was pegged. At 128 threads I was pegging out my CPU across the board. Also interesting is I started to hit bottlenecks in MySQL/Innodb when I had enough memory but I increased the threads from 64 to 128. As time permits I should revisit this and look at increased datasets, and look for area’s where Tyrant may stumble a bit.
Bottom line given a specific application and data pattern sometimes a relational database is not the appropriate place for storing data. A tool like Tokyo Tyrant may not be for everyone or every application, but neither is a relational database. Before building your next application try and understand whether an RDBMS is really needed or not.
How did I do these tests:
The above number were run with 32 Threads, Tyrant was started with 8 threads and 128M of memory, memached was started with 16 threads ( 1.4 memcached ), mysql was 5.1 XtraDB. Each environment had 2 tables each with 2 million rows. The data was identical. memcached and Tyrant stored a comma delimited string to represent the row.  Mysql was running with 256M allocated to the innodb buffer unless otherwise noted.
What’s next? Well next I am going to try and continue this series by exploring and benchmarking other NOSQL options and comparing them to database based solutions. I think showing the performance of a couple of different Tokyo database formats would also be interesting. What other solutions are people interested in? I know I have gotten a lot of requests for cassandra #’s, but what else? Drop a comment and let me know!
Interested in Redis: http://code.google.com/p/redis/
Sorry guys, I had to yank this back and replace one of the graphs as I had an old version out there…. its fixed now.
Great articles! I’m interested in Project Voldemort. 🙂
Cassandra, Redis (if there are any improvements on the massive memory consumption), MongoDB, HamsterDB..
Here’s a big list if you need inspiration:
http://internetmindmap.com/database_software#key-value
I thought that Tokyo Cabinet was limited to 1 writer or N readers. Is that a performance problem when the writer has to do disk IO? For example, can the writer block waiting for IO and does it continue to have exclusive access while blocked?
comparing mysql with tokyo tryant with a comma delimited string is not fair since it takes way more space per row,try using a compact binary serialization
Great article! It would be interesting to see how Memcached compares to an in memory Tokyo Tyrant hash. For persistence the in memory Tokyo Tyrant instance can replicate to a disk based tyrant hash on another server.
Can you post the ttserver arguments you started Tokyo Tyrant with?
I think you’re gloss over the scale-out concerns? People should be sternly aware that having two TT servers, then moving to four, is going to be a real bitch. Yes you can use the memcached client which distributes the queries, but if you’re going from two to four, and you expect that data to persist, you’re screwed? You end up distributing your data and treating half of it as though it were new, despite it being in the old cluster.
Hi! Very interesting indeed.
I would be interested in comparison to BerkleyDB as non-SQL based database which used to be supported as the engine in MySQL and is now also provided by Oracle.
Thanks for the articles.
I’ve been struggling / tweaking MySQL 5.1.39 with Innodb plugin 1.0.4 for high performance, It indeed scored way better than Vanilla 5.1.x, but its not just overall performance that I’m looking for. My app has to respond well before 100ms.
So in all these tests how was was the app able to respond back? What could be a solution / approach (rather than putting more and more stuff in RAM) as my App is using well over 50G. The performance of Tokyo tyrant / and other variants that ur testing could be useful for guys who have the time-constraint on their app’s response.
Definitely mongo !
Redis is much less interesting, because it needs to fit everything in RAM.
Hi,
What about Riak ?
I’m interested in MongoDB
@Uriel , understand the issues with uncompress comma delimited strings … but this should only be even faster right? So in the worse case its still faster then stock MySQL.
@dormando Not glossing over the scalability difficulties… any project requires you to plan for future growth and come up with a plan. If you do not do this ahead of time the potential for disaster or a painful scale out increases dramatically. The plan to scale out Tyrant or MySQL will hit similar challenges at a certain point. For instance if you did not plan to shard MySQL and now have to that’s a big painful task for a large dataset. Similarly for tyrant. I am not aiming this blog post to be an all encompassing Tyrant playbook, there are too many variables out there for that. Instead this is designed to show there are alternatives to MYSQL and some of these can be faster in certain situations. I agree performance is one factor of many when deciding on an architecture, its a big one but only one. If tyrant is not faster then MySQL most people would not even consider it… so hopefully this solicits feedback, get people talking, thinking outside the database, etc.
would be interested in couchdb and mongodb
One more vote for testing Redis
We tried Tokyo a few months ago to replace some MySQL databases and found that requests block and time out when you do a sync to disk. Normally Tokyo will work in memory and only when you call sync will it flush everything to disk. Response times which are normally around a few milliseconds go up to many seconds while Tokyo is synching. Have you taken sync into account for your benchmarks?
I think we should examine redis
redis is a key value database, however it support persistency and clients (PHP/ Ruby/Perl /Java / Python / tcl …
some of the clients support sharding ( not need to re-implement that)
and there is also terracotta …
Best Regards
Great articles! I appreciate the thoroughness of each experiment and the discussion of methodology and outcomes. Looking forward to the next tests.
Voldemort + BDB, please and pure BDB (1 server).
I second the CouchDB request.
hi, may be you should try MemcacheDB(http://memcachedb.org/) too, it uses BerkleyDB as persistence db
+1 for MongoDB