This document describes techniques for scaling out a web application across multiple database shards. It introduces Incline, which handles denormalization and data replication between shards transparently using triggers and queue tables. It also discusses Pacific utilities like mysqld_jumpstart for provisioning MySQL instances and pacific_divide for splitting shards without downtime. Incline addresses issues with complex queries and consistency across shards under an eventual consistency model.
1 of 32
Downloaded 108 times
More Related Content
A Clever Way to Scale-out a Web Application
1. A Clever Way to Scale-out
a Web Application
Cybozu Labs, Inc.
Kazuho Oku
2. RDB sharding
denormalization is inevitable
uid:1-2000
uid:2001-4000
uid:4001-6000
tweet
tweet
tweet
following
following
following
...
followed_by
followed_by
followed_by
timeline
timeline
timeline
when uid:123 tweets, write his tweet, read uids of his followers, and
update the timeline table of his followers
Sep 11 2009 A Clever Way to Scale-out a Web Application 2
3. Two methods to update the shards
eventual consistency
asynchonous updates using worker processes
pros: fast response, high scalability
cons: hard to maintain
2-phase commit
synchronous updates
pros: synchronous, doesn't require external
daemon
cons: slow response
Sep 11 2009 A Clever Way to Scale-out a Web Application 3
4. The problems
complex queries
reading from / writing to multiple DB nodes
cannot use secondary indexes
need to maintain per-user views (denormalized tables)
maintain consistency between the nodes
when using eventual consistency model
dynamic scaling
adding new nodes without stopping the service
Sep 11 2009 A Clever Way to Scale-out a Web Application 4
6. Incline
solution for the two problems of
eventual consistency:
complex update queries
maintenance of the denormalized tables
basic idea
do not let app. developers write denormalization
logic
handle denormalization below the SQL layer
by using triggers and queue tables
Sep 11 2009 A Clever Way to Scale-out a Web Application 6
7. Incline – illustrated
insert / update / delete rows of related
tables automatically
uid:1-2000
uid:2001-4000
uid:4001-6000
tweet
tweet
tweet
following
following
following
followed_by
followed_by
followed_by
...
timeline
timeline
timeline
queue
queue
queue
when uid:123 tweets, write only to his tweet table. Incline updates
other tables automatically
Sep 11 2009 A Clever Way to Scale-out a Web Application 7
8. Incline – illustrated (cont'd)
insert / update / delete rows of related
tables automatically
uid:1-2000
uid:2001-4000
uid:4001-6000
tweet
tweet
tweet
following
following
following
followed_by
followed_by
followed_by
...
timeline
timeline
timeline
queue
queue
queue
when uid:2431 starts following uid:940 only write to his following table
Sep 11 2009 A Clever Way to Scale-out a Web Application 8
9. Incline – details
triggers generated from def. files
sync. updates within each node
async. updates between the nodes
each DB node has a queue table
helper program (C++) applies the queued events
to other nodes
uses a fault tolerant algorithm
application only needs to write to the
user's shard
Sep 11 2009 A Clever Way to Scale-out a Web Application 9
10. Incline – the commands
# create queue tables
% incline --mode=shard --rdbms=mysql --database=microblog
--host=10.0.200.10 --source=microblog.json --shard-source=shard.json
create-queue
# create triggers
% incline --mode=shard --rdbms=mysql --database=microblog
--host=10.0.200.10 --source=microblog.json --shard-source=shard.json
create-trigger
# run forwarder (transfers data from specified host to other shards)
% incline --mode=shard --rdbms=mysql --database=microblog
--host=10.0.200.10 --source=microblog.json --shard-source=shard.json
forward
Sep 11 2009 A Clever Way to Scale-out a Web Application 10
12. Incline – FYI the generated triggers
CREATE TRIGGER _INCLINE_followed_by_INSERT AFTER INSERT ON followed_by FOR EACH NEW.following_id,NEW.user_id,'I';
ROW BEGIN
END IF;
IF (((1<=NEW.follower_id AND NEW.follower_id<2001))) THEN
ENDCREATE TRIGGER _INCLINE_following_DELETE AFTER DELETE ON following FOR EACH
INSERT INTO timeline (user_id,ctime,tweet_id,tweet_user_id) SELECT ROW BEGIN
NEW.follower_id,tweet.ctime,tweet.tweet_id,tweet.user_id FROM tweet WHERE IF (((1<=OLD.following_id AND OLD.following_id<2001))) THEN
tweet.user_id=NEW.user_id;
DELETE FROM followed_by WHERE followed_by.user_id=OLD.following_id AND
ELSE
followed_by.follower_id=OLD.user_id;
INSERT INTO _iq_timeline (user_id,ctime,tweet_id,tweet_user_id,_iq_action) ELSE
SELECT NEW.follower_id,tweet.ctime,tweet.tweet_id,tweet.user_id,'I' FROM
INSERT INTO _iq_followed_by (user_id,follower_id,_iq_action) SELECT
tweet WHERE tweet.user_id=NEW.user_id;
OLD.following_id,OLD.user_id,'D';
END IF;
END IF;
END
END
CREATE TRIGGER _INCLINE_followed_by_UPDATE AFTER UPDATE ON followed_by FOR EACH CREATE TRIGGER _INCLINE_tweet_INSERT AFTER INSERT ON tweet FOR EACH ROW BEGIN
ROW BEGIN
INSERT INTO timeline (user_id,ctime,tweet_id,tweet_user_id) SELECT
IF (((1<=NEW.follower_id AND NEW.follower_id<2001))) THEN
followed_by.follower_id,NEW.ctime,NEW.tweet_id,NEW.user_id FROM
REPLACE INTO timeline (user_id,ctime,tweet_id,tweet_user_id) SELECT followed_by WHERE ((1<=followed_by.follower_id AND
NEW.follower_id,tweet.ctime,tweet.tweet_id,tweet.user_id FROM tweet WHERE followed_by.follower_id<2001)) AND NEW.user_id=followed_by.user_id;
tweet.user_id=NEW.user_id;
INSERT INTO _iq_timeline (user_id,ctime,tweet_id,tweet_user_id,_iq_action)
ELSE
SELECT followed_by.follower_id,NEW.ctime,NEW.tweet_id,NEW.user_id,'I' FROM
INSERT INTO _iq_timeline (user_id,ctime,tweet_id,tweet_user_id,_iq_action) followed_by WHERE NOT (((1<=followed_by.follower_id AND
SELECT NEW.follower_id,tweet.ctime,tweet.tweet_id,tweet.user_id,'U' FROM followed_by.follower_id<2001))) AND NEW.user_id=followed_by.user_id;
tweet WHERE tweet.user_id=NEW.user_id;
END
END IF;
CREATE TRIGGER _INCLINE_tweet_UPDATE AFTER UPDATE ON tweet FOR EACH ROW BEGIN
END
REPLACE INTO timeline (user_id,ctime,tweet_id,tweet_user_id) SELECT
CREATE TRIGGER _INCLINE_followed_by_DELETE AFTER DELETE ON followed_by FOR EACH followed_by.follower_id,NEW.ctime,NEW.tweet_id,NEW.user_id FROM
ROW BEGIN
followed_by WHERE ((1<=followed_by.follower_id AND
IF (((1<=OLD.follower_id AND OLD.follower_id<2001))) THEN
followed_by.follower_id<2001)) AND NEW.user_id=followed_by.user_id;
DELETE FROM timeline WHERE timeline.user_id=OLD.follower_id AND INSERT INTO _iq_timeline (user_id,ctime,tweet_id,tweet_user_id,_iq_action)
tweet_user_id=OLD.user_id;
SELECT followed_by.follower_id,NEW.ctime,NEW.tweet_id,NEW.user_id,'U' FROM
ELSE
followed_by WHERE NOT (((1<=followed_by.follower_id AND
followed_by.follower_id<2001))) AND NEW.user_id=followed_by.user_id;
INSERT INTO _iq_timeline (user_id,tweet_id,tweet_user_id,_iq_action) SELECT
OLD.follower_id,tweet.tweet_id,tweet.user_id,'D' FROM tweet WHERE END
tweet.user_id=OLD.user_id;
CREATE TRIGGER _INCLINE_tweet_DELETE AFTER DELETE ON tweet FOR EACH ROW BEGIN
END IF;
DELETE FROM timeline WHERE timeline.tweet_id=OLD.tweet_id AND
timeline.tweet_user_id=OLD.user_id;
END
INSERT INTO _iq_timeline (tweet_id,tweet_user_id,user_id,_iq_action) SELECT
CREATE TRIGGER _INCLINE_following_INSERT AFTER INSERT ON following FOR EACH ROW
OLD.tweet_id,OLD.user_id,followed_by.follower_id,'D' FROM followed_by
BEGIN
WHERE OLD.user_id=followed_by.user_id AND NOT
IF (((1<=NEW.following_id AND NEW.following_id<2001))) THEN
(((1<=followed_by.follower_id AND followed_by.follower_id<2001)));
INSERT INTO followed_by (user_id,follower_id) SELECT
END
NEW.following_id,NEW.user_id;
ELSE
INSERT INTO _iq_followed_by (user_id,follower_id,_iq_action) SELECT
Sep 11 2009 A Clever Way to Scale-out a Web Application 12
14. Range-based sharding vs. hash-based
Range-based sharding is better
range queries are sometimes necessary
manual tuning is easy
number of nodes increase continuously
with hash-based sharding, you have to add
1,2,4,8,16,32,64,... servers at once
Sep 11 2009 A Clever Way to Scale-out a Web Application 14
15. Pacific
utility programs for dynamic scaling
mysqld_jumpstart
pacific_divide
Sep 11 2009 A Clever Way to Scale-out a Web Application 15
16. mysqld_jumpstart – summary
create a mysqld instance in a single
command
service automatically started by daemontools
setup of primary nodes and slaves
auto-generated backup script: install_dir/etc/
backup.sh
uses XtraBackup for hot-backup
Sep 11 2009 A Clever Way to Scale-out a Web Application 16
17. mysql_jumpstart – the commands
# create and start a master database
% mysqld_jumpstart --mysql-install-db=/usr/local/mysql/bin/
mysql_install_db --mysqld=/usr/local/mysql/libexec/mysqld --base-
dir=/var/servicedb --server-id=1252619462 --socket=/tmp/mysql-
servicedb.sock --service-dir=/service/mysql-servicedb --replication-
network='10.0.200.0/255.255.255.0'
# backup
% /var/servicedb/etc/backup.sh /var/backup/servicedb.backup.20090911
# create and start a slave database
% mysqld_jumpstart --mysql-install-db=/usr/local/mysql/bin/
mysql_install_db --mysqld=/usr/local/mysql/libexec/mysqld --base-
dir=/var/servicedb --server-id=1252619493 --socket=/tmp/mysql-
servicedb.sock --service-dir=/service/mysql-servicedb --replication-
network='10.0.200.0/255.255.255.0' --master-host=10.0.200.1 --from-
innobackupex
Sep 11 2009 A Clever Way to Scale-out a Web Application 17
18. Splitting a MySQL shard
use replication to prepare, then upgrade
a slave to master
Before:
1 2,000
2,001 4,000
4,001 6,000
replication
slave
After:
1 2,000
2,001 3,000
3,001 4,000
4,001 6,000
Sep 11 2009 A Clever Way to Scale-out a Web Application 18
19. Problems in splitting a shard
speed vs. safety
downtime should be minimum
guarantee that all the application servers write to
the new node
reads may switch to the new node eventually
Sep 11 2009 A Clever Way to Scale-out a Web Application 19
20. Pacific_divide – the blurbs
fail-safe
application servers using the old sharding
definition cannot access the split nodes
app. servers reload the definition upon such case
minimum impact on users
no read-locks during division
in eventual-consistency mode
acquires write lock only against the dividing node
write lock time < 10 seconds
if no delay in replication
Sep 11 2009 A Clever Way to Scale-out a Web Application 20
21. Pacific_divide – the split algorithm
1. create a new slave node
2. drop write privileges of existing username on the dividing
node
3. wait until the new node becomes in sync.
4. update incline triggers
5. create new user and give read / write privileges
6. update shard def.
7. drop read privileges granted to the old username
Sep 11 2009 A Clever Way to Scale-out a Web Application 21
22. Pacific_divide – the comand
# upgrade 10.0.200.18 to a master with range uid:3,000-
#
# when instructed by pacific_divide, transmit shard.json to all
# application servers and mysql shards (or you may use nfs, etc.)
% pacific_divide --shard-def=shard.json --database=microblog --new-
host=10.0.200.18 --from-id=3000 --incline-source=microblog.json
Before:
1 2,000
2,001 4,000
4,001 6,000
replication
slave
After:
1 2,000
2,001 3,000
3,001 4,000
4,001 6,000
Sep 11 2009 A Clever Way to Scale-out a Web Application 22
25. DBIx::ShardManager – the code
# create manager object
my $mgr = DBIx::ShardManager->new(
definition => DBIx::ShardManager::Definition::JSON->new(
file => 'etc/user_shard_def.json',
auto_reload => 1,
),
connector => DBIx::ShardManager::Connector::DBI->new(
driver => 'mysql',
dbname => 'microblog',
attr => {
mysql_enable_utf8 => 1,
RaiseError => 1,
},
),
);
Sep 11 2009 A Clever Way to Scale-out a Web Application 25
26. DBIx::ShardManager – the code (cont'd)
# read user's timeline
# first, read my timeline table
my $timeline = $mgr->rw_handle($user_id)->selectall_arrayref(
'SELECT * FROM timeline WHERE user_id=? ORDER BY ctime DESC LIMIT
20',
{ Slice => {} },
$user_id,
);
# fetch the tweets using (tweet_user_id,tweet_id) from other shards
$mgr->shard_inner_join(
$timeline,
tweet_user_id => {
'tweet.tweet_id' => 'tweet_id',
},
}
Sep 11 2009 A Clever Way to Scale-out a Web Application 26
27. DBIx::ShardManager – blurbs
access to raw DBI handles
easy to use ORM above DBIx::ShardManager
detects changes and reloads shard def.
but may throw exceptions on writes during node
divisions by pacific_divide
display maintenance error, and let the user retry
shard_join to be optimized
with Net::Drizzle, or mycached
Sep 11 2009 A Clever Way to Scale-out a Web Application 27
29. Conclusion
RDB sharding is not difficult when using
Incline, Pacific, DBIx::ShardManager
IMO it is as easy as writing code for a standalone
database system
app. developers can use 2-phase commit
if necessary
or rely on Incline for async. updates
Sep 11 2009 A Clever Way to Scale-out a Web Application 29
30. Current Status & ToDo
Incline - early beta
ToDo: add support for multiple shard keys, add
recovery support on data-loss
Pacific - early beta
ToDo: make it a distribution
DBIx::ShardManager - still alpha
ToDo: write more join functions, concurrent
access, etc.
Sep 11 2009 A Clever Way to Scale-out a Web Application 30
31. Miscellaneous
Mycached
currently in alpha status
access MySQL tables using memcached protocol
higher concurrency (thousands of connections)
higher throughput (2x SQL)
Sep 11 2009 A Clever Way to Scale-out a Web Application 31
32. For more information
see my blog
http://developer.cybozu.co.jp/kazuho/
DBIx::ShardManager is in coderepos.org/share/
lang/perl
come to BPStudy #25 on 9/25
2h30m talk on Incline, Pacific,
DBIx::ShardManager (hopefully including demos)
Sep 11 2009 A Clever Way to Scale-out a Web Application 32