SlideShare a Scribd company logo
A Clever Way to Scale-out
    a Web Application
       Cybozu Labs, Inc.
         Kazuho Oku
RDB sharding

    denormalization is inevitable

               uid:1-2000
                 uid:2001-4000
                      uid:4001-6000

                  tweet
                          tweet
                           tweet

                following
                     following
                        following
                                                                                                ...
              followed_by
                   followed_by
                       followed_by

                timeline
                      timeline
                         timeline



    when uid:123 tweets, write his tweet, read uids of his followers, and
    update the timeline table of his followers

Sep 11 2009                      A Clever Way to Scale-out a Web Application                           2
Two methods to update the shards

    eventual consistency
         asynchonous updates using worker processes
         pros: fast response, high scalability
         cons: hard to maintain
    2-phase commit
         synchronous updates
         pros: synchronous, doesn't require external
          daemon
         cons: slow response
Sep 11 2009           A Clever Way to Scale-out a Web Application   3
The problems

    complex queries
         reading from / writing to multiple DB nodes
         cannot use secondary indexes
               need to maintain per-user views (denormalized tables)

    maintain consistency between the nodes
         when using eventual consistency model
    dynamic scaling
         adding new nodes without stopping the service

Sep 11 2009               A Clever Way to Scale-out a Web Application   4
Incline




Sep 11 2009   A Clever Way to Scale-out a Web Application   5
Incline

    solution for the two problems of
     eventual consistency:
         complex update queries
         maintenance of the denormalized tables
    basic idea
         do not let app. developers write denormalization
          logic
         handle denormalization below the SQL layer
               by using triggers and queue tables

Sep 11 2009               A Clever Way to Scale-out a Web Application   6
Incline – illustrated

    insert / update / delete rows of related
     tables automatically
               uid:1-2000
                 uid:2001-4000
                      uid:4001-6000

                  tweet
                          tweet
                           tweet

               following
                      following
                        following

              followed_by
                   followed_by
                       followed_by
    ...
                timeline
                       timeline
                        timeline

                        queue
                            queue
                       queue



    when uid:123 tweets, write only to his tweet table. Incline updates
    other tables automatically

Sep 11 2009                      A Clever Way to Scale-out a Web Application                           7
Incline – illustrated (cont'd)

    insert / update / delete rows of related
     tables automatically
               uid:1-2000
                 uid:2001-4000
                      uid:4001-6000

                  tweet
                          tweet
                           tweet

                following
                     following
                        following

              followed_by
                   followed_by
                       followed_by
    ...
                timeline
                       timeline
                        timeline

                        queue
                            queue
                       queue




    when uid:2431 starts following uid:940 only write to his following table

Sep 11 2009                      A Clever Way to Scale-out a Web Application                           8
Incline – details

    triggers generated from def. files
    sync. updates within each node
    async. updates between the nodes
         each DB node has a queue table
         helper program (C++) applies the queued events
          to other nodes
         uses a fault tolerant algorithm
    application only needs to write to the
     user's shard
Sep 11 2009          A Clever Way to Scale-out a Web Application   9
Incline – the commands
   # create queue tables
   % incline --mode=shard --rdbms=mysql --database=microblog 
    --host=10.0.200.10 --source=microblog.json --shard-source=shard.json 
    create-queue

   # create triggers
   % incline --mode=shard --rdbms=mysql --database=microblog 
    --host=10.0.200.10 --source=microblog.json --shard-source=shard.json 
    create-trigger

   # run forwarder (transfers data from specified host to other shards)
   % incline --mode=shard --rdbms=mysql --database=microblog 
    --host=10.0.200.10 --source=microblog.json --shard-source=shard.json 
    forward




Sep 11 2009             A Clever Way to Scale-out a Web Application     10
Incline – the definition files
   # view def. file
                                               # shard def. file
   [
                                                              {
      {
                                                              "algorithm" : "range-int",
          "source"      : [ "tweet", "followed_by" ],
                  "map"      : {
          "destination" : "timeline",
                                    "1"    : {
          "pk_columns" : {
                                                "host"     : "10.0.200.10",
            "followed_by.follower_id" : "user_id",
                        "username" : "pac1251781019"
            "tweet.user_id"           : "tweet_user_id",
                },
            "tweet.tweet_id"          : "tweet_id"
                      "2001" : {
          },
                                                               "host"     : "10.0.200.11",
          "npk_columns" : {
                                                "username" : "pac1251781332"
            "tweet.ctime" : "ctime"
                                     },
          },
                                                            "4001" : {
          "merge"       : {
                                                "host" : "10.0.200.12",
            "tweet.user_id" : "followed_by.user_id"
                        "username" : "pac1251781408"
          },
                                                            }
          "shard-key"   : "user_id"
                               }
        }, {
          "source"      : "following",
          "destination" : "followed_by",
          "pk_columns" : {
              "following.following_id" : "user_id",
              "following.user_id"      : "follower_id"
          },
          "shard-key"    : "user_id"
        }
   ]


Sep 11 2009                              A Clever Way to Scale-out a Web Application                        11
Incline – FYI the generated triggers
   CREATE TRIGGER _INCLINE_followed_by_INSERT AFTER INSERT ON followed_by FOR EACH          NEW.following_id,NEW.user_id,'I';
         ROW BEGIN
                                                                     END IF;
     IF (((1<=NEW.follower_id AND NEW.follower_id<2001))) THEN
                       ENDCREATE TRIGGER _INCLINE_following_DELETE AFTER DELETE ON following FOR EACH
       INSERT INTO timeline (user_id,ctime,tweet_id,tweet_user_id) SELECT                   ROW BEGIN
         NEW.follower_id,tweet.ctime,tweet.tweet_id,tweet.user_id FROM tweet WHERE      IF (((1<=OLD.following_id AND OLD.following_id<2001))) THEN
         tweet.user_id=NEW.user_id;
                                                                                          DELETE FROM followed_by WHERE followed_by.user_id=OLD.following_id AND
     ELSE
                                                                                  followed_by.follower_id=OLD.user_id;
       INSERT INTO _iq_timeline (user_id,ctime,tweet_id,tweet_user_id,_iq_action)       ELSE
         SELECT NEW.follower_id,tweet.ctime,tweet.tweet_id,tweet.user_id,'I' FROM
                                                                                           INSERT INTO _iq_followed_by (user_id,follower_id,_iq_action) SELECT
         tweet WHERE tweet.user_id=NEW.user_id;
                                                                                            OLD.following_id,OLD.user_id,'D';
     END IF;
                                                                           END IF;
   END
                                                                               END
   CREATE TRIGGER _INCLINE_followed_by_UPDATE AFTER UPDATE ON followed_by FOR EACH    CREATE TRIGGER _INCLINE_tweet_INSERT AFTER INSERT ON tweet FOR EACH ROW BEGIN
          ROW BEGIN
                                                                                        INSERT INTO timeline (user_id,ctime,tweet_id,tweet_user_id) SELECT
     IF (((1<=NEW.follower_id AND NEW.follower_id<2001))) THEN
                             followed_by.follower_id,NEW.ctime,NEW.tweet_id,NEW.user_id FROM
        REPLACE INTO timeline (user_id,ctime,tweet_id,tweet_user_id) SELECT                 followed_by WHERE ((1<=followed_by.follower_id AND
          NEW.follower_id,tweet.ctime,tweet.tweet_id,tweet.user_id FROM tweet WHERE         followed_by.follower_id<2001)) AND NEW.user_id=followed_by.user_id;
          tweet.user_id=NEW.user_id;
                                                                                        INSERT INTO _iq_timeline (user_id,ctime,tweet_id,tweet_user_id,_iq_action)
     ELSE
                                                                                  SELECT followed_by.follower_id,NEW.ctime,NEW.tweet_id,NEW.user_id,'I' FROM
        INSERT INTO _iq_timeline (user_id,ctime,tweet_id,tweet_user_id,_iq_action)          followed_by WHERE NOT (((1<=followed_by.follower_id AND
          SELECT NEW.follower_id,tweet.ctime,tweet.tweet_id,tweet.user_id,'U' FROM          followed_by.follower_id<2001))) AND NEW.user_id=followed_by.user_id;
          tweet WHERE tweet.user_id=NEW.user_id;
                                     END
     END IF;
                                                                         CREATE TRIGGER _INCLINE_tweet_UPDATE AFTER UPDATE ON tweet FOR EACH ROW BEGIN
   END
                                                                                 REPLACE INTO timeline (user_id,ctime,tweet_id,tweet_user_id) SELECT
   CREATE TRIGGER _INCLINE_followed_by_DELETE AFTER DELETE ON followed_by FOR EACH          followed_by.follower_id,NEW.ctime,NEW.tweet_id,NEW.user_id FROM
          ROW BEGIN
                                                                        followed_by WHERE ((1<=followed_by.follower_id AND
     IF (((1<=OLD.follower_id AND OLD.follower_id<2001))) THEN
                             followed_by.follower_id<2001)) AND NEW.user_id=followed_by.user_id;
        DELETE FROM timeline WHERE timeline.user_id=OLD.follower_id AND                 INSERT INTO _iq_timeline (user_id,ctime,tweet_id,tweet_user_id,_iq_action)
          tweet_user_id=OLD.user_id;
                                                       SELECT followed_by.follower_id,NEW.ctime,NEW.tweet_id,NEW.user_id,'U' FROM
     ELSE
                                                                                  followed_by WHERE NOT (((1<=followed_by.follower_id AND
                                                                                            followed_by.follower_id<2001))) AND NEW.user_id=followed_by.user_id;
       INSERT INTO _iq_timeline (user_id,tweet_id,tweet_user_id,_iq_action) SELECT
         OLD.follower_id,tweet.tweet_id,tweet.user_id,'D' FROM tweet WHERE            END
         tweet.user_id=OLD.user_id;
                                                  CREATE TRIGGER _INCLINE_tweet_DELETE AFTER DELETE ON tweet FOR EACH ROW BEGIN
     END IF;
                                                                           DELETE FROM timeline WHERE timeline.tweet_id=OLD.tweet_id AND
                                                                                            timeline.tweet_user_id=OLD.user_id;
   END
                                                                                        INSERT INTO _iq_timeline (tweet_id,tweet_user_id,user_id,_iq_action) SELECT
   CREATE TRIGGER _INCLINE_following_INSERT AFTER INSERT ON following FOR EACH ROW
                                                                                            OLD.tweet_id,OLD.user_id,followed_by.follower_id,'D' FROM followed_by
          BEGIN
                                                                                            WHERE OLD.user_id=followed_by.user_id AND NOT
     IF (((1<=NEW.following_id AND NEW.following_id<2001))) THEN
                           (((1<=followed_by.follower_id AND followed_by.follower_id<2001)));
        INSERT INTO followed_by (user_id,follower_id) SELECT
                                                                                      END
          NEW.following_id,NEW.user_id;
     ELSE
       INSERT INTO _iq_followed_by (user_id,follower_id,_iq_action) SELECT


Sep 11 2009                                          A Clever Way to Scale-out a Web Application                                                                   12
Pacific




Sep 11 2009   A Clever Way to Scale-out a Web Application   13
Range-based sharding vs. hash-based

    Range-based sharding is better
         range queries are sometimes necessary
         manual tuning is easy
         number of nodes increase continuously
               with hash-based sharding, you have to add
                1,2,4,8,16,32,64,... servers at once




Sep 11 2009               A Clever Way to Scale-out a Web Application   14
Pacific

    utility programs for dynamic scaling
         mysqld_jumpstart
         pacific_divide




Sep 11 2009         A Clever Way to Scale-out a Web Application   15
mysqld_jumpstart – summary

    create a mysqld instance in a single
     command
         service automatically started by daemontools
         setup of primary nodes and slaves
         auto-generated backup script: install_dir/etc/
          backup.sh
               uses XtraBackup for hot-backup 




Sep 11 2009               A Clever Way to Scale-out a Web Application   16
mysql_jumpstart – the commands
   # create and start a master database
   % mysqld_jumpstart --mysql-install-db=/usr/local/mysql/bin/
      mysql_install_db --mysqld=/usr/local/mysql/libexec/mysqld --base-
      dir=/var/servicedb --server-id=1252619462 --socket=/tmp/mysql-
      servicedb.sock --service-dir=/service/mysql-servicedb --replication-
      network='10.0.200.0/255.255.255.0'

   # backup
   % /var/servicedb/etc/backup.sh /var/backup/servicedb.backup.20090911

   # create and start a slave database
   % mysqld_jumpstart --mysql-install-db=/usr/local/mysql/bin/
      mysql_install_db --mysqld=/usr/local/mysql/libexec/mysqld --base-
      dir=/var/servicedb --server-id=1252619493 --socket=/tmp/mysql-
      servicedb.sock --service-dir=/service/mysql-servicedb --replication-
      network='10.0.200.0/255.255.255.0' --master-host=10.0.200.1 --from-
      innobackupex



Sep 11 2009              A Clever Way to Scale-out a Web Application       17
Splitting a MySQL shard

    use replication to prepare, then upgrade
     a slave to master
         Before:

                        1 2,000
           2,001 4,000
                            4,001 6,000

                                   replication
                                                               slave


              After:

                        1 2,000
           2,001 3,000
             3,001 4,000
   4,001 6,000




Sep 11 2009                        A Clever Way to Scale-out a Web Application                    18
Problems in splitting a shard

    speed vs. safety
         downtime should be minimum
         guarantee that all the application servers write to
          the new node
               reads may switch to the new node eventually




Sep 11 2009               A Clever Way to Scale-out a Web Application   19
Pacific_divide – the blurbs

    fail-safe
         application servers using the old sharding
          definition cannot access the split nodes
               app. servers reload the definition upon such case

    minimum impact on users
         no read-locks during division
               in eventual-consistency mode
         acquires write lock only against the dividing node
         write lock time < 10 seconds
               if no delay in replication
Sep 11 2009                 A Clever Way to Scale-out a Web Application   20
Pacific_divide – the split algorithm

   1.         create a new slave node
   2.         drop write privileges of existing username on the dividing
              node
   3.         wait until the new node becomes in sync.
   4.         update incline triggers
   5.         create new user and give read / write privileges
   6.         update shard def.
   7.         drop read privileges granted to the old username




Sep 11 2009                 A Clever Way to Scale-out a Web Application   21
Pacific_divide – the comand 
   #   upgrade 10.0.200.18 to a master with range uid:3,000-
   #   
   #   when instructed by pacific_divide, transmit shard.json to all
   #   application servers and mysql shards (or you may use nfs, etc.)

   % pacific_divide --shard-def=shard.json --database=microblog --new-
      host=10.0.200.18 --from-id=3000 --incline-source=microblog.json



                 Before:

                            1 2,000
          2,001 4,000
                           4,001 6,000


                                       replication
                                                             slave



                  After:

                            1 2,000
          2,001 3,000
            3,001 4,000
   4,001 6,000




Sep 11 2009                     A Clever Way to Scale-out a Web Application                         22
Pacific_divide – how the shard def. changes
    # before
                                                      # after

    {
                                                             {
         "algorithm" : "range-int",
                                    "algorithm" : "range-int",
         "map"       : {
                                               "map"       : {
          "1"      : {
                                                  "1"      : {
            "host"      : "10.0.200.10",
                                  "host"      : "10.0.200.10",
            "username" : "pac1251781019"
                                  "username" : "pac1251781019"
          },
                                                            },
          "2001" : {
                                                    "2001" : {
             "host"     : "10.0.200.11",
                                  "host"     : "10.0.200.11",
             "username" : "pac1251781332"
                                 "username" : "pac1252624011"
          },
                                                            },
          "4001" : {
                                                    "3001" : {
             "host" : "10.0.200.12",
                                      "host"     : "10.0.200.18",
             "username" : "pac1251781408"
                                 "username" : "pac1252624011"
          }
                                                             },
    }
                                                                   "4001" : {
                                                                           "host" : "10.0.200.12",
                                                                           "username" : "pac1251781408"
                                                                         }
                                                                   }




Sep 11 2009                              A Clever Way to Scale-out a Web Application                       23
DBIx::ShardManager




Sep 11 2009    A Clever Way to Scale-out a Web Application   24
DBIx::ShardManager – the code
   # create manager object
   my $mgr = DBIx::ShardManager->new(
       definition => DBIx::ShardManager::Definition::JSON->new(
           file         => 'etc/user_shard_def.json',
           auto_reload => 1,
       ),
       connector => DBIx::ShardManager::Connector::DBI->new(
           driver => 'mysql',
           dbname => 'microblog',
           attr    => {
                mysql_enable_utf8 => 1,
                RaiseError => 1,
           },
       ),
   );



Sep 11 2009             A Clever Way to Scale-out a Web Application   25
DBIx::ShardManager – the code (cont'd)
   # read user's timeline

   # first, read my timeline table
   my $timeline = $mgr->rw_handle($user_id)->selectall_arrayref(
       'SELECT * FROM timeline WHERE user_id=? ORDER BY ctime DESC LIMIT
      20',
       { Slice => {} },
       $user_id,
   );
   # fetch the tweets using (tweet_user_id,tweet_id) from other shards
   $mgr->shard_inner_join(
       $timeline,
       tweet_user_id => {
            'tweet.tweet_id' => 'tweet_id',
       },
   }


Sep 11 2009             A Clever Way to Scale-out a Web Application        26
DBIx::ShardManager – blurbs

    access to raw DBI handles
         easy to use ORM above DBIx::ShardManager
    detects changes and reloads shard def.
         but may throw exceptions on writes during node
          divisions by pacific_divide
               display maintenance error, and let the user retry

    shard_join to be optimized
         with Net::Drizzle, or mycached


Sep 11 2009                A Clever Way to Scale-out a Web Application   27
Conclusion




Sep 11 2009   A Clever Way to Scale-out a Web Application   28
Conclusion

    RDB sharding is not difficult when using
     Incline, Pacific, DBIx::ShardManager
         IMO it is as easy as writing code for a standalone
          database system
    app. developers can use 2-phase commit
     if necessary
         or rely on Incline for async. updates



Sep 11 2009           A Clever Way to Scale-out a Web Application   29
Current Status & ToDo

    Incline - early beta
         ToDo: add support for multiple shard keys, add
          recovery support on data-loss
    Pacific - early beta
         ToDo: make it a distribution
    DBIx::ShardManager - still alpha
         ToDo: write more join functions, concurrent
          access, etc.


Sep 11 2009           A Clever Way to Scale-out a Web Application   30
Miscellaneous

    Mycached
         currently in alpha status
         access MySQL tables using memcached protocol
         higher concurrency (thousands of connections)
         higher throughput (2x SQL)




Sep 11 2009          A Clever Way to Scale-out a Web Application   31
For more information

    see my blog
     http://developer.cybozu.co.jp/kazuho/
         DBIx::ShardManager is in coderepos.org/share/
          lang/perl
    come to BPStudy #25 on 9/25
         2h30m talk on Incline, Pacific,
          DBIx::ShardManager (hopefully including demos)



Sep 11 2009          A Clever Way to Scale-out a Web Application   32

More Related Content

A Clever Way to Scale-out a Web Application

  • 1. A Clever Way to Scale-out a Web Application Cybozu Labs, Inc. Kazuho Oku
  • 2. RDB sharding  denormalization is inevitable uid:1-2000 uid:2001-4000 uid:4001-6000 tweet tweet tweet following following following ... followed_by followed_by followed_by timeline timeline timeline when uid:123 tweets, write his tweet, read uids of his followers, and update the timeline table of his followers Sep 11 2009 A Clever Way to Scale-out a Web Application 2
  • 3. Two methods to update the shards  eventual consistency  asynchonous updates using worker processes  pros: fast response, high scalability  cons: hard to maintain  2-phase commit  synchronous updates  pros: synchronous, doesn't require external daemon  cons: slow response Sep 11 2009 A Clever Way to Scale-out a Web Application 3
  • 4. The problems  complex queries  reading from / writing to multiple DB nodes  cannot use secondary indexes  need to maintain per-user views (denormalized tables)  maintain consistency between the nodes  when using eventual consistency model  dynamic scaling  adding new nodes without stopping the service Sep 11 2009 A Clever Way to Scale-out a Web Application 4
  • 5. Incline Sep 11 2009 A Clever Way to Scale-out a Web Application 5
  • 6. Incline  solution for the two problems of eventual consistency:  complex update queries  maintenance of the denormalized tables  basic idea  do not let app. developers write denormalization logic  handle denormalization below the SQL layer  by using triggers and queue tables Sep 11 2009 A Clever Way to Scale-out a Web Application 6
  • 7. Incline – illustrated  insert / update / delete rows of related tables automatically uid:1-2000 uid:2001-4000 uid:4001-6000 tweet tweet tweet following following following followed_by followed_by followed_by ... timeline timeline timeline queue queue queue when uid:123 tweets, write only to his tweet table. Incline updates other tables automatically Sep 11 2009 A Clever Way to Scale-out a Web Application 7
  • 8. Incline – illustrated (cont'd)  insert / update / delete rows of related tables automatically uid:1-2000 uid:2001-4000 uid:4001-6000 tweet tweet tweet following following following followed_by followed_by followed_by ... timeline timeline timeline queue queue queue when uid:2431 starts following uid:940 only write to his following table Sep 11 2009 A Clever Way to Scale-out a Web Application 8
  • 9. Incline – details  triggers generated from def. files  sync. updates within each node  async. updates between the nodes  each DB node has a queue table  helper program (C++) applies the queued events to other nodes  uses a fault tolerant algorithm  application only needs to write to the user's shard Sep 11 2009 A Clever Way to Scale-out a Web Application 9
  • 10. Incline – the commands # create queue tables % incline --mode=shard --rdbms=mysql --database=microblog --host=10.0.200.10 --source=microblog.json --shard-source=shard.json create-queue # create triggers % incline --mode=shard --rdbms=mysql --database=microblog --host=10.0.200.10 --source=microblog.json --shard-source=shard.json create-trigger # run forwarder (transfers data from specified host to other shards) % incline --mode=shard --rdbms=mysql --database=microblog --host=10.0.200.10 --source=microblog.json --shard-source=shard.json forward Sep 11 2009 A Clever Way to Scale-out a Web Application 10
  • 11. Incline – the definition files # view def. file # shard def. file [ { { "algorithm" : "range-int", "source" : [ "tweet", "followed_by" ], "map" : { "destination" : "timeline", "1" : { "pk_columns" : { "host" : "10.0.200.10", "followed_by.follower_id" : "user_id", "username" : "pac1251781019" "tweet.user_id" : "tweet_user_id", }, "tweet.tweet_id" : "tweet_id" "2001" : { }, "host" : "10.0.200.11", "npk_columns" : { "username" : "pac1251781332" "tweet.ctime" : "ctime" }, }, "4001" : { "merge" : { "host" : "10.0.200.12", "tweet.user_id" : "followed_by.user_id" "username" : "pac1251781408" }, } "shard-key" : "user_id" } }, { "source" : "following", "destination" : "followed_by", "pk_columns" : { "following.following_id" : "user_id", "following.user_id" : "follower_id" }, "shard-key" : "user_id" } ] Sep 11 2009 A Clever Way to Scale-out a Web Application 11
  • 12. Incline – FYI the generated triggers CREATE TRIGGER _INCLINE_followed_by_INSERT AFTER INSERT ON followed_by FOR EACH NEW.following_id,NEW.user_id,'I'; ROW BEGIN END IF; IF (((1<=NEW.follower_id AND NEW.follower_id<2001))) THEN ENDCREATE TRIGGER _INCLINE_following_DELETE AFTER DELETE ON following FOR EACH INSERT INTO timeline (user_id,ctime,tweet_id,tweet_user_id) SELECT ROW BEGIN NEW.follower_id,tweet.ctime,tweet.tweet_id,tweet.user_id FROM tweet WHERE IF (((1<=OLD.following_id AND OLD.following_id<2001))) THEN tweet.user_id=NEW.user_id; DELETE FROM followed_by WHERE followed_by.user_id=OLD.following_id AND ELSE followed_by.follower_id=OLD.user_id; INSERT INTO _iq_timeline (user_id,ctime,tweet_id,tweet_user_id,_iq_action) ELSE SELECT NEW.follower_id,tweet.ctime,tweet.tweet_id,tweet.user_id,'I' FROM INSERT INTO _iq_followed_by (user_id,follower_id,_iq_action) SELECT tweet WHERE tweet.user_id=NEW.user_id; OLD.following_id,OLD.user_id,'D'; END IF; END IF; END END CREATE TRIGGER _INCLINE_followed_by_UPDATE AFTER UPDATE ON followed_by FOR EACH CREATE TRIGGER _INCLINE_tweet_INSERT AFTER INSERT ON tweet FOR EACH ROW BEGIN ROW BEGIN INSERT INTO timeline (user_id,ctime,tweet_id,tweet_user_id) SELECT IF (((1<=NEW.follower_id AND NEW.follower_id<2001))) THEN followed_by.follower_id,NEW.ctime,NEW.tweet_id,NEW.user_id FROM REPLACE INTO timeline (user_id,ctime,tweet_id,tweet_user_id) SELECT followed_by WHERE ((1<=followed_by.follower_id AND NEW.follower_id,tweet.ctime,tweet.tweet_id,tweet.user_id FROM tweet WHERE followed_by.follower_id<2001)) AND NEW.user_id=followed_by.user_id; tweet.user_id=NEW.user_id; INSERT INTO _iq_timeline (user_id,ctime,tweet_id,tweet_user_id,_iq_action) ELSE SELECT followed_by.follower_id,NEW.ctime,NEW.tweet_id,NEW.user_id,'I' FROM INSERT INTO _iq_timeline (user_id,ctime,tweet_id,tweet_user_id,_iq_action) followed_by WHERE NOT (((1<=followed_by.follower_id AND SELECT NEW.follower_id,tweet.ctime,tweet.tweet_id,tweet.user_id,'U' FROM followed_by.follower_id<2001))) AND NEW.user_id=followed_by.user_id; tweet WHERE tweet.user_id=NEW.user_id; END END IF; CREATE TRIGGER _INCLINE_tweet_UPDATE AFTER UPDATE ON tweet FOR EACH ROW BEGIN END REPLACE INTO timeline (user_id,ctime,tweet_id,tweet_user_id) SELECT CREATE TRIGGER _INCLINE_followed_by_DELETE AFTER DELETE ON followed_by FOR EACH followed_by.follower_id,NEW.ctime,NEW.tweet_id,NEW.user_id FROM ROW BEGIN followed_by WHERE ((1<=followed_by.follower_id AND IF (((1<=OLD.follower_id AND OLD.follower_id<2001))) THEN followed_by.follower_id<2001)) AND NEW.user_id=followed_by.user_id; DELETE FROM timeline WHERE timeline.user_id=OLD.follower_id AND INSERT INTO _iq_timeline (user_id,ctime,tweet_id,tweet_user_id,_iq_action) tweet_user_id=OLD.user_id; SELECT followed_by.follower_id,NEW.ctime,NEW.tweet_id,NEW.user_id,'U' FROM ELSE followed_by WHERE NOT (((1<=followed_by.follower_id AND followed_by.follower_id<2001))) AND NEW.user_id=followed_by.user_id; INSERT INTO _iq_timeline (user_id,tweet_id,tweet_user_id,_iq_action) SELECT OLD.follower_id,tweet.tweet_id,tweet.user_id,'D' FROM tweet WHERE END tweet.user_id=OLD.user_id; CREATE TRIGGER _INCLINE_tweet_DELETE AFTER DELETE ON tweet FOR EACH ROW BEGIN END IF; DELETE FROM timeline WHERE timeline.tweet_id=OLD.tweet_id AND timeline.tweet_user_id=OLD.user_id; END INSERT INTO _iq_timeline (tweet_id,tweet_user_id,user_id,_iq_action) SELECT CREATE TRIGGER _INCLINE_following_INSERT AFTER INSERT ON following FOR EACH ROW OLD.tweet_id,OLD.user_id,followed_by.follower_id,'D' FROM followed_by BEGIN WHERE OLD.user_id=followed_by.user_id AND NOT IF (((1<=NEW.following_id AND NEW.following_id<2001))) THEN (((1<=followed_by.follower_id AND followed_by.follower_id<2001))); INSERT INTO followed_by (user_id,follower_id) SELECT END NEW.following_id,NEW.user_id; ELSE INSERT INTO _iq_followed_by (user_id,follower_id,_iq_action) SELECT Sep 11 2009 A Clever Way to Scale-out a Web Application 12
  • 13. Pacific Sep 11 2009 A Clever Way to Scale-out a Web Application 13
  • 14. Range-based sharding vs. hash-based  Range-based sharding is better  range queries are sometimes necessary  manual tuning is easy  number of nodes increase continuously  with hash-based sharding, you have to add 1,2,4,8,16,32,64,... servers at once Sep 11 2009 A Clever Way to Scale-out a Web Application 14
  • 15. Pacific  utility programs for dynamic scaling  mysqld_jumpstart  pacific_divide Sep 11 2009 A Clever Way to Scale-out a Web Application 15
  • 16. mysqld_jumpstart – summary  create a mysqld instance in a single command  service automatically started by daemontools  setup of primary nodes and slaves  auto-generated backup script: install_dir/etc/ backup.sh  uses XtraBackup for hot-backup Sep 11 2009 A Clever Way to Scale-out a Web Application 16
  • 17. mysql_jumpstart – the commands # create and start a master database % mysqld_jumpstart --mysql-install-db=/usr/local/mysql/bin/ mysql_install_db --mysqld=/usr/local/mysql/libexec/mysqld --base- dir=/var/servicedb --server-id=1252619462 --socket=/tmp/mysql- servicedb.sock --service-dir=/service/mysql-servicedb --replication- network='10.0.200.0/255.255.255.0' # backup % /var/servicedb/etc/backup.sh /var/backup/servicedb.backup.20090911 # create and start a slave database % mysqld_jumpstart --mysql-install-db=/usr/local/mysql/bin/ mysql_install_db --mysqld=/usr/local/mysql/libexec/mysqld --base- dir=/var/servicedb --server-id=1252619493 --socket=/tmp/mysql- servicedb.sock --service-dir=/service/mysql-servicedb --replication- network='10.0.200.0/255.255.255.0' --master-host=10.0.200.1 --from- innobackupex Sep 11 2009 A Clever Way to Scale-out a Web Application 17
  • 18. Splitting a MySQL shard  use replication to prepare, then upgrade a slave to master Before: 1 2,000 2,001 4,000 4,001 6,000 replication slave After: 1 2,000 2,001 3,000 3,001 4,000 4,001 6,000 Sep 11 2009 A Clever Way to Scale-out a Web Application 18
  • 19. Problems in splitting a shard  speed vs. safety  downtime should be minimum  guarantee that all the application servers write to the new node  reads may switch to the new node eventually Sep 11 2009 A Clever Way to Scale-out a Web Application 19
  • 20. Pacific_divide – the blurbs  fail-safe  application servers using the old sharding definition cannot access the split nodes  app. servers reload the definition upon such case  minimum impact on users  no read-locks during division  in eventual-consistency mode  acquires write lock only against the dividing node  write lock time < 10 seconds  if no delay in replication Sep 11 2009 A Clever Way to Scale-out a Web Application 20
  • 21. Pacific_divide – the split algorithm 1.  create a new slave node 2.  drop write privileges of existing username on the dividing node 3.  wait until the new node becomes in sync. 4.  update incline triggers 5.  create new user and give read / write privileges 6.  update shard def. 7.  drop read privileges granted to the old username Sep 11 2009 A Clever Way to Scale-out a Web Application 21
  • 22. Pacific_divide – the comand # upgrade 10.0.200.18 to a master with range uid:3,000- # # when instructed by pacific_divide, transmit shard.json to all # application servers and mysql shards (or you may use nfs, etc.) % pacific_divide --shard-def=shard.json --database=microblog --new- host=10.0.200.18 --from-id=3000 --incline-source=microblog.json Before: 1 2,000 2,001 4,000 4,001 6,000 replication slave After: 1 2,000 2,001 3,000 3,001 4,000 4,001 6,000 Sep 11 2009 A Clever Way to Scale-out a Web Application 22
  • 23. Pacific_divide – how the shard def. changes # before # after { { "algorithm" : "range-int", "algorithm" : "range-int", "map" : { "map" : { "1" : { "1" : { "host" : "10.0.200.10", "host" : "10.0.200.10", "username" : "pac1251781019" "username" : "pac1251781019" }, }, "2001" : { "2001" : { "host" : "10.0.200.11", "host" : "10.0.200.11", "username" : "pac1251781332" "username" : "pac1252624011" }, }, "4001" : { "3001" : { "host" : "10.0.200.12", "host" : "10.0.200.18", "username" : "pac1251781408" "username" : "pac1252624011" } }, } "4001" : { "host" : "10.0.200.12", "username" : "pac1251781408" } } Sep 11 2009 A Clever Way to Scale-out a Web Application 23
  • 24. DBIx::ShardManager Sep 11 2009 A Clever Way to Scale-out a Web Application 24
  • 25. DBIx::ShardManager – the code # create manager object my $mgr = DBIx::ShardManager->new( definition => DBIx::ShardManager::Definition::JSON->new( file => 'etc/user_shard_def.json', auto_reload => 1, ), connector => DBIx::ShardManager::Connector::DBI->new( driver => 'mysql', dbname => 'microblog', attr => { mysql_enable_utf8 => 1, RaiseError => 1, }, ), ); Sep 11 2009 A Clever Way to Scale-out a Web Application 25
  • 26. DBIx::ShardManager – the code (cont'd) # read user's timeline # first, read my timeline table my $timeline = $mgr->rw_handle($user_id)->selectall_arrayref( 'SELECT * FROM timeline WHERE user_id=? ORDER BY ctime DESC LIMIT 20', { Slice => {} }, $user_id, ); # fetch the tweets using (tweet_user_id,tweet_id) from other shards $mgr->shard_inner_join( $timeline, tweet_user_id => { 'tweet.tweet_id' => 'tweet_id', }, } Sep 11 2009 A Clever Way to Scale-out a Web Application 26
  • 27. DBIx::ShardManager – blurbs  access to raw DBI handles  easy to use ORM above DBIx::ShardManager  detects changes and reloads shard def.  but may throw exceptions on writes during node divisions by pacific_divide  display maintenance error, and let the user retry  shard_join to be optimized  with Net::Drizzle, or mycached Sep 11 2009 A Clever Way to Scale-out a Web Application 27
  • 28. Conclusion Sep 11 2009 A Clever Way to Scale-out a Web Application 28
  • 29. Conclusion  RDB sharding is not difficult when using Incline, Pacific, DBIx::ShardManager  IMO it is as easy as writing code for a standalone database system  app. developers can use 2-phase commit if necessary  or rely on Incline for async. updates Sep 11 2009 A Clever Way to Scale-out a Web Application 29
  • 30. Current Status & ToDo  Incline - early beta  ToDo: add support for multiple shard keys, add recovery support on data-loss  Pacific - early beta  ToDo: make it a distribution  DBIx::ShardManager - still alpha  ToDo: write more join functions, concurrent access, etc. Sep 11 2009 A Clever Way to Scale-out a Web Application 30
  • 31. Miscellaneous  Mycached  currently in alpha status  access MySQL tables using memcached protocol  higher concurrency (thousands of connections)  higher throughput (2x SQL) Sep 11 2009 A Clever Way to Scale-out a Web Application 31
  • 32. For more information  see my blog http://developer.cybozu.co.jp/kazuho/  DBIx::ShardManager is in coderepos.org/share/ lang/perl  come to BPStudy #25 on 9/25  2h30m talk on Incline, Pacific, DBIx::ShardManager (hopefully including demos) Sep 11 2009 A Clever Way to Scale-out a Web Application 32