SlideShare a Scribd company logo
More Mastering the Art of Indexing


                                       Yoshinori Matsunobu

                               Lead of MySQL Professional Services APAC
                                           Sun Microsystems
                                     Yoshinori.Matsunobu@sun.com




Copyright 2010 Sun Microsystems inc                          The World’s Most Popular Open Source Database   1
Table of contents

       •    Case 1: Lock contention and indexing
       •    Case 2: Deadlock caused by indexes
       •    Case 3: Covering index and range scan / LIMIT
       •    Case 4: Covering index and long text/blob
       •    Case 5: Sorting, indexing and query execution plans


                       This is a second half of “Mastering the art of indexing” session.
                                      (Independent from the first half. No prerequisite)
                                     The first half session was presented last year.
                   http://www.mysqlconf.com/mysql2009/public/schedule/detail/6661



Copyright 2010 Sun Microsystems inc                        The World’s Most Popular Open Source Database   2
Speaker’s profile
       •    Name: Yoshinori Matsunobu

       •    Living in Tokyo, Japan

       •    Leading MySQL Consulting Business in Japan and APAC at Sun
            Microsystems (Oracle)

       •    Joined MySQL inc as a consultant in September 2006
              – Before joining MySQL, I worked at Sony Corporation as a software
                architect for 5.5 years, using Oracle/WebLogic/SAP R3 and
                MySQL/JBoss

       •    Published a couple of MySQL/Linux/Java books/articles (but all
            written in Japanese)

       •    Contact:
              – E-mail: Yoshinori.Matsunobu@gmai.com
              – Blog http://yoshinorimatsunobu.blogspot.com
              – @matsunobu on Twitter
Copyright 2010 Sun Microsystems inc                        The World’s Most Popular Open Source Database   3
Case 1: Lock contention
                                                user_id(PK)       name           status           …
  •    “user” table (InnoDB)                    1                 Ronald         0                …
  •    1 million records                        …                 …              …                …
  •    PK on user_id                            100               Messi          0
  •    No index on status column
                                                …                 …              …
  •    Auto-committed queries
                                                1000000           Raul           9                …
Session 1                                                 Session 2

1) DELETE FROM user WHERE status = 9;
…                                                             2) UPDATE user SET status=9
                                                              WHERE user_id=100;
… (taking long time to scan & delete)
                                                              … (waiting for query 1 to finish)
…                                                             …
Query OK, 1000 rows affected (13.66 sec)                   Query OK, 1 row affected (11.27 sec)
                                                           Rows matched: 1 Changed: 1 Warnings: 0
                                       Why query 2 was blocked by query 1?

 Copyright 2010 Sun Microsystems inc                                      The World’s Most Popular Open Source Database   4
Understanding how statement based
                          binary logging works
 1) DELETE FROM user
                                        user_id(PK)          name       status           …
        WHERE status = 9;
                       a)               1                    Ronald     0                …
                                        …                    …          …                …
                                   b)   100                  Messi      0

                                        …                    …          …
                        d)
Fully scanning user table,              1000000              Raul       9                …
deleting if status == 9                               c)
                                              2) UPDATE user SET status=9 WHERE user_id=100;
    What happens if query 1) does not lock row: user_id=100 ?
- 2) finishes before 1)
- 1) already checked row: user_id=100,                     Binary log:
the row was not deleted because status==0                  1. UPDATE user SET status=9 WHERE user_id=100;
- The final result of the row                              2. DELETE FROM user WHERE status = 9;

      --> exists, status=9                                 user_id=100 on slaves: Not exists
                                Data consistency is broken!
  Copyright 2010 Sun Microsystems inc                                       The World’s Most Popular Open Source Database   5
Next-Key Locking in InnoDB
       • Locking not only “modified” rows, but also “scanned”
         rows
       • To avoid master/slave mismatch
              – And to make point-in-time-recovery from binary logs work


       • UPDATE/DELETE … WHERE … sets exclusive next-
         key lock on every record the search encounters.
       • INSERT … SELECT … put a shared next key lock on
         all the selected rows.

       • Disadvantage: low concurrency


Copyright 2010 Sun Microsystems inc                  The World’s Most Popular Open Source Database   6
If status column is indexed

1) DELETE FROM user
      WHERE status = 9;                2) UPDATE user SET status=1 WHERE user_id=100;
              status      PK
                                               user_id(PK)     name             status           …
              0           100
                                               1               Ronald           0                …
              1           100
                                               …               …                …                …
              …           …
                                               100             Messi            0 -> 1
              9           12345
              9           …                    …               …                …

              9           1000000              1000000         Raul             9                …


        • 1) and 2) can run in parallel (concurrency improved!)
               – If query 2 sets status=9, it is blocked by query 1
        • Even though cardinality on status column is very low, indexing is
          still helpful to avoid massive row-lock contentions
 Copyright 2010 Sun Microsystems inc                         The World’s Most Popular Open Source Database   7
Sequential Access vs Random Access
     •       Updating/deleting rows by full table scan does sequential reads/writes
     •       Updating/deleting rows by index scan does random reads/writes

     •       Full table scan is not always worse than index scan



                                Index scan for 2mil rows vs Full scan for 100mil rows


              900rows/s                                                                  Index scan (buffer pool=10G)
                                                     297rows/s                           Index scan (buffer pool=5G)
              502,310rows/s                                                              Full table scan


         0     1000      2000         3000    4000     5000   6000   7000    8000
                                             seconds

Copyright 2010 Sun Microsystems inc                                         The World’s Most Popular Open Source Database   8
Read Committed + Row Based Binary Logging
       •    Next-key locking can be disabled with --binlog-format=row
            and --transaction-isolation=read-committed in MySQL 5.1

    Session 1                                    Session 2
    1) DELETE FROM user WHERE status = 9;
    …                                            2) UPDATE user SET status=9 WHERE
    … (taking long time to scan & delete)        user_id=100;
    …                                            Query OK, 1 row affected (0.00 sec)
    …                                            Rows matched: 1 Changed: 1 Warnings: 0
    Query OK, 1000 rows affected (13.66 sec)


•     Performance disadvantages:
       – Row based binary logging
       – Read Committed is less efficient than repeatable read with many concurrent sessions
         in InnoDB (See http://www.facebook.com/note.php?note_id=244956410932)

•     In 5.0 or earlier: innodb_locks_unsafe_for_binlog
       – Enable only if you do not use binlog or you do not care about data consistency
Copyright 2010 Sun Microsystems inc                          The World’s Most Popular Open Source Database   9
Table of contents

       •    Case 1: Lock contention and indexing
       •    Case 2: Deadlock caused by indexes
       •    Case 3: Covering index and range scan / LIMIT
       •    Case 4: Covering index and long text/blob
       •    Case 5: Sorting, indexing and query execution plans




Copyright 2010 Sun Microsystems inc                 The World’s Most Popular Open Source Database   10
Case 2: Deadlock
                                       user_id(PK)   name        status         …
- user table                           1             Ronald      0              …

- Index on status column 2                           John        1              …
                                       …             …           …
- Auto committed
                                       …             …           …

                                       10000000      Raul        9              …

Session 1                                            Session 2
 mysql> UPDATE user SET status=4                      mysql> UPDATE user SET status=2
 where status=1 ORDER BY user_id LIMIT 1 ;            where user_id=2;

 Query OK, 1 rows affected (0.00 sec)                 ERROR 1213 (40001): Deadlock found when
 Rows matched: 1 Changed: 1 Warnings: 0               trying to get lock; try
                                                      restarting transaction


         Why auto-committed, single-row updating queries caused deadlock error?

 Copyright 2010 Sun Microsystems inc                             The World’s Most Popular Open Source Database   11
Updating(Deleting) single row is not single step
 1) UPDATE user SET status=4,… where                   2) UPDATE user SET status=2
  status=1 ORDER BY user_id LIMIT 1 ;                  where user_id=2;
                                                          b
                status     PK                    user_id(PK)       name             status           …
     a          0          100                   1                 Ronald           0                …
                                      c (wait)
                1          2                     2                 John             1                …
                1          10         d (wait)   100               Vieri            0 -> 1
                                        -> DL
                1          13
                                                 …                 …                …
                …          …
                                                 1000000           Raul             9                …
       Secondary Index on status                               Clustered Index
1)
a Exclusive Lock on status = 1, getting PK              2)
                                                        b Exclusive Lock on PRIMARY = 2,
c Exclusive Lock on PRIMARY = 2                             getting values (status=1)

   -> Waiting for b                                     d Exclusive Lock on status = 1
                                                           -> Waiting for c -> Deadlock!
Copyright 2010 Sun Microsystems inc                            The World’s Most Popular Open Source Database   12
Table of contents

       •    Case 1: Lock contention and indexing
       •    Case 2: Deadlock caused by indexes
       •    Case 3: Covering index and range scan / LIMIT
       •    Case 4: Covering index and long text/blob
       •    Case 5: Sorting, indexing and query execution plans




Copyright 2010 Sun Microsystems inc                 The World’s Most Popular Open Source Database   13
Example table
                     “diary” table

                     CREATE TABLE diary (
                       diary_id INT UNSIGNED AUTO_INCREMENT,
                       user_id INT UNSIGNED NOT NULL,
                       post_date TIMESTAMP NOT NULL,
                       status TINYINT UNSIGNED NOT NULL,
                       rating FLOAT NOT NULL,
                       title VARCHAR(100) NOT NULL,
                       body TEXT,
                       PRIMARY KEY (diary_id),
                       INDEX user_date(user_id)
                     ) CHARSET utf8 ENGINE=InnoDB;



Copyright 2010 Sun Microsystems inc                  The World’s Most Popular Open Source Database   14
Wide-range queries are not fast
 SELECT diary_id FROM diary WHERE user_id=1 AND status=0 AND
 post_date >= '2009-03-01 00:00:00';

                               Branch 1
                            - 20      Leaf 1
                            - 40      Leaf 2

       Leaf Block 1                            5: post_date=‘2009-03-01..’, status=0
     user_id    RowID
        1          5                                       10000: post_date=‘2009-04-04..’, status=0
        1       10000
        1       15321
                                               15321: post_date=‘2009-04-23…’, status=0
     …                                …
        10               10                                  table records


     - If 100 entries match user_id=1, 100 random disk reads might happen
           - One random read for a leaf block, but 100 random reads for table records
     - Single HDD can do only 100-200 random disk reads per second (very slow)

Copyright 2010 Sun Microsystems inc                           The World’s Most Popular Open Source Database   15
Covering Index: Reading only an index
  SELECT diary_id FROM diary WHERE user_id=1 AND status=0 AND
  post_date >= '2009-03-01 00:00:00';
                                Branch 1
                              20       Leaf 1
                            - 120      Leaf 2

                    Leaf 1
                                                5: post_date=‘2009-03-01..’, status=0
user_id       post_date    status       RowID
   1         2009-03-29       0          1000
   1         2009-03-30       0         10000               10000: post_date=‘2009-04-04..’, status=0
   1         2009-03-31       0            5
   1         2009-04-01       0         15321   15321: post_date=‘2009-04-23…’, status=0
   1         2009-04-30       0          100
   ..            …            ..         400                  table records

- If all columns in the SQL statement (SELECT/WHERE/etc) are contained within single
index, MySQL chooses “Covering Index” execution plan
- Very efficient because random disk i/o does not happen
- In InnoDB, RowID is PK (diary_id) so covering index can be used more frequently
- status column is not useful to filter records, but useful to make it Covering Index
 Copyright 2010 Sun Microsystems inc                         The World’s Most Popular Open Source Database   16
Covering Index = “Using index”

> explain select count(ind) from t     > explain select count(c) from t
           id: 1                                  id: 1
  select_type: SIMPLE                    select_type: SIMPLE
        table: t                               table: t
         type: index                            type: ALL
possible_keys: NULL                    possible_keys: NULL
          key: ind                               key: NULL
      key_len: 5                             key_len: NULL
          ref: NULL                              ref: NULL
         rows: 100000181                        rows: 100000181
        Extra: Using index                     Extra:

mysql> select count(ind) from t;       mysql> select count(c) from t;
+---------------+                      +-----------+
|   count(ind) |                       | count(c) |
+---------------+                      +-----------+
|     100000000 |                      | 100000000 |
+---------------+                      +-----------+
1 row in set (15.98 sec)               1 row in set (28.99 sec)

 Copyright 2010 Sun Microsystems inc           The World’s Most Popular Open Source Database   17
LIMIT without covering index is not fast
 SELECT diary_id FROM diary WHERE user_id=1 AND status=0 AND
 post_date >= '2009-03-01 00:00:00' ORDER BY post_date LIMIT 30, 10;

                               Branch 1
                            - 20      Leaf 1
                            - 40      Leaf 2
                                                      checking status=0 or not
               Leaf 1
user_id      post_date           RowID 1
   1        2009-03-29             4       1
   1        2009-03-30           10000          2
   1        2009-03-31             5
                      …                    …
                                           40
                                                    table records
   1        2009-04-30            200      …
   1        2009-05-13           20000
   ..           …                 400

- LIMIT 30,10 requires at least 40 random reads
- If most of records are not status=0, many more random reads will happen
Copyright 2010 Sun Microsystems inc                 The World’s Most Popular Open Source Database   18
LIMIT with covering index is fast
 SELECT diary_id FROM diary WHERE user_id=1 AND status=0 AND
 post_date >= '2009-03-01 00:00:00' ORDER BY post_date LIMIT 30, 10;
                               Branch 1
                             20       Leaf 1
                           - 120      Leaf 2

                   Leaf 1
                                                   5: post_date=‘2009-03-01..’, status=0
user_id      post_date    status       RowID
   1        2009-03-29       0           4
   1        2009-03-30       0         10000                   10000: post_date=‘2009-04-04..’, status=0
   1        2009-03-31       0           5
   1        2009-04-01       0         15321      15321: post_date=‘2009-04-23…’, status=0
   1        2009-04-30       0          100
   1        2009-05-30       0          200                      table records
   1        2009-06-13       0         20000
   ..           …            ..         400
              - Covering index scan is sequential access
              - All entries are likely to reside in the same leaf block
              - Single disk i/o is enough to get all data
              - Reading 40 records or 10 records does not matter
Copyright 2010 Sun Microsystems inc                             The World’s Most Popular Open Source Database   19
LIMIT performance example
       SELECT diary_id FROM diary WHERE user_id=? AND status=0 AND
       post_date >= '2009-03-01 00:00:00' ORDER BY post_date LIMIT X, 10;

    Time to execute from 50 clients
                                      Normal Index    Covering Index
       LIMIT 0, 10                    1.787s          0.800s
       LIMIT 30, 10                   5.173s          0.831s


   •    Disk read happened in both cases (worst case example)
   •    # of random disk reads was N times higher on Normal Index + LIMIT
   •    # of disk reads was almost equal on Covering Index + LIMIT

   •    Similar effects apply to COUNT records

Copyright 2010 Sun Microsystems inc                  The World’s Most Popular Open Source Database   20
What if covering index can not be used?
 1) SELECT diary_id FROM diary WHERE user_id=1 AND status=0 AND
 post_date >= '2009-03-01 00:00:00' ORDER BY post_date LIMIT 0, 10;

 2) Remember the highest post_date (in HIDDEN HTML tag, etc) i.e. 2009-04-29
 3) SELECT diary_id FROM diary WHERE user_id=1 AND status=0 AND
 post_date > '2009-04-29 00:00:00' ORDER BY post_date LIMIT 0, 10;


               Leaf 1
user_id      post_date           RowID 1               checking status=0 or not
   1        2009-03-29             4
   1        2009-03-30           10000
   1        2009-03-31             5
                      …
   1        2009-04-29             5
   1        2009-04-30            200                 table records
                                           1
                                           …
   1        2009-05-13           20000
   ..           …                 400

- Use LIMIT 0, X (OFFSET 0) so that you can minimize the number of random reads


Copyright 2010 Sun Microsystems inc                   The World’s Most Popular Open Source Database   21
Table of contents

       •    Case 1: Lock contention and indexing
       •    Case 2: Deadlock caused by indexes
       •    Case 3: Covering index and range scan / LIMIT
       •    Case 4: Covering index and long text/blob
       •    Case 5: Sorting, indexing and query execution plans




Copyright 2010 Sun Microsystems inc                 The World’s Most Popular Open Source Database   22
“diary” table example

CREATE TABLE diary (                        - body is about 1KB/row
  diary_id INT UNSIGNED                     - The rest columns are less than
      AUTO_INCREMENT,                       50B/row in total
  user_id INT UNSIGNED NOT NULL,            - 20 million rows (20+GB) in total
  post_date TIMESTAMP NOT NULL,
                                            - Almost INSERT or SELECT only
  status TINYINT UNSIGNED NOT NULL,
  rating FLOAT NOT NULL,                    - 90% SELECT statements do not
                                            fetch body
  title VARCHAR(100) NOT NULL,
  body TEXT,                                SELECT user_id, post_date, title
  PRIMARY KEY (diary_id),                   FROM diary WHERE diary_id=?
  INDEX user_date(user_id, post_date),      - 10% SELECT statements fetch body
  INDEX user_rating(user_id, rating)        SELECT body FROM diary WHERE
) CHARSET utf8 ENGINE=InnoDB;               diary_id=?



Copyright 2010 Sun Microsystems inc                   The World’s Most Popular Open Source Database   23
Page(block) structure in InnoDB
                    Page Header

               Row                       Row
                                                              diary_id user_id post_date …
                          Row                                                        body (prefix)
                                                               rating      title
                                                                                       (768B)
Row Row Row Row Row
                                                     (20B in InnoDB Plugin’s DYNAMIC format)


                                 Row position info
                     Page Trailer                                                         The rest body

         1 Page(block) = 16KB
                                                                         Same page or Overflow Page
- Storing the rest body within the same page if space is available.
- If not, storing it to a separated page called “Overflow Page”.
- “diary” table is insert-mostly, so the rest body is stored in the same page in most cases
 Copyright 2010 Sun Microsystems inc                               The World’s Most Popular Open Source Database   24
Large TEXT/BLOB slows down all queries
diary_id            user_id               post_date             title                             body
(BIGINT PK)         (BIGINT, INDEX)       (DATETIME, INDEX)     (VARCHAR(100))                    (TEXT)
1                   5544321               2009/09/13 21:10:14   MySQL Cluster overview            …….(2000bytes)
2                   5544321               2009/10/13 22:13:34   UEFA Champions League             …….(700bytes)
3                   2345                  2009/11/7 22:12:23    巨人・7年ぶりの日本一                       …….(3000bytes)

SELECT user_id, post_date, title FROM diary                                 SELECT body FROM diary
 WHERE diary_id=?                                                            WHERE diary_id=?
 90% queries                                                                                  10% queries

                 diary_id     user_id     post_date    title
                                                                     body

    Block        diary_id     user_id     post_date    title                         body
                 diary_id     user_id     post_date    title
                                                                     body

                                                      InnoDB Buffer Pool

                                                      InnoDB Data File
・ Even though 90% queries don’t fetch body, body is loaded into buffer pool
because body resides in the same block
・Body values occupy most of InnoDB buffer pool space. Less # of records will be cached
    Copyright 2010 Sun Microsystems inc                                          The World’s Most Popular Open Source Database   25
Optimization approach - 1 to 1 relationship
  CREATE TABLE diary_head (
    diary_id INT UNSIGNED
        AUTO_INCREMENT,
    user_id INT UNSIGNED NOT NULL,
    post_date TIMESTAMP NOT NULL,
    status TINYINT UNSIGNED NOT NULL,
    rating FLOAT NOT NULL,
    title VARCHAR(100) NOT NULL,         -Two tables, both have diary_id as primary key
    PRIMARY KEY (diary_id),              -diary_head has all columns except body
    INDEX user_date(user_id, post_date), -diary_body has only body and pk
    INDEX user_rating(user_id, rating)   -diary_body can be NoSQL
  ) CHARSET utf8 ENGINE=InnoDB;          -Normalization is broken

  CREATE TABLE diary_body (
    diary_id INT UNSIGNED AUTO_INCREMENT
       PRIMARY KEY,
    body TEXT
  ) CHARSET utf8 ENGINE=InnoDB;

Copyright 2010 Sun Microsystems inc                    The World’s Most Popular Open Source Database   26
Table Size
                                      PRIMARY KEY (+   Secondary Indexes
                                      records)
    diary                             24GB             1.2G
    diary_head                        1.2GB            1.2G
    diary_body                        22GB             0



    - Since body is by far the biggest column, diary_head table becomes much smaller
    - 90% SELECT statements access diary_head
    SELECT user_id, post_date, title FROM diary_head WHERE diary_id=?
    - 10% SELECT statements access diary_body
    SELECT body FROM diary_body WHERE diary_id=?


Copyright 2010 Sun Microsystems inc                        The World’s Most Popular Open Source Database   27
Why 1:1 relationship is effective?
SELECT user_id, post_date, title FROM diary_head                    SELECT body FROM diary_body
 WHERE diary_id=?                                                    WHERE diary_id=?
 90% queries                                                                          10% queries


             diary_id    user_id       post_date    title
                                                            …
 Block       diary_id    user_id       post_date    title       Block          body                      body
             diary_id    user_id       post_date    title

                …
                                                   InnoDB Buffer Pool



                                      InnoDB Data File, InnoDB Log File

     - 90% queries do not read blocks that contain body
     - Blocks in diary_head table are (much) less frequently cached out

Copyright 2010 Sun Microsystems inc                                     The World’s Most Popular Open Source Database   28
Queries per second
    Queries that contain                diary table   1:1 relationship           Improvement
    body column                         (qps)         (qps)
    2%                                  323.61        15166.22                   46.9x
    5%                                  333.78        6567.11                    19.7x
    10%                                 352.27        3215.27                    9.13x
    20%                                 395.81        1370.24                    3.46x
    33%                                 474.84        782.51                     1.45x
    50%                                 632.87        539.86                     0.85x

SELECT user_id, post_date, title FROM diary_head WHERE diary_id=?
2,5,..50%: SELECT body FROM diary_body WHERE diary_id=?


       • If only small number of queries (20% or less) fetch body column,
         1:1 relationship is beneficial in this case

Copyright 2010 Sun Microsystems inc                             The World’s Most Popular Open Source Database   29
Covering Index as an alternative
       • Covering index can be a replacement of 1:1
         relationship, without breaking normalization

       • ALTER TABLE diary
         ADD INDEX diary_covering (diary_id, user_id,
         post_date, status, rating, title);
              – Including all columns except body


       • Execution plan of “SELECT user_id, post_date, …
         FROM diary WHERE diary_id=?” should be “Using
         Index (Covering Index)” !


Copyright 2010 Sun Microsystems inc                 The World’s Most Popular Open Source Database   30
Covering index that covers all columns except body
            SELECT user_id, post_date, rating, status, title FROM diary
            WHERE diary_id=5”
                               Branch 1
                           -100       Leaf 1
                           - 200      Leaf 2

                     Leaf 1
diary_id       user_id            …    RowID          5: post_date=‘2009-03-01..’, … body
   1            1000              …      1
   2              2               …      2
   3           10000              …      3             10000: post_date=‘2009-04-04..’, … body
   4             351              …      4
   5            1352              …      5        15321: post_date=‘2009-04-23…’, … body
   6             930              …      6
   7             444              …      7                   table records
   ..             …               ..    …

        •    This query’s execution plan should be “Using Index” (covering index)
        •    Index size is much smaller than diary table (close to diary_head)
        •    Should be very well cached, so it should be fast

 Copyright 2010 Sun Microsystems inc                       The World’s Most Popular Open Source Database   31
Be careful about query execution plan, always
mysql> EXPLAIN SELECT user_id, post_date, title FROM diary WHERE diary_id=5;

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: diary
         type: const
possible_keys: PRIMARY,diary_covering
          key: PRIMARY
      key_len: 4
          ref: const
         rows: 1
        Extra:
1 row in set (0.00 sec)

        • PRIMARY is Clustered Index in InnoDB
               – All colums *including body* will be accessed
        • Why MySQL did not use “diary_covering”?
 Copyright 2010 Sun Microsystems inc                       The World’s Most Popular Open Source Database   32
Difference between type=const and type=ref
       • const: Unique lookup (guaranteed by Primary/Unique
         index)
       • ref: Non-unique lookup

       • MySQL prioritizes “const” plan over “ref” plan
              – Unique key lookup reads at most one record so this is the
                fastest if we do not consider row length at all

       • Using diary_covering is non-unique lookup
              – Even though it actually returns at most one record

       • Control optimizer plan by FORCE INDEX
              – SELECT user_id, post_date, title FROM diary FORCE
                INDEX (diary_covering) WHERE diary_id=?

Copyright 2010 Sun Microsystems inc                   The World’s Most Popular Open Source Database   33
Optimized query execution plan

 mysql> EXPLAIN SELECT user_id, post_date, title FROM diary
 FORCE INDEX (diary_covering) WHERE diary_id = 5 G
 *************************** 1. row ***************************
            id: 1
   select_type: SIMPLE
         table: diary
          type: ref
 possible_keys: diary_covering
           key: diary_covering
       key_len: 4
           ref: const
          rows: 1
         Extra: Using index
 1 row in set (0.00 sec)




Copyright 2010 Sun Microsystems inc            The World’s Most Popular Open Source Database   34
Queries per second – Covering index
Queries that diary table           Covering Index           1:1 relationship Up (normal
contain body                                                                 ->covering)
column
2%                323.61           14275.93                 15166.22             44.11x
5%                333.78           6174.59                  6567.11              18.50x
10%               352.27           3198.16                  3215.27              9.08x
20%               395.81           1557.52                  1370.24              3.94x
33%               474.84           852.98                   782.51               1.80x
50%               632.87           550.08                   539.86               0.87x

 SELECT user_id, post_date, title FROM diary_head FORCE
 INDEX(diary_covering) WHERE diary_id=?
 2,5,..50%: SELECT body FROM diary_body WHERE diary_id=?

        • Covering index is very effective – close or above 1:1
             relationship
 Copyright 2010 Sun Microsystems inc                 The World’s Most Popular Open Source Database   35
What about insertion time and table size?
        - In general, adding indexes slows down insertion time
        - 1:1 relationship requires to execute INSERT twice

                                 Normal diary table 1:1 relationship         Covering index
                                                                             version
Time to insert                   3 hours 4 min      3 hours 17 min           3 hours 9 min
20million records
Table size                       24GB               1.4GB + 22GB             24GB
Secondary Index                  1.2GB              1.2GB + 0                2.2GB
size

 •    No big difference on insertion time
 •    SQL statement parsing overhead is relatively small on disk i/o bound workloads
 •    “diary_covering” index was sorted by diary_id (AUTO_INC PK), so index entries
      were sequentially inserted
 •    Secondary Index size was about 1.0GB bigger. It depends on covered columns

Copyright 2010 Sun Microsystems inc                           The World’s Most Popular Open Source Database   36
Note: Sequential order INSERT is fast
   INSERT INTO diary (diary_id, user_id, …) VALUES (NULL, 3, …)

        Leaf Block 1                               Leaf Block 1                     Leaf Block 2
diary_id user_id, etc RowID                diary_id user_id, etc RowID      diary_id user_id, etc RowID
   1         …          1                     1         …          1           61        …         61
   2         …          2                     2         …          2
   3         …          3                     3         …          3
   …                                          …                                            Empty

  60            …             60              60       …         60
“diary_covering” index
                                          ・No fragmentation
                                          ・Small number of blocks, small size
                                          ・Highly recommended for InnoDB PRIMARY KEY



                               All entries are inserted here: cached in memory
 Copyright 2010 Sun Microsystems inc                              The World’s Most Popular Open Source Database   37
Table of contents

       •    Case 1: Lock contention and indexing
       •    Case 2: Deadlock caused by indexes
       •    Case 3: Covering index and range scan / LIMIT
       •    Case 4: Covering index and long text/blob
       •    Case 5: Sorting, indexing and query execution plans




Copyright 2010 Sun Microsystems inc                 The World’s Most Popular Open Source Database   38
Sorting and Indexing
  SELECT * FROM tbl WHERE key1 < 30 ORDER BY key1


                                  Branch 1
                               - 60      Leaf 1
                              - 120      Leaf 2

                Leaf 1                            5: col2=‘aaa’, col3=10
          key1             PK
           1              10000                                        10000: col2=‘abc’, col3=100
           2                5
           3              15321                    15321: col2=‘a’, col3=7
        …
           60              431                                      Table Records


       - Index entries are sorted. When an indexed column is used with ORDER BY,
       filesort (sorting all records) can be skipped




Copyright 2010 Sun Microsystems inc                               The World’s Most Popular Open Source Database   39
Sorting and Indexing (2)
  SELECT * FROM tbl WHERE key1 < 30 ORDER BY col2


                                  Branch 1
                               - 60      Leaf 1                              sorting by col2
                              - 120      Leaf 2

                Leaf 1                            5: col2=‘aaa’, col3=10
          key1             PK
           1              10000                                        10000: col2=‘abc’, col3=100
           2                5
           3              15321                    15321: col2=‘a’, col3=7
        …
           60              431                                      Table Records


-If column(s) with ORDER BY are not indexed, sorting all matched entries is required
-EXPLAIN
      - Extra: Using filesort
-Calculation time is O(NlogN)


Copyright 2010 Sun Microsystems inc                               The World’s Most Popular Open Source Database   40
Sorting and Indexing (3)
  SELECT * FROM tbl WHERE key1 < 30 ORDER BY key2


                                  Branch 1
                               - 60      Leaf 1                              Sorting by key2
                              - 120      Leaf 2

                Leaf 1                            5: col2=‘aaa’, col3=10
          key1             PK
           1              10000                                        10000: col2=‘abc’, col3=100
           2                5
           3              15321                    15321: col2=‘a’, col3=7
        …
           60              431                                      Table Records


  - You have two indexes, key1 and key2
  - key1 or key2 is used. Both indexes can not be used at the same time
  - If key2 is used, filesort does not happen. But key1 is not used to filter records
  so full scan (full index scan) happens
  - MySQL optimizer chooses key1 or key2 (cost based)

Copyright 2010 Sun Microsystems inc                               The World’s Most Popular Open Source Database   41
Note: Index merge
 SELECT * FROM tbl WHERE key1 = 2 AND key2 = 3

 Key1’s Leaf Block                             Key2’s Leaf Block             5: col2=‘aaa’, col3=10
  key1    RowID                                 key2    RowID
   1      10000                                  1         10
   2         4                                   1         20
   2        537                                  1         30
                                                                                999: col2=‘a’, col3=7
   2        999                        merge     2        500
   3        100                                  3        100                       table records
   3        200                                  3        200
   3        300                  4      100      3        300
   4        400                     999 200      3        999
    …                           537    300


-Key 1 and Key2 are different indexes each other
-One access for key1, One access for key2, merging 7 entries, one access on the data
-The more records matched, the more overhead is added
-Index Merge can be used to filter records, but can not be used to skip sorting

 Copyright 2010 Sun Microsystems inc                          The World’s Most Popular Open Source Database   42
ORDER BY LIMIT N

           SELECT * FROM tbl WHERE cond ORDER BY keyX LIMIT 20



                What MySQL query execution plans can be considered ?




Copyright 2010 Sun Microsystems inc                    The World’s Most Popular Open Source Database   43
Plan A: Using cond as an index
         Using cond as an index, sorting matched records, returning top 20

         mysql> EXPLAIN SELECT * FROM tbl WHERE cond < 10
         -----> ORDER BY keyX LIMIT 20G
         *************************** 1. row ***************************
                    id: 1
           select_type: SIMPLE
                 table: tbl
                  type: range
         possible_keys: cond
                   key: cond
               key_len: 5
                   ref: NULL
                  rows: 10
                 Extra: Using where; Using filesort
         1 row in set (0.00 sec)

                 If cond is very complex, this plan might not be possible
Copyright 2010 Sun Microsystems inc                      The World’s Most Popular Open Source Database   44
Plan B: Using keyX as an index
          Using keyX as an index, skipping sorting, checking cond one by one,
         stopping scanning when 20 records match criteria


        mysql> EXPLAIN SELECT * FROM tbl WHERE cond < 10
        -----> ORDER BY keyX LIMIT 20G
        *************************** 1. row ***************************
                   id: 1
          select_type: SIMPLE
                table: tbl
                 type: index
        possible_keys: NULL
                  key: keyX
              key_len: 5
                  ref: NULL
                 rows: 20
                Extra: Using where
        1 row in set (0.00 sec)

Copyright 2010 Sun Microsystems inc                 The World’s Most Popular Open Source Database   45
Plan C: Full table scan
 Scanning whole table, filtering by cond, sorting, then returning top 20 records


mysql> EXPLAIN SELECT * FROM tbl WHERE cond < 10
-----> ORDER BY keyX LIMIT 20G
*************************** 1. row ***************************
 id: 1
  select_type: SIMPLE
         table: tbl
          type: ALL
possible_keys: NULL
           key: NULL
       key_len: NULL
           ref: NULL
          rows: 4012
         Extra: Using where; Using filesort
1 row in set (0.00 sec)


Copyright 2010 Sun Microsystems inc               The World’s Most Popular Open Source Database   46
Which plan is the fastest ?
           SELECT * FROM tbl WHERE cond ORDER BY keyX LIMIT 20

  A: Using cond as an index, sorting matched records, returning top 20
 (type=range, key=cond, Using filesort)

 B: Using keyX as an index, skipping sorting, checking cond one by one,
stopping scanning when 20 records match criteria
(type=index, key=keyX)

 C: Scanning whole table, filtering by cond, sorting, then returning top 20 records
(type=ALL, key=NULL, Using filesort)




Copyright 2010 Sun Microsystems inc                    The World’s Most Popular Open Source Database   47
The fastest execution plan depends on data
  SELECT * FROM tbl WHERE cond < 10 ORDER BY keyX LIMIT 20

A. Using cond as an index
                                                          When small # of records match
     Leaf Block
                                                          cond < 10, plan A should be the fastest.
  cond       RowID
    1         10000                                       Otherwise massive random reads happen
    2           5                             Sorting by keyX
    3         15321
 …                                              Returning top20
   10              431           Rows                     When many records match
                                                          cond < 10, plan B should be the fastest.
                                                          Otherwise massive random reads happen
B. Using keyX as an index
                                                          When both A and B are slow,
      Leaf Block                                          C (full table scan) should be the fastest.
   keyX       RowID
    aaa         250                          Stopping when 20 rows
    bbb        5553                          meet criteria: cond < 10
    ccc          51
  …
                                      Rows        - Database Optimizer can not decide the
    zzz              732                          fastest plan without reading records
                                                  - Sometimes Optimizer chooses a slower plan.
Copyright 2010 Sun Microsystems inc                                     The World’s Most Popular Open Source Database   48
Example case: DBT-1 (similar to TPC-W)
SELECT i_id, i_title, a_fname, a_lname FROM item, author
WHERE item.i_title LIKE '%AAA%' AND item.i_a_id = author.a_id
ORDER BY item.i_title ASC LIMIT 50;
*************************** 1. row ***************************
  select_type: SIMPLE
        table: item
         type: index
possible_keys: i_i_a_id
          key: i_i_title
      key_len: 63
          ref: NULL
         rows: 10005
        Extra: Using where
*************************** 2. row ***************************
  select_type: SIMPLE              - Item table has 10000 records
        table: author              - Author table has 2500 records
         type: eq_ref              - Index i_title on item
possible_keys: PRIMARY             - Join from item to author
          key: PRIMARY             - Primary key a_id on author
      key_len: 5                   - Index on i_title can not be used for filtering
          ref: test.item.i_a_id    - Plan B was chosen in this case
         rows: 1                   - But almost no record matched WHERE condition
        Extra:
Copyright 2010 Sun Microsystems inc                   The World’s Most Popular Open Source Database   49
type=index, without covering index is not efficient
              SELECT i_id, i_title, a_fname, a_lname FROM item, author
              WHERE i_title LIKE '%aaa%' AND i_a_id = a_id
              ORDER BY i_title ASC LIMIT 50;


                 Leaf 1                         PK, i_a_id, …
          i_title      i_id(PK)
            …             …                                               PK, i_a_id, …
            …             …           …
            …             …                      PK, i_a_id, …
         …
            …                …                                  Table Records

- type=index means Full index scan
- Reading records one by one (random access!)
- Check where conditions
- Joining author table
- Stopping scanning when 50 records meet criteria
- In this case, only 5 records meet criteria. -> Scanning all index entries & random accesses
- Full table scan is better plan (in ideal, fulltext search is the best)


Copyright 2010 Sun Microsystems inc                               The World’s Most Popular Open Source Database   50
Query execution plan, TPS, and CPU scalability
                                                    DBT-1 Throughput

                         3500
                                                                             type=ALL (IGNORE INDEX)
                         3000
     Throughput (BT/s)




                         2500                                                              8 cores
                         2000                                                              4 cores
                         1500                                                              8 cores, bad index
                         1000                                                              4 cores, bad index
                          500
                            0                                            type=index (default plan)
                                4   6   7   8    10 12 14 16       20   24   28
                                                # of connections



             - Full index scan and massive random reads caused serious
             global mutex contentions inside InnoDB, which degraded CPU scalability




Copyright 2010 Sun Microsystems inc                                      The World’s Most Popular Open Source Database   51
Conclusion

       • Query Execution Plan is very important
              – Always be careful about EXPLAIN plan


       • In some cases, control query execution plan by
         yourself
              – Covering index on primary key, etc


       • Index can be used to reduce record-lock contentions




Copyright 2010 Sun Microsystems inc                  The World’s Most Popular Open Source Database   52
Enjoy the conference !

       • The slides will be published at Slideshare very soon

       • My talks on Thursday
              – SSD Deployment Strategies for MySQL
                     • April 15th (Thu), 14:00-14:45, Ballroom E




Copyright 2010 Sun Microsystems inc                          The World’s Most Popular Open Source Database   53
Copyright 2010 Sun Microsystems inc   The World’s Most Popular Open Source Database   54

More Related Content

More mastering the art of indexing

  • 1. More Mastering the Art of Indexing Yoshinori Matsunobu Lead of MySQL Professional Services APAC Sun Microsystems [email protected] Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 1
  • 2. Table of contents • Case 1: Lock contention and indexing • Case 2: Deadlock caused by indexes • Case 3: Covering index and range scan / LIMIT • Case 4: Covering index and long text/blob • Case 5: Sorting, indexing and query execution plans This is a second half of “Mastering the art of indexing” session. (Independent from the first half. No prerequisite) The first half session was presented last year. http://www.mysqlconf.com/mysql2009/public/schedule/detail/6661 Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 2
  • 3. Speaker’s profile • Name: Yoshinori Matsunobu • Living in Tokyo, Japan • Leading MySQL Consulting Business in Japan and APAC at Sun Microsystems (Oracle) • Joined MySQL inc as a consultant in September 2006 – Before joining MySQL, I worked at Sony Corporation as a software architect for 5.5 years, using Oracle/WebLogic/SAP R3 and MySQL/JBoss • Published a couple of MySQL/Linux/Java books/articles (but all written in Japanese) • Contact: – E-mail: [email protected] – Blog http://yoshinorimatsunobu.blogspot.com – @matsunobu on Twitter Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 3
  • 4. Case 1: Lock contention user_id(PK) name status … • “user” table (InnoDB) 1 Ronald 0 … • 1 million records … … … … • PK on user_id 100 Messi 0 • No index on status column … … … • Auto-committed queries 1000000 Raul 9 … Session 1 Session 2 1) DELETE FROM user WHERE status = 9; … 2) UPDATE user SET status=9 WHERE user_id=100; … (taking long time to scan & delete) … (waiting for query 1 to finish) … … Query OK, 1000 rows affected (13.66 sec) Query OK, 1 row affected (11.27 sec) Rows matched: 1 Changed: 1 Warnings: 0 Why query 2 was blocked by query 1? Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 4
  • 5. Understanding how statement based binary logging works 1) DELETE FROM user user_id(PK) name status … WHERE status = 9; a) 1 Ronald 0 … … … … … b) 100 Messi 0 … … … d) Fully scanning user table, 1000000 Raul 9 … deleting if status == 9 c) 2) UPDATE user SET status=9 WHERE user_id=100; What happens if query 1) does not lock row: user_id=100 ? - 2) finishes before 1) - 1) already checked row: user_id=100, Binary log: the row was not deleted because status==0 1. UPDATE user SET status=9 WHERE user_id=100; - The final result of the row 2. DELETE FROM user WHERE status = 9; --> exists, status=9 user_id=100 on slaves: Not exists Data consistency is broken! Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 5
  • 6. Next-Key Locking in InnoDB • Locking not only “modified” rows, but also “scanned” rows • To avoid master/slave mismatch – And to make point-in-time-recovery from binary logs work • UPDATE/DELETE … WHERE … sets exclusive next- key lock on every record the search encounters. • INSERT … SELECT … put a shared next key lock on all the selected rows. • Disadvantage: low concurrency Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 6
  • 7. If status column is indexed 1) DELETE FROM user WHERE status = 9; 2) UPDATE user SET status=1 WHERE user_id=100; status PK user_id(PK) name status … 0 100 1 Ronald 0 … 1 100 … … … … … … 100 Messi 0 -> 1 9 12345 9 … … … … 9 1000000 1000000 Raul 9 … • 1) and 2) can run in parallel (concurrency improved!) – If query 2 sets status=9, it is blocked by query 1 • Even though cardinality on status column is very low, indexing is still helpful to avoid massive row-lock contentions Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 7
  • 8. Sequential Access vs Random Access • Updating/deleting rows by full table scan does sequential reads/writes • Updating/deleting rows by index scan does random reads/writes • Full table scan is not always worse than index scan Index scan for 2mil rows vs Full scan for 100mil rows 900rows/s Index scan (buffer pool=10G) 297rows/s Index scan (buffer pool=5G) 502,310rows/s Full table scan 0 1000 2000 3000 4000 5000 6000 7000 8000 seconds Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 8
  • 9. Read Committed + Row Based Binary Logging • Next-key locking can be disabled with --binlog-format=row and --transaction-isolation=read-committed in MySQL 5.1 Session 1 Session 2 1) DELETE FROM user WHERE status = 9; … 2) UPDATE user SET status=9 WHERE … (taking long time to scan & delete) user_id=100; … Query OK, 1 row affected (0.00 sec) … Rows matched: 1 Changed: 1 Warnings: 0 Query OK, 1000 rows affected (13.66 sec) • Performance disadvantages: – Row based binary logging – Read Committed is less efficient than repeatable read with many concurrent sessions in InnoDB (See http://www.facebook.com/note.php?note_id=244956410932) • In 5.0 or earlier: innodb_locks_unsafe_for_binlog – Enable only if you do not use binlog or you do not care about data consistency Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 9
  • 10. Table of contents • Case 1: Lock contention and indexing • Case 2: Deadlock caused by indexes • Case 3: Covering index and range scan / LIMIT • Case 4: Covering index and long text/blob • Case 5: Sorting, indexing and query execution plans Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 10
  • 11. Case 2: Deadlock user_id(PK) name status … - user table 1 Ronald 0 … - Index on status column 2 John 1 … … … … - Auto committed … … … 10000000 Raul 9 … Session 1 Session 2 mysql> UPDATE user SET status=4 mysql> UPDATE user SET status=2 where status=1 ORDER BY user_id LIMIT 1 ; where user_id=2; Query OK, 1 rows affected (0.00 sec) ERROR 1213 (40001): Deadlock found when Rows matched: 1 Changed: 1 Warnings: 0 trying to get lock; try restarting transaction Why auto-committed, single-row updating queries caused deadlock error? Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 11
  • 12. Updating(Deleting) single row is not single step 1) UPDATE user SET status=4,… where 2) UPDATE user SET status=2 status=1 ORDER BY user_id LIMIT 1 ; where user_id=2; b status PK user_id(PK) name status … a 0 100 1 Ronald 0 … c (wait) 1 2 2 John 1 … 1 10 d (wait) 100 Vieri 0 -> 1 -> DL 1 13 … … … … … 1000000 Raul 9 … Secondary Index on status Clustered Index 1) a Exclusive Lock on status = 1, getting PK 2) b Exclusive Lock on PRIMARY = 2, c Exclusive Lock on PRIMARY = 2 getting values (status=1) -> Waiting for b d Exclusive Lock on status = 1 -> Waiting for c -> Deadlock! Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 12
  • 13. Table of contents • Case 1: Lock contention and indexing • Case 2: Deadlock caused by indexes • Case 3: Covering index and range scan / LIMIT • Case 4: Covering index and long text/blob • Case 5: Sorting, indexing and query execution plans Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 13
  • 14. Example table “diary” table CREATE TABLE diary ( diary_id INT UNSIGNED AUTO_INCREMENT, user_id INT UNSIGNED NOT NULL, post_date TIMESTAMP NOT NULL, status TINYINT UNSIGNED NOT NULL, rating FLOAT NOT NULL, title VARCHAR(100) NOT NULL, body TEXT, PRIMARY KEY (diary_id), INDEX user_date(user_id) ) CHARSET utf8 ENGINE=InnoDB; Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 14
  • 15. Wide-range queries are not fast SELECT diary_id FROM diary WHERE user_id=1 AND status=0 AND post_date >= '2009-03-01 00:00:00'; Branch 1 - 20 Leaf 1 - 40 Leaf 2 Leaf Block 1 5: post_date=‘2009-03-01..’, status=0 user_id RowID 1 5 10000: post_date=‘2009-04-04..’, status=0 1 10000 1 15321 15321: post_date=‘2009-04-23…’, status=0 … … 10 10 table records - If 100 entries match user_id=1, 100 random disk reads might happen - One random read for a leaf block, but 100 random reads for table records - Single HDD can do only 100-200 random disk reads per second (very slow) Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 15
  • 16. Covering Index: Reading only an index SELECT diary_id FROM diary WHERE user_id=1 AND status=0 AND post_date >= '2009-03-01 00:00:00'; Branch 1 20 Leaf 1 - 120 Leaf 2 Leaf 1 5: post_date=‘2009-03-01..’, status=0 user_id post_date status RowID 1 2009-03-29 0 1000 1 2009-03-30 0 10000 10000: post_date=‘2009-04-04..’, status=0 1 2009-03-31 0 5 1 2009-04-01 0 15321 15321: post_date=‘2009-04-23…’, status=0 1 2009-04-30 0 100 .. … .. 400 table records - If all columns in the SQL statement (SELECT/WHERE/etc) are contained within single index, MySQL chooses “Covering Index” execution plan - Very efficient because random disk i/o does not happen - In InnoDB, RowID is PK (diary_id) so covering index can be used more frequently - status column is not useful to filter records, but useful to make it Covering Index Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 16
  • 17. Covering Index = “Using index” > explain select count(ind) from t > explain select count(c) from t id: 1 id: 1 select_type: SIMPLE select_type: SIMPLE table: t table: t type: index type: ALL possible_keys: NULL possible_keys: NULL key: ind key: NULL key_len: 5 key_len: NULL ref: NULL ref: NULL rows: 100000181 rows: 100000181 Extra: Using index Extra: mysql> select count(ind) from t; mysql> select count(c) from t; +---------------+ +-----------+ | count(ind) | | count(c) | +---------------+ +-----------+ | 100000000 | | 100000000 | +---------------+ +-----------+ 1 row in set (15.98 sec) 1 row in set (28.99 sec) Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 17
  • 18. LIMIT without covering index is not fast SELECT diary_id FROM diary WHERE user_id=1 AND status=0 AND post_date >= '2009-03-01 00:00:00' ORDER BY post_date LIMIT 30, 10; Branch 1 - 20 Leaf 1 - 40 Leaf 2 checking status=0 or not Leaf 1 user_id post_date RowID 1 1 2009-03-29 4 1 1 2009-03-30 10000 2 1 2009-03-31 5 … … 40 table records 1 2009-04-30 200 … 1 2009-05-13 20000 .. … 400 - LIMIT 30,10 requires at least 40 random reads - If most of records are not status=0, many more random reads will happen Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 18
  • 19. LIMIT with covering index is fast SELECT diary_id FROM diary WHERE user_id=1 AND status=0 AND post_date >= '2009-03-01 00:00:00' ORDER BY post_date LIMIT 30, 10; Branch 1 20 Leaf 1 - 120 Leaf 2 Leaf 1 5: post_date=‘2009-03-01..’, status=0 user_id post_date status RowID 1 2009-03-29 0 4 1 2009-03-30 0 10000 10000: post_date=‘2009-04-04..’, status=0 1 2009-03-31 0 5 1 2009-04-01 0 15321 15321: post_date=‘2009-04-23…’, status=0 1 2009-04-30 0 100 1 2009-05-30 0 200 table records 1 2009-06-13 0 20000 .. … .. 400 - Covering index scan is sequential access - All entries are likely to reside in the same leaf block - Single disk i/o is enough to get all data - Reading 40 records or 10 records does not matter Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 19
  • 20. LIMIT performance example SELECT diary_id FROM diary WHERE user_id=? AND status=0 AND post_date >= '2009-03-01 00:00:00' ORDER BY post_date LIMIT X, 10; Time to execute from 50 clients Normal Index Covering Index LIMIT 0, 10 1.787s 0.800s LIMIT 30, 10 5.173s 0.831s • Disk read happened in both cases (worst case example) • # of random disk reads was N times higher on Normal Index + LIMIT • # of disk reads was almost equal on Covering Index + LIMIT • Similar effects apply to COUNT records Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 20
  • 21. What if covering index can not be used? 1) SELECT diary_id FROM diary WHERE user_id=1 AND status=0 AND post_date >= '2009-03-01 00:00:00' ORDER BY post_date LIMIT 0, 10; 2) Remember the highest post_date (in HIDDEN HTML tag, etc) i.e. 2009-04-29 3) SELECT diary_id FROM diary WHERE user_id=1 AND status=0 AND post_date > '2009-04-29 00:00:00' ORDER BY post_date LIMIT 0, 10; Leaf 1 user_id post_date RowID 1 checking status=0 or not 1 2009-03-29 4 1 2009-03-30 10000 1 2009-03-31 5 … 1 2009-04-29 5 1 2009-04-30 200 table records 1 … 1 2009-05-13 20000 .. … 400 - Use LIMIT 0, X (OFFSET 0) so that you can minimize the number of random reads Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 21
  • 22. Table of contents • Case 1: Lock contention and indexing • Case 2: Deadlock caused by indexes • Case 3: Covering index and range scan / LIMIT • Case 4: Covering index and long text/blob • Case 5: Sorting, indexing and query execution plans Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 22
  • 23. “diary” table example CREATE TABLE diary ( - body is about 1KB/row diary_id INT UNSIGNED - The rest columns are less than AUTO_INCREMENT, 50B/row in total user_id INT UNSIGNED NOT NULL, - 20 million rows (20+GB) in total post_date TIMESTAMP NOT NULL, - Almost INSERT or SELECT only status TINYINT UNSIGNED NOT NULL, rating FLOAT NOT NULL, - 90% SELECT statements do not fetch body title VARCHAR(100) NOT NULL, body TEXT, SELECT user_id, post_date, title PRIMARY KEY (diary_id), FROM diary WHERE diary_id=? INDEX user_date(user_id, post_date), - 10% SELECT statements fetch body INDEX user_rating(user_id, rating) SELECT body FROM diary WHERE ) CHARSET utf8 ENGINE=InnoDB; diary_id=? Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 23
  • 24. Page(block) structure in InnoDB Page Header Row Row diary_id user_id post_date … Row body (prefix) rating title (768B) Row Row Row Row Row (20B in InnoDB Plugin’s DYNAMIC format) Row position info Page Trailer The rest body 1 Page(block) = 16KB Same page or Overflow Page - Storing the rest body within the same page if space is available. - If not, storing it to a separated page called “Overflow Page”. - “diary” table is insert-mostly, so the rest body is stored in the same page in most cases Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 24
  • 25. Large TEXT/BLOB slows down all queries diary_id user_id post_date title body (BIGINT PK) (BIGINT, INDEX) (DATETIME, INDEX) (VARCHAR(100)) (TEXT) 1 5544321 2009/09/13 21:10:14 MySQL Cluster overview …….(2000bytes) 2 5544321 2009/10/13 22:13:34 UEFA Champions League …….(700bytes) 3 2345 2009/11/7 22:12:23 巨人・7年ぶりの日本一 …….(3000bytes) SELECT user_id, post_date, title FROM diary SELECT body FROM diary WHERE diary_id=? WHERE diary_id=? 90% queries 10% queries diary_id user_id post_date title body Block diary_id user_id post_date title body diary_id user_id post_date title body InnoDB Buffer Pool InnoDB Data File ・ Even though 90% queries don’t fetch body, body is loaded into buffer pool because body resides in the same block ・Body values occupy most of InnoDB buffer pool space. Less # of records will be cached Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 25
  • 26. Optimization approach - 1 to 1 relationship CREATE TABLE diary_head ( diary_id INT UNSIGNED AUTO_INCREMENT, user_id INT UNSIGNED NOT NULL, post_date TIMESTAMP NOT NULL, status TINYINT UNSIGNED NOT NULL, rating FLOAT NOT NULL, title VARCHAR(100) NOT NULL, -Two tables, both have diary_id as primary key PRIMARY KEY (diary_id), -diary_head has all columns except body INDEX user_date(user_id, post_date), -diary_body has only body and pk INDEX user_rating(user_id, rating) -diary_body can be NoSQL ) CHARSET utf8 ENGINE=InnoDB; -Normalization is broken CREATE TABLE diary_body ( diary_id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, body TEXT ) CHARSET utf8 ENGINE=InnoDB; Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 26
  • 27. Table Size PRIMARY KEY (+ Secondary Indexes records) diary 24GB 1.2G diary_head 1.2GB 1.2G diary_body 22GB 0 - Since body is by far the biggest column, diary_head table becomes much smaller - 90% SELECT statements access diary_head SELECT user_id, post_date, title FROM diary_head WHERE diary_id=? - 10% SELECT statements access diary_body SELECT body FROM diary_body WHERE diary_id=? Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 27
  • 28. Why 1:1 relationship is effective? SELECT user_id, post_date, title FROM diary_head SELECT body FROM diary_body WHERE diary_id=? WHERE diary_id=? 90% queries 10% queries diary_id user_id post_date title … Block diary_id user_id post_date title Block body body diary_id user_id post_date title … InnoDB Buffer Pool InnoDB Data File, InnoDB Log File - 90% queries do not read blocks that contain body - Blocks in diary_head table are (much) less frequently cached out Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 28
  • 29. Queries per second Queries that contain diary table 1:1 relationship Improvement body column (qps) (qps) 2% 323.61 15166.22 46.9x 5% 333.78 6567.11 19.7x 10% 352.27 3215.27 9.13x 20% 395.81 1370.24 3.46x 33% 474.84 782.51 1.45x 50% 632.87 539.86 0.85x SELECT user_id, post_date, title FROM diary_head WHERE diary_id=? 2,5,..50%: SELECT body FROM diary_body WHERE diary_id=? • If only small number of queries (20% or less) fetch body column, 1:1 relationship is beneficial in this case Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 29
  • 30. Covering Index as an alternative • Covering index can be a replacement of 1:1 relationship, without breaking normalization • ALTER TABLE diary ADD INDEX diary_covering (diary_id, user_id, post_date, status, rating, title); – Including all columns except body • Execution plan of “SELECT user_id, post_date, … FROM diary WHERE diary_id=?” should be “Using Index (Covering Index)” ! Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 30
  • 31. Covering index that covers all columns except body SELECT user_id, post_date, rating, status, title FROM diary WHERE diary_id=5” Branch 1 -100 Leaf 1 - 200 Leaf 2 Leaf 1 diary_id user_id … RowID 5: post_date=‘2009-03-01..’, … body 1 1000 … 1 2 2 … 2 3 10000 … 3 10000: post_date=‘2009-04-04..’, … body 4 351 … 4 5 1352 … 5 15321: post_date=‘2009-04-23…’, … body 6 930 … 6 7 444 … 7 table records .. … .. … • This query’s execution plan should be “Using Index” (covering index) • Index size is much smaller than diary table (close to diary_head) • Should be very well cached, so it should be fast Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 31
  • 32. Be careful about query execution plan, always mysql> EXPLAIN SELECT user_id, post_date, title FROM diary WHERE diary_id=5; *************************** 1. row *************************** id: 1 select_type: SIMPLE table: diary type: const possible_keys: PRIMARY,diary_covering key: PRIMARY key_len: 4 ref: const rows: 1 Extra: 1 row in set (0.00 sec) • PRIMARY is Clustered Index in InnoDB – All colums *including body* will be accessed • Why MySQL did not use “diary_covering”? Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 32
  • 33. Difference between type=const and type=ref • const: Unique lookup (guaranteed by Primary/Unique index) • ref: Non-unique lookup • MySQL prioritizes “const” plan over “ref” plan – Unique key lookup reads at most one record so this is the fastest if we do not consider row length at all • Using diary_covering is non-unique lookup – Even though it actually returns at most one record • Control optimizer plan by FORCE INDEX – SELECT user_id, post_date, title FROM diary FORCE INDEX (diary_covering) WHERE diary_id=? Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 33
  • 34. Optimized query execution plan mysql> EXPLAIN SELECT user_id, post_date, title FROM diary FORCE INDEX (diary_covering) WHERE diary_id = 5 G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: diary type: ref possible_keys: diary_covering key: diary_covering key_len: 4 ref: const rows: 1 Extra: Using index 1 row in set (0.00 sec) Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 34
  • 35. Queries per second – Covering index Queries that diary table Covering Index 1:1 relationship Up (normal contain body ->covering) column 2% 323.61 14275.93 15166.22 44.11x 5% 333.78 6174.59 6567.11 18.50x 10% 352.27 3198.16 3215.27 9.08x 20% 395.81 1557.52 1370.24 3.94x 33% 474.84 852.98 782.51 1.80x 50% 632.87 550.08 539.86 0.87x SELECT user_id, post_date, title FROM diary_head FORCE INDEX(diary_covering) WHERE diary_id=? 2,5,..50%: SELECT body FROM diary_body WHERE diary_id=? • Covering index is very effective – close or above 1:1 relationship Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 35
  • 36. What about insertion time and table size? - In general, adding indexes slows down insertion time - 1:1 relationship requires to execute INSERT twice Normal diary table 1:1 relationship Covering index version Time to insert 3 hours 4 min 3 hours 17 min 3 hours 9 min 20million records Table size 24GB 1.4GB + 22GB 24GB Secondary Index 1.2GB 1.2GB + 0 2.2GB size • No big difference on insertion time • SQL statement parsing overhead is relatively small on disk i/o bound workloads • “diary_covering” index was sorted by diary_id (AUTO_INC PK), so index entries were sequentially inserted • Secondary Index size was about 1.0GB bigger. It depends on covered columns Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 36
  • 37. Note: Sequential order INSERT is fast INSERT INTO diary (diary_id, user_id, …) VALUES (NULL, 3, …) Leaf Block 1 Leaf Block 1 Leaf Block 2 diary_id user_id, etc RowID diary_id user_id, etc RowID diary_id user_id, etc RowID 1 … 1 1 … 1 61 … 61 2 … 2 2 … 2 3 … 3 3 … 3 … … Empty 60 … 60 60 … 60 “diary_covering” index ・No fragmentation ・Small number of blocks, small size ・Highly recommended for InnoDB PRIMARY KEY All entries are inserted here: cached in memory Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 37
  • 38. Table of contents • Case 1: Lock contention and indexing • Case 2: Deadlock caused by indexes • Case 3: Covering index and range scan / LIMIT • Case 4: Covering index and long text/blob • Case 5: Sorting, indexing and query execution plans Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 38
  • 39. Sorting and Indexing SELECT * FROM tbl WHERE key1 < 30 ORDER BY key1 Branch 1 - 60 Leaf 1 - 120 Leaf 2 Leaf 1 5: col2=‘aaa’, col3=10 key1 PK 1 10000 10000: col2=‘abc’, col3=100 2 5 3 15321 15321: col2=‘a’, col3=7 … 60 431 Table Records - Index entries are sorted. When an indexed column is used with ORDER BY, filesort (sorting all records) can be skipped Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 39
  • 40. Sorting and Indexing (2) SELECT * FROM tbl WHERE key1 < 30 ORDER BY col2 Branch 1 - 60 Leaf 1 sorting by col2 - 120 Leaf 2 Leaf 1 5: col2=‘aaa’, col3=10 key1 PK 1 10000 10000: col2=‘abc’, col3=100 2 5 3 15321 15321: col2=‘a’, col3=7 … 60 431 Table Records -If column(s) with ORDER BY are not indexed, sorting all matched entries is required -EXPLAIN - Extra: Using filesort -Calculation time is O(NlogN) Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 40
  • 41. Sorting and Indexing (3) SELECT * FROM tbl WHERE key1 < 30 ORDER BY key2 Branch 1 - 60 Leaf 1 Sorting by key2 - 120 Leaf 2 Leaf 1 5: col2=‘aaa’, col3=10 key1 PK 1 10000 10000: col2=‘abc’, col3=100 2 5 3 15321 15321: col2=‘a’, col3=7 … 60 431 Table Records - You have two indexes, key1 and key2 - key1 or key2 is used. Both indexes can not be used at the same time - If key2 is used, filesort does not happen. But key1 is not used to filter records so full scan (full index scan) happens - MySQL optimizer chooses key1 or key2 (cost based) Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 41
  • 42. Note: Index merge SELECT * FROM tbl WHERE key1 = 2 AND key2 = 3 Key1’s Leaf Block Key2’s Leaf Block 5: col2=‘aaa’, col3=10 key1 RowID key2 RowID 1 10000 1 10 2 4 1 20 2 537 1 30 999: col2=‘a’, col3=7 2 999 merge 2 500 3 100 3 100 table records 3 200 3 200 3 300 4 100 3 300 4 400 999 200 3 999 … 537 300 -Key 1 and Key2 are different indexes each other -One access for key1, One access for key2, merging 7 entries, one access on the data -The more records matched, the more overhead is added -Index Merge can be used to filter records, but can not be used to skip sorting Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 42
  • 43. ORDER BY LIMIT N SELECT * FROM tbl WHERE cond ORDER BY keyX LIMIT 20 What MySQL query execution plans can be considered ? Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 43
  • 44. Plan A: Using cond as an index Using cond as an index, sorting matched records, returning top 20 mysql> EXPLAIN SELECT * FROM tbl WHERE cond < 10 -----> ORDER BY keyX LIMIT 20G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: tbl type: range possible_keys: cond key: cond key_len: 5 ref: NULL rows: 10 Extra: Using where; Using filesort 1 row in set (0.00 sec) If cond is very complex, this plan might not be possible Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 44
  • 45. Plan B: Using keyX as an index Using keyX as an index, skipping sorting, checking cond one by one, stopping scanning when 20 records match criteria mysql> EXPLAIN SELECT * FROM tbl WHERE cond < 10 -----> ORDER BY keyX LIMIT 20G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: tbl type: index possible_keys: NULL key: keyX key_len: 5 ref: NULL rows: 20 Extra: Using where 1 row in set (0.00 sec) Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 45
  • 46. Plan C: Full table scan Scanning whole table, filtering by cond, sorting, then returning top 20 records mysql> EXPLAIN SELECT * FROM tbl WHERE cond < 10 -----> ORDER BY keyX LIMIT 20G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: tbl type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 4012 Extra: Using where; Using filesort 1 row in set (0.00 sec) Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 46
  • 47. Which plan is the fastest ? SELECT * FROM tbl WHERE cond ORDER BY keyX LIMIT 20 A: Using cond as an index, sorting matched records, returning top 20 (type=range, key=cond, Using filesort)  B: Using keyX as an index, skipping sorting, checking cond one by one, stopping scanning when 20 records match criteria (type=index, key=keyX)  C: Scanning whole table, filtering by cond, sorting, then returning top 20 records (type=ALL, key=NULL, Using filesort) Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 47
  • 48. The fastest execution plan depends on data SELECT * FROM tbl WHERE cond < 10 ORDER BY keyX LIMIT 20 A. Using cond as an index When small # of records match Leaf Block cond < 10, plan A should be the fastest. cond RowID 1 10000 Otherwise massive random reads happen 2 5 Sorting by keyX 3 15321 … Returning top20 10 431 Rows When many records match cond < 10, plan B should be the fastest. Otherwise massive random reads happen B. Using keyX as an index When both A and B are slow, Leaf Block C (full table scan) should be the fastest. keyX RowID aaa 250 Stopping when 20 rows bbb 5553 meet criteria: cond < 10 ccc 51 … Rows - Database Optimizer can not decide the zzz 732 fastest plan without reading records - Sometimes Optimizer chooses a slower plan. Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 48
  • 49. Example case: DBT-1 (similar to TPC-W) SELECT i_id, i_title, a_fname, a_lname FROM item, author WHERE item.i_title LIKE '%AAA%' AND item.i_a_id = author.a_id ORDER BY item.i_title ASC LIMIT 50; *************************** 1. row *************************** select_type: SIMPLE table: item type: index possible_keys: i_i_a_id key: i_i_title key_len: 63 ref: NULL rows: 10005 Extra: Using where *************************** 2. row *************************** select_type: SIMPLE - Item table has 10000 records table: author - Author table has 2500 records type: eq_ref - Index i_title on item possible_keys: PRIMARY - Join from item to author key: PRIMARY - Primary key a_id on author key_len: 5 - Index on i_title can not be used for filtering ref: test.item.i_a_id - Plan B was chosen in this case rows: 1 - But almost no record matched WHERE condition Extra: Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 49
  • 50. type=index, without covering index is not efficient SELECT i_id, i_title, a_fname, a_lname FROM item, author WHERE i_title LIKE '%aaa%' AND i_a_id = a_id ORDER BY i_title ASC LIMIT 50; Leaf 1 PK, i_a_id, … i_title i_id(PK) … … PK, i_a_id, … … … … … … PK, i_a_id, … … … … Table Records - type=index means Full index scan - Reading records one by one (random access!) - Check where conditions - Joining author table - Stopping scanning when 50 records meet criteria - In this case, only 5 records meet criteria. -> Scanning all index entries & random accesses - Full table scan is better plan (in ideal, fulltext search is the best) Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 50
  • 51. Query execution plan, TPS, and CPU scalability DBT-1 Throughput 3500 type=ALL (IGNORE INDEX) 3000 Throughput (BT/s) 2500 8 cores 2000 4 cores 1500 8 cores, bad index 1000 4 cores, bad index 500 0 type=index (default plan) 4 6 7 8 10 12 14 16 20 24 28 # of connections - Full index scan and massive random reads caused serious global mutex contentions inside InnoDB, which degraded CPU scalability Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 51
  • 52. Conclusion • Query Execution Plan is very important – Always be careful about EXPLAIN plan • In some cases, control query execution plan by yourself – Covering index on primary key, etc • Index can be used to reduce record-lock contentions Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 52
  • 53. Enjoy the conference ! • The slides will be published at Slideshare very soon • My talks on Thursday – SSD Deployment Strategies for MySQL • April 15th (Thu), 14:00-14:45, Ballroom E Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 53
  • 54. Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 54