ぼくとMySQLの3日間戦争(予告編)
こんにちは、DBAのたなかです。
聞いてください。
ある日、突然、MySQLが
Number of processes running now: 0 130731 00:28:59 mysqld restarted 130731 0:28:59 InnoDB: Database was not shut down normally. InnoDB: Starting recovery from log files... InnoDB: Starting log scan based on checkpoint at InnoDB: log sequence number 141 925270908 InnoDB: Doing recovery: scanned up to log sequence number 141 925276705 130731 0:28:59 InnoDB: Starting an apply batch of log records to the database... InnoDB: Progress in percents: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 InnoDB: Apply batch completed InnoDB: In a MySQL replication slave the last master binlog file InnoDB: position 0 53414028, file name mysql-bin.238 InnoDB: Last MySQL binlog file position 0 922822642, file name ./mysql-bin.060 130731 0:29:00 InnoDB: Flushing modified pages from the buffer pool... 130731 0:29:00 InnoDB: Started /usr/local/mysql4026/libexec/mysqld: ready for connections. Version: '4.0.26-log' socket: '/xxx/mysql.sock' port: 3306 Source distribution 130731 0:29:00 Slave SQL thread initialized, starting replication in log 'mysql-bin.238' at position 53414028, relay log './mysql-relay-bin.735' position: 5648238 130731 0:29:00 Slave I/O thread: connected to master '[email protected]:3306', replication started in log 'mysql-bin.238' at position 53414028 mysqld got signal 11; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=67108864 read_buffer_size=2093056 max_used_connections=0 max_connections=3000 threads_connected=0 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 30773512 K bytes of memory Hope that's ok; if not, decrease some variables in the equation.
落ちた。
しかもまたすぐ
Number of processes running now: 0 130731 00:36:20 mysqld restarted 130731 0:36:20 InnoDB: Database was not shut down normally. InnoDB: Starting recovery from log files... InnoDB: Starting log scan based on checkpoint at InnoDB: log sequence number 141 926344408 InnoDB: Doing recovery: scanned up to log sequence number 141 926357179 130731 0:36:20 InnoDB: Starting an apply batch of log records to the database... InnoDB: Progress in percents: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 InnoDB: Apply batch completed InnoDB: In a MySQL replication slave the last master binlog file InnoDB: position 0 58547047, file name mysql-bin.238 InnoDB: Last MySQL binlog file position 0 922822642, file name ./mysql-bin.060 130731 0:36:21 InnoDB: Flushing modified pages from the buffer pool... 130731 0:36:21 InnoDB: Started /usr/local/mysql4026/libexec/mysqld: ready for connections. Version: '4.0.26-log' socket: '/xxx/mysql.sock' port: 3306 Source distribution 130731 0:36:21 Slave SQL thread initialized, starting replication in log 'mysql-bin.238' at position 58547047, relay log './mysql-relay-bin.736' position: 5133063 130731 0:36:21 Slave I/O thread: connected to master '[email protected]:3306', replication started in log 'mysql-bin.238' at position 58547047 mysqld got signal 11; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=67108864 read_buffer_size=2093056 max_used_connections=0 max_connections=3000 threads_connected=0 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 30773512 K bytes of memory Hope that's ok; if not, decrease some variables in the equation.
落ちた。
幸いにしてサービス系ではなくバックアップ用のスレーブだが、MySQL4.0.26ときたもんだ。。。
まあ、夜だし、mysqld_safeの自動再起動で上がったみたいだし、寝るか。明日ゆっくり見ましょう。再現しないと思うけど。
と思って、次の日、見てみたら、
$ grep "InnoDB: Database was not shut down normally." error.log .. 130731 0:06:54 InnoDB: Database was not shut down normally. 130731 0:14:17 InnoDB: Database was not shut down normally. 130731 0:21:39 InnoDB: Database was not shut down normally. 130731 0:28:59 InnoDB: Database was not shut down normally. 130731 0:36:20 InnoDB: Database was not shut down normally. 130731 0:43:43 InnoDB: Database was not shut down normally. 130731 0:51:06 InnoDB: Database was not shut down normally. 130731 0:58:27 InnoDB: Database was not shut down normally. 130731 1:05:47 InnoDB: Database was not shut down normally. 130731 1:13:09 InnoDB: Database was not shut down normally. 130731 1:20:30 InnoDB: Database was not shut down normally. 130731 1:27:52 InnoDB: Database was not shut down normally. 130731 1:35:18 InnoDB: Database was not shut down normally. 130731 1:42:39 InnoDB: Database was not shut down normally. 130731 1:49:59 InnoDB: Database was not shut down normally. 130731 1:57:18 InnoDB: Database was not shut down normally. ..
( д ) ゚ ゚
`この日だけで'60件。この人は一応nagiosさんに監視されているものの、00:06以外は一度も通知されていない。
遡ってみたら、2013年3月6日は 4 0 0 回 落 ち て た 。というか、5時間くらい連続で 毎 分 落 ち て た 。
(; ゚д゚) なにこれ。。
先輩DBA(お互い`となりの人'と呼び合う仲)に聞いてみる。
- そのバックアップインスタンスは昔別のサーバーに載っかってて、以前も同じ(クラッシュを繰り返す)状態になったことがある
- データが壊れてるんだと思って、マスターのデータファイルをまるっとコピーしてきたら再発しなくなった
- しばらくしたらまた起きた
- 今度はハードウェアを疑って、バックアップインスタンスを別のサーバー(これが今載っかってるサーバー)に移した、というか、マスターのデータから新しく作成して再発しなくなった
- そして訪れた今日
Σ(゚д゚lll) なにそれ!?
このエントリーは、DBAとこの謎の事象の戦いを描いたノンフィクションである。。(つづく)
第2話(調査編)はこちら
⇒ ぼくとMySQLの3日間戦争(調査編)
3 Notes/ Hide
- ogata2104 reblogged this from gmomedia-engineer
- atm09td reblogged this from gmomedia-engineer
- nekogeruge987 reblogged this from gmomedia-engineer
- gmomedia-engineer posted this