Skip to content

Commit 9039dc1

Browse files
author
Marko Mäkelä
committed
Bug#16216513 INPLACE ALTER DISABLED FOR PARTITIONED TABLES
Due to Bug#14760210, INPLACE ALTER TABLE was disabled for partitioned tables. But it is possible to support this within InnoDB (our only transactional engine that can use the partitioning engine, ha_partition). The fix is to atomically commit the ALTER TABLE changes to all partitions of the table, within a single data dictionary transaction. The groundwork for this was laid out when fixing Bug#15989081 INNODB ALTER TABLE IS NOT ATOMIC. This patch modifies the solution so that changes to all partitions will be committed using the same data dictionary transaction, and (if needed) piggy-backing the rename of all partition *.ibd files in the same mini-transaction with the data dictionary transaction commit. In this way, crash recovery will either replay or roll back the ALTER TABLE. The worst possible scenario should be that the *.frm file (for which there is no log-based recovery) gets out of sync with the InnoDB data dictionary. This problem affects all tables. handler::commit_inplace_alter_table(): Slightly modify the API. Add ha_alter_info->group_commit_ctx. When it is NULL, the behaviour is as usual (commit only the table handle). When ha_alter_info->group_commit_ctx != NULL, it is assumed to be a pointer to a NULL-terminated array of ha_alter_info->handler_ctx, belonging to the partitions of a table. All partitions of the table will share the TABLE and altered_table objects. If handler::commit_inplace_alter_table() supports this operation, it must set ha_alter_info->group_commit_ctx = NULL. In this case, ha_partition::commit_inplace_alter_table(commit=true) will assume that the handler::commit_inplace_alter_table(commit=true) for the first partition committed the changes to all partitions. The rollback of partitions is unaffected by this. That is, handler::commit_inplace_alter_table(commit=false) will be invoked on every partition separately. The InnoDB changes are described below in detail. INNOBASE_ALTER_REBUILD: Renamed from INNOBASE_INPLACE_REBUILD. INNOBASE_ALTER_DATA: Renamed from INNOBASE_INPLACE_CREATE. INNOBASE_ALTER_NOREBUILD: Replaces INNOBASE_ONLINE_OPERATIONS. Does not include INNOBASE_INPLACE_IGNORE (no change inside InnoDB). ha_innobase_inplace_ctx: Replace user_trx with a reference to prebuilt. ha_innobase::commit_inplace_alter_table() needs to replace the ha_innobase::prebuilt in every partition. Add old_table and rename indexed_table to new_table. Add max_autoinc, so that we can preserve the current auto-increment value for every partition. Add the predicate ctx->need_rebuild(). Rename add, drop to add_index, drop_index. ha_innobase::prepare_inplace_alter_table(): Allocate ctx for every case of CHANGE_CREATE_OPTION, because all changes to InnoDB data (including the auto-increment value) will now require a ctx object. innobase_rollback_sec_index(): Replace prebuilt with user_table. Add the parameter ibool locked instead of hard-wiring locked=FALSE, because this can be invoked during a commit. innobase_rename_columns_try(): Replace new_clustered, user_table with ctx. commit_get_autoinc(), innobase_update_foreign_try(): commit_try_rebuild(): Replace user_table with ctx. innobase_update_foreign_cache(): commit_cache_rebuild(): Replace all parameters with ctx. commit_try_norebuild(): Add the ctx parameter, because there is a 1:N mapping between ha_alter_info and ctx. commit_cache_norebuild(): Replace ha_alter_info, user_table with ctx. commit_update_stats(): Split to alter_stats_norebuild() and alter_stats_rebuild(). ha_innobase::commit_inplace_alter_table(): Support them ha_alter_info->group_commit_ctx API. Remove the UNIV_DDL_DEBUG code that would check the table after a successful ALTER TABLE. (It would be better done in the caller of commit_inplace_alter_table().) Before starting the data dictionary modifications, invoke dict_stats_stop_bg() on every affected dict_table_t object. dict_stats_wait_bg_to_stop_using_table(): Renamed and simplified from from dict_stats_wait_bg_to_stop_using_tables(). dict_stats_stop_bg(): New predicate. DICT_STATS_BG_YIELD(trx): New auxiliary macro. rb#1939 approved by Jon Olav Hauglid, Sunny Bains
1 parent 0e8390c commit 9039dc1

10 files changed

Lines changed: 1061 additions & 855 deletions

File tree

sql/ha_partition.cc

Lines changed: 57 additions & 124 deletions
Original file line numberDiff line numberDiff line change
@@ -7835,15 +7835,13 @@ class ha_partition_inplace_ctx : public inplace_alter_handler_ctx
78357835
{
78367836
public:
78377837
inplace_alter_handler_ctx **handler_ctx_array;
7838-
bool rollback_done;
78397838
private:
78407839
uint m_tot_parts;
78417840

78427841
public:
78437842
ha_partition_inplace_ctx(THD *thd, uint tot_parts)
78447843
: inplace_alter_handler_ctx(),
78457844
handler_ctx_array(NULL),
7846-
rollback_done(false),
78477845
m_tot_parts(tot_parts)
78487846
{}
78497847

@@ -7862,14 +7860,11 @@ enum_alter_inplace_result
78627860
ha_partition::check_if_supported_inplace_alter(TABLE *altered_table,
78637861
Alter_inplace_info *ha_alter_info)
78647862
{
7865-
#ifdef PARTITION_SUPPORTS_INPLACE_ALTER
78667863
uint index= 0;
78677864
enum_alter_inplace_result result= HA_ALTER_INPLACE_NO_LOCK;
78687865
ha_partition_inplace_ctx *part_inplace_ctx;
7866+
bool first_is_set= false;
78697867
THD *thd= ha_thd();
7870-
#else
7871-
enum_alter_inplace_result result= HA_ALTER_INPLACE_NOT_SUPPORTED;
7872-
#endif
78737868

78747869
DBUG_ENTER("ha_partition::check_if_supported_inplace_alter");
78757870
/*
@@ -7878,34 +7873,20 @@ ha_partition::check_if_supported_inplace_alter(TABLE *altered_table,
78787873
prep_alter_part_table() in mysql_alter_table().
78797874
*/
78807875
if (ha_alter_info->alter_info->flags == Alter_info::ALTER_PARTITION)
7881-
result= HA_ALTER_INPLACE_NO_LOCK;
7882-
7883-
#ifndef PARTITION_SUPPORTS_INPLACE_ALTER
7884-
/*
7885-
Due to bug#14760210 partitions can be out-of-sync in case
7886-
commit_inplace_alter_table fails after the first partition.
7887-
7888-
Until we can either commit all partitions at the same time or
7889-
have an atomic recover on failure/crash we don't support any
7890-
inplace alter.
7876+
DBUG_RETURN(HA_ALTER_INPLACE_NO_LOCK);
78917877

7892-
TODO: investigate what happens when indexes are out-of-sync
7893-
between partitions. If safe and possible to recover from,
7894-
then we could allow ADD/DROP INDEX.
7895-
*/
7896-
DBUG_RETURN(result);
7897-
#else
78987878
part_inplace_ctx=
78997879
new (thd->mem_root) ha_partition_inplace_ctx(thd, m_tot_parts);
79007880
if (!part_inplace_ctx)
79017881
DBUG_RETURN(HA_ALTER_ERROR);
79027882

79037883
part_inplace_ctx->handler_ctx_array= (inplace_alter_handler_ctx **)
7904-
thd->alloc(sizeof(inplace_alter_handler_ctx *) * m_tot_parts);
7884+
thd->alloc(sizeof(inplace_alter_handler_ctx *) * (m_tot_parts + 1));
79057885
if (!part_inplace_ctx->handler_ctx_array)
79067886
DBUG_RETURN(HA_ALTER_ERROR);
79077887

7908-
for (index= 0; index < m_tot_parts; index++)
7888+
/* Set all to NULL, including the terminating one. */
7889+
for (index= 0; index <= m_tot_parts; index++)
79097890
part_inplace_ctx->handler_ctx_array[index]= NULL;
79107891

79117892
for (index= 0; index < m_tot_parts; index++)
@@ -7915,15 +7896,32 @@ ha_partition::check_if_supported_inplace_alter(TABLE *altered_table,
79157896
ha_alter_info);
79167897
part_inplace_ctx->handler_ctx_array[index]= ha_alter_info->handler_ctx;
79177898

7899+
if (index == 0)
7900+
{
7901+
first_is_set= (ha_alter_info->handler_ctx != NULL);
7902+
}
7903+
else if (first_is_set != (ha_alter_info->handler_ctx != NULL))
7904+
{
7905+
/* Either none or all partitions must set handler_ctx! */
7906+
DBUG_ASSERT(0);
7907+
DBUG_RETURN(HA_ALTER_ERROR);
7908+
}
79187909
if (p_result < result)
79197910
result= p_result;
79207911
if (result == HA_ALTER_ERROR)
79217912
break;
79227913
}
7914+
79237915
ha_alter_info->handler_ctx= part_inplace_ctx;
7916+
/*
7917+
To indicate for future inplace calls that there are several
7918+
partitions/handlers that need to be committed together,
7919+
we set group_commit_ctx to the NULL terminated array of
7920+
the partitions handlers.
7921+
*/
7922+
ha_alter_info->group_commit_ctx= part_inplace_ctx->handler_ctx_array;
79247923

79257924
DBUG_RETURN(result);
7926-
#endif
79277925
}
79287926

79297927

@@ -8004,8 +8002,8 @@ bool ha_partition::commit_inplace_alter_table(TABLE *altered_table,
80048002
Alter_inplace_info *ha_alter_info,
80058003
bool commit)
80068004
{
8007-
uint index= 0;
80088005
ha_partition_inplace_ctx *part_inplace_ctx;
8006+
bool error= false;
80098007

80108008
DBUG_ENTER("ha_partition::commit_inplace_alter_table");
80118009

@@ -8019,117 +8017,52 @@ bool ha_partition::commit_inplace_alter_table(TABLE *altered_table,
80198017
part_inplace_ctx=
80208018
static_cast<class ha_partition_inplace_ctx*>(ha_alter_info->handler_ctx);
80218019

8022-
if (!commit && part_inplace_ctx->rollback_done)
8023-
DBUG_RETURN(false); // We have already rolled back changes.
8024-
8025-
for (index= 0; index < m_tot_parts; index++)
8020+
if (commit)
80268021
{
8027-
ha_alter_info->handler_ctx= part_inplace_ctx->handler_ctx_array[index];
8028-
if (m_file[index]->ha_commit_inplace_alter_table(altered_table,
8029-
ha_alter_info, commit))
8022+
DBUG_ASSERT(ha_alter_info->group_commit_ctx ==
8023+
part_inplace_ctx->handler_ctx_array);
8024+
ha_alter_info->handler_ctx= part_inplace_ctx->handler_ctx_array[0];
8025+
error= m_file[0]->ha_commit_inplace_alter_table(altered_table,
8026+
ha_alter_info, commit);
8027+
if (error)
8028+
goto end;
8029+
if (ha_alter_info->group_commit_ctx)
80308030
{
8031-
part_inplace_ctx->handler_ctx_array[index]= ha_alter_info->handler_ctx;
8032-
goto err;
8033-
}
8034-
part_inplace_ctx->handler_ctx_array[index]= ha_alter_info->handler_ctx;
8035-
DBUG_EXECUTE_IF("ha_partition_fail_final_add_index", {
8036-
/* Simulate failure by rollback of the second partition */
8037-
if (m_tot_parts > 1)
8031+
/*
8032+
If ha_alter_info->group_commit_ctx is not set to NULL,
8033+
then the engine did only commit the first partition!
8034+
The engine is probably new, since both innodb and the default
8035+
implementation of handler::commit_inplace_alter_table sets it to NULL
8036+
and simply return false, since it allows metadata changes only.
8037+
Loop over all other partitions as to follow the protocol!
8038+
*/
8039+
uint i;
8040+
DBUG_ASSERT(0);
8041+
for (i= 1; i < m_tot_parts; i++)
80388042
{
8039-
index++;
8040-
ha_alter_info->handler_ctx= part_inplace_ctx->handler_ctx_array[index];
8041-
m_file[index]->ha_commit_inplace_alter_table(altered_table,
8042-
ha_alter_info, false);
8043-
part_inplace_ctx->handler_ctx_array[index]= ha_alter_info->handler_ctx;
8044-
goto err;
8043+
ha_alter_info->handler_ctx= part_inplace_ctx->handler_ctx_array[i];
8044+
error|= m_file[i]->ha_commit_inplace_alter_table(altered_table,
8045+
ha_alter_info,
8046+
true);
80458047
}
8046-
});
8047-
}
8048-
ha_alter_info->handler_ctx= part_inplace_ctx;
8049-
8050-
DBUG_RETURN(false);
8051-
8052-
err:
8053-
ha_alter_info->handler_ctx= part_inplace_ctx;
8054-
/*
8055-
Reverting committed changes is (for now) only possible for ADD INDEX
8056-
For other changes we will just try to rollback changes.
8057-
*/
8058-
if (index > 0 &&
8059-
ha_alter_info->handler_flags & (Alter_inplace_info::ADD_INDEX |
8060-
Alter_inplace_info::ADD_UNIQUE_INDEX |
8061-
Alter_inplace_info::ADD_PK_INDEX))
8062-
{
8063-
Alter_inplace_info drop_info(ha_alter_info->create_info,
8064-
ha_alter_info->alter_info,
8065-
NULL, 0,
8066-
ha_alter_info->modified_part_info,
8067-
ha_alter_info->ignore);
8068-
8069-
if (ha_alter_info->handler_flags & Alter_inplace_info::ADD_INDEX)
8070-
drop_info.handler_flags|= Alter_inplace_info::DROP_INDEX;
8071-
if (ha_alter_info->handler_flags & Alter_inplace_info::ADD_UNIQUE_INDEX)
8072-
drop_info.handler_flags|= Alter_inplace_info::DROP_UNIQUE_INDEX;
8073-
if (ha_alter_info->handler_flags & Alter_inplace_info::ADD_PK_INDEX)
8074-
drop_info.handler_flags|= Alter_inplace_info::DROP_PK_INDEX;
8075-
drop_info.index_drop_count= ha_alter_info->index_add_count;
8076-
drop_info.index_drop_buffer=
8077-
(KEY**) ha_thd()->alloc(sizeof(KEY*) * drop_info.index_drop_count);
8078-
if (!drop_info.index_drop_buffer)
8079-
{
8080-
sql_print_error("Failed with error handling of adding index:\n"
8081-
"committing index failed, and when trying to revert "
8082-
"already committed partitions we failed allocating\n"
8083-
"memory for the index for table '%s'",
8084-
table_share->table_name.str);
8085-
DBUG_RETURN(true);
8086-
}
8087-
for (uint i= 0; i < drop_info.index_drop_count; i++)
8088-
drop_info.index_drop_buffer[i]=
8089-
&ha_alter_info->key_info_buffer[ha_alter_info->index_add_buffer[i]];
8090-
8091-
// Drop index for each partition where we already committed new index.
8092-
for (uint i= 0; i < index; i++)
8093-
{
8094-
bool error= m_file[i]->ha_prepare_inplace_alter_table(altered_table,
8095-
&drop_info);
8096-
error|= m_file[i]->ha_inplace_alter_table(altered_table, &drop_info);
8097-
error|= m_file[i]->ha_commit_inplace_alter_table(altered_table,
8098-
&drop_info, true);
8099-
if (error)
8100-
sql_print_error("Failed with error handling of adding index:\n"
8101-
"committing index failed, and when trying to revert "
8102-
"already committed partitions we failed removing\n"
8103-
"the index for table '%s' partition nr %d",
8104-
table_share->table_name.str, i);
81058048
}
8106-
8107-
// Rollback uncommitted changes.
8108-
for (uint i= index+1; i < m_tot_parts; i++)
8049+
}
8050+
else
8051+
{
8052+
uint i;
8053+
for (i= 0; i < m_tot_parts; i++)
81098054
{
8055+
/* Rollback, commit == false, is done for each partition! */
81108056
ha_alter_info->handler_ctx= part_inplace_ctx->handler_ctx_array[i];
81118057
if (m_file[i]->ha_commit_inplace_alter_table(altered_table,
81128058
ha_alter_info, false))
8113-
{
8114-
/* How could this happen? */
8115-
sql_print_error("Failed with error handling of adding index:\n"
8116-
"Rollback of add_index failed for table\n"
8117-
"'%s' partition nr %d",
8118-
table_share->table_name.str, i);
8119-
}
8120-
part_inplace_ctx->handler_ctx_array[i]= ha_alter_info->handler_ctx;
8059+
error= true;
81218060
}
8122-
8123-
// We have now reverted/rolled back changes. Set flag to prevent
8124-
// it from being done again.
8125-
part_inplace_ctx->rollback_done= true;
8126-
8127-
print_error(HA_ERR_NO_PARTITION_FOUND, MYF(0));
81288061
}
8129-
8062+
end:
81308063
ha_alter_info->handler_ctx= part_inplace_ctx;
81318064

8132-
DBUG_RETURN(true);
8065+
DBUG_RETURN(error);
81338066
}
81348067

81358068

sql/handler.h

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1264,6 +1264,18 @@ class Alter_inplace_info
12641264
*/
12651265
inplace_alter_handler_ctx *handler_ctx;
12661266

1267+
/**
1268+
If the table uses several handlers, like ha_partition uses one handler
1269+
per partition, this contains a Null terminated array of ctx pointers
1270+
that should all be committed together.
1271+
Or NULL if only handler_ctx should be committed.
1272+
Set to NULL if the low level handler::commit_inplace_alter_table uses it,
1273+
to signal to the main handler that everything was committed as atomically.
1274+
1275+
@see inplace_alter_handler_ctx for information about object lifecycle.
1276+
*/
1277+
inplace_alter_handler_ctx **group_commit_ctx;
1278+
12671279
/**
12681280
Flags describing in detail which operations the storage engine is to execute.
12691281
*/
@@ -1311,6 +1323,7 @@ class Alter_inplace_info
13111323
index_add_count(0),
13121324
index_add_buffer(NULL),
13131325
handler_ctx(NULL),
1326+
group_commit_ctx(NULL),
13141327
handler_flags(0),
13151328
modified_part_info(modified_part_info_arg),
13161329
ignore(ignore_arg),
@@ -2926,6 +2939,10 @@ class handler :public Sql_alloc
29262939
29272940
@note In case of partitioning, this function might be called for rollback
29282941
without prepare_inplace_alter_table() having been called first.
2942+
Also partitioned tables sets ha_alter_info->group_commit_ctx to a NULL
2943+
terminated array of the partitions handlers and if all of them are
2944+
committed as one, then group_commit_ctx should be set to NULL to indicate
2945+
to the partitioning handler that all partitions handlers are committed.
29292946
@see prepare_inplace_alter_table().
29302947
29312948
@param altered_table TABLE object for new version of table.
@@ -2940,7 +2957,11 @@ class handler :public Sql_alloc
29402957
virtual bool commit_inplace_alter_table(TABLE *altered_table,
29412958
Alter_inplace_info *ha_alter_info,
29422959
bool commit)
2943-
{ return false; }
2960+
{
2961+
/* Nothing to commit/rollback, mark all handlers committed! */
2962+
ha_alter_info->group_commit_ctx= NULL;
2963+
return false;
2964+
}
29442965

29452966

29462967
/**

storage/innobase/dict/dict0stats_bg.cc

Lines changed: 13 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
/*****************************************************************************
22
3-
Copyright (c) 2012, Oracle and/or its affiliates. All Rights Reserved.
3+
Copyright (c) 2012, 2013, Oracle and/or its affiliates. All Rights Reserved.
44
55
This program is free software; you can redistribute it and/or modify it under
66
the terms of the GNU General Public License as published by the Free Software
@@ -28,6 +28,10 @@ Created Apr 25, 2012 Vasil Dimov
2828
#include "dict0stats.h"
2929
#include "dict0stats_bg.h"
3030

31+
#ifdef UNIV_NONINL
32+
# include "dict0stats_bg.ic"
33+
#endif
34+
3135
#include <vector>
3236

3337
/** Minimum time interval between stats recalc for a given table */
@@ -174,37 +178,24 @@ dict_stats_recalc_pool_del(
174178
}
175179

176180
/*****************************************************************//**
177-
Wait until background stats thread has stopped using the specified table(s).
181+
Wait until background stats thread has stopped using the specified table.
178182
The caller must have locked the data dictionary using
179183
row_mysql_lock_data_dictionary() and this function may unlock it temporarily
180184
and restore the lock before it exits.
181-
The background stats thead is guaranteed not to start using the specified
182-
tables after this function returns and before the caller unlocks the data
185+
The background stats thread is guaranteed not to start using the specified
186+
table after this function returns and before the caller unlocks the data
183187
dictionary because it sets the BG_STAT_IN_PROGRESS bit in table->stats_bg_flag
184188
under dict_sys->mutex. */
185189
UNIV_INTERN
186190
void
187-
dict_stats_wait_bg_to_stop_using_tables(
188-
/*====================================*/
189-
dict_table_t* table1, /*!< in/out: table1 */
190-
dict_table_t* table2, /*!< in/out: table2, could be NULL */
191+
dict_stats_wait_bg_to_stop_using_table(
192+
/*===================================*/
193+
dict_table_t* table, /*!< in/out: table */
191194
trx_t* trx) /*!< in/out: transaction to use for
192195
unlocking/locking the data dict */
193196
{
194-
ut_ad(!srv_read_only_mode);
195-
196-
while ((table1->stats_bg_flag & BG_STAT_IN_PROGRESS)
197-
|| (table2 != NULL
198-
&& (table2->stats_bg_flag & BG_STAT_IN_PROGRESS))) {
199-
200-
table1->stats_bg_flag |= BG_STAT_SHOULD_QUIT;
201-
if (table2 != NULL) {
202-
table2->stats_bg_flag |= BG_STAT_SHOULD_QUIT;
203-
}
204-
205-
row_mysql_unlock_data_dictionary(trx);
206-
os_thread_sleep(250000);
207-
row_mysql_lock_data_dictionary(trx);
197+
while (!dict_stats_stop_bg(table)) {
198+
DICT_STATS_BG_YIELD(trx);
208199
}
209200
}
210201

0 commit comments

Comments
 (0)