-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure schema is consistent while bootstrapping #583
base: palantir-cassandra-2.2.18
Are you sure you want to change the base?
Conversation
… before bootstrapping, and assert that the schema does not change while bootstrapping
@@ -1048,6 +1060,18 @@ private void joinTokenRing(int delay, boolean autoBootstrap, Collection<String> | |||
unsafeDisableNode(); | |||
throw new BootstrappingSafetyException("Bootstrap streaming failed."); | |||
} | |||
|
|||
if(!localSchemaVersion.equals(Schema.instance.getVersion().toString()) || !isSchemaConsistent(localSchemaVersion)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if we are being too restrictive by requiring this. as far as i can tell, it's actually okay for the schema to change bootstrap has started as long as the joining node is receiving forwarded writes + the new keyspace/columnfamily. this is already required for correctness in normal bootstraps (without schema changes), so i'm not sure we should be more strict here.
maybe the more "correct" check here is that other nodes see the joining node and are forwarding writes (presumably if the joining node is in their gossip, this is true). i think the current code already tries to do this via sleeping for 30 seconds to let gossip work and checking that gossip has seen the seeds
Gossiper.instance.addLocalApplicationStates(states);
setMode(Mode.JOINING, "sleeping " + RING_DELAY + " ms for pending range setup", true);
Uninterruptibles.sleepUninterruptibly(RING_DELAY, TimeUnit.MILLISECONDS);
[...]
if (!Gossiper.instance.seenAnySeed())
throw new IllegalStateException("Unable to contact any seeds!");
but doesn't appear to guarantee that all writes are forwarded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right that this check is more restrictive than we need to be.
@@ -83,6 +86,11 @@ public ListenableFuture<StreamState> bootstrap(StreamStateStore stateStore, bool | |||
streamer.addRanges(keyspaceName, strategy.getPendingAddressRanges(tokenMetadata, tokens, address)); | |||
} | |||
|
|||
if (!initialLocalSchemaVersion.equals(Schema.instance.getVersion()) || !MigrationManager.isReadyForBootstrap()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the order of this check is actually important. MigrationManager.isReadyForBootstrap()
must be evaluated to true before initialLocalSchemaVersion.equals(Schema.instance.getVersion())
is evaluated to true. the way it's written now, it's possible for a partially applied schema to complete in between the two checks and not throw here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also what's the reason for putting these checks in Bootstrapper
as opposed to StorageService#bootstrap
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original idea was to put the second check in the callback for handling a PREPARE, but I decided that that was not necessary.
Moved the checks to SS
@@ -966,6 +962,15 @@ private void joinTokenRing(int delay, boolean autoBootstrap, Collection<String> | |||
setMode(Mode.JOINING, "waiting for schema information to complete", true); | |||
Uninterruptibles.sleepUninterruptibly(1, TimeUnit.SECONDS); | |||
} | |||
|
|||
while(!isSchemaConsistent()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't see a reason not to just combine this with the previous check and have it be
while (!MigrationManager.isReadyForBootstrap() || !isSchemaConsistent())
{
setMode(Mode.JOINING, "waiting for schema information to complete... local schema version is {Schema.instance.getVersion().toString()}", true);
Uninterruptibles.sleepUninterruptibly(1, TimeUnit.SECONDS);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this also means we can remove the above // first sleep the delay to make sure we see all our peers
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've decided to switch the first loop to just a wait with no short circuit condition
nice! i think these checks are sufficient to guard us against the theoretical edge case we talked about. as discussed, let's hold off on merging this "fix" until we have a repro of the edge case to make sure we aren't missing something here |
does this affect spinning up zombie or new clusters? |
} | ||
Uninterruptibles.sleepUninterruptibly(1, TimeUnit.SECONDS); | ||
} | ||
Uninterruptibles.sleepUninterruptibly(delay, TimeUnit.MILLISECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we might as well just remove this given the below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having some delay here is still important because we know from experience that you should wait at least some amount of time here before proceeding. The intention of this change is to make the bootstrapping node wait
- at least 30s
- wait longer if it thinks the schema is consistent
A bad scenario would be if we get the non-empty schema by gossiping with a node in a network partition, think we have gossiped with everyone because we haven't waited long enough, and then pass the subsequent checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does that risk exist even if we wait for isSchemaConsistent
? in that case, we should make our check more strict instead of just relying on a 30 second wait to hope it's safe
private static boolean isSchemaConsistent() | ||
{ | ||
String localSchemaVersion = Schema.instance.getVersion().toString(); | ||
Set<Entry<InetAddress, EndpointState>> endpointStates = Gossiper.instance.getEndpointStates(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: return Gossiper.instance.getEndpointStates().stream().map(Entry::getValue).map(entry -> entry.getApplicationState(ApplicationState.SCHEMA).value).allMatch(Schema.instance.getVersion().toString()::equals);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
String localSchemaVersion = Schema.instance.getVersion().toString(); | ||
return Gossiper.instance.getEndpointStates().stream() | ||
.map(entry -> entry.getValue().getApplicationState(ApplicationState.SCHEMA).value) | ||
.allMatch(localSchemaVersion::equals); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah wait doesn't this check cause problems bc the endpoint states can have LEFT
nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made a change to exclude LEFT nodes, and also gracefully handle cases where ApplicationStates are null
Confirmed from testing that this does not impact spinning up new clusters. On the first startup of seed nodes, this codepath is not flexed because they do not bootstrap. |
This PR addresses a potential edge case where the Schema is updated while bootstrapping. The following changes are made
Gossiper
before BootstrappingThis PR make no change to the logic here 4c831cc
Making that change would not work for the first node in a new cluster.