Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(restoreIndices): batchSize vs limit #10178

Merged

Conversation

david-leifker
Copy link
Collaborator

@david-leifker david-leifker commented Apr 1, 2024

restoreIndices batchSize was being used as a limit instead of batchSize
added timestamp filters based on createdOn

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

restoreIndices batchSize was being used as a limit instead of batchSize
@github-actions github-actions bot added product PR or Issue related to the DataHub UI/UX devops PR or Issue related to DataHub backend & deployment labels Apr 1, 2024
david-leifker and others added 2 commits April 1, 2024 14:18
restoreIndices batchSize was being used as a limit instead of batchSize
@@ -85,7 +85,10 @@ private List<RestoreIndicesResult> iterateFutures(List<Future<RestoreIndicesResu
private RestoreIndicesArgs getArgs(UpgradeContext context) {
RestoreIndicesArgs result = new RestoreIndicesArgs();
result.batchSize = getBatchSize(context.parsedArgs());
// this class assumes batch size == limit
result.limit = getBatchSize(context.parsedArgs());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, confused on this. Is this just supposed to be more explicit about what is going on? I would expect this to use its own variable, not just copy the batch size configuration.

Copy link
Collaborator Author

@david-leifker david-leifker Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class has batch logic to set the start + batchSize as if it was a limit and potentially do this for different batches across multiple threads. Most other places where restore indices is used doesn't have all this logic. For other places, it makes more sense to simply use the batchSize as a batch and run through all the aspects assuming the rest api doesn't time out. In the case of the CLL upgrade job there is no expected timeout so it will complete after fully iterating through all the aspects without having to use or implement this specific batching logic which is a bit more complex. Setting this one line allows this code to run as it was designed for now, but eventually I'd replace it with a simpler implementation. I don't have evidence one way or another whether the threading logic in this class is effective at the moment. Therefore preserving the way this class works for now.

@david-leifker david-leifker merged commit 9a0a53b into datahub-project:master Apr 1, 2024
41 checks passed
sleeperdeep pushed a commit to sleeperdeep/datahub that referenced this pull request Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops PR or Issue related to DataHub backend & deployment product PR or Issue related to the DataHub UI/UX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants