Skip to content

Search indices can become out-of-sync with the SQL state #8295

Closed as not planned
@tkdrahn

Description

@tkdrahn

Describe the bug

There seem to be a couple scenarios where the search/graph indexes can become out-of-sync with the SQL database:

  1. If the MCL publish fails, it seems entityService increments a metric and moves on without rolling back the SQL persist
  2. If something fails during elasticsearch updates, the mae-consumer increments a metric and skips past the MCL event

To Reproduce
These scenarios seem like they could happen anytime there is a transient/network level error with the ElasticSearch or the Kafka cluster, which is not uncommon

Expected behavior

  • In the 1st case, I would expect entityService to rollback the SQL update and return a failure code to the client. If the client was the mce-consumer, I would expect mce-consumer to keep retrying the failed message (dont commit offsets) until the transient issue goes away and processing succeeds. Ideally the entire persist+MCL publish is atomic (both succeed or both fail)

  • In the 2nd case, I would expect the mae-consumer to keep retrying the failed message (dont commit offsets) until the transient issue goes away and processing succeeds

  • Alternatively, an easier solution might be to automatically call the restoreIndices GMS endpoint when one of these failures is detected

Screenshots
N/A

Desktop (please complete the following information):
N/A

Additional context
N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions