Search indices can become out-of-sync with the SQL state

**Describe the bug**

There seem to be a couple scenarios where the search/graph indexes can become out-of-sync with the SQL database:

1.  If the MCL publish fails, it seems entityService increments a metric and moves on without rolling back the SQL persist
    * https://github.com/datahub-project/datahub/blob/master/metadata-dao-impl/kafka-producer/src/main/java/com/linkedin/metadata/dao/producer/KafkaHealthChecker.java#L34
2.  If something fails during elasticsearch updates, the mae-consumer increments a metric and skips past the MCL event
    * https://github.com/datahub-project/datahub/blob/c074d2fdc1fefec1369a9af49da5b713e270b7da/metadata-jobs/mae-consumer/src/main/java/com/linkedin/metadata/kafka/MetadataChangeLogProcessor.java#L87C19-L87C19

**To Reproduce**
These scenarios seem like they could happen anytime there is a transient/network level error with the ElasticSearch or the Kafka cluster, which is not uncommon 

**Expected behavior**
* In the 1st case, I would expect entityService to rollback the SQL update and return a failure code to the client.  If the client was the mce-consumer, I would expect mce-consumer to keep retrying the failed message (dont commit offsets) until the transient issue goes away and processing succeeds.  Ideally the entire persist+MCL publish is atomic (both succeed or both fail)
* In the 2nd case, I would expect the mae-consumer to keep retrying the failed message (dont commit offsets) until the transient issue goes away and processing succeeds

* Alternatively, an easier solution might be to automatically call the restoreIndices GMS endpoint when one of these failures is detected

**Screenshots**
N/A

**Desktop (please complete the following information):**
N/A

**Additional context**
N/A

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search indices can become out-of-sync with the SQL state #8295

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Search indices can become out-of-sync with the SQL state #8295

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions