fix(proxy): prevent stale-key collision in treedb.update() after SetUnFilteredMocks re-indexes#3946
fix(proxy): prevent stale-key collision in treedb.update() after SetUnFilteredMocks re-indexes#3946AkashKumar7902 wants to merge 2 commits intomainfrom
Conversation
…nFilteredMocks re-indexes
During replay, SetUnFilteredMocks is called twice per test-set:
Phase 1 (wide window, once at test-set start):
SendMockFilterParamsToAgent(ctx, [], BaseTime, now, ...)
→ FilterTcsMocks marks ALL mocks IsFiltered=true
→ SetUnFilteredMocks: deleteAll() + re-insert; assigns SortOrder/ID 0..N-1
Phase 2 (narrow window, once per test case):
SendMockFilterParamsToAgent(ctx, expectedNames, reqTime, respTime, ...)
→ FilterTcsMocks re-evaluates; startup mocks (outside window) become IsFiltered=false
→ SetUnFilteredMocks: deleteAll() + re-insert again; startup mocks move to end,
remaining mocks shift: what was ID=10 at Phase 1 is now ID=7 at Phase 2
Meanwhile, the integrations-side globalCache (processedMockCache in
pkg/postgres/v2/replayer/replayer.go) caches mock pointers together with their
Phase-1 TestModeInfo{SortOrder, ID} values. The cache is invalidated only when
MockManager.Revision() changes — but in-flight connections that already read the
old cache continue to use stale TestModeInfo values.
When such a connection matches a mock and calls UpdateUnFilteredMock → treedb.update(),
it supplies the stale Phase-1 key (e.g. SortOrder=11, ID=10). After Phase 2 re-indexed
the tree, that key slot now holds a completely different mock (e.g. mock-13). The old
update() had no identity check — it trusted the caller's key unconditionally, removed
mock-13, and left mock-10 (the actual target) untouched in the tree. Subsequent DB
connections that needed mock-10 (e.g. the StartupMessage/auth for connID="2") found
nothing and failed with "No matching mock found", causing exit code 1.
This was the root cause of test-set-132 failing while test-set-135 (single connID,
no second DB connection) passed — test-set-135 never needed mock-10's equivalent.
Add three-layer identity validation inside treedb.update(), all executed under the
existing mutex so the check-and-remove is atomic:
Layer 1 — Exact key match with name validation:
Look up oldKey in the RBT. If found, check mock.Name == expectedName before
proceeding. If the name doesn't match (stale key collision), mark as not-found
and fall through instead of removing the wrong mock.
Layer 2 — ID-index fallback with name validation:
If exact key failed, look up oldInfo.ID in idIndex to get the current key for
this mock (handles the case where only SortOrder shifted but ID is stable).
Validate name again before removing.
Layer 3 — Linear name scan fallback:
If both key and ID are stale (full re-index after deleteAll), iterate the RBT
to find the node whose mock.Name == expectedName. Use the found node's ID in
the new key to keep idIndex consistent.
expectedName is extracted from newObj (*models.Mock).Name. This is always correct
because UpdateUnFilteredMock deep-copies the matched mock into updateMock before
calling treedb.update(), so newObj.Name always carries the identity of the mock
that was actually matched by the integrations layer.
1. Abstraction boundary: treedb is the authoritative store. Store integrity
(only remove what was asked for) must be enforced by the store itself, not
scattered across every caller.
2. Atomicity: any integrations-side fix would require a read-then-update pattern
(re-fetch current key, then call update) across two separate operations, leaving
a TOCTOU window where another goroutine could call SetUnFilteredMocks between
the read and the update. The treedb fix happens entirely under the mutex —
the name check and the removal are a single atomic operation.
pkg/agent/proxy/treedb_test.go (new file, 4 tests):
TestUpdateExactMatch
Normal happy path: mock at matching key is found and updated.
TestUpdateRejectsKeyCollision
Bug scenario: stale key points to a different mock after re-index.
update() must reject the operation and leave the innocent mock intact.
TestUpdateNameFallbackAfterReindex
Linear scan fallback: target mock exists under a different key after
SetUnFilteredMocks shifted all IDs. update() must find it by name and
update it correctly.
TestUpdateIDIndexRejectsCollision
ID-index path also validates identity: if idIndex[id] points to a node
holding a different mock, the operation is rejected.
All 4 tests pass: ok go.keploy.io/server/v3/pkg/agent/proxy 0.008s
pkg/agent/proxy/treedb.go — update() method rewritten with 3-layer validation
pkg/agent/proxy/treedb_test.go — new unit tests for update() identity semantics
Signed-off-by: Akash Kumar <[email protected]>
There was a problem hiding this comment.
Pull request overview
This PR hardens TreeDb.update() against stale-key collisions caused by SetUnFilteredMocks re-indexing, preventing accidental removal of the wrong mock during replay.
Changes:
- Adds multi-layer identity validation to
treedb.update()(exact-key → ID-index → name-scan fallback). - Introduces unit tests covering the collision scenario and fallback behaviors.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| pkg/agent/proxy/treedb.go | Adds identity checks and fallback lookup paths to avoid removing the wrong mock after re-indexing. |
| pkg/agent/proxy/treedb_test.go | Adds tests that exercise exact match, key-collision rejection, name fallback after reindex, and ID-index collision rejection. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| currentKey, exists := db.idIndex[oldInfo.ID] | ||
| if exists { | ||
| // Verify identity before removing | ||
| if expectedName != "" { | ||
| idVal, idFound := db.rbt.Get(currentKey) | ||
| if idFound { | ||
| if existingMock, ok := idVal.(*models.Mock); ok && existingMock != nil { | ||
| if existingMock.Name != expectedName { | ||
| exists = false // Different mock at this ID, skip | ||
| } | ||
| } | ||
| } | ||
| } | ||
| if exists { | ||
| db.rbt.Remove(currentKey) | ||
| db.rbt.Put(newKey, newObj) | ||
| delete(db.idIndex, oldInfo.ID) |
There was a problem hiding this comment.
idFound == false currently leaves exists == true, so the ID-index path can still proceed and return success even when the idIndex points to a key that no longer exists in the tree. That can create “phantom updates” (inserting newKey without actually removing/replacing the intended old entry) and can corrupt idIndex consistency. Set exists = false (or return false) when db.rbt.Get(currentKey) does not find a node.
| db.rbt.Remove(foundKey) | ||
| // Use the found mock's ID in the new key to keep idIndex consistent | ||
| adjustedNewKey := newInfo | ||
| if foundInfoOk { | ||
| adjustedNewKey.ID = foundInfo.ID | ||
| delete(db.idIndex, foundInfo.ID) | ||
| } | ||
| db.rbt.Put(adjustedNewKey, newObj) | ||
| if okNew { | ||
| db.idIndex[adjustedNewKey.ID] = adjustedNewKey | ||
| } |
There was a problem hiding this comment.
The fallback path always derives the new key from newInfo (a models.TestModeInfo), even if okNew is false. In that case newInfo is the zero value and db.rbt.Put(adjustedNewKey, ...) can insert under an unintended key type/value for the tree/comparator. This path should require okNew before using newInfo (otherwise avoid updating / return false), or build the adjusted key from the original newKey in a type-safe way.
| db.rbt.Remove(foundKey) | |
| // Use the found mock's ID in the new key to keep idIndex consistent | |
| adjustedNewKey := newInfo | |
| if foundInfoOk { | |
| adjustedNewKey.ID = foundInfo.ID | |
| delete(db.idIndex, foundInfo.ID) | |
| } | |
| db.rbt.Put(adjustedNewKey, newObj) | |
| if okNew { | |
| db.idIndex[adjustedNewKey.ID] = adjustedNewKey | |
| } | |
| // We can only safely construct a new tree key if the caller provided | |
| // a valid TestModeInfo-based newKey (okNew) and the found key is also | |
| // a TestModeInfo (foundInfoOk). Otherwise, avoid mutating the tree. | |
| if !okNew || !foundInfoOk { | |
| return false | |
| } | |
| db.rbt.Remove(foundKey) | |
| // Use the found mock's ID in the new key to keep idIndex consistent | |
| adjustedNewKey := newInfo | |
| adjustedNewKey.ID = foundInfo.ID | |
| delete(db.idIndex, foundInfo.ID) | |
| db.rbt.Put(adjustedNewKey, newObj) | |
| db.idIndex[adjustedNewKey.ID] = adjustedNewKey |
| // First try exact key match | ||
| existingVal, found := db.rbt.Get(oldKey) | ||
| if found && expectedName != "" { | ||
| if existingMock, ok := existingVal.(*models.Mock); ok && existingMock != nil { | ||
| if existingMock.Name != expectedName { | ||
| found = false // Different mock at this key, skip | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The collision-prevention logic is skipped when expectedName == "", meaning a stale key can still remove the wrong entry (the original bug class) if a mock name is empty or if newObj isn’t a *models.Mock. If the goal is “never remove a different mock than intended,” consider treating missing expectedName as a hard failure (return false) or adding an alternate identity signal (e.g., validate against oldInfo.ID/other invariant) before allowing a remove.
| // Stale reference uses old key for mock-7 | ||
| staleKey := models.TestModeInfo{SortOrder: 8, ID: 7} | ||
| newMock7 := mockWith("mock-7", 99, 7) | ||
| newMock7.TestModeInfo.IsFiltered = false | ||
|
|
||
| ok := db.update(staleKey, newMock7.TestModeInfo, newMock7) |
There was a problem hiding this comment.
TestUpdateNameFallbackAfterReindex doesn’t assert that (a) the updated mock is present under the actual inserted key after fallback, and (b) the stored mock’s internal TestModeInfo is consistent with the key chosen by update() (especially since the implementation may adjust the ID). Adding assertions for the final key presence + mock.TestModeInfo/idIndex consistency would catch subtle regressions in the fallback logic.
Describe the changes that are made
During replay, SetUnFilteredMocks is called twice per test-set:
Phase 1 (wide window, once at test-set start):
SendMockFilterParamsToAgent(ctx, [], BaseTime, now, ...)
→ FilterTcsMocks marks ALL mocks IsFiltered=true
→ SetUnFilteredMocks: deleteAll() + re-insert; assigns SortOrder/ID 0..N-1
Phase 2 (narrow window, once per test case):
SendMockFilterParamsToAgent(ctx, expectedNames, reqTime, respTime, ...)
→ FilterTcsMocks re-evaluates; startup mocks (outside window) become IsFiltered=false
→ SetUnFilteredMocks: deleteAll() + re-insert again; startup mocks move to end,
remaining mocks shift: what was ID=10 at Phase 1 is now ID=7 at Phase 2
Meanwhile, the integrations-side globalCache (processedMockCache in pkg/postgres/v2/replayer/replayer.go) caches mock pointers together with their Phase-1 TestModeInfo{SortOrder, ID} values. The cache is invalidated only when MockManager.Revision() changes — but in-flight connections that already read the old cache continue to use stale TestModeInfo values.
When such a connection matches a mock and calls UpdateUnFilteredMock → treedb.update(), it supplies the stale Phase-1 key (e.g. SortOrder=11, ID=10). After Phase 2 re-indexed the tree, that key slot now holds a completely different mock (e.g. mock-13). The old update() had no identity check — it trusted the caller's key unconditionally, removed mock-13, and left mock-10 (the actual target) untouched in the tree. Subsequent DB connections that needed mock-10 (e.g. the StartupMessage/auth for connID="2") found nothing and failed with "No matching mock found", causing exit code 1.
This was the root cause of test-set-132 failing while test-set-135 (single connID, no second DB connection) passed — test-set-135 never needed mock-10's equivalent.
Add three-layer identity validation inside treedb.update(), all executed under the existing mutex so the check-and-remove is atomic:
Layer 1 — Exact key match with name validation:
Look up oldKey in the RBT. If found, check mock.Name == expectedName before
proceeding. If the name doesn't match (stale key collision), mark as not-found
and fall through instead of removing the wrong mock.
Layer 2 — ID-index fallback with name validation:
If exact key failed, look up oldInfo.ID in idIndex to get the current key for
this mock (handles the case where only SortOrder shifted but ID is stable).
Validate name again before removing.
Layer 3 — Linear name scan fallback:
If both key and ID are stale (full re-index after deleteAll), iterate the RBT
to find the node whose mock.Name == expectedName. Use the found node's ID in
the new key to keep idIndex consistent.
expectedName is extracted from newObj (*models.Mock).Name. This is always correct because UpdateUnFilteredMock deep-copies the matched mock into updateMock before calling treedb.update(), so newObj.Name always carries the identity of the mock that was actually matched by the integrations layer.
Abstraction boundary: treedb is the authoritative store. Store integrity (only remove what was asked for) must be enforced by the store itself, not scattered across every caller.
Atomicity: any integrations-side fix would require a read-then-update pattern (re-fetch current key, then call update) across two separate operations, leaving a TOCTOU window where another goroutine could call SetUnFilteredMocks between the read and the update. The treedb fix happens entirely under the mutex — the name check and the removal are a single atomic operation.
pkg/agent/proxy/treedb_test.go (new file, 4 tests):
TestUpdateExactMatch
Normal happy path: mock at matching key is found and updated.
TestUpdateRejectsKeyCollision
Bug scenario: stale key points to a different mock after re-index.
update() must reject the operation and leave the innocent mock intact.
TestUpdateNameFallbackAfterReindex
Linear scan fallback: target mock exists under a different key after
SetUnFilteredMocks shifted all IDs. update() must find it by name and
update it correctly.
TestUpdateIDIndexRejectsCollision
ID-index path also validates identity: if idIndex[id] points to a node
holding a different mock, the operation is rejected.
All 4 tests pass: ok go.keploy.io/server/v3/pkg/agent/proxy 0.008s
pkg/agent/proxy/treedb.go — update() method rewritten with 3-layer validation
pkg/agent/proxy/treedb_test.go — new unit tests for update() identity semantics
Links & References
Closes: #[issue number that will be closed through this PR]
🔗 Related PRs
🐞 Related Issues
📄 Related Documents
What type of PR is this? (check all applicable)
Added e2e test pipeline?
Added comments for hard-to-understand areas?
Added to documentation?
Are there any sample code or steps to test the changes?
Self Review done?
Any relevant screenshots, recordings or logs?
🧠 Semantics for PR Title & Branch Name
Please ensure your PR title and branch name follow the Keploy semantics:
📌 PR Semantics Guide
📌 Branch Semantics Guide
Examples:
fix: patch MongoDB document update bugfeat/#1-login-flow(You may skip mentioning the issue number in the branch name if the change is small and the PR description clearly explains it.)Additional checklist: