Skip to content

fix(proxy): prevent stale-key collision in treedb.update() after SetUnFilteredMocks re-indexes#3946

Open
AkashKumar7902 wants to merge 2 commits intomainfrom
fix/treedb-stale-key-collision-on-reindex
Open

fix(proxy): prevent stale-key collision in treedb.update() after SetUnFilteredMocks re-indexes#3946
AkashKumar7902 wants to merge 2 commits intomainfrom
fix/treedb-stale-key-collision-on-reindex

Conversation

@AkashKumar7902
Copy link
Contributor

Describe the changes that are made

During replay, SetUnFilteredMocks is called twice per test-set:

Phase 1 (wide window, once at test-set start):
SendMockFilterParamsToAgent(ctx, [], BaseTime, now, ...)
→ FilterTcsMocks marks ALL mocks IsFiltered=true
→ SetUnFilteredMocks: deleteAll() + re-insert; assigns SortOrder/ID 0..N-1

Phase 2 (narrow window, once per test case):
SendMockFilterParamsToAgent(ctx, expectedNames, reqTime, respTime, ...)
→ FilterTcsMocks re-evaluates; startup mocks (outside window) become IsFiltered=false
→ SetUnFilteredMocks: deleteAll() + re-insert again; startup mocks move to end,
remaining mocks shift: what was ID=10 at Phase 1 is now ID=7 at Phase 2

Meanwhile, the integrations-side globalCache (processedMockCache in pkg/postgres/v2/replayer/replayer.go) caches mock pointers together with their Phase-1 TestModeInfo{SortOrder, ID} values. The cache is invalidated only when MockManager.Revision() changes — but in-flight connections that already read the old cache continue to use stale TestModeInfo values.

When such a connection matches a mock and calls UpdateUnFilteredMock → treedb.update(), it supplies the stale Phase-1 key (e.g. SortOrder=11, ID=10). After Phase 2 re-indexed the tree, that key slot now holds a completely different mock (e.g. mock-13). The old update() had no identity check — it trusted the caller's key unconditionally, removed mock-13, and left mock-10 (the actual target) untouched in the tree. Subsequent DB connections that needed mock-10 (e.g. the StartupMessage/auth for connID="2") found nothing and failed with "No matching mock found", causing exit code 1.

This was the root cause of test-set-132 failing while test-set-135 (single connID, no second DB connection) passed — test-set-135 never needed mock-10's equivalent.

Add three-layer identity validation inside treedb.update(), all executed under the existing mutex so the check-and-remove is atomic:

Layer 1 — Exact key match with name validation:
Look up oldKey in the RBT. If found, check mock.Name == expectedName before
proceeding. If the name doesn't match (stale key collision), mark as not-found
and fall through instead of removing the wrong mock.

Layer 2 — ID-index fallback with name validation:
If exact key failed, look up oldInfo.ID in idIndex to get the current key for
this mock (handles the case where only SortOrder shifted but ID is stable).
Validate name again before removing.

Layer 3 — Linear name scan fallback:
If both key and ID are stale (full re-index after deleteAll), iterate the RBT
to find the node whose mock.Name == expectedName. Use the found node's ID in
the new key to keep idIndex consistent.

expectedName is extracted from newObj (*models.Mock).Name. This is always correct because UpdateUnFilteredMock deep-copies the matched mock into updateMock before calling treedb.update(), so newObj.Name always carries the identity of the mock that was actually matched by the integrations layer.

  1. Abstraction boundary: treedb is the authoritative store. Store integrity (only remove what was asked for) must be enforced by the store itself, not scattered across every caller.

  2. Atomicity: any integrations-side fix would require a read-then-update pattern (re-fetch current key, then call update) across two separate operations, leaving a TOCTOU window where another goroutine could call SetUnFilteredMocks between the read and the update. The treedb fix happens entirely under the mutex — the name check and the removal are a single atomic operation.

pkg/agent/proxy/treedb_test.go (new file, 4 tests):

TestUpdateExactMatch
Normal happy path: mock at matching key is found and updated.

TestUpdateRejectsKeyCollision
Bug scenario: stale key points to a different mock after re-index.
update() must reject the operation and leave the innocent mock intact.

TestUpdateNameFallbackAfterReindex
Linear scan fallback: target mock exists under a different key after
SetUnFilteredMocks shifted all IDs. update() must find it by name and
update it correctly.

TestUpdateIDIndexRejectsCollision
ID-index path also validates identity: if idIndex[id] points to a node
holding a different mock, the operation is rejected.

All 4 tests pass: ok go.keploy.io/server/v3/pkg/agent/proxy 0.008s

pkg/agent/proxy/treedb.go — update() method rewritten with 3-layer validation
pkg/agent/proxy/treedb_test.go — new unit tests for update() identity semantics

Links & References

Closes: #[issue number that will be closed through this PR]

  • NA (if very small change like typo, linting, etc.)

🔗 Related PRs

  • NA

🐞 Related Issues

  • NA

📄 Related Documents

  • NA

What type of PR is this? (check all applicable)

  • 📦 Chore
  • 🍕 Feature
  • 🐞 Bug Fix
  • 📝 Documentation Update
  • 🎨 Style
  • 🧑‍💻 Code Refactor
  • 🔥 Performance Improvements
  • ✅ Test
  • 🔁 CI
  • ⏩ Revert

Added e2e test pipeline?

  • 👍 yes (added unit test)
  • 🙅 no, because they aren't needed
  • 🙋 no, because I need help

Added comments for hard-to-understand areas?

  • 👍 yes
  • 🙅 no, because the code is self-explanatory

Added to documentation?

  • 📜 README.md
  • 📓 Wiki
  • 🙅 no documentation needed

Are there any sample code or steps to test the changes?

  • 👍 yes, mentioned below
  • 🙅 no, because it is not needed

Self Review done?

  • ✅ yes
  • ❌ no, because I need help

Any relevant screenshots, recordings or logs?

  • NA

🧠 Semantics for PR Title & Branch Name

Please ensure your PR title and branch name follow the Keploy semantics:

📌 PR Semantics Guide
📌 Branch Semantics Guide

Examples:

  • PR Title: fix: patch MongoDB document update bug
  • Branch Name: feat/#1-login-flow (You may skip mentioning the issue number in the branch name if the change is small and the PR description clearly explains it.)

Additional checklist:

…nFilteredMocks re-indexes

During replay, SetUnFilteredMocks is called twice per test-set:

  Phase 1 (wide window, once at test-set start):
    SendMockFilterParamsToAgent(ctx, [], BaseTime, now, ...)
    → FilterTcsMocks marks ALL mocks IsFiltered=true
    → SetUnFilteredMocks: deleteAll() + re-insert; assigns SortOrder/ID 0..N-1

  Phase 2 (narrow window, once per test case):
    SendMockFilterParamsToAgent(ctx, expectedNames, reqTime, respTime, ...)
    → FilterTcsMocks re-evaluates; startup mocks (outside window) become IsFiltered=false
    → SetUnFilteredMocks: deleteAll() + re-insert again; startup mocks move to end,
      remaining mocks shift: what was ID=10 at Phase 1 is now ID=7 at Phase 2

Meanwhile, the integrations-side globalCache (processedMockCache in
pkg/postgres/v2/replayer/replayer.go) caches mock pointers together with their
Phase-1 TestModeInfo{SortOrder, ID} values. The cache is invalidated only when
MockManager.Revision() changes — but in-flight connections that already read the
old cache continue to use stale TestModeInfo values.

When such a connection matches a mock and calls UpdateUnFilteredMock → treedb.update(),
it supplies the stale Phase-1 key (e.g. SortOrder=11, ID=10). After Phase 2 re-indexed
the tree, that key slot now holds a completely different mock (e.g. mock-13). The old
update() had no identity check — it trusted the caller's key unconditionally, removed
mock-13, and left mock-10 (the actual target) untouched in the tree. Subsequent DB
connections that needed mock-10 (e.g. the StartupMessage/auth for connID="2") found
nothing and failed with "No matching mock found", causing exit code 1.

This was the root cause of test-set-132 failing while test-set-135 (single connID,
no second DB connection) passed — test-set-135 never needed mock-10's equivalent.

Add three-layer identity validation inside treedb.update(), all executed under the
existing mutex so the check-and-remove is atomic:

  Layer 1 — Exact key match with name validation:
    Look up oldKey in the RBT. If found, check mock.Name == expectedName before
    proceeding. If the name doesn't match (stale key collision), mark as not-found
    and fall through instead of removing the wrong mock.

  Layer 2 — ID-index fallback with name validation:
    If exact key failed, look up oldInfo.ID in idIndex to get the current key for
    this mock (handles the case where only SortOrder shifted but ID is stable).
    Validate name again before removing.

  Layer 3 — Linear name scan fallback:
    If both key and ID are stale (full re-index after deleteAll), iterate the RBT
    to find the node whose mock.Name == expectedName. Use the found node's ID in
    the new key to keep idIndex consistent.

expectedName is extracted from newObj (*models.Mock).Name. This is always correct
because UpdateUnFilteredMock deep-copies the matched mock into updateMock before
calling treedb.update(), so newObj.Name always carries the identity of the mock
that was actually matched by the integrations layer.

1. Abstraction boundary: treedb is the authoritative store. Store integrity
   (only remove what was asked for) must be enforced by the store itself, not
   scattered across every caller.

2. Atomicity: any integrations-side fix would require a read-then-update pattern
   (re-fetch current key, then call update) across two separate operations, leaving
   a TOCTOU window where another goroutine could call SetUnFilteredMocks between
   the read and the update. The treedb fix happens entirely under the mutex —
   the name check and the removal are a single atomic operation.

pkg/agent/proxy/treedb_test.go (new file, 4 tests):

  TestUpdateExactMatch
    Normal happy path: mock at matching key is found and updated.

  TestUpdateRejectsKeyCollision
    Bug scenario: stale key points to a different mock after re-index.
    update() must reject the operation and leave the innocent mock intact.

  TestUpdateNameFallbackAfterReindex
    Linear scan fallback: target mock exists under a different key after
    SetUnFilteredMocks shifted all IDs. update() must find it by name and
    update it correctly.

  TestUpdateIDIndexRejectsCollision
    ID-index path also validates identity: if idIndex[id] points to a node
    holding a different mock, the operation is rejected.

All 4 tests pass: ok go.keploy.io/server/v3/pkg/agent/proxy 0.008s

  pkg/agent/proxy/treedb.go       — update() method rewritten with 3-layer validation
  pkg/agent/proxy/treedb_test.go  — new unit tests for update() identity semantics

Signed-off-by: Akash Kumar <[email protected]>
Copilot AI review requested due to automatic review settings March 20, 2026 09:24
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens TreeDb.update() against stale-key collisions caused by SetUnFilteredMocks re-indexing, preventing accidental removal of the wrong mock during replay.

Changes:

  • Adds multi-layer identity validation to treedb.update() (exact-key → ID-index → name-scan fallback).
  • Introduces unit tests covering the collision scenario and fallback behaviors.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
pkg/agent/proxy/treedb.go Adds identity checks and fallback lookup paths to avoid removing the wrong mock after re-indexing.
pkg/agent/proxy/treedb_test.go Adds tests that exercise exact match, key-collision rejection, name fallback after reindex, and ID-index collision rejection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +107 to +123
currentKey, exists := db.idIndex[oldInfo.ID]
if exists {
// Verify identity before removing
if expectedName != "" {
idVal, idFound := db.rbt.Get(currentKey)
if idFound {
if existingMock, ok := idVal.(*models.Mock); ok && existingMock != nil {
if existingMock.Name != expectedName {
exists = false // Different mock at this ID, skip
}
}
}
}
if exists {
db.rbt.Remove(currentKey)
db.rbt.Put(newKey, newObj)
delete(db.idIndex, oldInfo.ID)
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idFound == false currently leaves exists == true, so the ID-index path can still proceed and return success even when the idIndex points to a key that no longer exists in the tree. That can create “phantom updates” (inserting newKey without actually removing/replacing the intended old entry) and can corrupt idIndex consistency. Set exists = false (or return false) when db.rbt.Get(currentKey) does not find a node.

Copilot uses AI. Check for mistakes.
Comment on lines +148 to +158
db.rbt.Remove(foundKey)
// Use the found mock's ID in the new key to keep idIndex consistent
adjustedNewKey := newInfo
if foundInfoOk {
adjustedNewKey.ID = foundInfo.ID
delete(db.idIndex, foundInfo.ID)
}
db.rbt.Put(adjustedNewKey, newObj)
if okNew {
db.idIndex[adjustedNewKey.ID] = adjustedNewKey
}
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback path always derives the new key from newInfo (a models.TestModeInfo), even if okNew is false. In that case newInfo is the zero value and db.rbt.Put(adjustedNewKey, ...) can insert under an unintended key type/value for the tree/comparator. This path should require okNew before using newInfo (otherwise avoid updating / return false), or build the adjusted key from the original newKey in a type-safe way.

Suggested change
db.rbt.Remove(foundKey)
// Use the found mock's ID in the new key to keep idIndex consistent
adjustedNewKey := newInfo
if foundInfoOk {
adjustedNewKey.ID = foundInfo.ID
delete(db.idIndex, foundInfo.ID)
}
db.rbt.Put(adjustedNewKey, newObj)
if okNew {
db.idIndex[adjustedNewKey.ID] = adjustedNewKey
}
// We can only safely construct a new tree key if the caller provided
// a valid TestModeInfo-based newKey (okNew) and the found key is also
// a TestModeInfo (foundInfoOk). Otherwise, avoid mutating the tree.
if !okNew || !foundInfoOk {
return false
}
db.rbt.Remove(foundKey)
// Use the found mock's ID in the new key to keep idIndex consistent
adjustedNewKey := newInfo
adjustedNewKey.ID = foundInfo.ID
delete(db.idIndex, foundInfo.ID)
db.rbt.Put(adjustedNewKey, newObj)
db.idIndex[adjustedNewKey.ID] = adjustedNewKey

Copilot uses AI. Check for mistakes.
Comment on lines +83 to +91
// First try exact key match
existingVal, found := db.rbt.Get(oldKey)
if found && expectedName != "" {
if existingMock, ok := existingVal.(*models.Mock); ok && existingMock != nil {
if existingMock.Name != expectedName {
found = false // Different mock at this key, skip
}
}
}
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The collision-prevention logic is skipped when expectedName == "", meaning a stale key can still remove the wrong entry (the original bug class) if a mock name is empty or if newObj isn’t a *models.Mock. If the goal is “never remove a different mock than intended,” consider treating missing expectedName as a hard failure (return false) or adding an alternate identity signal (e.g., validate against oldInfo.ID/other invariant) before allowing a remove.

Copilot uses AI. Check for mistakes.
Comment on lines +95 to +100
// Stale reference uses old key for mock-7
staleKey := models.TestModeInfo{SortOrder: 8, ID: 7}
newMock7 := mockWith("mock-7", 99, 7)
newMock7.TestModeInfo.IsFiltered = false

ok := db.update(staleKey, newMock7.TestModeInfo, newMock7)
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestUpdateNameFallbackAfterReindex doesn’t assert that (a) the updated mock is present under the actual inserted key after fallback, and (b) the stored mock’s internal TestModeInfo is consistent with the key chosen by update() (especially since the implementation may adjust the ID). Adding assertions for the final key presence + mock.TestModeInfo/idIndex consistency would catch subtle regressions in the fallback logic.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants