Fix/replay after startup failure by AuraReaper · Pull Request #3964 · keploy/keploy

AuraReaper · 2026-03-23T17:18:04Z

Describe the changes that are made

Preserve per-test-set startup failure status during replay so Docker Compose app shutdowns are reported as APP_HALTED instead of being downgraded to INTERNAL_ERR.
Continue replaying the remaining test sets in Docker Compose mode when one test set causes the app to stop during startup.
Add focused unit tests for replay status resolution and abort behavior.

What type of PR is this? (check all applicable)

Added e2e test pipeline?

👍 yes
🙅 no, because they aren't needed
🙋 no, because I need help

Added comments for hard-to-understand areas?

👍 yes
🙅 no, because the code is self-explanatory

Added to documentation?

📜 README.md
📓 Wiki
🙅 no documentation needed

Are there any sample code or steps to test the changes?

👍 yes, mentioned below
🙅 no, because it is not needed

Self Review done?

✅ yes
❌ no, because I need help

Any relevant screenshots, recordings or logs?

Manual cloud replay verification was done locally.
Verified that the broken test set is marked as APP_HALTED and the subsequent test sets continue to run.

🧠 Semantics for PR Title & Branch Name

PR Title: fix: continue replay after docker compose startup failure
Branch Name: fix/replay-after-startup-failure

github-actions · 2026-03-23T17:18:19Z

CLA Assistant Lite bot:
Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.

I have read the CLA Document and I hereby sign the CLA

0 out of 2 committers have signed the CLA.
❌ @yash
❌ @AuraReaper
Yash seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You can retrigger this bot by commenting recheck in this Pull Request}

github-actions · 2026-03-23T17:18:21Z

The CLA check failed. Please ensure you have:

Signed the CLA by commenting 'I have read the CLA Document and I hereby sign the CLA.'
Used the correct email address in your commits (matches the one you used to sign the CLA).

After fixing these issues, comment 'recheck' to trigger the workflow again.

github-actions · 2026-03-23T17:18:44Z

The CLA check failed. Please ensure you have:

Signed the CLA by commenting 'I have read the CLA Document and I hereby sign the CLA.'
Used the correct email address in your commits (matches the one you used to sign the CLA).

After fixing these issues, comment 'recheck' to trigger the workflow again.

kilo-code-bot · 2026-03-23T17:18:56Z

Kilo Code Review could not run — your account is out of credits.

Add credits at app.kilo.ai to enable reviews on this change.

Copilot

Pull request overview

This PR adjusts replay status handling so that startup failures (notably in Docker Compose mode) preserve an APP_HALTED/APP_FAULT classification instead of being overwritten as INTERNAL_ERR, and so that a single failing test set doesn’t necessarily abort the entire Docker Compose replay run.

Changes:

Added helper functions to map application/run-time errors to TestSetStatus, and to decide when a replay run should abort.
Updated replay flow to avoid aborting the full run on APP_HALTED / APP_FAULT when running in Docker Compose mode.
Added logic to resolve/override status in several error paths (consumed mocks, mock params updates, test case results retrieval, and final status resolution).

Comments suppressed due to low confidence (1)

pkg/service/replay/replay.go:450

In Docker Compose mode, APP_HALTED/APP_FAULT no longer aborts the full run, but the flaky-check attempt loop will still continue rerunning the same failing test set (since abortTestRun stays false). That can waste time and may restart the same broken compose stack multiple times before moving on. Consider breaking out of the per-testset attempt loop immediately for terminal statuses (APP_HALTED/APP_FAULT/INTERNAL_ERR), while still allowing the outer loop to continue to the next test set in Docker Compose mode.

			switch testSetStatus {
			case models.TestSetStatusAppHalted:
				testSetResult = false
				abortTestRun = shouldAbortTestRun(testSetStatus, cmdType)
			case models.TestSetStatusInternalErr:
				testSetResult = false
				abortTestRun = shouldAbortTestRun(testSetStatus, cmdType)
			case models.TestSetStatusFaultUserApp:
				testSetResult = false
				abortTestRun = shouldAbortTestRun(testSetStatus, cmdType)
			case models.TestSetStatusUserAbort:
				return nil
			case models.TestSetStatusFailed:
				testSetResult = false
			case models.TestSetStatusPassed:
				testSetResult = true
			case models.TestSetStatusIgnored:
				testSetResult = false
			case models.TestSetStatusNoTestsToRun:
				testSetResult = false
			}

			if testSetStatus != models.TestSetStatusIgnored {
				testRunResult = testRunResult && testSetResult
				if abortTestRun {
					break
				}
			}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pkg/service/replay/replay.go

…keploy into fix/replay-after-startup-failure

github-actions · 2026-03-24T04:28:08Z

The CLA check failed. Please ensure you have:

Signed the CLA by commenting 'I have read the CLA Document and I hereby sign the CLA.'
Used the correct email address in your commits (matches the one you used to sign the CLA).

After fixing these issues, comment 'recheck' to trigger the workflow again.

kilo-code-bot · 2026-03-24T04:30:01Z

pkg/service/replay/replay.go

+		strings.Contains(errMsg, "connection reset by peer") ||
+		strings.Contains(errMsg, "broken pipe") ||
+		strings.Contains(errMsg, "server closed") ||
+		strings.Contains(errMsg, "eof") ||


WARNING: Overly broad string match

The check for "eof" is very short and could inadvertently match error messages that happen to contain "eof" as a substring (e.g., "malloc of size...", or any error mentioning "EOF" in a different context). Consider using a more specific match like "eof" at word boundaries, or checking for the exact string "eof" (as the full error message) or "unexpected eof".

kilo-code-bot · 2026-03-24T04:30:01Z

pkg/service/replay/replay.go

-				default:
-					testSetStatusByErrChan = models.TestSetStatusAppHalted
 				}
 				utils.LogError(r.logger, err, "application failed to run")


SUGGESTION: Error log lacks actionable next step

Per coding guidelines, error logs should provide a logical next step for the user, not just state what failed. Consider enhancing this message to guide users on troubleshooting, e.g., "application failed to run; check the application logs for details or verify the app command is correct".

kilo-code-bot · 2026-03-24T04:30:01Z

pkg/service/replay/replay.go

-					default:
-						testSetStatusByErrChan = models.TestSetStatusAppHalted
 					}
 					utils.LogError(r.logger, err, "application failed to run")


SUGGESTION: Error log lacks actionable next step

Same as line 855 - this error log should provide guidance to the user. Consider: "application failed to run; check the application logs for details or verify the app command is correct".

kilo-code-bot · 2026-03-24T04:30:17Z

Code Review Summary

Status: 3 Issues Found (unchanged from previous review) | Recommendation: Address before merge

Overview

Severity	Count
CRITICAL	0
WARNING	1
SUGGESTION	2

Issue Details (click to expand)

WARNING

File	Line	Issue
`pkg/service/replay/replay.go`	92	Overly broad string match for "eof" in `isDockerComposeReplayShutdown` could match unintended errors

SUGGESTION

File	Line	Issue
`pkg/service/replay/replay.go`	855	Error log "application failed to run" lacks actionable next step for the user
`pkg/service/replay/replay.go`	1107	Error log "application failed to run" lacks actionable next step for the user

Incremental Review Notes

New commits reviewed since 8e6a0a9:

Performance test workflow added: New CI workflow and shell scripts for performance testing with k6. No issues found - scripts include proper error handling with actionable messages.
DNS mock lookup refactored: pkg/agent/proxy/dns.go simplified by removing retry loop and updateDNSMock. Now uses stateless lookup via GetStatelessMocks. This is a good simplification that removes complexity and potential race conditions.
MockManager optimized: Added GetStatelessMocks method with optimized O(1) lookup maps (statelessFiltered/statelessUnfiltered). Implementation is thread-safe with proper mutex protection.
util.go improved: Added nil checks in filterByTimeStamp and filterByMapping. Also optimized filterByMapping to use map-based lookup (O(1)) instead of nested loops (O(n*m)).

Previous issues remain: The 3 issues from the initial review are still present in unchanged code and should still be addressed.

Other Observations (not in diff)

Other error logs in unchanged code also lack actionable next steps (lines 1182, 1309, 1347, 1638). Per custom instructions, error logs should provide a logical next step for the user, not just state what failed. Consider a follow-up to enhance these messages.

Positive Notes

The mutex protection for testSetStatusByErrChan via setErrStatus/getErrStatus properly addresses the data race concern
The mapAppErrorToTestSetStatus function correctly handles ErrTestBinStopped and has a sensible default case
The refactoring consolidates error-to-status mapping logic into reusable helper functions
The status resolution logic in resolveTestSetStatus properly handles precedence of terminal statuses
New GetStatelessMocks provides efficient O(1) DNS mock lookup, improving performance
Nil checks added to mock filtering functions improve robustness

Files Reviewed (7 files)

pkg/service/replay/replay.go - 3 issues (unchanged from previous review)
pkg/agent/proxy/dns.go - 0 new issues (simplified stateless lookup)
pkg/agent/proxy/mockmanager.go - 0 new issues (added GetStatelessMocks)
pkg/util.go - 0 new issues (added nil checks, optimized filterByMapping)
.github/workflows/keploy-performance-test.yml - 0 issues (new CI workflow)
.github/workflows/test_workflow_scripts/performance/*.sh - 0 issues (CI scripts with proper error handling)

Fix these issues in Kilo Cloud

_{Reviewed by claude-4.5-opus-20251124 · 447,999 tokens}

github-actions · 2026-03-25T15:21:23Z

The CLA check failed. Please ensure you have:

Signed the CLA by commenting 'I have read the CLA Document and I hereby sign the CLA.'
Used the correct email address in your commits (matches the one you used to sign the CLA).

After fixing these issues, comment 'recheck' to trigger the workflow again.

github-actions · 2026-03-25T15:25:43Z

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run	P50	P90	P99	RPS	Error Rate	Status
1	3.02ms	6.1ms	606.16ms	98.93	0.00%	❌ FAIL
2	2.63ms	3.31ms	4.9ms	100.00	0.00%	✅ PASS
3	2.55ms	3.23ms	4.95ms	100.00	0.00%	✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

✅ Result: PASSED - Only 1 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

Yash added 2 commits March 23, 2026 22:40

fix: improve test set status resolution for Docker Compose shutdowns

7c4365a

refactor: Consolidate replay test set status and Docker Compose shutdown

b868e60

Copilot AI review requested due to automatic review settings March 23, 2026 17:18

AuraReaper requested review from Sarthak160 and gouravkrosx as code owners March 23, 2026 17:18

Merge branch 'main' into fix/replay-after-startup-failure

31ff642

Copilot started reviewing on behalf of AuraReaper March 23, 2026 17:18 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

pkg/service/replay/replay.go Outdated Show resolved Hide resolved

pkg/service/replay/replay.go Show resolved Hide resolved

pkg/service/replay/replay.go Show resolved Hide resolved

Yash added 2 commits March 24, 2026 09:57

fix(replay): mutex-protect test set status, enhance error mapping

acead0c

Merge branch 'fix/replay-after-startup-failure' of github.com:keploy/…

8e6a0a9

…keploy into fix/replay-after-startup-failure

kilo-code-bot bot reviewed Mar 24, 2026

View reviewed changes

AuraReaper mentioned this pull request Mar 24, 2026

[bug]: replay aborts remaining test sets after app startup failure in Docker Compose replay #3965

Open

1 task

Merge branch 'main' into fix/replay-after-startup-failure

f755a05

AuraReaper self-assigned this Mar 25, 2026

Uh oh!

Conversation

AuraReaper commented Mar 23, 2026

Describe the changes that are made

What type of PR is this? (check all applicable)

Added e2e test pipeline?

Added comments for hard-to-understand areas?

Added to documentation?

Are there any sample code or steps to test the changes?

Self Review done?

Any relevant screenshots, recordings or logs?

🧠 Semantics for PR Title & Branch Name

Uh oh!

github-actions bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

kilo-code-bot bot commented Mar 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

kilo-code-bot bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Summary

Overview

WARNING

SUGGESTION

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

github-actions bot commented Mar 25, 2026

🚀 Keploy Performance Test Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Mar 23, 2026 •

edited

Loading

kilo-code-bot bot commented Mar 24, 2026 •

edited

Loading