Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

glretrace: Add possibility to check for a lost context #794

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

gerddie
Copy link
Contributor

@gerddie gerddie commented May 20, 2022

Add a command line option to create an EGL context that supports
lost context notification, and check with each rendered frame whether
the context was lost. In that case abort replaying and return an error
code -2.

Closes: #787

Signed-off-by: Gert Wollny [email protected]

@gerddie gerddie self-assigned this May 20, 2022
@gerddie gerddie marked this pull request as draft May 20, 2022 14:19
@gerddie
Copy link
Contributor Author

gerddie commented May 20, 2022

Completely untested ...

Copy link
Member

@jrfonseca jrfonseca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor things, otherwise looks great. Thanks!

@@ -268,6 +268,8 @@ static void retrace_eglCreateContext(trace::Call &call) {
break;
}

if (retrace::notifyLostContext)
profile.notifyLostContext = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 -> true

@@ -326,13 +328,20 @@ static void retrace_eglSwapBuffers(trace::Call &call) {
}
} else {
glFlush();
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spurious whitespace


if (retrace::profilingFrameTimes) {
// Wait for presentation to finish
glFinish();
std::cout << "rendering_finished " << glretrace::getCurrentTime() << std::endl;
}

if (retrace::notifyLostContext) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

&& ... has_EXT_create_context_robustness

@gerddie gerddie force-pushed the check-context-reset branch from 565ab6d to a78f75a Compare May 23, 2022 10:39
@jrfonseca
Copy link
Member

@okias, do you have a trace that triggers GPU resets? If so, please see if this works/helps.

@@ -159,6 +159,15 @@ static void retrace_glXSwapBuffers(trace::Call &call) {
glFlush();
}

if (retrace::notifyLostContext) {
if (glGetGraphicsResetStatus() != GL_NO_ERROR) {
std::cout << "Context was lost aborting rendering" << std::endl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update.

Here should be std::cerr << "error: Context ...

@okias
Copy link
Contributor

okias commented May 24, 2022

@okias, do you have a trace that triggers GPU resets? If so, please see if this works/helps.

I have on Mesa3D CI, but since it involve lot of work, I'm looking for a way how to kill my TGL GPU in laptop first :)

@okias
Copy link
Contributor

okias commented May 24, 2022

ok, with ./apitrace retrace --notify-lost-context -w /home/projects/collabora/traces-db/unvanquished/unvanquished-ultra.trace it runs the trace, but seems to doesn't trigger warning -> running in background:

while true; do sudo ./i915_hangman --run-subtest error-state-basic; sleep 1; done

from xorg-intel-gpu-tools on TGL.

And it does nothing when running

 time ./apitrace retrace --notify-lost-context /home/projects/collabora/traces-db/unvanquished/unvanquished-ultra.trace
./apitrace retrace --notify-lost-context   0.02s user 0.02s system 101% cpu 0.039 total

@okias
Copy link
Contributor

okias commented May 24, 2022

I tried iterate whole ./i915_hangman from https://github.com/freedesktop/xorg-intel-gpu-tools with looped apitrace, but no luck.

@gerddie
Copy link
Contributor Author

gerddie commented May 25, 2022

@okias what backend is used to create the context when running the trace?

@okias
Copy link
Contributor

okias commented May 25, 2022

@okias what backend is used to create the context when running the trace?

my bad, it's GLX. I thought I have at least something traced in EGL, but I don't see any trace with EGL.

@@ -729,6 +729,7 @@ usage(const char *argv0) {
" --no-context-check don't check that the actual GL context version matches the requested version\n"
" --min-cpu-time=NANOSECONDS ignore calls with less than this CPU time when profiling (default is 1000)\n"
" --ignore-calls=CALLSET ignore calls in CALLSET\n"
" --notify-context-lost enable_context_lost_notify\n"
Copy link
Contributor

@okias okias May 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notify-lost-context, word order is incorrect

@okias
Copy link
Contributor

okias commented May 25, 2022

❯ cd piglit/bin
❯ apitrace trace -o egl.trace ./egl-gl-colorspace 
❯ ./apitrace replay --notify-lost-context -w egl.trace
/home/projects/collabora/piglit/bin/egl-gl-colorspace
renderableType = 1
53: message: api performance issue 1: memory mapping a busy "miptree" BO stalled and took 0.839 ms.
55: message: api performance issue 1: memory mapping a busy "miptree" BO stalled and took 0.230 ms.
56: message: major api error 2: GL_INVALID_OPERATION in unsupported function called (unsupported extension or deprecated function?)
Rendered 1 frames in 0.0256996 secs, average of 38.9111 fps

for a regular run without crashing it.
Without notify-lost-context, there is no GL_INVAL_OP

@gerddie gerddie force-pushed the check-context-reset branch 3 times, most recently from b360656 to 341e9c0 Compare May 27, 2022 12:10
@okias
Copy link
Contributor

okias commented Jun 1, 2022

I tried error-state-basic, hangcheck-unterminated with traces made with glx, but no luck getting fail report from apitrace.

gerddie added 4 commits August 7, 2023 10:42
Add a command line option to create an EGL context that supports
lost context notification, and check with each rendered frame whether
the context was lost. In that case abort replaying and return an error
code -2.

v2: Check for support before querying the reset status (EGL)

Signed-off-by: Gert Wollny <[email protected]>
v2: check for support before using the reset status query (GLX)

Signed-off-by: Gert Wollny <[email protected]>
@gerddie gerddie force-pushed the check-context-reset branch from 341e9c0 to f4ed1e2 Compare August 7, 2023 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

replay: use EGL_CONTEXT_LOST to catch when GPU hangs
3 participants