-
-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using Fibers causes epic crash #46
Comments
This is the bare minimum required to make fibers work within the go runtime.
This is the bare minimum required to make fibers work within the go runtime.
This is the bare minimum required to make fibers work within the go runtime.
@dunglas the following Docker file (props @cdaguerre in #374) appears to "fix" fibers. At least for this reproducer with manual testing. It needs more testing:
🥳 🤞 🤞 still testing... |
Great news! Don't hesitate to open a PR with this changes, so we can see if this fix the issue for all architectures. |
I'll do some proper testing by Monday (by updating the fiber branch), but I haven't seen a crash yet via manual testing. |
@withinboredom I had issues with fibers so I could also test this on my Cloud Run service but not really sure where can I get docker image to use with this fix. |
It doesn't fix it, per se, more-or-less just reduces the probability of a crash. Edit to add: the best way to prevent a crash is to just not output anything at all inside a fiber. |
I've just encountered this issue and using the workaround from @withinboredom did resolve the exception. In this project the culprit seem to be the monolog logger as that is the only place fibers are being used. |
I started working on a cgo library several weeks ago to allow output from c to go without calling go. It's still a wip: https://github.com/withinboredom/cgoc There's a segfault once the number of concurrent requests gets high (due to usage of some C synchronization primitives from go), and a memory leak, but the it's pretty fast by itself (~8gbs on my machine). I hope to have it working sometime in the next few months as a potential solution. |
@withinboredom IMHO the best option would be to fix the issue directly in Go! |
@dunglas I highly doubt it will ever be fixable, for very valid reasons. The reason it is failing boils down to the following:
According to the CL (https://go-review.googlesource.com/c/go/+/530480) this means changing the stack for an If we can fix the ncgo issue, then we are free to muck around with the stack as much as we want. |
One way to fix it might be to have |
According to golang/go#62130 (comment), this seems fixable directly in Go for our case. |
This would work: I've been tearing apart the Fiber/boost context implementation to see if I can pop the stack back to original and jump to go, then on returning, replace the stack. The only problem with this approach (and fwiw, I do have it mostly working) is that it requires assembly and I am only familiar with x86-64 assembly. We would need to write assembly for every architecture (and there are some big perf hits here). |
It turns out the patch to get it working is pretty darn simple. diff --git a/src/runtime/cgocall.go b/src/runtime/cgocall.go
index 0d3cc40903..609c5dbc52 100644
--- a/src/runtime/cgocall.go
+++ b/src/runtime/cgocall.go
@@ -215,34 +215,6 @@ func cgocall(fn, arg unsafe.Pointer) int32 {
func callbackUpdateSystemStack(mp *m, sp uintptr, signal bool) {
g0 := mp.g0
- inBound := sp > g0.stack.lo && sp <= g0.stack.hi
- if mp.ncgo > 0 && !inBound {
- // ncgo > 0 indicates that this M was in Go further up the stack
- // (it called C and is now receiving a callback).
- //
- // !inBound indicates that we were called with SP outside the
- // expected system stack bounds (C changed the stack out from
- // under us between the cgocall and cgocallback?).
- //
- // It is not safe for the C call to change the stack out from
- // under us, so throw.
-
- // Note that this case isn't possible for signal == true, as
- // that is always passing a new M from needm.
-
- // Stack is bogus, but reset the bounds anyway so we can print.
- hi := g0.stack.hi
- lo := g0.stack.lo
- g0.stack.hi = sp + 1024
- g0.stack.lo = sp - 32*1024
- g0.stackguard0 = g0.stack.lo + stackGuard
- g0.stackguard1 = g0.stackguard0
-
- print("M ", mp.id, " procid ", mp.procid, " runtime: cgocallback with sp=", hex(sp), " out of bounds [", hex(lo), ", ", hex(hi), "]")
- print("\n")
- exit(2)
- }
-
if !mp.isextra {
// We allocated the stack for standard Ms. Don't replace the
// stack bounds with estimated ones when we already initialized It turns out, because of a few conditions, nothing fancy is required:
If we are OK with having a custom version of go for forever ... then this is likely the best solution, but I highly doubt it would be accepted into go. Note that this is probably a very ugly crash if output is sent from a thread from the parallel extension... because (3) will be violated above. This can probably be mitigated by marshaling the output in C, to the "main" thread, if the current thread isn't the "main" thread. This needs some further testing. Before I go into this further, are we ok with a custom go patch for the foreseeable future @dunglas? I will create a PR to go, arguing for this patch, but I suspect it won't be accepted. If we are, this is what I propose: A. testing for (3) above and verify if any further work is required |
Minimal code to reproduce:
With some slight modifications, it can also be reproduced in worker mode.
The text was updated successfully, but these errors were encountered: