-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: fatal error: found bad pointer in Go heap #29362
Comments
It would help if you could try with the 1.12 beta1 and see if the crash continues to happen. Also, I guess it's not possible to reduce the code to a shortened reproducer which we can use ? |
A short reproducer, or any more information would be great. It's really hard to tell what could be going wrong with just this information, though lowering I think it's still too early to rule out a cgo and unsafe issue until we get more information. Although you say you don't use cgo, perusing at some of your dependencies in The fact that it succeeds with As for an example of where it could still be a cgo issue: consider if one of your dependencies was making some false assumption about cgo that's mostly true on a certain platform in an older toolchain version, but with a toolchain release that assumption became true less often. Similar case with unsafe. Not saying that this is exactly what's going on (in fact, it's probably not) but at this point it's still a possibility until we can figure out where that pointer came from. Figuring out where the pointer came from is a bit tricky, because in this case the error is being caught during a write barrier buffer flush, which could be happening long after the write barrier for the pointer value was executed. One thing that might help fish this out is to open this up in gdb and try and see what values are at the offending address, and see if that looks like anything you might recognize? As @agnivade suggests, could you run it with the 1.12 beta as well? |
When I said "We don't use cgo", it meant we can build our application with On the other hand, it's difficult to build without unsafe. Even |
Got it! Thanks for confirming and I'm sorry for the noise.
For sure, and I'm certainly not suggesting you stop using unsafe. All I was trying to get at was that maybe a dependency isn't using the unsafe package right. I only mention it to leave it on the table as a possibility, because until we know where that pointer came from, or have other new information, it's really up in the air. It could very well be a bug in the runtime, but it could also be in user code someplace. The fact that all of your backtraces have this failing in Another thing that might be helpful is finding which commit between 1.10 and 1.11 caused the issue to start happening via bisection. This would also give us a lot of information (though I suspect this might take a long time, if it takes ~30 minutes to reproduce). |
I have a reproducer at https://github.com/mark-rushakoff/go-issue-24993 that produces a similar trace:
Some commentary ongoing in #24993. Not 100% clear to me if the reproducer is related, but it does at least produce the |
Updated: No crash with |
Updated: Crash with Go 1.12beta1 |
Huh. I'm only loosely familiar with the precise semantics of
Awesome, I'll dig into this tomorrow and see if I can figure anything out. |
I wonder if we could have a debug mode that the write barrier is always eager, i.e. not going through the buffer. So if the bad pointer is from the write barrier, we see a fault immediately. Effectively maybe we just set the buffer size to 0 so every write barrier triggers a flush? |
My reproducer is able to get the Full stack trace:
|
I got more interesting information.
Note that the bad pointer is
The bad pointer is the argument of func Keep(playerID int32) {
go func() {
err := updateAliveTime(playerID)
if err != nil {
logger.Errorf("Failed to keep connection. err=[%v]", err)
}
}()
} Note that |
I added quick check in
|
@aclements Would you take a look? I'm not sure about Go's GC implementation. In case of
|
Change https://golang.org/cl/155779 mentions this issue: |
@methane yeah that's definitely problematic; non-existent (or "zero-sized") stack maps don't seem to be treated properly. The compiler seems to forgo emitting a stack map because the entrypoint of Nice catch, and thanks for putting up a fix! This got me worried about this happening elsewhere, but I poked around and it seems OK. The bitvector's Just to be clear, does this fix the issue you've been seeing? There was also a bunch of stuff figured out/solved in #24993 which could result in a similar error but I assume you saw that and the patch there didn't help. |
I created small reproducer fot this. (amd64 only) |
The original issue was found in our company's load testing environment. We will run the load test today. I will share the result if we faced fatal error again. |
@aclements @randall77 Do you think golang.org/cl/155779 fixes #26243 as well? |
I'm not sure, but I don't think so. CL-155779 fixes non-pointer is put into write barrier buffer. |
@FiloSottile I don't think it's related to #26243, unfortunately. |
Change https://golang.org/cl/156122 mentions this issue: |
Change https://golang.org/cl/156123 mentions this issue: |
Derived from Naoki's reproducer. Update #29362 Change-Id: I1cbd33b38a2f74905dbc22c5ecbad4a87a24bdd1 Reviewed-on: https://go-review.googlesource.com/c/156122 Run-TryBot: Keith Randall <[email protected]> Reviewed-by: Brad Fitzpatrick <[email protected]> TryBot-Result: Gobot Gobot <[email protected]>
Use the length of the bitmap to decide how much to pass to the write barrier, not the total length of the arguments. The test needs enough arguments so that two distinct bitmaps get interpreted as a single longer bitmap. Update #29362 Change-Id: I78f3f7f9ec89c2ad4678f0c52d3d3def9cac8e72 Reviewed-on: https://go-review.googlesource.com/c/156123 Run-TryBot: Keith Randall <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Austin Clements <[email protected]>
Would you backport this to Go 1.11? |
Yes, we should backport these. @gopherbot please open an issue for backporting to 1.11. |
Backport issue(s) opened: #29565 (for 1.11). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases. |
@gopherbot please open an issue for backporting to 1.10. |
@randall77 You can ask gopherbot to open backport issues for both releases at the same time, but in general if "compile with 1.11" is a valid workaround, we are not backporting to 1.10 anymore. (Second paragraph of https://github.com/golang/go/wiki/MinorReleases.) |
Apologies for my limited imagination, but under what scenarios would “compile with 1.11” not be a sufficient workaround? |
I suppose if 1.11 dropped support for a relevant OS, but in practice I suspect the new policy is equivalent to "only security fixes are backported to two releases ago". |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?go env
OutputWe use
GOOS=linux GOARCH=amd64
when build.We ran the program on the other machine.
Running machine info
What did you do?
We're building online game server with Go.
We faced a random crash like this.
crash report
Other info:
-race
flag, without any race report.span.state
is always 1.runtime.wbBufFlush()
...runtime.findObject()
. But caller ofwbBufFlush()
is various.GOMAXPROCS=1
. It happens less than 30 min.GOMAXPROCS=1 GODEBUG=invalidptr=0
, more than 6 hours (false positive?)GOMAXPROCS=1
more than 9 hours. (Go 1.11 regression?)GODEBUG=gcstoptheworld=1
I don't think this issue is same to #26243 because stack trace and environment are different.
Can we do anything to investigate this issue?
The text was updated successfully, but these errors were encountered: