Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: re-enable TestCgoCallbackGC flaky on FreeBSD #16396

Closed
bradfitz opened this issue Jul 17, 2016 · 29 comments
Closed

runtime: re-enable TestCgoCallbackGC flaky on FreeBSD #16396

bradfitz opened this issue Jul 17, 2016 · 29 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-FreeBSD
Milestone

Comments

@bradfitz
Copy link
Contributor

I see a number of recent TestCgoCallbackGC flakes on FreeBSD

amd64:
https://build.golang.org/log/c47d9b4fe69df9c91af66122e7633f77cb623794
https://build.golang.org/log/2c93bb5f172cf8082a80bd567628bcea4f437c20
https://build.golang.org/log/92ad760a01e814bb0021486856b4d3123a4a11b7

386:
https://build.golang.org/log/47a3e8462838b94a69459a3e56de5ea852b5e86e
https://build.golang.org/log/d6604ec508fd4166c46bd6fb3e04a4b0fb945959
https://build.golang.org/log/85e6aab2ff5692ee7372f03dd187fc555f8a685c

Most are:

--- FAIL: TestCgoCallbackGC (93.64s)
    crash_test.go:105: testprogcgo CgoCallbackGC exit status: signal: quit (core dumped)
    crash_cgo_test.go:61: expected "OK\n", but got:
FAIL
FAIL    runtime 147.968s

Some are:

--- FAIL: TestCgoCallbackGC (102.14s)
    crash_test.go:105: testprogcgo CgoCallbackGC exit status: signal: quit (core dumped)
    crash_cgo_test.go:61: expected "OK\n", but got:
panic: test timed out after 5m0s

goroutine 232809 [running]:
panic(0x81f2c80, 0x39220b20)
    /tmp/workdir/go/src/runtime/panic.go:500 +0x334 fp=0x392b1fa0 sp=0x392b1f5c
testing.startAlarm.func1()
    /tmp/workdir/go/src/testing/testing.go:918 +0xef fp=0x392b1fe0 sp=0x392b1fa0
runtime.goexit()
    /tmp/workdir/go/src/runtime/asm_386.s:1612 +0x1 fp=0x392b1fe4 sp=0x392b1fe0
created by time.goFunc
    /tmp/workdir/go/src/time/sleep.go:154 +0x30

goroutine 1 [chan receive]:
runtime.gopark(0x8246704, 0x39632d30, 0x8224051, 0xc, 0x17, 0x3)
    /tmp/workdir/go/src/runtime/proc.go:259 +0x12d fp=0x39233cd4 sp=0x39233cc0
runtime.goparkunlock(0x39632d30, 0x8224051, 0xc, 0x17, 0x3)
    /tmp/workdir/go/src/runtime/proc.go:265 +0x45 fp=0x39233cf0 sp=0x39233cd4
...

Why did these start recently?

@bradfitz bradfitz added this to the Go1.7Maybe milestone Jul 17, 2016
@ianlancetaylor
Copy link
Member

I think these are other versions of #15658 and #16136. There is something wrong on FreeBSD but I don't understand what it is and I have not been able to recreate it myself.

If anybody has a FreeBSD machine I can ssh into that would help.

@bradfitz
Copy link
Contributor Author

Our FreeBSD builders run on GCE. I can create you an instance there.

@ianlancetaylor
Copy link
Member

Our FreeBSD builders run on GCE. I can create you an instance there.

That would help if you have time. Thanks.

@bradfitz
Copy link
Contributor Author

@ianlancetaylor, done. Emailed you details.

@ianlancetaylor
Copy link
Member

I think I was incorrect in thinking that this was related to the other FreeBSD problems. The other FreeBSD problems are less frequent, and I have not been able to recreate them at all.

I now think this is the same as issue #13926: a problem that arises when there are many simultaneous callbacks from C to Go, which is exactly what the CgoCallbackGC test case does. The problem is that on FreeBSD you can surprisingly large delays during the callback. My theory is that when enough of those delays occur, the test times out. That is consistent with the failures reported on the builder, which are in all cases the program crashing from SIGQUIT; SIGQUIT is sent by the runtime test harness when the test program runs for more than 1 minute (runTestProg in runtime/crash_test.go).

Working on a patch.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/25047 mentions this issue.

@ianlancetaylor
Copy link
Member

TestCgoCallbackGC is still failing on FreeBSD on the builders. The change did not fix it, at least not completely.

@adg
Copy link
Contributor

adg commented Jul 25, 2016

We may just have to punt on this for Go 1.7, which should be okay since IIUC this is not a regression in Go 1.7. (Or is it?)

@ianlancetaylor
Copy link
Member

To the best of my knowledge this is not a regression in 1.7.

@bradfitz
Copy link
Contributor Author

@adg, @broady, should we at least document this as a known issue in the release notes for Go 1.7? And then punt this to Go 1.8 for actually fixing.

It'd be a pity to ship something with known crashes for a "first class port". I'd rather warn FreeBSD users about this bug so they don't deploy something with cgo callbacks in production and see crashes.

@adg
Copy link
Contributor

adg commented Jul 27, 2016

Agree, sounds like the right move.

On 27 July 2016 at 15:46, Brad Fitzpatrick [email protected] wrote:

@adg https://github.com/adg, @broady https://github.com/broady,
should we at least document this as a known issue in the release notes for
Go 1.7? And then punt this to Go 1.8 for actually fixing.

It'd be a pity to ship something with known crashes for a "first class
port". I'd rather warn FreeBSD users about this bug so they don't deploy
something with cgo callbacks in production and see crashes.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#16396 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AIDilbbWTN44lyUFsJ__eMiYKdLcEpZgks5qZvCsgaJpZM4JOHyK
.

@broady
Copy link
Contributor

broady commented Jul 27, 2016

Maybe, but if it's not a regression then the release notes isn't really the place for this.

We could say something like "Reduced runtime crashes on FreeBSD, but the issue is still present. See Issue 16396."

@bradfitz
Copy link
Contributor Author

bradfitz commented Jul 27, 2016

I don't think it matters whether it's a regression. Had I known about this in Go 1.6 I would have also argued we need a dedicated "Known Issues" section in our release notes, especially for first class ports.

Or: we demote FreeBSD to no longer be a first-class port.

@broady
Copy link
Contributor

broady commented Jul 27, 2016

I suppose it wasn't known at the time, so we couldn't document as a known issue. SGTM.

Perhaps we should also amend the 1.6 release notes (with a date for said addendum).

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/25283 mentions this issue.

@bradfitz bradfitz modified the milestones: Go1.8, Go1.7Maybe Jul 28, 2016
gopherbot pushed a commit that referenced this issue Jul 28, 2016
Updates #16396

Change-Id: I7b4f85610e66f2c77c17cf8898cc41d81b2efc8c
Reviewed-on: https://go-review.googlesource.com/25283
Reviewed-by: Chris Broadfoot <[email protected]>
Reviewed-by: Ian Lance Taylor <[email protected]>
Reviewed-by: Andrew Gerrand <[email protected]>
@gopherbot
Copy link
Contributor

CL https://golang.org/cl/27313 mentions this issue.

gopherbot pushed a commit that referenced this issue Aug 18, 2016
The trybot flakes are a nuisance.

Updates #16396

Change-Id: I8202adb554391676ba82bca44d784c6a81bf2085
Reviewed-on: https://go-review.googlesource.com/27313
Run-TryBot: Josh Bleecher Snyder <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
Reviewed-by: Brad Fitzpatrick <[email protected]>
@quentinmit quentinmit added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 29, 2016
@rsc rsc modified the milestones: Go1.9, Go1.8 Nov 11, 2016
@stevenh
Copy link
Contributor

stevenh commented Dec 1, 2016

Does anyone have any tips on reproducing this?
I'm currently running with the following and haven't seen the test take over 7 seconds.

setenv GOMAXPROCS 2
runtime> sh -c 'while true; do go test -run=TestCgoCallbackGC -cpu=1,2,4; sleep 1; done'

Test machine is dual E5-2640 @ 2.50GHz running FreeBSD 11.0-RELEASE-p3 go version go1.7.3 freebsd/amd64

@stevenh
Copy link
Contributor

stevenh commented Dec 1, 2016

Also testing this on 10.1 (metal) and 10.2 (gce VM) currently nothing longer than 8 seconds, tests are still running.

@stevenh
Copy link
Contributor

stevenh commented Dec 2, 2016

Test has been running in a loop on three different machines for over 28 hours not a single failure so far.

@ianlancetaylor
Copy link
Member

@stevenh Which version of FreeBSD? It's possible that changes to the kernel scheduler have eliminated the problem. As it says in https://golang.org/cl/25047 the problem was much worse on FreeBSD than on GNU/Linux for some reason.

Or of course it's possible that something we've changed in the runtime has fixed the problem, though I don't know what that could be.

@stevenh
Copy link
Contributor

stevenh commented Dec 3, 2016

I tested on 10.1 (metal), 10.2 (GCE) and 11 (metal) all under golang 1.7.3.

With all machines clearing well over 24 hours (coming up to 32 now) on tight loop running this test I think it may be good to remove the 604efe1 for 1.8 to see if we can confirm it is indeed no longer an issue.

@stevenh
Copy link
Contributor

stevenh commented Dec 5, 2016

All three machines have now been running the test in loop for over 3 days no errors.

@ianlancetaylor
Copy link
Member

Thanks for trying it so thoroughly. Sounds like this issue can be closed.

@bradfitz
Copy link
Contributor Author

bradfitz commented Dec 5, 2016

@ianlancetaylor, did you mean to close this? It sounds like we need to at least remove the t.Skip for this test on FreeBSD.

@bradfitz bradfitz reopened this Dec 5, 2016
@bradfitz bradfitz changed the title runtime: TestCgoCallbackGC flaky on FreeBSD runtime: re-enable TestCgoCallbackGC flaky on FreeBSD Dec 5, 2016
@gopherbot
Copy link
Contributor

CL https://golang.org/cl/33903 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/33926 mentions this issue.

gopherbot pushed a commit that referenced this issue Dec 6, 2016
Change-Id: I811e76c9f42505e974bea634d4ded2499e4893db
Reviewed-on: https://go-review.googlesource.com/33926
Reviewed-by: Ian Lance Taylor <[email protected]>
@paulzhol
Copy link
Member

Hello,
I've starting running builds with CGO_ENABLED=1 on freebsd-arm-paulzhol and it fails consistently:

ok  	regexp/syntax	4.107s
--- FAIL: TestCgoCallbackGC (180.11s)
	crash_test.go:105: testprogcgo CgoCallbackGC exit status: exit status 2
	crash_cgo_test.go:61: expected "OK\n", but got:
		SIGQUIT: quit
		PC=0x20263cd0 m=0
		signal arrived during cgo execution
		
		goroutine 166 [syscall, locked to thread]:
		runtime.cgocall(0xc0c78, 0x30a7d7c4, 0x0)
			/tmp/gobuilder/freebsd-arm-paulzhol-6ceba5d5e85b/go/src/runtime/cgocall.go:131 +0x130 fp=0x30a7d7ac sp=0x30a7d790
		main._Cfunc_foo()
			_/tmp/gobuilder/freebsd-arm-paulzhol-6ceba5d5e85b/go/src/runtime/testdata/testprogcgo/_obj/_cgo_gotypes.go:176 +0x38 fp=0x30a7d7c0 sp=0x30a7d7ac
		main.CgoCallbackGC.func2(0x30a5a040)
			/tmp/gobuilder/freebsd-arm-paulzhol-6ceba5d5e85b/go/src/runtime/testdata/testprogcgo/callback.go:81 +0x14 fp=0x30a7d7d4 sp=0x30a7d7c0
		runtime.goexit()
			/tmp/gobuilder/freebsd-arm-paulzhol-6ceba5d5e85b/go/src/runtime/asm_arm.s:998 +0x4 fp=0x30a7d7d4 sp=0x30a7d7d4
		created by main.CgoCallbackGC
			/tmp/gobuilder/freebsd-arm-paulzhol-6ceba5d5e85b/go/src/runtime/testdata/testprogcgo/callback.go:83 +0xd8
		
		goroutine 1 [chan receive]:
		main.CgoCallbackGC()
			/tmp/gobuilder/freebsd-arm-paulzhol-6ceba5d5e85b/go/src/runtime/testdata/testprogcgo/callback.go:86 +0x114
		main.main()
			/tmp/gobuilder/freebsd-arm-paulzhol-6ceba5d5e85b/go/src/runtime/testdata/testprogcgo/main.go:34 +0x1ac
		
		goroutine 17 [runnable, locked to thread]:
		runtime.goexit()
			/tmp/gobuilder/freebsd-arm-paulzhol-6ceba5d5e85b/go/src/runtime/asm_arm.s:998 +0x4

System is an ALLWINNER A20 running FreeBSD 11.0-RELEASE-p2. I've noticed CGO_ENABLED was off by default because the bootstrap 1.4.3 compilers need to be built with CGO_ENABLED=0.

@ianlancetaylor
Copy link
Member

@paulzhol This issue is closed and the problem you are reporting has different symptoms. Please open a new issue with the details. Thanks.

@paulzhol
Copy link
Member

paulzhol commented Jan 27, 2017

@ianlancetaylor I've opened #18811 but I do have the following in my log which look like the "Most are:" example in the original issue:

ok      regexp/syntax   4.205s
--- FAIL: TestCgoCallbackGC (204.17s)
        crash_test.go:106: testprogcgo CgoCallbackGC exit status: signal: quit
        crash_cgo_test.go:66: expected "OK\n", but got:
FAIL
FAIL    runtime 427.836s
ok      runtime/debug   0.088s

But the builder failed to push that:

2017/01/27 02:25:42 Failed: exit status 1
Build complete, duration 41m53.837957541s. Result: error: exit status 1
 RunTime:41m53.837957541s Builder:freebsd-arm-paulzhol PackagePath:]
2017/01/27 02:25:45 Post https://build.golang.org/result?builder=freebsd-arm-paulzhol&key=XXXXXXXXXXXX&version=1: dial tcp: lookup build.golang.org on 192.168.XXX.XXX:53: server misbehaving

@golang golang locked and limited conversation to collaborators Jan 27, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-FreeBSD
Projects
None yet
Development

No branches or pull requests

9 participants