Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: comparison tool swap #6305

Merged
merged 1 commit into from
Jun 19, 2015
Merged

Conversation

theuni
Copy link
Member

@theuni theuni commented Jun 19, 2015

As suggested here: #6278 (comment)

This should be functionally identical to what's in place now. It was built from
theuni/bitcoinj@be0eef7

That commit is the same as this pruned commit in TheBlueMatt's repo:
TheBlueMatt/bitcoinj@0f7b5d8

Now we'll be able to trust the line numbers in the stack traces.

@theuni
Copy link
Member Author

theuni commented Jun 19, 2015

While there are no other builds in the queue, I'll force-push a few times to try to trigger the NPE

@theuni theuni force-pushed the comparison-tool-fresh-jar branch 7 times, most recently from e158c04 to a4d9f95 Compare June 19, 2015 05:51
This should be functionally identical to what's in place now. It was built from
theuni/bitcoinj@be0eef7

That commit is the same as this pruned commit in TheBlueMatt's repo:
TheBlueMatt/bitcoinj@0f7b5d8

Now we'll be able to trust the line numbers in the stack traces.
@theuni
Copy link
Member Author

theuni commented Jun 19, 2015

Grr, 7 forced rebuilds and no errors. I'm not sure if that's a good thing or not :)

@laanwj
Copy link
Member

laanwj commented Jun 19, 2015

Well it tends to happen in droves, as if it depends on the load on the Travis server.
I wouldn't applaud yet :)

@laanwj
Copy link
Member

laanwj commented Jun 19, 2015

Going ahead and merging this for troubleshooting improvement.

@laanwj laanwj merged commit a4d9f95 into bitcoin:master Jun 19, 2015
laanwj added a commit that referenced this pull request Jun 19, 2015
a4d9f95 build: comparison tool swap (Cory Fields)
@dexX7
Copy link
Contributor

dexX7 commented Jun 19, 2015

The issue is still present. I looped it locally, until it failed (took around 30 min):

06:34:40 15 BitcoindComparisonTool$1.onPreMessageReceived: Requested 230386124745549b4a14cb801c16c308568fcc5679b5afd9df0d8601cf938f7e
06:34:40 15 Peer.processMessage: Received unhandled message: Reject: block 0f863347c378794f3a04e2899158a2f952c3720159fe71b6e13d23cca538f4d2 for reason 'bad-txns-inputs-missingorspent' (16)
06:34:40 15 Peer.processMessage: Received unhandled message: Reject: block 230386124745549b4a14cb801c16c308568fcc5679b5afd9df0d8601cf938f7e for reason 'bad-blk-sigops' (16)
06:34:40 1 BitcoindComparisonTool.main: Sent inv with block 0f863347c378794f3a04e2899158a2f952c3720159fe71b6e13d23cca538f4d2
06:34:40 15 BitcoindComparisonTool$1.onPreMessageReceived: Got empty header message from bitcoind
Exception in thread "main" java.lang.NullPointerException
    at com.google.bitcoin.core.BitcoindComparisonTool.main(BitcoindComparisonTool.java:311)

debug.log:

2015-06-19 16:34:40 received: getheaders (69 bytes) peer=1
2015-06-19 16:34:40 getheaders -1 to 0000000000000000000000000000000000000000000000000000000000000000 from peer=1
2015-06-19 16:34:40 sending: headers (1 bytes) peer=1
2015-06-19 16:34:40 received: ping (8 bytes) peer=1
2015-06-19 16:34:40 sending: pong (8 bytes) peer=1
2015-06-19 16:34:41 trying connection 0.0.0.0 lastseen=0.0hrs

It's caused by BitcoindComparisonTool.java#L311:

bitcoind.ping().get();

More specifically, because future of PendingPing (Peer.java#L1424) was null, which was returned by ping(), presumably because a pong was received, and processed via PendingPing#complete() (Peer.java#L1383), before it went through:

/**
* Sends the peer a ping message and returns a future that will be invoked when the pong is received back.
* The future provides a number which is the number of milliseconds elapsed between the ping and the pong.
* Once the pong is received the value returned by {@link com.google.bitcoin.core.Peer#getLastPingTime()} is
* updated.
* @throws ProtocolException if the peer version is too low to support measurable pings.
*/
public ListenableFuture<Long> ping()

@theuni
Copy link
Member Author

theuni commented Jun 19, 2015

@dexX7 Thanks for having a look. The line numbers didn't change, so it's the same as before.

I came to the same conclusion when I looked into it last time, but @mikehearn was confident that the issue couldn't be here. My guess was that it could be caused by a non-so-random PRNG causing ping lookups to race. In that case, replacing the bitcoind.ping().get() with bitcoind.ping(counter++).get() would avoid the issue I think.

@mikehearn Mind having another look, now that we're sure the line#s are good? Even better, since @dexX7 can reproduce locally, maybe he'll be able to test proposed changes more easily.

@dexX7
Copy link
Contributor

dexX7 commented Jun 19, 2015

My guess was that it could be caused by a non-so-random PRNG causing ping lookups to race.

Ah, just to clarify: I'm not sure, how it's caused, and this was just a guess.

But regarding the nonce lookup "collision": I gave it a try and it looks like the nonce can be anything, there always seems to be only one ping in the pending pings list (at least in these tests, for what I saw).

I quickly created a stripped ping-pong only version of the tool with additional log output, and when running via:

qa/pull-tester/run-bitcoind-for-test.sh /usr/bin/java -jar /path/to/bitcoinj/core/target/pull-tests.jar qa/tmp/compTool 0 2>&1

Then the flipped created/clearing log entries further indicate the pong was indeed processed, before ping() finished:

09:44:37 1 Peer.ping: created ping with nonce: 1102
09:44:37 11 Peer.processPong: clearing pong with nonce: 1102 (pending: 1)
09:44:37 1 Peer.ping: created ping with nonce: 1103
09:44:37 11 Peer.processPong: clearing pong with nonce: 1103 (pending: 1)
09:44:37 1 Peer.ping: created ping with nonce: 1104
09:44:37 11 Peer.processPong: clearing pong with nonce: 1104 (pending: 1)
09:44:37 11 Peer.processPong: clearing pong with nonce: 1105 (pending: 1) <-- uh oh
09:44:37 1 Peer.ping: created ping with nonce: 1105
09:44:37 1 BitcoindComparisonTool.main: ERROR: ping() with nonce 1105 returned null

@theuni
Copy link
Member Author

theuni commented Jun 19, 2015

@dexX7 Ah, great work. Looks like we have an answer :)

@mikehearn
Copy link
Contributor

@dexX7 Thanks! The race is fixed in this commit:

bitcoinj/bitcoinj@ce50e0b

Running "mvn package" inside a git master checkout of bitcoinj should create a file core/target/pull-tests.jar that contains a fresh pull tester.

theuni added a commit to theuni/bitcoin that referenced this pull request Jun 21, 2015
This should fix the spurious comparison tool failures.
See discussion here: bitcoin#6305

The race fix was cherry-picked on top of the version we're currently using, so
it should be functionally identical otherwise.
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants