Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bloom filters #1795

Merged
merged 17 commits into from
Jan 17, 2013
Merged

Bloom filters #1795

merged 17 commits into from
Jan 17, 2013

Conversation

TheBlueMatt
Copy link
Contributor

Filter tx invs and add MSG_FILTERED_BLOCK to provide blocks as header+vector (not including tx itself)

@BitcoinPullTester
Copy link

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/bd93f61c2abe5b19e77030fa5859bc370c789a3a for binaries and test log.

@gavinandresen
Copy link
Contributor

Very exciting.

Did you see this:
http://matthias.vallentin.net/blog/2011/06/a-garden-variety-of-bloom-filters/

He has a really interesting looking C++11 headers-only implementation here at github that might give you inspiration (e.g. I think he uses boost to do the hashing; he seems to get a ton done in a very small number of lines of code).
https://github.com/mavam/libbf

@jgarzik
Copy link
Contributor

jgarzik commented Sep 27, 2012

implementation ACK, I like it a lot. This seems to match the initial sketch from @mikehearn and myself on IRC.

@jgarzik
Copy link
Contributor

jgarzik commented Oct 3, 2012

Poke, let's get some additional review on this.

@mikehearn
Copy link
Contributor

Hey Matt, could you rebase this on top of ultraprune? I'm ready to start merging the bitcoinj side but want to test this with the latest code. Will do a review now-ish.

@mikehearn
Copy link
Contributor

ACK on this. Most of my comments are purely cosmetic except for the unit test commit being in the wrong place (it wouldn't compile in the order given).

I guess we'll need a BIP? I'll start on one that documents what we've come up with.

@TheBlueMatt
Copy link
Contributor Author

Thanks mike for the good review (as usual). I updated the code to addresses most of the issues you mentioned.
Changes:

  • many additional comments
  • filteradd is now limited to 1MB, any larger and the node gets a Misbehaving(100).
  • fRelayTxn now defaults to false - this means we will no longer relay tx inv message before receiving the remote peer's version message, if this irks anyone it can be discussed.
    More changes to come (sipa's new merkle block stuff, more unit tests and a seed twiddle value for bloom filters)

@TheBlueMatt
Copy link
Contributor Author

@mikehearn there are a few comments which note what the protocol defines as spec (these are obviously up for discussion):

  • there is no formal definition for filter parameters for a filteradd command if no filter is loaded - it is up to the serving node (if the client doest care, why should the spec determine the parameters?)
  • as noted above, 1MB limit to filteradd
  • txes in MSG_FILTERED_BLOCK can be relayed even when a node already has seen it (the current code does this) however it MUST always forward txn that the node has not seen (I think the bip says this, but its not really clearly worded imho, I updated the BIP to make this more clear)

@BitcoinPullTester
Copy link

Automatic sanity-testing: FAILED BUILD/TEST, see http://jenkins.bluematt.me/pull-tester/1860b72e356f0d4279b25bf1ca562ac00511eded for binaries and test log.

This could happen for one of several reasons:

  1. It chanages paths in makefile.linux-mingw or otherwise changes build scripts in a way that made them incompatible with the automated testing scripts
  2. It does not build on either Linux i386 or Win32 (via MinGW cross compile)
  3. The test suite fails on either Linux i386 or Win32
  4. The block test-cases failed (lookup the first bNN identifier which failed in https://github.com/TheBlueMatt/test-scripts/blob/master/FullBlockTestGenerator.java)

@TheBlueMatt
Copy link
Contributor Author

@BitcoinPullTester hey, I wasnt done yet!

@BitcoinPullTester
Copy link

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/5be9d251b5901399c6f70b99f1c2bddc4e14e7ec for binaries and test log.

@TheBlueMatt
Copy link
Contributor Author

Added the requested nTweak value and merged sipa's work on partial merkle tree representations. The BIP now needs rewritten and this is ready for second (third?) round reviews.

@sipa
Copy link
Member

sipa commented Nov 18, 2012

Mostly ACK;

  • I'd prefer to have the Murmurhash3 implementation separate from CBloomFilter
  • The 0xFFFFFFFF/(n-1)_i seed value seems intended to result in large bit differences between the different hash function's seeds. Together with the tweak, however, the first and the last now get seeds tweak and tweak-1. I think something simpler like k1_i+k2*n+tweak is better (with k1 and k2 arbitrary large odd 32-bit integers).
  • I feel uneasy about the arbitrary filter parameters used for the implicitly created filter when sending filteradd without filterload. The server cannot be expected to make a reasonable guess how the client is going to use the filter, and the client still has to track how large the server-side filter grows anyway. I'd prefer to make this simply illegal (DoS 100): filteradd always requires an active filter. Maybe the actual filter data in filterload can be made optional: if it is omitted, it's assumed to be all zeroes (though that would mean the size has to be specified).

@BitcoinPullTester
Copy link

Automatic sanity-testing: FAILED MERGE, see http://jenkins.bluematt.me/pull-tester/21e780e05c7b18ab318e57bd9e24961a34378e2d for test log.

This pull does not merge cleanly onto current master

@BitcoinPullTester
Copy link

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/a5a72aa691296af6de92658ff998da120c4b34d8 for binaries and test log.


// Nodes must NEVER send a data item > 520 bytes (the max size for a script data object,
// and thus, the maximum size any matched object can have) in a filteradd message
if (vData.size() > 520)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If vData is too large, it will still be added to the filter. There needs to be an else block here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, fixed.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be nice if that 520 was a constant somewhere or a define? So it could be used for the max size for a script data object and in here.

@mikehearn
Copy link
Contributor

Can you add a new feature that lets clients opt out of auto-adding? Sipa points out that it's only necessary for wallets that contain outputs which don't require predictable data in the corresponding inputs, which (for now) is most of them. This should resolve the excessive expansion issues.

@gmaxwell
Copy link
Contributor

gmaxwell commented Jan 9, 2013

@mikehearn Kind of an an unfortunate privacy loss for ones who do set it, and it still leaves the feature potentially useless for any who set it. (But I agree on having the option, if for nothing else because it makes the load-balancing lark I suggested viable)

@mikehearn
Copy link
Contributor

Well not really, you can still set the FP rate to whatever you want. If all
your outputs are pay-to-address then it's a no-op functionality-wise, the
auto-adding didn't buy you anything. For the case where you have
pay-to-pubkey or other scripts, the client has to refresh the filter every
so often .... it can be implemented in bcj later.

@gmaxwell
Copy link
Contributor

gmaxwell commented Jan 9, 2013

Setting that bit or not is still another anonymity set reduction, but I'm splitting hairs there ignore it. I guess I don't have a good feel for how fast the filter fails and needs to be reset. If it's too fast periodic refreshes won't be sufficient (E.g. will the size of the refresh plus the unwanted data end up being more than just not using the filter).

@mikehearn
Copy link
Contributor

I think we can explore different algorithms later. The exact choices will
depend on the users privacy preferences and device situation anyway.

On Wed, Jan 9, 2013 at 7:09 PM, Gregory Maxwell [email protected]:

Setting that bit or not is still another anonymity set reduction, but I'm
splitting hairs there ignore it. I guess I don't have a good feel for how
fast the filter fails and needs to be reset. If it's too fast periodic
refreshes won't be sufficient (E.g. will the size of the refresh plus the
unwanted data end up being more than just not using the filter).


Reply to this email directly or view it on GitHubhttps://github.com//pull/1795#issuecomment-12058432.

@TheBlueMatt
Copy link
Contributor Author

Added a nFlags to let the peer pick how/when it wants the filter updated...also added a second two commits to make it possible to match an arbitrary script template as a discussion point (that second part is not yet tested).
Would like comments.

@gavinandresen
Copy link
Contributor

Code review notes:

I don't like the entire implementation of CPartialMerkleTree being stuffed into main.h, it should be its own .cpp/.h

Message handling looks good from a potential vulnerability point of view.

NACK on the script.cpp / script.h changes -- they are unused (yes? I don't see where MatchesTemplate is used outside of script.cpp/.h) and I don't see unit tests for the new features like OP_OPCODE.

@TheBlueMatt
Copy link
Contributor Author

@gavinandresen Yes, after discussion I believe we are currently targeting skipping the 2 MatchesTemplate commits for 0.8 and maybe skipping those entirely depending on what we may need in the future.

@BitcoinPullTester
Copy link

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/e1a4f3778cb90ba9f0d4e736752f78dad1703caa for binaries and test log.

@TheBlueMatt
Copy link
Contributor Author

@gavinandresen moved it to main.cpp, is that ok for you?

@gavinandresen
Copy link
Contributor

ACK, looks good.

gavinandresen added a commit that referenced this pull request Jan 17, 2013
@gavinandresen gavinandresen merged commit 91f70a7 into bitcoin:master Jan 17, 2013
@rebroad
Copy link
Contributor

rebroad commented Mar 27, 2013

So.. what was the BIP for these protocol changes? It would be nice if it could be referenced in the pull requests.

@@ -2815,6 +2969,10 @@ bool static ProcessMessage(CNode* pfrom, string strCommand, CDataStream& vRecv)
vRecv >> pfrom->strSubVer;
if (!vRecv.empty())
vRecv >> pfrom->nStartingHeight;
if (!vRecv.empty())
vRecv >> pfrom->fRelayTxes; // set to true after we get the first filter* message
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow this doesn't feel like the best way to implement this....

@mikehearn
Copy link
Contributor

https://en.bitcoin.it/wiki/BIP_0037 as was discussed on the mailing list. Regardless, this code was already merged and shipped months ago, it's a bit late to be commenting on it now. You can always open up new pull reqs if you want to change how the protocol works.

laudney pushed a commit to reddcoin-project/reddcoin-3.10 that referenced this pull request Mar 19, 2014
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants