Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spec: bencode codec spec #227

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

aschmahmann
Copy link

This is a proposal for an IPLD Codec for the Bencode format. The Bencode format is commonly used within the BitTorrent ecosystem.

There's likely room for discussion here as to when someone would want to use a Bencode IPLD codec rather than some custom codec for formats like BitTorrent-infohash-v1 or BitTorrent-infohash-v2 that would also expose IPLD Links.

However, the existence of this format may help us explore more of the IPLD Data Model and also has utility in terms of being able to work with BitTorrent data without necessarily making lots of new codecs.

This was referenced Jun 28, 2022
@BigLep BigLep requested a review from rvagg June 28, 2022 22:45

### Strings

Bencode Strings are represented as IPLD Data Model Strings. This is because, despite not being UTF-8 it makes it very simple to put a Bencode String into a Bencode Dictionary given that IPLD Data Model maps require String keys. As a result these Strings must take advantage of the IPLD Data Model flexibility to support non-UTF-8 data as Data Model Strings.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vmx here's an example where strings are not necessarily UTF-8 because bencode strings aren't UTF-8 and BitTorrent will even do things like put the concatenated SHA1 hashes and put them into a string.

At the moment the BitTorrent ADL (in Go and the further along one in Rust) is based off of the Bencode codec rather than a specific BitTorrent codec. There are some tradeoffs here, but wanted to flag how non-utf8 keys are used here since you were interested.

Some other materials related to this you might want to see where I mention some of the tradeoffs here before the demo towards the end.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pinging. Yes that's the problem of no real support for bytes in map keys. Possible workarounds could be:

  • Storing dictionaries as a list of two-tuples instead of maps
    • Con: this makes pathing ugly as you would need to know the position of key
  • If a bencode string is as map identifier, encode it as UTF-8 compatible string
    • Con: again pathing would be more difficult, you would need to know how to encode those binary bytes. On the other hand I guess current pathing implementations support UTF-8 only anyway, so you'd need to encode/escape the path somehow. You could use that as your "native" encoding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding ugly pathing, I've written some ideas about how one could escape arbitrary binary data in paths using URL escaping with IPLD URLs. https://github.com/ipld/ipld/blob/e6cfab631d2bd24bf158d3a85e126514c98de5ce/notebook/exploration-reports/2022.03-ipld-url-scheme.md#special-characters-and-escaping

It'd be nice if Node implementations could support modes where keys are loaded as byte buffers. 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🔎 In Review
Development

Successfully merging this pull request may close these issues.

3 participants