-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spec: bencode codec spec #227
base: master
Are you sure you want to change the base?
Conversation
|
||
### Strings | ||
|
||
Bencode Strings are represented as IPLD Data Model Strings. This is because, despite not being UTF-8 it makes it very simple to put a Bencode String into a Bencode Dictionary given that IPLD Data Model maps require String keys. As a result these Strings must take advantage of the IPLD Data Model flexibility to support non-UTF-8 data as Data Model Strings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vmx here's an example where strings are not necessarily UTF-8 because bencode strings aren't UTF-8 and BitTorrent will even do things like put the concatenated SHA1 hashes and put them into a string.
At the moment the BitTorrent ADL (in Go and the further along one in Rust) is based off of the Bencode codec rather than a specific BitTorrent codec. There are some tradeoffs here, but wanted to flag how non-utf8 keys are used here since you were interested.
Some other materials related to this you might want to see where I mention some of the tradeoffs here before the demo towards the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pinging. Yes that's the problem of no real support for bytes in map keys. Possible workarounds could be:
- Storing dictionaries as a list of two-tuples instead of maps
- Con: this makes pathing ugly as you would need to know the position of key
- If a bencode string is as map identifier, encode it as UTF-8 compatible string
- Con: again pathing would be more difficult, you would need to know how to encode those binary bytes. On the other hand I guess current pathing implementations support UTF-8 only anyway, so you'd need to encode/escape the path somehow. You could use that as your "native" encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding ugly pathing, I've written some ideas about how one could escape arbitrary binary data in paths using URL escaping with IPLD URLs. https://github.com/ipld/ipld/blob/e6cfab631d2bd24bf158d3a85e126514c98de5ce/notebook/exploration-reports/2022.03-ipld-url-scheme.md#special-characters-and-escaping
It'd be nice if Node implementations could support modes where keys are loaded as byte buffers. 😅
This is a proposal for an IPLD Codec for the Bencode format. The Bencode format is commonly used within the BitTorrent ecosystem.
There's likely room for discussion here as to when someone would want to use a Bencode IPLD codec rather than some custom codec for formats like BitTorrent-infohash-v1 or BitTorrent-infohash-v2 that would also expose IPLD Links.
However, the existence of this format may help us explore more of the IPLD Data Model and also has utility in terms of being able to work with BitTorrent data without necessarily making lots of new codecs.