an incredibly simple protocol for reliably and correctly downloading large immutable files.
(this is a follow-up and successor to my RTP proposal)
the ritp
url scheme will be described in a follow-up post.
Downloading large files is hard. Networks are messy, and while there are plenty of protocols that aim to abstract away this messiness, they all have their own problems.
If implemented properly, HTTP can handle essentially anything you throw at it... however, to my knowledge, there is no full http implementation that correctly implements every corner-case of error handling. (wget is probably the closest)
The main difficulty of HTTP is that it's too flexible for its own good: all the error recovery and detection mechanisms are optional headers, so a complete implementation has to have several layers of fallbacks, and testing all these rarely used code paths is a significant challenge. RITP addresses this complexity by limiting its scope and choosing a small number of mandatory fields that all servers must support.
One problem with http is that of mid-air collisions: it is entirely possible for a server to implement Range
but not provide a Last-Modified
or ETag
field. RITP solves this by not allowing files/resources to be modified.
the gemini protocol is essentially a response to the complexity of http, but it limits its scope to “serving small markup files”, and explicitly recommends using another protocol to handle large downloads.
RITP can be used with gemini as this “protocol for large file downloads”. this especially works well with gemini's cross-protocol redirects, giving a layer of mutable identifiers above the immutable RITP.
BitTorrent is a great protocol at what it does, but it has one major downside: poor single-peer performance. Currently it solves this with webseeds, but then you have to deal with the issues of HTTP.
RITP can be used with BitTorrent as an alternative protocol for webseeds, reducing the overhead of all the http headers.
The main problem with 9p is poor performance over high-latency links.
RITP is a stateful request/response protocol, similar to 9p.
all messages (both requests and responses) start with the same fields:
* a 32 bit length
field that stores the length of the entire message, including the length field itself
* an 8 bit type
field that identifies the type of message
* a 24 bit token
field that identifies the message batch the message belongs to. this is described in further detail in the batches section. the token is unique within the connection.
whenever a field specifies a length
or offset
of/within a file, this length/offset is in units of bytes.
the high bit (0x80) of the type
field marks a message as a response (server-sent).
depending on the type of a message, a message has any number of fixed-size fields, followed by a variable-length “tail field”, which consists of all following bytes of the message. the interpretation of the tail field depends on the type of the message. if no special interpretation of tail field is specified, it should be ignored. the tail field takes advantage of the fact that messages already describe their total length.
the remaining bytes (tail field) of the message are interpreted as a multihash, which is then looked up by the server and the corresponding file is associated with the message batch.
if the batch id has been used before, the previous batch is closed and replaced.
fixed-size fields:
* (64 bits) offset
* (32 bits) length
requests a byte slice of the file associated with the message batch.
fixed-size fields:
* (8 bits) code
: an error code, possible values listed in the errors section.
tail field:
* description
: a human readable utf8 string providing a description of the error
when an error response is given, all future messages sent to that batch are canceled, and will not be given a response.
used to indicate that an OPEN
request has been successful.
fixed-width fields:
* (64 bits) length
: the total length of the file that is now associated with this batch.
given in response to a READ
.
fixed-size fields:
* (64 bits) offset
: the offset field of the corresponding READ
request.
tail fields:
* payload
: a substring of the overall file, starting at offset
.
the payload
should be the empty string if (and only if) offset
is greater than or equal to the length of the associated file. this mirrors the UNIX method of signaling end-of-file.
other error
: an error occured that cannot be accurately described by any of the other error codes.not found
: no file with the given hash is stored on the server. this error code is also used if the hash is malformed or uses an unsupported hash algorithm.unknown request type
: the type
was not recognized by the server. this allows the protocol to be extended in the future by adding message types.batch does not exist
: the type
was recognized, and the batch unexpectedly does not exist.A batch is a group of messages that represent operations on a single file. A batch is identified by it's token
.
A batch begins with an OPEN
request and is followed by any number of READ
requests.
If the OPEN
request encounters an error, other requests with the same batch token will be ignored.
If other message types are introduced in the future, they will be specified as either introducing a new batch (like OPEN
) or being part of an existing batch (like READ
).
future versions of this specification may define new message types.
clients may attempt to use these new message types with any servers, and servers that do not support them MUST respond with error 0x03 unknown request type
.
servers MUST NOT respond with new response codes unless the client previously sent a request that indicates that response code is supported (what requests imply support for what responses will be defined in future specifications)
“batch does not exist” errors are only returned for recognized message types to potentially allow adding new message types that create a batch.
batch tokens are client-assigned so that a client can “pipeline” an OPEN
and a READ
in a single transmission, reducing round-trip delays.
errors cancel a batch to prevent an error in a large pipelined batch from causing a bunch of error responses.
there is no need to identify which request in a batch caused the error, as properly formed (i.e. part of an OPEN
ed batch) READ
requests are infallible. if they were not infallible, a READ ERROR
response would exist, which would contain the offset of the read that caused the error.
the server is allowed to respond with short reads to allow for servers that use fixed-size buffers or have limited memory.
OPEN
requests originally used urn
s, but this was changed to multihash after discovering that sha256
and md5
are not actually in the list of urn namespaces
messages describe their total length to allow new request codes to be added.
All clients and servers are RECOMMENDED to support at least sha2-256.
Before a connection is initiated, both the server address(es) and the hash of the file to download are communicated to the client out-of-band, such as through a magnet:
url or similar. The exact nature of this communication is left up to future specifications.
You can follow this blog via its RSS feed or by searching for @[email protected] on your Mastodon/ActivityPub instance.
]]>refining the ideas behind HTTP, BitTorrent, Gemini, and 9p to create a simple protocol for reliably transferring large immutable files. (this document is a first draft, and is not intended to be implemented in its current form)
NOTE: this document has been obsolesced by RITP
all of these can be trivially handled via an outer protocol (eg. mutable identifiers can be handled with gemini cross-protocol redirects)
a stateful request/response protocol similar to 9p, but with hash-identified urls similar to magnet links (these encode length and content hash)
streams are encoded over tcp, tcp/tls, or quic to ensure reliable delivery.
each request has a number of fields. each field is marked either as a fixed number of bits, or as a “string” field.
“string” fields consist of a 64 bit length value, followed by that many bytes.
“string” fields do not have to be valid utf-8 unless specified.
all integers are little-endian. note that tokens are not integers, they are opaque client chosen identifiers with a length of 63 bits (followed by a 1 bit flag, which takes the place of the LSB of the final bytes). servers must take to mask the correct bit.
there are 2 request types: * OPEN * READ
there are 2 response types: * OK * ERROR
each request has the following structure:
* (63 bits) new_token
: a token identifying the results of the request
* (1 bit) request_type
: integer representing type of request
* (n bits) type-dependent fields
each response has the following structure:
* (63 bits) request_token
: the client-chosen token found in the request that generated this responce
* (1 bit) is_error
: set if the corresponding request generated an error (such as the server being unable to find the requested resource)
* (string) payload
: if is_error
is set, then a human and machine readable UTF-8 string representing an error. otherwise, its interpretation depends on the type of the request.
open requests one extra “string” field:
* (string) uri
: this field represents the uri of the resource to be downloaded. what schemes are supported depends on the server, but it is recommended to support at least urn:sha256:*
uris.
the payload
of non-error responses to OPEN requests is ignored, and SHOULD be empty.
read requests have two extra fields: * (64 bits) offset: at what point to begin reading * (64 bits) length: the maximum number of payload bytes the server is allowed respond with.
the payload
of non-error responses to READ
requests MUST be the empty string if and only if offset
is greater than or equal to the total number of bytes in the resource, or if length
is 0. if neither of these conditions are met, the payload
MUST NOT be the empty string.
when the server generates an error in response to a token, all further READ
requests targeted at that token are canceled. this allows a client to begin sending READ requests before it has received a response to the OPEN request.
rtp
much like the magnet
url scheme, the rtp
scheme consists entirly of predefined query parameters:
u
: the uri/urn of the underlying resource. can be specified multiple times to specify multiple hashes for the resource (the client is expected to verify the hash, so specifying multiple u
allows slowly migrating to a new hash algo). if multiple values are specified, they must correspond to the same resource.l
: the length of the resources
: the server(s) that the resource can be retrieved from. if specified multiple times, the client may choose one, or perform a swarm download from several at once. these servers take the form of proto!addr!port
, for example, tcp!example.com!7777
. (this is based off of plan9 dial strings, since it seems to be the only well-specified way of specifying a method of transport)t
: list of mime types that the resource may be interpreted as. clients MAY ignore values other than the first.v
: protocol version. if not specified, it defaults to version 1
, which is the version specified in this document. clients MUST reject urls with an unrecognized version.it is RECOMMENDED that every value of u
is recognized by every server s
. if a client encounters an error when downloading from one server, it SHOULD try downloading from another server.
clients SHOULD NOT use rtp
urls in OPEN
requests, instead they should choose a value listed in u
.
unrecognized fields SHOULD be ignored.
gemini+rtp
use the Gemini Protocol in order to implement mutable identifiers.
gemini is used in “proxy mode”, that is, the sent url has a scheme of gemini+rtp
and not gemini
. the gemini server then uses a cross-protocol redirect to return an rtp
url.
READ.length
is defined as a maximum so that clients that do not know the length of the resource they are downloading can use a value of 2^64 - 1
to request as much of the rest of the resource as the server is able to provide.
READ.offset
exists both so that downloads can be resumed, and also to allow seeking within complex formats (eg. allowing you do download just one file out of a zip archive). It also allows doing swarm downloads from multiple equally trusted sources.
an error string consists of a machine readable string representing the kind of error, optionally followed by a colon, and then a human-readable string further clarifying the error.
the following predefined error strings:
* error
: a generic error kind usable when nothing else is applicaple
* not found
: no resource with the given uri is known to the server.
* unsupported scheme
: the given uri or urn scheme is not supported
READ
needs an additional field, open_token
, which corresponds to the the request_token
of an OPEN
request.RTP
is commonly used as an abbreviation for the Real-time Transport Protocol, and is not descriptive enough.I am currently drafting a successor proposal that addresses these issues.
You can follow this blog via its RSS feed or by searching for @[email protected] on your Mastodon/ActivityPub instance.
]]>imagine you have two devices, a client and a server, connected in a peculiar way:
You've just been tasked with building a layer on top of this so that the server can send messages to the client, what do you do?
[DISCLAIMER: read the errata section before you go out and implement this]
the client sends 1-bit tokens. when the server receives that token, if it has generated new messages since it last sent that token, it responds with the boolean negation of that token.
the client always sends the last token it received. when it receives a token that is different from the one it sent, it retrieves a message from the side channel.
When the system start up, the devices initialize themselves to the following state:
1. the client sets it's client_token
variable to false
.
2. the server has an empty message queue
3. the server sets its server_token
to false, and new_messages
to false.
the client periodically sends its client_token
value along the unreliable channel.
when the server receives client_token
, it performs the following operation:
if client_token = server_token then
set new_messages false
end if
it then responds with server_token
.
whenever the server generates a new message, it performs the following operation:
if not new_messages then
set server_token (not server_token)
set new_messages true
end if
when the client receives server_token
, it performs the following operation:
if server_token ≠ client_token then
pull messages
set client_token server_token
turns out this absurd hypothetical isn't actually that far off of how TCP+TLS and UDP behave, especially in the context of mobile and embedded devices.
this system also has the nice property that the 1-bit messages don't depend on the side-channel messages, useful if there are actually multiple clients, and perhaps multiple secure servers the client wants to be able to receive messages from, and the presence of new messages is multiplexed through a single unsecured server.
as shown here, using a single bit means the server can't tell the difference between a client sending the previous token and the next token.
however, if you replace the 1-bit bools with 2-bit unsigned integers, and replace the negation with a wrapping increment, the protocol works as intended.
additionally, i should have added a timing requirement to the infallible channel, otherwise long polling is a simpler and equally valid solution to the posed problem.
You can follow this blog via its RSS feed or by searching for @[email protected] on your Mastodon/ActivityPub instance.
]]>