Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bolt04: Variable hop_payload for the sphinx onion #619

Merged
merged 10 commits into from
Jul 26, 2019
Merged

Conversation

cdecker
Copy link
Collaborator

@cdecker cdecker commented Jun 6, 2019

This is the variable payload length variant of #593. It removes the concept of fixed-length frames, in favor of just prefixing the payload with its varint-encoded length, followed by the HMAC.

The varint is limited to 1 and 3 byte encodings, since the maximum length of the payload is 1300 - 32 - 3 bytes, and the 5 and 9 byte varints are not useful. The HMAC is not part of the payload since it allows us to fit more of the shorter lengths in the single byte varint.

I didn't update the example code yet, since the Go version was not migrated yet, but it should not be too much work to update that later.

Copy link
Collaborator

@t-bast t-bast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff, only nit comments from me.

Maybe the only thing that we could discuss is the option of including the HMAC inside the hop_payload to be able to remove it entirely from the last hop and save 32 bytes (because the fact that a hop is the last hop can be encoded more efficiently than with 32 0x00 bytes)?

EDIT: in fact it probably isn't a good idea to put the HMAC inside the hop_payload, because it would cost 2 additional bytes per hop (the type and length, and potentially might require 3 bytes instead of 1 for the hop_payload length) so it ends up costing more than 32 0x00 bytes in the last hop for long routes. You can ignore that comment.

04-onion-routing.md Outdated Show resolved Hide resolved
04-onion-routing.md Outdated Show resolved Hide resolved
04-onion-routing.md Show resolved Hide resolved
04-onion-routing.md Outdated Show resolved Hide resolved
04-onion-routing.md Outdated Show resolved Hide resolved
04-onion-routing.md Outdated Show resolved Hide resolved
04-onion-routing.md Outdated Show resolved Hide resolved
bolt04/onion-test-multi-frame.json Outdated Show resolved Hide resolved
@t-bast
Copy link
Collaborator

t-bast commented Jun 11, 2019

@cdecker I think the feature flag that was included in #593 didn't make it to this PR, could you re-add it?

@t-bast
Copy link
Collaborator

t-bast commented Jun 14, 2019

@Roasbeef would you have time to review this PR to unblock merging it?

Copy link
Collaborator

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to lack of the specification for the varint, I think this PR is also missing:

  • the new feature bit selection
  • an updated section on the new feature generation

I also think that we should keep the existing realm field in place. Otherwise, we're more or less locked into this new format for the current onion version, and it also makes code that handles the current and legacy format more complex.

One other thing that stands out is that since the encoded integers for the hop payload are variably sized, there isn't a clear "max route length" any longer. Instead, it's now dependent on the encoded size of each of these fields which makes things a bit harder to analyze.

04-onion-routing.md Outdated Show resolved Hide resolved

The `length` field determines both the length and the format of the `hop_payload` field; the following formats are defined:

- Legacy `hop_data` format, identified by a `0x00` length.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by a single byte 0x00 length?

Otherwise will 0x00 and 0x00 0x00 0x00 both mean to parse the legacy format?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a single byte 0x00 is more precise. The second option would be invalid for minimal encoding anyway :-)


For each hop in the route, in reverse order, the sender applies the
following operations:

- The _rho_-key and _mu_-key are generated using the hop's shared secret.
- The `hops_data` field is right-shifted by 65 bytes, discarding the last 65
- `shift_size` is defined as the length of the `hop_payload` plus the varint encoding of the length and the length of that HMAC. Thus if the payload is `l` then the `shift_size` is `1 + l + 32` for `l < 253`, otherwise `3 + l + 32` due to the varint encoding of `l`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This varint encoding isn't defined anywhere in this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Go code below also needs to be updated as it still references constants which are fixed values rather than variable due to the new payload format.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I didn't implement the Go version, so I can replace the code with the C version only.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, after I update can slot it in.


For each hop in the route, in reverse order, the sender applies the
following operations:

- The _rho_-key and _mu_-key are generated using the hop's shared secret.
- The `hops_data` field is right-shifted by 65 bytes, discarding the last 65
- `shift_size` is defined as the length of the `hop_payload` plus the varint encoding of the length and the length of that HMAC. Thus if the payload is `l` then the `shift_size` is `1 + l + 32` for `l < 253`, otherwise `3 + l + 32` due to the varint encoding of `l`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be:

if the length of the payload is l

@Roasbeef
Copy link
Collaborator

Roasbeef commented Jun 18, 2019

Hey @t-bast, I don't think I'm blocking anything here. This PR as is is underspecified, and there're comments that are a week or two old which are currently unaddressed.

@t-bast
Copy link
Collaborator

t-bast commented Jun 18, 2019

@Roasbeef there probably was some miscommunication here, but we had an action item in the IRC meeting to get your feedback on this PR (because the rest of us were ok with the proposal), which is why I think @cdecker was waiting for your comments before fixing the others (which were mainly nits).

@cdecker
Copy link
Collaborator Author

cdecker commented Jun 18, 2019

@Roasbeef there probably was some miscommunication here, but we had an action item in the IRC meeting to get your feedback on this PR (because the rest of us were ok with the proposal), which is why I think @cdecker was waiting for your comments before fixing the others (which were mainly nits).

No, I'm just working on other stuff in the meantime, since we won't merge this before monday anyway, I thought I'd better work on more pressing issues first. @Roasbeef wasn't really blocking anything here :-)

@t-bast
Copy link
Collaborator

t-bast commented Jun 18, 2019

since we won't merge this before monday anyway

Damn, I had misunderstood the last meeting's outcome, I got my hopes up for nothing then 😬

I'd love to help move this forward though as TLV + variable-length onion are the two features that unblock many other cool things (with non-negligible user impact).

Is there something I can help with or are you good? I can for example pick up the update of the Go implementation if no-one has time to work on it.

@cdecker
Copy link
Collaborator Author

cdecker commented Jun 18, 2019

Added a bunch of fixups mostly addressing @t-bast's feedback. Will see if it makes sense to keep the old payload section in place, or whether it just reads better by restructuring.

I'm not planning on implementing the Go version, so if @t-bast wants to do it, that'd work for me 😁

Sorry for the mixup, I was under the impression we wanted to discuss one last time on Monday.

@cdecker
Copy link
Collaborator Author

cdecker commented Jun 18, 2019

In addition to lack of the specification for the varint, I think this PR is also missing:

  • the new feature bit selection

I cherry-picked the commits defining the feature bit from #593 now.

  • an updated section on the new feature generation

I'm not sure I understand what you mean by this.

I also think that we should keep the existing realm field in place. Otherwise, we're more or less locked into this new format for the current onion version, and it also makes code that handles the current and legacy format more complex.

I can restore that section if it makes sense in the text. As for lockin, that's pretty much guaranteed at this point, sadly. That was one of the advantages of the multi-frame proposal, namely that we'd be left with some bits to define future, non-TLV, payload types.

One other thing that stands out is that since the encoded integers for the hop payload are variably sized, there isn't a clear "max route length" any longer. Instead, it's now dependent on the encoded size of each of these fields which makes things a bit harder to analyze.

Indeed, I have now removed any reference to 20 being the maximum route length, since that's no longer true. This was incidentally also the main reason I was holding on to the 65 frame size, it made sure that 20 would really be the maximum length. Not sure if this is a feature or a bug with the variable length payload tbh 😉

Copy link
Collaborator

@cfromknecht cfromknecht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdecker a couple things from me, see inline comments

@t-bast i don't think we are blocked on the implementation when we're still hammering out the details of the proposal


1. type: `per_hop` (for `realm` 0)
- Legacy `hop_data` format, identified by a `0x00` length.
- `tlv_payload` format, identified by any other length.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo it'd be preferable to keep the realm byte/version in-tact, and just call this the version 1 hop_data format. i'm hesitant to nerf future versioning of the inner payload format, given that the upgrade path of the outer version is unclear at this point.

we could even move to signal the final hop by setting the high order realm bit instead of using an empty hmac, saving 32-num_hops bytes

04-onion-routing.md Outdated Show resolved Hide resolved
- The legacy `hop_data` is identified by a single `0x00`-byte prefix
- The variable length `hop_payload` is prefixed with a `varint` encoding
the length in bytes, excluding the prefix and the trailing HMAC.
- This specification makes us of available length encoded integer format,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- This specification makes us of available length encoded integer format,
- This specification makes use of available length encoded integer format,

04-onion-routing.md Show resolved Hide resolved
## Legacy `hop_data` payload format

The `hop_data` format is identified by a single `0x00`-byte length, for backward compatibility.
It's payload is defined as:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It's payload is defined as:
Its payload is defined as:

04-onion-routing.md Outdated Show resolved Hide resolved
04-onion-routing.md Show resolved Hide resolved
bytes that exceed its 1300-byte size.
- The `version`, `short_channel_id`, `amt_to_forward`, `outgoing_cltv_value`,
`padding`, and `HMAC` are copied into the following 65 bytes.
- The varint-serialized length, serialized `hop_payload` and `HMAC` are copied into the following `shift_size` bytes.
- The _rho_-key is used to generate 1300 bytes of pseudo-random byte stream
which is then applied, with `XOR`, to the `hops_data` field.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
which is then applied, with `XOR`, to the `hops_data` field.
which is then applied, with `XOR`, to the `hop_payloads` field.

@t-bast
Copy link
Collaborator

t-bast commented Jun 19, 2019

@t-bast i don't think we are blocked on the implementation when we're still hammering out the details of the proposal

I think that most of the proposal makes sense to implement now to be able to provide spec feedback. For example there's a mention that dropping the realm byte makes it hard in the code to handle legacy and new format. That wasn't really an issue for my implementation and I don't see how re-adding the realm byte would make the implementation simpler (it does provide a way out of TLV though, that argument remains valid).

But if you feel this is not necessary right now, I won't do anything on the go implementation, let me know if you need a hand at some point.

04-onion-routing.md Outdated Show resolved Hide resolved
@t-bast t-bast added the Meeting Discussion Raise at next meeting label Jun 19, 2019

The reader:
- MUST return an error if `amt_to_forward` or `outgoing_cltv_value` are not present.
- MUST return an error if it is not the final node and `short_channel_id` is not present.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the proposal currently uses the HMAC to signal the exit hop, enforcing these requires a lookahead since the HMAC follows the TLV payload or remembering which types the TLV payload contained when checking the HMAC :( Ideally the final hop would be signaled before parsing the TLV payload, making it possible to enforce these requirements while parsing the packet linearly.

Perhaps this is another argument for signaling the final hop in the realm byte and dropping the final HMAC?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a signal in the TLV (making it a low type such that it gets processed first)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means that the onion processing layer needs to parse and interpret the TLV stream to look for a particular TLV type though, isn't this a layering violation? Whereas signaling via the realm byte might be a bit cleaner, don't you think?

@Roasbeef
Copy link
Collaborator

an updated section on the new feature generation

Oops, meant filler generation.

As for lockin, that's pretty much guaranteed at this point, sadly.

I don't think we're locked in if we leave the existing realm in place. I think all the prior iterations we went through have allowed us to explore the level of extensibility the current "realm" field as to offer. Would be shame if we closed off all future paths prematurely. I think we can get the best of both worlds by keeping the realm in place, and just using 2 fixed bytes for a length. 65KB is far larger than the max payload size in the onion, it simplifies parsing, and also allows us to avoid unnecessarily adding a var int into the protocol.

@t-bast
Copy link
Collaborator

t-bast commented Jun 25, 2019

I think all design comments (realm vs no realm, termination signaling via TLV) have been addressed during yesterday's meeting right?

There are still a few typos/nits and a clarification on the TLVs' integer notation, but apart from that are we missing something? What are the next steps before merging this?

@Roasbeef @cfromknecht are you happy with the current state of the proposal? :)

04-onion-routing.md Outdated Show resolved Hide resolved
@cdecker
Copy link
Collaborator Author

cdecker commented Jun 29, 2019

I think we can get the best of both worlds by keeping the realm in place, and just using 2 fixed bytes for a length. 65KB is far larger than the max payload size in the onion, it simplifies parsing, and also allows us to avoid unnecessarily adding a var int into the protocol.

While I think we can discuss about leaving the realm byte intact, I'd definitely want to use the varint in there, it's a tiny change and it saves us up to 20 bytes in total. I'd actually argue that for a vast majority of cases we'll have payload lengths < 253 which results so we'd rarely ever use the 2 byte length prefix at all.

@cdecker
Copy link
Collaborator Author

cdecker commented Jun 29, 2019

an updated section on the new feature generation

Oops, meant filler generation.

The filler generation literally doesn't change, we just generate a bit more since we might be shifting a bit more.

09-features.md Outdated
|------|-------------------|-------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|
| Bits | Name | Description | Link |
|------|-------------------|-------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|
| 0/1 | `var_onion_optin` | This node requires/supports variable payload routing onions | [routing onion specification](https://github.com/lightningnetwork/lightning-rfc/blob/master/04-onion-routing.md) |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to be able to combine those with the local features (such as in #571) it would be better to use bits that aren't currently assigned (instead of 0/1).
I'll let you sync with @rustyrussell to see what bit values would make sense: many are tentatively assigned in #605 but I think var_onion_optin should replace one of those (since it's coming first). We probably need to update the list in #605.


1. tlv: `tlv_payload`
2. types:
1. type: 0 (`destination_signal`)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we have both a signal here and also a final set of zero bytes in the HMAC? I'm not sure we need another signal on this level, given that in the underlying Sphinx format, the zero'd HMAC is used as a tag for the final destination. This would also create two ways that packet processors are required to know of in order to determine if this is the final hop or not. This framing layer sits above the current onion layer IMO, meaning we're crossing layers here in order to detect a terminal point in the route.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or will the HMAC be left in place, and this is just for the higher level TLV parser?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I currently kept both conditions in my implementation. If either the next packet's hmac is 0x00 or if the first byte of the TLV stream is 0x00 it means we are the last hop.
Instead of using TLV code inside the onion processor, I translated this condition as the first byte after the varint length of the payload is 0x00 (since TLV streams are ordered by TLV type values and this one as type 0x00).
I don't have a strong opinion on what we should do here because it's not necessarily the onion processor layer that need to figure out that we are the destination, it might make more sense that it's the layer that interprets the payload that should figure out that we are the destination.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we leave the HMAC, we should continue to use it to signal termination.

Half-parsing the TLV this way is just nasty, as is the alternative of using type 0xFFFFFFFFFFFFFFFF to signal the HMAC (which, of course would always put it at the end), which loses more bytes across any normal route than skipping the terminal HMAC would!

* Empty
1. type: 2 (`amt_to_forward`)
2. data:
* [`integer`:`amt_to_forward`]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not clear what integer is from strictly reading this document. I think using varint here makes it more clear when combined with #622.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be tu64, agreed.

The clarifications were tacked on after the fact, but they should really be
part of the conventions. I also updated the links to use the reference style,
which results in better text flow and makes it easier to read the source.

Signed-off-by: Christian Decker <[email protected]>
This actually introduces the variable size shift and filler generation.

Signed-off-by: Christian Decker <[email protected]>
As discussed during the spec meeting this allows us not to use the 32 byte
HMAC to identify the last hop, and use a 2-byte signal instead.

Signed-off-by: Christian Decker <[email protected]>
The tables were a bit unreadable in the source view, and the globalfeatures
table was not rendering correctly. This is just a minor cleanup pass.

Signed-off-by: Christian Decker <[email protected]>
Suggested-by: Janus Troelsen <@ysangkok>
As discussed during the IRC meeting on 2019-07-22 this would have been a
duplication of signals. It was decided to use one for now, with the option of
coming back should we ever need the last 32 bytes of the onion.
@cdecker
Copy link
Collaborator Author

cdecker commented Jul 25, 2019

Rebased ^^

Here's the range-diff, to make it easier to look for changes during the rebase:

$ git range-diff $(git merge-base master origin/variable-onion)..origin/variable-onion master..variable-onion
 1:  4f4c3c3 =  1:  a93cc7f bolt04: Formatting cleanup and fold clarifications into conventions
 2:  dc7c2fc !  2:  29e7be9 bolt04: Describe the variable size `hop_payload`
    @@ -117,6 +117,6 @@
      When building the route, the origin node MUST use a payload for
      the final node with the following values:
     +
    - * `outgoing_cltv_value`: set to the final expiry specified by the recipient
    - * `amt_to_forward`: set to the final amount specified by the recipient
    - 
    + * `outgoing_cltv_value`: set to the final expiry specified by the recipient (e.g.
    +   `min_final_cltv_expiry` from a [BOLT #11](11-payment-encoding.md) payment invoice)
    + * `amt_to_forward`: set to the final amount specified by the recipient (e.g. `amount`
 3:  62d8d7b =  3:  2bff8e5 bolt04: Amend the filler generation and onion decoding to varpayload
 4:  cf0527b =  4:  3be749c bolt04: Remove in-spec test vector in favor of JSON test vector
 5:  be5add0 =  5:  74093fa bolt04: Add the TLV types for the new payload format
 6:  d0862a5 !  6:  6b2144c bolt04: Shut the spellchecker up
    @@ -14,12 +14,11 @@
      verifier
      verifiers
     @@
    - structs
    - CompactSize
    - encodings
    + BigSize
    + namespaces
    + tlvs
     +fips
     +rfc
     +varint
     +CompactSize
     +tlvs
    - \ No newline at end of file
 7:  b70ae1f =  7:  d3bb7f2 bolt07: Add feature bits for multi-frame support
 8:  82a73d2 =  8:  88e476f bolt04: Introduce the destination_signal to the tlv_payload
 9:  157a708 =  9:  35edd4e bolt09: Cleanup the feature bits tables and extract URLs into footer
10:  b06386f = 10:  437d79b bolt04: Remove TLV based termination signal

Copy link
Collaborator

@t-bast t-bast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 437d79b

@rustyrussell
Copy link
Collaborator

Ack 437d79b

@cdecker please apply. I'd do it, but you deserve the honor :)

@cdecker cdecker merged commit d23f4b0 into master Jul 26, 2019
@cdecker
Copy link
Collaborator Author

cdecker commented Jul 26, 2019

Congrats everyone, this has been a long time in the making, but it got substantially better through the extensive feedback 👍 🎆

cdecker added a commit to cdecker/lightning that referenced this pull request Jul 26, 2019
cdecker added a commit to cdecker/lightning that referenced this pull request Jul 26, 2019
rustyrussell pushed a commit to ElementsProject/lightning that referenced this pull request Jul 30, 2019
@t-bast t-bast deleted the variable-onion branch September 22, 2021 06:52
t-bast added a commit to ACINQ/eclair that referenced this pull request Feb 25, 2022
We previously supported a 65-bytes fixed-size sphinx payload, which has
been deprecated in favor of variable length payloads containing a tlv
stream (see lightning/bolts#619).

It looks like the whole network now supports the variable-length format,
so we can stop accepting the old one. It is also being removed from the
spec (see lightning/bolts#962).
t-bast added a commit to ACINQ/eclair that referenced this pull request Sep 5, 2022
We previously supported a 65-bytes fixed-size sphinx payload, which has
been deprecated in favor of variable length payloads containing a tlv
stream (see lightning/bolts#619).

It looks like the whole network now supports the variable-length format,
so we can stop accepting the old one. It is also being removed from the
spec (see lightning/bolts#962).
t-bast added a commit to ACINQ/eclair that referenced this pull request Sep 5, 2022
We previously supported a 65-bytes fixed-size sphinx payload, which has
been deprecated in favor of variable length payloads containing a tlv
stream (see lightning/bolts#619).

It looks like the whole network now supports the variable-length format,
so we can stop accepting the old one. It is also being removed from the
spec (see lightning/bolts#962).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants