Skip to content

Latest commit

 

History

History
1451 lines (1171 loc) · 63.2 KB

Protocol.md

File metadata and controls

1451 lines (1171 loc) · 63.2 KB

SaltyRTC – End-to-End-Encrypted Signalling

SaltyRTC is a signalling protocol that uses end-to-end encryption techniques based on the Networking and Cryptography library (NaCl) and the WebSocket protocol. It offers the user to freely choose from a range of signalling tasks, such as setting up a WebRTC or ORTC peer-to-peer connection. SaltyRTC is completely open to new and custom signalling tasks for everything feasible. The protocol has been designed in a way that no third party needs to be trusted. Moreover, it is able to protect the clients' signalling data even in case the underlying TLS encryption of the WebSocket protocol has been completely broken.

This document describes the protocol for both client and server.

Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

Terminology

Client

A SaltyRTC compliant client. The client uses the signalling channel to establish a WebRTC or ORTC peer-to-peer connection.

Server

A SaltyRTC compliant server. The server provides the signalling channel clients may communicate with one another.

Peer

The term peer is being used for protocol descriptions that need to be followed by both SaltyRTC compliant clients and servers.

Initiator

An initiator is a SaltyRTC compliant client who wants to establish a WebRTC or ORTC peer-to-peer connection to a responder.

Responder

The responder is a SaltyRTC compliant client who wants to establish a WebRTC or ORTC peer-to-peer connection to an initiator.

Task

A SaltyRTC task is a protocol extension to this protocol that will be negotiated during the client-to-client authentication phase. Once a task has been negotiated and the authentication is complete, the task protocol defines further procedures, messages, etc.

Signalling Path

A signalling path is a simple ASCII string and consists of the hex value of the initiator's public key. Initiator and responder connect to the same WebSocket path.

MessagePack Object

MessagePack is an object serialisation format very similar to JSON but uses a binary format. The MessagePack specification can be found here.

Object, Array, Map, Bin and Nil

As we are using MessagePack for data serialisation, the terms Object, Array, Map, Bin and Nil represent the terms from the MessagePack specification:

  • Object represents a value of any type.
  • Array represents a sequence of objects.
  • Map represents a key-value pair of objects
  • Bin represents arbitrary binary data
  • Nil represents nil (null or None in other programming languages)

NaCl Key Pairs

NaCl (pronounced salt) is the Networking and Cryptography library which is being utilised by SaltyRTC to encrypt and authenticate its messages. See nacl.cr.yp.to for details.

In this specfication, we will use both secret and public key authenticated encryption provided by NaCl.

NaCl key pairs SHALL always be generated by using a cryptograhically secure random number generator.

Client's Permanent Key

The permanent key pair is a NaCl key pair for public key authenticated encryption. Each client MUST have or generate a permanent key pair that is valid beyond sessions.

Server's Session Key

A SaltyRTC compliant server MUST generate a new NaCl key pair for public key authenticated encryption for each connected client. The key is being exchanged in the handshake and is valid for the lifetime of the connection.

Server's Permanent Key

A SaltyRTC complicant server SHOULD have at least one permanent NaCl key pair for public key authenticated encryption. If the server has such a key pair, it will be used to sign¹ the server's session key and the client's permanent key to mitigate man-in-the-middle attacks. In order to validate this signature, a client that connects to a server SHOULD know the server's public permanent key.

In order to facilitate the change of the server's public permanent key without breaking backward compatibility, a server can have multiple public permanent key pairs. The clients announce the server's public permanent key they're going to use for verification in the 'client-auth' message. Note, however, that old permanent keys SHOULD be phased out after a transitional period (e.g. if they were compromised).

If multiple server public permanent keys are specified, one of them MUST be marked as the primary key. It is RECOMMENDED that the first key specified in the server configuration is treated as the primary key, while all others are treated as fallback keys.

1: The signature is done implicitly by using NaCl's authenticated public key encryption, because public key signatures in NaCl are still subject to change. Authenticated encryption achieves the same goals, while avoiding incompatible sign/verify implementations.

Client's Session Key

A SaltyRTC compliant client MUST generate a new session NaCl key pair for public key authenticated encryption for each other client. More precisely:

  • An initiator MUST generate such a key pair for each responder that has sent a valid 'token' message during the client-to-client authentication process.
  • A responder MUST generate such a key pair for the initiator once it has received a valid 'key' message during the client-to-client authentication process.

The key pair is valid for the lifetime of the client-to-client connection.

Authentication Token

An authentication token consists of a NaCl secret key generated by the initiator that is valid for a single encrypted message during the authentication process between initiator and responder. The token MUST be exchanged over a secure channel which is not defined in this document.

Address

The address is a single byte that identifies a specific peer on a WebSocket path. It is being used to indicate to which client a server should relay a message. In this document, the byte will be represented in hexadecimal notation (base 16) starting with 0x. The server (0x00) and the initiator (0x01) have a static identifier. For responders, the server MUST dynamically assign identifiers (in the range of 0x02..0xff). A server-assigned address becomes invalid as soon as the connection to the server has been severed.

Exchanging Signalling Information

Please note that this section is informational only.

In order to establish a signalling channel using SaltyRTC, the following information has to be available to both peers:

  • WebSocket URI scheme (ws or wss),
  • Server's public permanent key (optional but recommended),
  • Server host (as defined in RFC3986, 3.2.2)
  • Server port (as defined in RFC3986, 3.2.3)
  • Signalling path, and
  • Authentication token (only if responder and if not trusted)

How this information should be exchanged is deliberately not defined by this document. However, we will provide an idea of how the data can be encoded and exchanged by extending the WebSocket URI in the following way:

<scheme>://<server-host>:<server-port>/<signalling-path>
?<server-permanent-public-key>#<authentication-token-hex>

Note that exchanging the server's public permanent key from initiator to responder may or may not be a viable way to distribute the server's public permanent key depending on whether the initiator is fully trusted by the responder or not.

An example of such a URI:

wss://example.com:4567/11c7...0495?afc0...e589#23b7...6564

The initiator could encode this URI into a QR code which the responder will decode back to a URI. The responder can then extract the initiator's public permanent key, the server's public permanent key and the authentication token from the path. Furthermore, it must strip everything that follows ? away and can then use the result to connect to the SaltyRTC server. (If an implementation omits stripping the authentication token from the WebSocket URI, most WebSocket implementations will raise an error as # is not allowed in WebSocket URIs. This will help to prevent leakage of an authentication token.)

WebSocket

The SaltyRTC protocol has been designed to work on top of the WebSocket protocol. For more information about the WebSocket protocol, see RFC 6455.

Security Recommendation

Although the SaltyRTC protocol takes many security measures to prevent eavesdropping, it is still highly RECOMMENDED to use WebSocket in its secure mode (e.g. provide a valid certificate). This measure will ensure that the signalling path is hidden from eavesdroppers and generally hardens the protocol against potential attacks.

Subprotocol

It is REQUIRED to provide the following subprotocol when connecting to a server:

v1.saltyrtc.org

Only if the server chose the subprotocol above, this protocol SHALL be applied. If another shared subprotocol that is not related to SaltyRTC has been found, continue with that subprotocol. Otherwise, the underlying WebSocket connection will be closed automatically with a close code of 1002 (WebSocket Protocol Error).

Message Structure

SaltyRTC messages are encoded in binary using network-oriented format (most significant byte first, also known as big-endian). Unless otherwise noted, numeric constants are in decimal (base 10).

All signalling messages MUST start with a 24-byte nonce followed by either:

  • an NaCl public-key authenticated encrypted MessagePack object,
  • an NaCl secret-key authenticated encrypted MessagePack object or
  • an unencrypted MessagePack object.

Which case applies is always known by the communicating parties. In some scenarios, more than one case is possible. For these scenarios, a description will be provided how multiple cases must be handled.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
|                                                               |
|                            Nonce                              |
|                                                               |
|                                                               |
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                             Data                          ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The nonce is exactly 24 byte that SHALL only be used once per shared secret. A nonce can also be seen as the header of SaltyRTC messages as it is used by every single signalling message. It contains the following fields:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
|                            Cookie                             |
|                                                               |
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    Source     |  Destination  |        Overflow Number        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                        Sequence Number                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Cookie: 16 byte

This field contains 16 cryptographically secure random bytes. For SaltyRTC clients, the cookie SHALL be different for each new communication partner on the signalling path. Precisely, SaltyRTC clients generate a cookie for communication with the server and for each other client they communicate with. SaltyRTC servers MUST generate and use a random cookie for each client.

Source: 1 byte

Contains the SaltyRTC address of the sender.

Destination: 1 byte

Contains the SaltyRTC address of the receiver.

Overflow Number: 2 byte

This field contains the 16 bit unsigned overflow number used in combination with the sequence number. Starts with 0.

Sequence Number: 4 byte

Contains the 32 bit unsigned sequence number. Starts with a cryptographically secure random number and MUST be incremented by 1 for each message.

Note: The overflow and the sequence number have been defined separately considering that some programming languages do not have a native 48 bit unsigned integer type. However, treating the overflow and the sequence number as a single 48 bit unsigned integer is possible and supported by this protocol. In further sections, the combined number will be called Combined Sequence Number.

Connecting To a Signalling Server

A server MUST listen and accept incoming WebSocket connections. Clients SHALL connect to a server by using the WebSocket protocol and supplying a valid signalling path where they want to meet the other client. Servers MUST separate communication of clients between paths. This will be described in detail in the Client-to-Client Communication section.

The path MUST be set as part of the WebSocket URI directly after the hostname, separated by a forward slash. A signalling path is a simple ASCII string and MUST be the lowercase hex value of the initiators public key. Therefore, the resulting path MUST contain exactly 64 characters. Initiator and responder connect to the same WebSocket path.

Example of a WebSocket URI including a valid signalling path:

wss://example.com/debc3a6c9a630f27eae6bc3fd962925bdeb63844c09103f609bf7082bc383610

Sending a Signalling Message

A peer that wants to send a signalling message needs to go through the process of creating the nonce part of the message first.

If the server announced a new initiator or responder with the same address as a previous client OR this is the first message to the destination peer in general:

  • A server SHALL generate a new cryptographically secure random cookie to be used for the client until the connection has been severed.
  • A client SHALL also generate a new cryptographically secure random cookie to be used for the other peer. In case the other peer is a server, the cookie is valid until the connection has been closed. For other clients, this cookie is valid until the server announces a new initiator or responder with the same address or until the connection to the server has been closed.

The cookie SHALL be set to the previously generated cookie.

Set the source address:

  • A server SHALL use 0x00 as source address.
  • A client SHALL use its assigned identity as source address. If it has not been assigned an identity yet, the source address MUST be set to 0x00.

Set the destination address:

  • A server SHALL use the identity it has assigned to the client as destination address if the client-to-server authentication process has been completed. Otherwise, the destination address SHALL be 0x00.
  • A client MUST use the identity of the peer the message should be sent to. Initially, initiators and responders SHALL ONLY sent messages to the server (0x00). As soon as an identity (address) has been assigned to it, an initiator MAY also send messages to responders (0x02..0xff) and a responder MAY also send messages to the initiator (0x01).

If this is the first message to the destination peer:

  • Set the overflow number to 0 and the sequence number to a 32 bit cryptographically secure random number, or
  • Alternatively, if the implementation is using the combined sequence number, the upper 16 bit SHALL be 0 and the lower 32 bit MUST be cryptographically secure random.
  • The above number(s) SHALL be stored and updated separately for each other peer by its identity (destination address in this case).

In case this is not the first message to the destination peer, the peer does the following:

  • In case that the peer does make use of the combined sequence number, it MUST increase the combined sequence number of the destination peer by 1 and check that is has not reset to 0. Implementations that use the combined sequence number SHALL ignore the following two procedures.
  • Increment the sequence number of the destination peer by 1. In case that it overflows (and resets to 0), the overflow number of the destination peer MUST be increased by 1 as well, and
  • In case the overflow number of the destination peer has been incremented by 1, it SHALL NOT reset to 0 if it was greater than 0 before.

In case any of the checks listed above failed, the peer MUST close the connection with a close code of 3001 (Protocol Error).

The peer MUST serialise and encrypt (only if that is required by the message type) the MessagePack object. The resulting sequence of bytes represents the data section of the message and MUST contain more than 0 bytes.

The concatenation of the nonce and the data section represents the whole message and SHALL be sent as a whole.

Receiving a Signalling Message

When a peer receives a signalling message, it first checks that the message contains more than 24 byte. It checks that the destination address is sane:

  • A server MUST check that the destination address is 0x00 until the sender is authenticated. In case that the sender is authenticated, relaying is ONLY allowed between an initiator (0x01) and a responder (0x02..0xff). Note that the server MUST follow the Sending a Relay Message section for relay messages after it has completed the procedures of this section.
  • A client MUST check that the destination address targets its assigned identity (or 0x00 during authentication). The first message received with a destination address different to 0x00 SHALL be accepted as the client's assigned identity. However, the client MUST validate that the identity fits its role – initiators SHALL ONLY accept 0x01 and responders SHALL ONLY an identity from the range 0x02..0xff. The identity MUST be stored as the client's assigned identity.

Furthermore, a peer checks that the source address is sane:

  • A server MUST check that the source address is 0x00 until a specific identity has been assigned to the sender. In case that the sender is authenticated, the server MUST check that the source address equals the sender's assigned identity.
  • An initiator SHALL ONLY process messages from the server (0x00). As soon as the initiator has been assigned an identity, it SHALL also accept messages from other responders (0x02..0xff). Other messages SHALL be discarded and SHOULD trigger a warning.
  • A responder SHALL ONLY process messages from the server (0x00). As soon as the responder has been assigned an identity, it SHALL also accept messages from the initiator (0x01). Other messages SHALL be discarded and SHOULD trigger a warning.

In case this is the first message received from the sender, the peer:

  • MUST check that the overflow number of the source peer is 0 (or the upper 16 bits of the combined sequence number of the source peer are 0, in code: csn & 0xffff00000000 == 0) and,
  • if the peer has already sent a message to the sender, MUST check that the sender's cookie is different than its own cookie, and
  • MUST store cookie, overflow number and sequence number (or the combined sequence number) for checks on further messages.
  • The above number(s) SHALL be stored and updated separately for each other peer by its identity (source address in this case).

If the message is received by a client or received by and intended for a server (the destination address is 0x00), the peer does the following checks:

  • Ensure that the 16 byte cookie of the sender has not changed.
  • In case that the peer does make use of the combined sequence number, it MUST check that the combined sequence number of the source peer has been increased by 1 and has not reset to 0. Implementations that use the combined sequence number SHALL ignore the following three checks.
  • In case incrementing the sequence number of the source peer would not overflow that number, the sequence number MUST be incremented by 1 and the overflow number of the source peer MUST remain the same.
  • In case incrementing the sequence number of the source peer would overflow, the sequence number MUST be 0 and the overflow number of the source peer MUST be increased by 1.
  • The overflow number of the source peer SHALL NOT reset to 0 if it was greater than 0 before.

In case that any check fails, the peer MUST close the connection with a close code of 3001 (Protocol Error) unless otherwise stated.

The peer MUST proceed by decrypting (only if that is required by the expected message type) and unserialising the MessagePack object from the data section of the message. In case that decryption or unserialisation fails (bear in mind that there is a corner case for the server when expecting 'client-hello' or 'client-auth'), the peer MUST close the connection with a close code of 3001 (Protocol Error). Further processing depends on the message type described below.

Sending a Relay Message

Once a server has received and validated a relay message from an initiator to a responder or the other way around, it SHALL validate that both sender and receiver are authenticated towards the server. It SHALL continue by sending the unmodified relay message to the destined client without going through the Sending a Signalling Message procedure. In case the message could not be relayed (e.g. the destined client closed the connection or sending the message timed out), the server MUST send a 'send-error' message back to the original sender of the message.

Encrypting a Message

If a message type requires encrypting, the data section of the message MUST be encrypted by using either NaCl public key cryptography or NaCl secret key cryptography. Which case applies depends on the message type.

Public Key Cryptography

Encrypt data with the nonce of the nonce section, the sender's private key and the receiver's public key of the corresponding key pair specified by the message type.

Secret Key Cryptography

Encrypt data with the nonce of the nonce section and the secret key specified by the message type.

Decrypting a Message

In case the current state of the message flow indicates that a message must be decrypted, the data section of the message MUST be decrypted by using either NaCl public key cryptography or NaCl secret key cryptography. Which case applies depends on the current state in the message flow.

Public Key Cryptography

Decrypt data with the nonce of the nonce section, the sender's public key and the receiver's private key of the corresponding key pair specified by the message type.

Secret Key Cryptography

Decrypt data with the nonce of the nonce section and the secret key specified by the message type.

Client-to-Server Messages

This section describes the various messages that will be exchanged between server and client.

Client-to-server messages are distinguishable from client-to-client messages by looking at the destination address field of the nonce. If the destination address is the server's address (0x00), the message is a client-to-server message. Message types between client and server SHALL NOT be repeated.

The messages are serialised MessagePack objects. We will provide an example for each message in an extended JSON format where a string value denoted with 'b' indicates that the content is binary data (MessagePack Bin format). For ease of reading, binary data of the examples is represented as a hex-encoded string. However, binary data SHALL NOT be hex-encoded in implementations. Unless otherwise noted, all non-binary strings MUST be interpreted as UTF-8 encoded strings. Furthermore, field values SHALL NOT be Nil. The type field is REQUIRED for all messages. Other required fields will be described in the messages' section. In case a field is missing or contains invalid data, the incident MUST be treated as a protocol error. This also applies to unexpected messages that deviate from the message flow.

In case that any check fails, the peer MUST close the connection with a close code of 3001 (Protocol Error) unless otherwise stated.

Path Cleaning

An initiator that is connected to the server MUST keep its path clean by dropping inactive responders (i.e. responders that have not sent a client-to-client message to the initiator, yet). To achieve that, the initiator MAY store the responders in a FIFO queue and drop the oldest responder that did not send a message to it once the path is congested (253 responders are connected). Another solution would be to drop responders that have not sent any messages to the initiator after 60 seconds. However, a combination of both is RECOMMENDED.

Message States (Towards/From Initiator)

      +--------------+     +-------------+
  --->+ server-hello +---->+ client-auth |
      +--------------+     +------+------+
                                  |
                                  v
                           +------+------+
                           | server-auth |
                           +------+------+
                                  |
                                  v
+---------------------------------+--------------------+
| new-responder/drop-responder/send-error/disconnected |
+-------------+-------------------------+--------------+
              |                         ^
              +-------------------------+

Message States (Towards/From Responder)

    +--------------+     +-------------+
--->+ server-hello |  +->+ client-auth |
    +------+-------+  |  +------+------+
           |          |         |
           v          |         v
    +------+-------+  |  +------+------+
    | client-hello +--+  | server-auth |
    +--------------+     +------+------+
                                |
                                v
            +-------------------+-------------------+
            | new-initiator/send-error/disconnected |
            +--------------+----------+-------------+
                           |          ^
                           +----------+

'server-hello' Message

This message MUST be sent by the server after a client connected to the server using a valid signalling path. The server MUST generate a new cryptographically secure random NaCl key pair for each client. The public key (32 bytes) of that key pair MUST be set in the key field of this message.

A receiving client MUST check that the message contains a valid NaCl public key (the size of the key MUST be exactly 32 bytes). In case the client has knowledge of the server's public permanent key, it SHALL ensure that the server's public session key is different to the server's public permanent key.

The message SHALL NOT be encrypted.

{
  "type": "server-hello",
  "key": b"debc3a6c9a630f27eae6bc3fd962925bdeb63844c09103f609bf7082bc383610"
}

'client-hello' Message

As soon as the client has received the 'server-hello' message, it MUST ONLY respond with this message in case the client takes the role of a responder. The initiator MUST skip this message. The responder MUST set the public key (32 bytes) of the permanent key pair in the key field of this message.

A receiving server MUST check that the message contains a valid NaCl public key (the size of the key MUST be exactly 32 bytes). Note that the server does not know whether the client will send a 'client-hello' message (the client is a responder) or a 'client-auth' message (the client is the initiator). Therefore, the server MUST be prepared to handle both message types at that particular point in the message flow. This is also the intended way to differentiate between initiator and responder.

The message SHALL NOT be encrypted.

{
  "type": "client-hello",
  "key": b"55e7dd57a01974ca31b6e588909b7b501cdc7694f21b930abb1600241b2ddb27"
}

'client-auth' Message

After the 'client-hello' message has been sent (responder) or after the 'server-hello' message has been received (initiator) the client MUST send this message to the server.

  • The client MUST set the your_cookie field to the cookie the server has used in the nonce of the 'server-hello' message.
  • It SHALL also set the subprotocols field to the exact same Array of subprotocol strings it has provided to the WebSocket client implementation for subprotocol negotiation.
  • If the user application requests to be pinged (see RFC 6455 section 5.5.2) in a specific interval, the client SHALL set the field ping_interval to the requested interval in seconds. Otherwise, ping_interval MUST be set to 0 indicating that no WebSocket ping messages SHOULD be sent.
  • If the client has stored the server's public permanent key (32 bytes), it SHOULD set it in the your_key field.

When the server receives a 'client-auth' message, it MUST check that the cookie provided in the your_cookie field contains the cookie the server has used in its previous messages to that client. The server SHALL check that the subprotocols field contains an Array of subprotocol strings, and:

  • If the server has access to the subprotocol selection function used by the underlying WebSocket implementation, SHALL use the same function to select the subprotocol from the server's list and the client's list. The resulting selected subprotocol MUST be equal to the initially negotiated subprotocol.
  • If the server does not have access to the subprotocol selection function of the underlying WebSocket implementation but it does have access to the list of subprotocols provided by the client to the WebSocket implementation, it SHALL validate that the lists contain the same subprotocol strings in the same order.
  • If the server is not able to apply either of the above mechanisms, it SHALL validate that the negotiated subprotocol is present in the subprotocols field.

Furthermore, the server SHALL validate that the ping_interval field contains a non-negative integer. If the value is 0, the server SHOULD NOT send WebSocket ping messages to the client. Otherwise, the server SHOULD send a WebSocket ping message in the requested interval in seconds to the client and wait for a corresponding pong message (as described in RFC 6455 section 5.5.3). An unanswered ping MUST result in a protocol error and the connection SHALL be closed with a close code of 3008 (Timeout). A timeout of 30 seconds for unanswered ping messages is RECOMMENDED.

If the 'client-auth' message contains a your_key field, it MUST be compared to the list of server public permanent keys. Then:

  • If the server does not have a permanent key pair, it SHALL drop the client with a close code of 3007 (Invalid Key).
  • If the server does have at least one permanent key pair and if the key sent by the client does not match any of the public keys, it SHALL drop the client with a close code of 3007 (Invalid Key).
  • If the key sent by the client matches a public permanent key of the server, then that key pair SHALL be selected for further usage of the server's permanent key pair towards that client.

In case the 'client-auth' message did not contain a your_key field but the server does have at least one permanent key pair, the server SHALL select the primary permanent key pair for further usage of the server's permanent key pair towards the client.

The message SHALL be NaCl public-key encrypted by the server's session key pair (public key sent in 'server-hello') and the client's permanent key pair (public key as part of the WebSocket path or sent in 'client-hello').

{
  "type": "client-auth",
  "your_cookie": b"af354da383bba00507fa8f289a20308a",
  "subprotocols": [
    "v1.saltyrtc.org",
    "some.other.protocol"
  ],
  "ping_interval": 30,
  "your_key": b"2659296ce03993e876d5f2abcaa6d19f92295ff119ee5cb327498d2620efc979"
}

'server-auth' Message

Once the server has received the 'client-auth' message, it SHALL reply with this message. Depending on the client's role, the server SHALL choose and assign an identity to the client by setting the destination address accordingly:

  • In case the client is the initiator, a previous initiator on the same path SHALL be dropped by closing its connection with a close code of 3004 (Dropped by Initiator) immediately. The new initiator SHALL be assigned the initiator address (0x01).
  • In case the client is a responder, the server SHALL choose a responder identity from the range 0x02..0xff. If no identity can be assigned because each identity is being held by an authenticated responder, the server SHALL close the connection to the client with a close code of 3000 (Path Full).

After the procedure above has been followed, the client SHALL be marked as authenticated towards the server. The server MUST set the following fields:

  • The your_cookie field SHALL contain the cookie the client has used in its previous messages.
  • The signed_keys field SHALL be set in case the server has at least one permanent key pair. Its value MUST contain the concatenation of the server's public session key and the client's public permanent key (in that order). The content of this field SHALL be NaCl public key encrypted using the previously selected private permanent key of the server and the client's public permanent key. For encryption, the message's nonce SHALL be used.
  • ONLY in case the client is an initiator, the responders field SHALL be set containing an Array of the active responder addresses on that path. An active responder is a responder that has already completed the authentication process and is still connected to the same path as the initiator.
  • ONLY in case the client is a responder, the initiator_connected field SHALL be set to a boolean whether an initiator is active on the same path. An initiator is considered active if it has completed the authentication process and is still connected.

When the client receives a 'server-auth' message, it MUST have accepted and set its identity as described in the Receiving a Signalling Message section. This identity is valid until the connection has been severed. It MUST check that the cookie provided in the your_cookie field contains the cookie the client has used in its previous and messages to the server. If the client has knowledge of the server's public permanent key, it SHALL decrypt the signed_keys field by using the message's nonce, the client's private permanent key and the server's public permanent key. The decrypted message MUST match the concatenation of the server's public session key and the client's public permanent key (in that order). If the signed_keys is present but the client does not have knowledge of the server's permanent key, it SHALL log a warning. Moreover, the client MUST do the following checks depending on its role:

  • In case the client is the initiator, it SHALL check that the responders field is set and contains an Array of responder identities. The responder identities MUST be validated and SHALL neither contain addresses outside the range 0x02..0xff nor SHALL an address be repeated in the Array. An empty Array SHALL be considered valid. However, Nil SHALL NOT be considered a valid value of that field. It SHOULD store the responder's identities in its internal list of responders. Additionally, the initiator MUST keep its path clean by following the procedure described in the Path Cleaning section.
  • In case the client is the responder, it SHALL check that the initiator_connected field contains a boolean value. In case the field's value is true, the responder MUST proceed with sending a 'token' or 'key' client-to-client message described in the Client-to-Client Messages section.

After the procedure above has been followed by the client, it SHALL mark the server as authenticated.

The message SHALL be NaCl public-key encrypted by the server's session key pair and the client's permanent key pair.

{
  "type": "server-auth",
  "your_cookie": b"18b96fd5a151eae23e8b5a1aed2fe30d",
  "signed_keys": b"e42bfd8c5bc9870ae1a0d928d52810983ac7ddf69df013a7621d072aa9633616cfd...",
  "initiator_connected": true,  // ONLY towards responders
  "responders": [  // ONLY towards initiators
    0x02,
    0x03
  ]
}

'new-initiator' Message

When a new initiator has authenticated itself towards the server on a path, the server MUST send this message to all currently authenticated responders on the same path. No additional field needs to be set. The server MUST ensure that a 'new-initiator' message has been sent before the corresponding initiator is able to send messages to any responder.

A responder who receives a 'new-initiator' message MUST proceed by deleting all currently cached information about and for the previous initiator (such as cookies and the sequence numbers) and continue by sending a 'token' or 'key' client-to-client message described in the Client-to-Client Messages section.

The message SHALL be NaCl public-key encrypted by the server's session key pair and the responder's permanent key pair.

{
  "type": "new-initiator"
}

'new-responder' Message

As soon as a new responder has authenticated itself towards the server on path, the server MUST send this message to an authenticated initiator on the same path. The field id MUST be set to the assigned identity of the newly connected responder. The server MUST ensure that a 'new-responder' message has been sent before the corresponding responder is able to send messages to the initiator.

An initiator who receives a 'new-responder' message SHALL validate that the id field contains a valid responder address (0x02..0xff). It SHOULD store the responder's identity in its internal list of responders. If a responder with the same id already exists, all currently cached information about and for the previous responder (such as cookies and the sequence number) MUST be deleted first. Furthermore, the initiator MUST keep its path clean by following the procedure described in the Path Cleaning section.

The message SHALL be NaCl public-key encrypted by the server's session key pair and the initiator's permanent key pair.

{
  "type": "new-responder",
  "id": 0x04
}

'drop-responder' Message

At any time, an authenticated initiator MAY request to drop an authenticated responder from the path the initiator is connected to by sending this message. The initiator MUST include the id field and set its value to the responder's identity the initiator wants to drop. In addition, it MAY include the reason field which contains an optional close code the server SHALL close the connection to the responder with. Before the message is being sent, the initiator SHALL delete all currently cached information (such as cookies and sequence numbers) about and for the previous responder that used the same address.

Upon receiving a 'drop-responder' message, the server MUST validate that the messages has been received from an authenticated initiator. The server MUST validate that the id field contains a valid responder address (0x02..0xff). If a reason field exists, it must contain a valid close code (see Close Code Enumeration, listing of close codes that are valid for 'drop-responder' messages). It proceeds by looking up the WebSocket connection of the provided responder identity. If no connection can be found, the message SHALL be silently discarded but MAY generate an informational logging entry. If the WebSocket connection has been found, the connection SHALL be closed with the provided close code of the reason field. If no reason field has been provided, the connection SHALL be closed with a close code of 3004 (Dropped by Initiator). Closing the connection MUST NOT trigger a 'disconnected' message.

The message SHALL be NaCl public-key encrypted by the server's session key pair and the initiator's permanent key pair.

{
  "type": "drop-responder",
  "id": 0x02,
  "reason": 3005
}

'disconnected' Message

If an initiator that has been authenticated towards the server terminates the connection with the server, the server SHALL send this message towards all connected and authenticated responders.

If a responder that has been authenticated towards the server terminates the connection with the server, the server SHALL send this message towards the initiator (if present).

An initiator who receives a 'disconnected' message SHALL validate that the id field contains a valid responder address (0x02..0xff).

A responder who receives a 'disconnected' message SHALL validate that the id field contains a valid initiator address (0x01).

A receiving client MUST delete all cached information about and for the other client with the identity of the id field (such as cookies and sequence numbers). The client MAY stay on the path and wait for a new initiator/responder to connect. However, the client-to-client handshake MUST start from the beginning. In addition, the client MUST notify the user application that the client with the identity id has disconnected.

The message SHALL be NaCl public-key encrypted by the server's session key pair and the client's permanent key pair.

{
  "type": "disconnected",
  "id": 0x02
}

'send-error' Message

In case the server could not relay a client-to-client message (meaning that the connection between server and the receiver has been severed), the server MUST send this message to the original sender of the message that should have been relayed. The server SHALL set the id field to the concatenation of the source address, the destination address, the overflow number and the sequence number (or the combined sequence number) of the nonce section from the original message.

A receiving client MUST treat this incident by raising an error event to the user's application and deleting all cached information about and for the other client (such as cookies and sequence numbers). The client MAY stay on the path and wait for a new initiator/responder to connect. However, the client-to-client handshake MUST start from the beginning.

The message SHALL be NaCl public-key encrypted by the server's session key pair and the client's permanent key pair.

{
  "type": "send-error",
  "id": b"010200000000000f"
}

Client-to-Client Messages

The following messages are messages that will be exchanged between two clients (initiator and responder).

Client-to-client messages are distinguishable from client-to-server messages by looking at the address fields of the nonce. If both fields contain a client address (an address different to 0x00), the message is a client-to-client message. SaltyRTC servers MUST relay these messages to the corresponding destination once sender and receiver are authenticated towards the server and the adress sections in the nonce have been validated. In case the message could not be relayed, the server MUST send a 'send-error' message back to the sender (see previous section). In this section and all its subsections, authentication means authentication towards the other client unless otherwise stated.

Identical to client-to-server messages, the messages are serialised MessagePack objects. We will provide an example for each message in an extended JSON format where a string value denoted with 'b' indicates that the content is binary data (MessagePack Bin format). For ease of reading, binary data of the examples is represented as a hex-encoded string. However, binary data SHALL NOT be hex-encoded in implementations. Unless otherwise noted, all non-binary strings MUST be interpreted as UTF-8 encoded strings. Furthermore, field values SHALL NOT be Nil. The type field is REQUIRED for all messages. Other required fields will be described in the messages' section. In case a field is missing or contains invalid data, the incident MUST be treated as a protocol error. This also applies to unexpected messages that deviate from the message flow.

Compared to client-to-server messages, protocol errors for client-to-client message MUST be handled differently. In case that any check fails, the procedure below MUST be followed unless otherwise stated:

  • If the other client is not authenticated yet, an initiator SHALL drop the corresponding responder by sending a 'drop-responder' message with the responder's address in the id field to the server and a close code of 3001 (Protocol Error) in the reason field. A responder SHALL close the connection to the server with a close code of 3001.
  • If the other client is authenticated, the client SHALL send a 'close' message to the other client containing the close code 3001 (Protocol Error). Both clients SHALL terminate the connection to the server (normal close code).

Client Handshake

Before the client-to-client handshake can take place, the initiator SHALL issue a token which is a securely random generated NaCl secret key (32 bytes) that is valid for a single successfully decrypted message – the 'token' message. The token MUST be exchanged securely between initiator and responder. This specification deliberately does not define how the token should be exchanged.

Once the authentication process of the two clients has been completed (after both clients have sent each other a valid 'auth' message), the clients MAY choose to trust each other by storing each other's public key and the path securely (note, that this trusting procedure must be handled by the application).

The API of the clients MUST be able to handle trusted public keys. If a trusted key is passed to the client, the initiator SHALL omit generating a token and both clients SHALL skip the 'token' message during the handshake.

If one of the clients is out of sync to the other (one has a trusted public key but the other has not), the initiator will receive a different message (either 'token' or 'key') than expected which it cannot decrypt. Therefore, the initiator MUST react by sending a 'drop-responder' message with the reason field set to 3005 (Initiator Could Not Decrypt) in case it cannot decrypt the first message sent by a responder. Once a responder's connection to the server is being terminated by that close code, the application of the responder SHALL be notified that the initiator could not decrypt the message. The easiest way to resolve such a conflict would be to untrust public keys of that path and let the initiator generate a new token.

To mitigate brute-force attacks, the initiator SHALL introduce a timeout of at least one second between handshake attempts. Furthermore, the initiator SHALL delete all cached information about and for a responder (such as cookies and sequence numbers) in case a responder fails to authenticate itself towards the initiator.

Message States

     +-----------------+    +------------------+
     | key (initiator) +--->+ auth (responder) |
     +--------+--------+    +---------+--------+
              ^                       |
              |                       v
     +--------+--------+    +---------+--------+
--+->+ key (responder) |    | auth (initiator) +.....> Task
  |  +---+-------------+    +--+-----------+---+  :
  |      ^                     |           |      :
  |      |                     v           v      :
  |  +---+---+      +----------+--+    +---+---+  :
  +->+ token |      | application +--->+ close |  :
     +-------+      +--+-----+--+-+    +-------+  :
                       |     ^  :                 :
                       +-----+  :.................:

'token' Message

Once a responder has authenticated itself towards the server and an initiator is present on that path, it SHALL ONLY send this message to the initiator in case it holds an authentication token issued by the initiator on that path. This message SHALL be skipped in case the responder knows that the initiator already trusts it and previously stored the responder's public key.

The responder MUST set the public key (32 bytes) of the permanent key pair in the key field of this message.

A receiving initiator MUST check that the message contains a valid NaCl public key (32 bytes) in the key field. In case the initiator expects a 'token' message but could not decrypt the message's content, it SHALL send a 'drop-responder' message containing the id of the responder who sent the message and a close code of 3005 (Initiator Could Not Decrypt) in the reason field.

The message SHALL be NaCl secret key encrypted by the token the initiator created and issued to the responder. In case the initiator has successfully decrypted the 'token' message, the secret key MUST be invalidated immediately and SHALL NOT be used for any other message.

{
  "type": "token",
  "key": b"55e7dd57a01974ca31b6e588909b7b501cdc7694f21b930abb1600241b2ddb27"
}

'key' Message

This message is sent by both initiator and responder. The responder SHALL send this message as its first message or directly after the 'token' message. The initiator MUST wait until it has successfully processed the message before it sends a 'key' message to that responder.

The client MUST generate a session key pair (a new NaCl key pair for public key authenticated encryption) for further communication with the other client. The client's session key pair SHALL NOT be identical to the client's permanent key pair. It MUST set the public key (32 bytes) of that key pair in the key field.

Once the other client receives a 'key' message, it MUST validate the key field: The key shall be 32 bytes and SHALL NOT be identical to the other client's public permanent key. Further messages from the other client will use the session key pair for encryption unless otherwise specified (e.g. by a task). In case an initiator expects a 'key' message but could not decrypt the message's content, it SHALL send a 'drop-responder' message containing the id of the responder who sent the message and a close code of 3005 (Initiator Could Not Decrypt) in the reason field.

The message SHALL be NaCl public-key encrypted by the client's permanent key pair and the other client's permanent key pair.

{
  "type": "key",
  "key": b"bbbf470d283a9a4a0828e3fb86340fcbd19efe75f63a2e51ad0b16d20c3a0c02",
}

'auth' Message

This message is sent by both initiator and responder. The responder SHALL send this message after it has received and processed a 'key' message from the initiator. The initiator MUST wait until it has successfully processed the 'auth' message before it sends an 'auth' message to that responder.

The client MUST set the following fields:

  • Set the your_cookie field to the cookie the other client has used in the nonce of its previous message(s).
  • A responder MUST set the tasks field to an Array of SaltyRTC task protocol names the responder offers to utilise.
  • An initiator MUST include the task field and set it to the name of the SaltyRTC task protocol it has chosen from the Array the responder provided.
  • Both clients SHALL set the data field to a Map containing the selected tasks' names as keys and another Map or Nil as the task's value. The content of these Maps depends on the task and SHALL be specified by the task's protocol specification. For each task, there MUST be a field in the data field.

When the client receives an 'auth' message, it MUST check the following fields:

  • The cookie provided in the your_cookie field SHALL contain the cookie it has used in its previous messages to the other client.
  • An initiator SHALL validate that the tasks field contains an Array with at least one element. Each element in the Array SHALL be a string. The initiator SHALL continue by comparing the provided tasks to its own Array of supported tasks. It MUST choose the first task in its own list of supported tasks that is also contained in the list of supported tasks provided by the responder. In case no common task could be found, the initiator SHALL send a 'close' message to the responder containing the close code 3006 (No Shared Task Found) as reason and raise an error event indicating that no common signalling task could be found². The initiator SHALL then proceed with the termination of the connection as described in the section 'close' Message.
  • A responder SHALL validate that the task field is present and contains one of the tasks it has previously offered to the initiator.
  • Both initiator an responder SHALL verify that the data field contains a Map and SHALL look up the chosen task's data value. The value MUST be handed over to the corresponding task after processing this message is complete.

2: SaltyRTC is designed with the expectation that two peers will attempt to establish an 1:1 connection. While there is a mechanism for dropping invalid responders without disconnecting (using the 'drop-responder' message) to prevent simple DoS schemes, by the time the proposed tasks are compared the responder has already authenticated itself towards the initiator. Thus, we can expect that this was a serious connection attempt, not a spammer trying to flood random WebSocket endpoints with connections.

After the above procedure has been followed, the other client has successfully authenticated it towards the client. The other client's public key MAY be stored as trusted for that path if the application desires it. The initiator MUST drop all other connected responders with a 'drop-responder' message containing the close code 3004 (Dropped by Initiator) in the reason field.

Both initiator and responder MUST continue by following the protocol specification of the chosen task after processing this message is complete.

The message SHALL be NaCl public-key encrypted by the client's session key pair and the other client's session key pair.

{
  "type": "auth",
  "your_cookie": b"957c92f0feb9bae1b37cb7e0d9989073",
  "tasks": [  // ONLY towards an initiator
    "v1.ortc.tasks.saltyrtc.org",
    "v1.webrtc.tasks.saltyrtc.org"
  ],
  "task": "v1.ortc.tasks.saltyrtc.org",  // ONLY towards a responder
  "data": {
     "v1.ortc.tasks.saltyrtc.org": {
        ...
      },
      "v1.webrtc.tasks.saltyrtc.org": {
        ...
      }
  }
}

'application' Message

Once the client-to-client handshake has been completed, the user application of a client MAY trigger sending this message.

This message type allows user applications to send simple control messages or early data without having to modify an existing task. However, this message SHOULD NOT be abused to write custom protocols. Tasks MUST support this message type and SHOULD support a message of this type to be sent or received at any time.

A client who sends an 'application' message SHALL set the data field to whatever data the user application provided. Therefore, data MAY be of any type.

A receiving client SHALL validate that the data field is set. It MUST pass that data to the user application.

The message SHALL be NaCl public-key encrypted by the client's session key pair and the other client's session key pair.

{
  "type": "application",
  "data": ...
}

'close' Message

Both initiator and responder SHALL trigger sending this message any time the application or a task requests to terminate the signalling connection between the clients over the server and to the server. However, this message SHALL ONLY be sent in case the client-to-client handshakes has been completed

A client who sends a 'close' message MUST set the reason field to a valid close code (as enumerated in Close Code Enumeration). 1001 SHALL be used for normal close cases. Once the message has been sent, the client SHALL remove all cached data (such as cookies and sequence numbers) of and for the other client. The client SHALL also terminate the connection to the server with a close code of 1001 (Going Away) if the connection is still open.

A receiving client SHALL validate that the reason field contains a valid close code (as enumerated in Close Code Enumeration). The client SHALL remove all cached data (such as cookies and sequence numbers) of and for the other client. The client SHALL also terminate the connection to the server (no specific close code) if the connection is still open.

The message SHALL be NaCl public-key encrypted by the client's session key pair and the other client's session key pair.

{
  "type": "close",
  "reason": 3003
}

Tasks

To choose a signalling task, client implementations MUST provide an API for the user to choose a list of signalling tasks/solutions (in order of preference) that shall be negotiated between the clients. At least one signalling task MUST be selected by the user.

As soon as the authentication procedure between initiator and responder has been completed sucessfully, the specification of the negotiated task takes over.

Message Flow Example

This example provides the message flow of an initiator and a responder that connect to the same signalling path. The responder starts communicating with the initiator once it has completed the authentication towards the server. Then, both clients proceed with the client-to-client handshake.

Initiator                     Server                      Responder
 |                               |                               :
 |   wss://saltytc.org/01ff...   |                               :
 |------------------------------>|                               :
 |         server-hello          |                               :
 |<------------------------------|                               |
 |          client-auth          |   wss://saltytc.org/01ff...   |
 |------------------------------>|<------------------------------|
 |          server-auth          |         server-hello          |
 |<------------------------------|------------------------------>|
 |                               |         client-hello          |
 |                               |<------------------------------|
 |                               |          client-auth          |
 |                               |<------------------------------|
 |         new-responder         |          server-auth          |
 |<------------------------------|------------------------------>|
 |                               |             token             |
 |             token             |<------------------------------|
 |<------------------------------|              key              |
 |              key              |<------------------------------|
 |<------------------------------|                               |
 |              key              |                               |
 |------------------------------>|              key              |
 |                               |------------------------------>|
 |                               |             auth              |
 |             auth              |<------------------------------|
 |<------------------------------|                               |
 |             auth              |                               |
 |------------------------------>|             auth              |
 :                               |------------------------------>|
 :                               :                               :
 :                               :                               :

Errors

A protocol error MUST be treated by closing the connection with a close code of 3001 (Protocol Error) unless otherwise stated. For client-to-client messages, the behaviour depends on the other client's authentication status (see the Client-to-Client Messages section).

In any case, errors SHOULD raise an error event to the application if the error cannot be resolved by the implementation itself.

Close Code Enumeration

The following close codes are being used by the protocol:

  • 1000: Normal closure (WebSocket internal close code)
  • 1001: Going Away (WebSocket internal close code)
  • 1002: Protocol Error (WebSocket internal close code)
  • 3000: Path Full
  • 3001: Protocol Error
  • 3002: Internal Error
  • 3003: Handover of the Signalling Channel
  • 3004: Dropped by Initiator
  • 3005: Initiator Could Not Decrypt
  • 3006: No Shared Task Found
  • 3007: Invalid Key
  • 3008: Timeout

The following close codes are available for 'drop-responder' messages:

  • 3001: Protocol Error
  • 3002: Internal Error
  • 3004: Dropped by Initiator
  • 3005: Initiator Could Not Decrypt

Security Mechanisms

Authentication

Client-to-Client

When the authentication token has been exchanged in a secure manner, both peers can assure authentication of each other.

The initiator either authenticates the responder by receiving the responder’s public permanent key encrypted with the authentication token, or it already knows the public permanent key of the responder. For both cases, only the initiator and the responder know the shared secret that can decrypt messages.

The other peer, the responder, also knows the public permanent key of the initiator before it connects to the server. Again, only the initiator and the responder know the shared secret to decrypt messages.

Client-to-Server

Authentication towards the server is only necessary to be able to establish another security layer for transport encryption. However, only the initiator can be authenticated towards the server. The responder is able to claim any public permanent key it has the corresponding private key for.

Clients can only authenticate the server in case the client knows the public permanent key of the server and the server uses this feature. In this case, a valid signature of the server for the keys it signs in 'server-auth' authenticates the server.

Message Integrity

For unencrypted messages ('server-hello' and 'client-hello'), the underlying WebSocket implementation may or may not provide message integrity. However, for encrypted messages, the Message Authentication Code (MAC) of NaCl ensures the message's integrity. Because a modified nonce would lead to an error during decryption, the nonce is implicitly protected as well.

Protection Against Replay Attacks

The cookie resembles a challenge that needs to be repeated by the other peer. A peer can thereby prove that it owns the private key for the public key it transmitted. This technique is being applied for both client-to-server authentication and client-to-client authentication. In combination with the overflow and sequence number and the source and destination bytes, the implementations are able to mitigate replay attacks.

Uniqueness of Nonces

The random 16 byte cookie should contain enough randomness to ensure that a nonce is not being reused for a shared secret as long as the protocol is being followed closely. To ensure that nonces are unique per shared secret, peers communicating with one another use different cookies.

Forward Secrecy

The shared secret of client and server is different each time the client connects to the server. Although the permanent key of the client does not change, the server always generates a new session key pair.

Two clients that communicate with each other establish a session key immediately during the handshake. The long-term (permanent keys) are only used for a single message ('key') before the session keys have been established.