libsodium is a modern, easy-to-use software library for encryption, decryption, signatures, password hashing, and more. Its design choices emphasize security and ease of use. Compared to the popular OpenSSL library, the libsodium API is fairly high-level and offers only a few primitives, which is one of the reasons for its ease of use. As part of the learning process, I built a toy remote login protocol (SSH alike) with libsodium. Also, I’m annoyed by the slow SSH login, which drives me to dig into the SSH protocol.

SSH is a cryptographic network protocol that provides authentication, confidentiality, and integrity. It uses public-key cryptography for authentication.

Getting Started

Since I am not an expert in security, I started by browsing the libsodium documentation. Obviously, I need public-key cryptography for this purpose. Under the public-key cryptography section there are three primitives:

  1. Authenticated encryption.
  2. Public-key signatures.
  3. Sealed boxes.

I used all three, and only the three primitives for my remote login protocol.

It is also helpful to learn how the SSH protocol works. The specification of the SSH protocol is scattered around various RFC pages. Viewing the SSH protocol with Wireshark is easier to get started. Also, adding -vvv options to the SSH client is helpful too.

The SSH Handshake

The SSH protocol starts by exchanging version strings between the client and the server. Let’s just skip this. There is no need to waste an RTT on a toy protocol.

The next step is the key exchange, the client gives a list of algorithms, and the server replies back with a list of algorithms, then the client chooses an acceptable algorithm and the actual key exchange happens. In my case, it’s the ECDH algorithm, which is 2 RTTs in total.

After the key exchange, the communication is encrypted so Wireshark is of no use here. The next thing to do is authentication. Based on my reading on RFCs and SSH client logs, the process is like this:

  1. client: I need to authenticate myself as user foo. server: OK, list of authentication methods.
  2. client: I’m going to authenticate myself with this public key. server: OK, key accepted.
  3. client: I’m authenticating myself with a signature signed by my key. server: Done.

Lots of RTTs for an SSH login.

The Handshake Protocol with libsodium

Let’s go back to the beginning and review the situation before the handshake:

  1. Both the client and the server have a key pair.
  2. The server has a list of accepted client public keys.
  3. The client knows the server’s public key. (communicating with an unknown server is unwise for privacy reasons)

With the server’s public key, libsodium provides a way (and the only way) to talk to the server: sealed boxes.

The client sends a message sealed by the server’s public key, and only the server could open it. If we put the client’s public key into that message and sign the message, the message could effectively identify and authenticate the client. After the authentication, the server can talk back to the client using the same protocol (to authenticate the server itself).

The primitive for authentication (signing) is public-key signatures.

This handshake protocol is vastly simpler than the SSH handshake, it has no support for negotiating algorithms, and the only authentication method is via public keys. Just a single RTT the authentication is done.

The Transport Protocol with libsodium

Now that the handshake is done using the sealed box, can we just use the sealed box for the following transmissions? This is not done in practice because public-key cryptography operations are expensive. The SSH protocol uses key exchange algorithms to create the key for symmetric encryption, which is used for transmissions during and after the handshake. Also, messages transmitted by SSH need to be authenticated (signed).

libsodium covers this use case with the authenticated encryption primitive. That is: a protocol combines authentication with encryption, which is easy to use and hard to misuse.

To use the authenticated encryption primitive:

  1. Compute the shared key using my secret key and the other side’s public key. (The key exchange.)
  2. The shared key is then used for both symmetric encryption and authentication.

This doesn’t require extra RTTs since the other side’s public key (the client) is known after the handshake.

The Handshake Protocol Implementation

Here is the actual handshake message used in my implementation.

struct __attribute__((packed)) Hello {
    uint8_t sign[crypto_sign_BYTES];
    uint8_t src_pk[crypto_sign_PUBLICKEYBYTES] = {};
    uint8_t dst_pk[crypto_sign_PUBLICKEYBYTES] = {};
    uint8_t session[k_session_size] = {};
    uint64_t ts_msec = 0;
    uint64_t flag = 0;
    uint64_t padding_sz = 0;
    uint8_t padding[0];

This message is sealed using the sealed box primitive. Besides the public key and the signature, it also contains a few extra fields:

  1. The session is some random bytes generated by the client, and the ts_msec is the client’s timestamp. The two fields are crucial in preventing replayed messages.
  2. The padding_sz is the size of the optional padding at the end of the handshake message. The purpose of this padding is obfuscation, making the protocol less identifiable and classifiable.
  3. The flag could be potentially used for extending the protocol in a backward-compatible way, or be used for optional features.

For the sender, after filling in the message, use crypto_sign_detached to generate the signature (includes the padding), then use crypto_box_seal to seal the message, and transmit it.

For the receiver, the process is:

  1. Open the message with crypto_box_seal_open.
  2. Verify the signature with crypto_sign_verify_detached.
  3. Authentication.
    • For the client: the message itself is an authentication.
    • For the server: check whether the client’s key is acceptable.
  4. Preventing replayed messages.
    • For the client: the session and timestamp field should be a copy of what the client sent to the server. Also, check the public keys in the message.
    • For the server: it maintains a persistent storage containing a list of recent sessions. The server must reject the message if the session is in the list, or the timestamp is too old to verify.

The Transport Protocol Implementation

The transport protocol is just the authenticated encryption primitive. To use the authenticated encryption, first generate the shared key using crypto_box_beforenm. Encryption and decryption are done with crypto_box_detached_afternm and crypto_box_open_detached_afternm.

A minor problem is that the key format for public-key signatures is different from the authenticated encryption, and needs to be converted using crypto_sign_ed25519_pk_to_curve25519 and crypto_sign_ed25519_sk_to_curve25519.

There is a nonce component for authenticated encryption, the nonce needs to be historically unique. Luckily, the session and the timestamp combo are already considered unique after a successful handshake; we can add a counter to them to create as many nonces as needed. To differentiate nonces from the server and the client, the server uses even numbers for the counter while the client uses odd ones. The nonce is NOT transmitted since the counter can be inferred by the receiver.

There is another problem: the size of the message. Like the handshake message which consists of a fixed-sized part and a variable-sized padding. The transport protocol also uses a fixed-sized header for the size of the message and the size of the padding. Then the message data and the padding comes as another message. Both the header and the message are sealed by the authenticated encryption primitive.

struct __attribute__((packed)) Header {
    uint32_t payload_sz = 0;
    uint32_t padding_sz = 0;
    // more fields ...

As mentioned before, padding is a form of obfuscation, it also reduces information leakage via the length of transmission.

Comparing with the SSH protocol

There are some major differences compared with the SSH protocol:

  1. The client can not communicate with a server at all without knowing its public key. This prevents some non-secure use cases, but it is a pro when privacy is a consideration since the protocol doesn’t reveal whom the client is communicating with, this allows bi-directional anonymous communication.
  2. No negotiation between the server and the client. The protocol is pretty much fixed. However, future versions of the protocol could use the reserved flag field to provide backward compatibility.
  3. The replay prevention depends on the persistent storage and timestamps which is a downside: Miss-configured device time can prevent the client from login in, and the persistent storage can add another failure mode to the server. I can think of how to prevent replay without using timestamps, but not without adding an extra RTT.
  4. All these design differences lead to significantly fewer RTTs, the protocol is 1-RTT.
  5. The protocol is not obviously identifiable and classifiable. It does not contain low entropy bytes and the length of the transmission can be obfuscated via padding. I’m not sure about this one; perhaps the complete lack of low entropy bytes is a unique feature among protocols and can be used for classification.

Comparing with the TLS Protocol

Rather than build a new protocol, we could just use the TLS protocol, and use client certificates for authentication. This is a solid choice and far less work is needed. Except there is a privacy concern when using client certificates: the client certificate is transmitted in plain text, which may upset some folks. The TLS is not designed for anonymous communication and privacy after all.

Further Explorations

While the handshake protocol is 1-RTT, and login into a server is significantly faster than SSH, there is still a noticeable latency. To reduce the latency further, I explored the TCP fast open options.

On a modern Linux kernel, TCP fast open can be done purely on the application side, and sysctl is not needed.

To use TCP fast open on the client, set the TCP_FASTOPEN_CONNECT and the TCP_FASTOPEN_NO_COOKIE socket option to 1. After setting the two options, connect will do nothing, the TCP handshake only begins when the initial data is written, with an SYN packet that contains data. Note that we are using TCP fast open without cookies, which makes SYN floods more damaging.

On the server side: The TCP_FASTOPEN option is for the queue size and the TCP_FASTOPEN_NO_COOKIE is set to 1.

Like the TCP fast open mechanism, we can also transmit the transport protocol before the handshake protocol responds. There is nothing in the protocol that prevents us from doing so, thus, saving another RTT.


Thanks to libsodium, I now have my own remote login protocol. The actual implementation of the remote login server might be covered in a future post.