Introduction
libsodium is a modern, easy-to-use software library for encryption, decryption, signatures, password hashing, and more. Its design choices emphasize security and ease of use. Compared to the popular OpenSSL library, the libsodium API is fairly high-level and offers only a few primitives, which is one of the reasons for its ease of use. As part of the learning process, I built a toy remote login protocol (SSH alike) with libsodium. Also, I’m annoyed by the slow SSH login, which drives me to dig into the SSH protocol.
SSH is a cryptographic network protocol that provides authentication, confidentiality, and integrity. It uses public-key cryptography for authentication.
Getting Started
Since I am not an expert in security, I started by browsing the libsodium documentation. Obviously, I need public-key cryptography for this purpose. Under the public-key cryptography section there are three primitives:
- Authenticated encryption.
- Public-key signatures.
- Sealed boxes.
I used all three, and only the three primitives for my remote login protocol.
It is also helpful to learn how the SSH protocol works. The
specification of the SSH protocol is scattered around various RFC pages. Viewing the SSH
protocol with Wireshark is easier to get started. Also, adding
-vvv
options to the SSH client is helpful too.
The SSH Handshake
The SSH protocol starts by exchanging version strings between the client and the server. Let’s just skip this. There is no need to waste an RTT on a toy protocol.
The next step is the key exchange, the client gives a list of algorithms, and the server replies back with a list of algorithms, then the client chooses an acceptable algorithm and the actual key exchange happens. In my case, it’s the ECDH algorithm, which is 2 RTTs in total.
After the key exchange, the communication is encrypted so Wireshark is of no use here. The next thing to do is authentication. Based on my reading on RFCs and SSH client logs, the process is like this:
- client: I need to authenticate myself as user foo. server: OK, list of authentication methods.
- client: I’m going to authenticate myself with this public key. server: OK, key accepted.
- client: I’m authenticating myself with a signature signed by my key. server: Done.
Lots of RTTs for an SSH login.
The Handshake Protocol with libsodium
Let’s go back to the beginning and review the situation before the handshake:
- Both the client and the server have a key pair.
- The server has a list of accepted client public keys.
- The client knows the server’s public key. (communicating with an unknown server is unwise for privacy reasons)
With the server’s public key, libsodium provides a way (and the only way) to talk to the server: sealed boxes.
The client sends a message sealed by the server’s public key, and only the server could open it. If we put the client’s public key into that message and sign the message, the message could effectively identify and authenticate the client. After the authentication, the server can talk back to the client using the same protocol (to authenticate the server itself).
The primitive for authentication (signing) is public-key signatures.
This handshake protocol is vastly simpler than the SSH handshake, it has no support for negotiating algorithms, and the only authentication method is via public keys. Just a single RTT the authentication is done.
The Transport Protocol with libsodium
Now that the handshake is done using the sealed box, can we just use the sealed box for the following transmissions? This is not done in practice because public-key cryptography operations are expensive. The SSH protocol uses key exchange algorithms to create the key for symmetric encryption, which is used for transmissions during and after the handshake. Also, messages transmitted by SSH need to be authenticated (signed).
libsodium covers this use case with the authenticated encryption primitive. That is: a protocol combines authentication with encryption, which is easy to use and hard to misuse.
To use the authenticated encryption primitive:
- Compute the shared key using my secret key and the other side’s public key. (The key exchange.)
- The shared key is then used for both symmetric encryption and authentication.
This doesn’t require extra RTTs since the other side’s public key (the client) is known after the handshake.
The Handshake Protocol Implementation
Here is the actual handshake message used in my implementation.
struct __attribute__((packed)) Hello {
uint8_t sign[crypto_sign_BYTES];
uint8_t src_pk[crypto_sign_PUBLICKEYBYTES] = {};
uint8_t dst_pk[crypto_sign_PUBLICKEYBYTES] = {};
uint8_t session[k_session_size] = {};
uint64_t ts_msec = 0;
uint64_t flag = 0;
uint64_t padding_sz = 0;
uint8_t padding[0];
};
This message is sealed using the sealed box primitive. Besides the public key and the signature, it also contains a few extra fields:
- The
session
is some random bytes generated by the client, and thets_msec
is the client’s timestamp. The two fields are crucial in preventing replayed messages. - The
padding_sz
is the size of the optional padding at the end of the handshake message. The purpose of this padding is obfuscation, making the protocol less identifiable and classifiable. - The
flag
could be potentially used for extending the protocol in a backward-compatible way, or be used for optional features.
For the sender, after filling in the message, use
crypto_sign_detached
to generate the signature (includes
the padding), then use crypto_box_seal
to seal the message,
and transmit it.
For the receiver, the process is:
- Open the message with
crypto_box_seal_open
. - Verify the signature with
crypto_sign_verify_detached
. - Authentication.
- For the client: the message itself is an authentication.
- For the server: check whether the client’s key is acceptable.
- Preventing replayed messages.
- For the client: the session and timestamp field should be a copy of what the client sent to the server. Also, check the public keys in the message.
- For the server: it maintains a persistent storage containing a list of recent sessions. The server must reject the message if the session is in the list, or the timestamp is too old to verify.
The Transport Protocol Implementation
The transport protocol is just the authenticated encryption
primitive. To use the authenticated encryption, first generate the
shared key using crypto_box_beforenm
. Encryption and
decryption are done with crypto_box_detached_afternm
and
crypto_box_open_detached_afternm
.
A minor problem is that the key format for public-key signatures is
different from the authenticated encryption, and needs to be converted
using crypto_sign_ed25519_pk_to_curve25519
and
crypto_sign_ed25519_sk_to_curve25519
.
There is a nonce component for authenticated encryption, the nonce needs to be historically unique. Luckily, the session and the timestamp combo are already considered unique after a successful handshake; we can add a counter to them to create as many nonces as needed. To differentiate nonces from the server and the client, the server uses even numbers for the counter while the client uses odd ones. The nonce is NOT transmitted since the counter can be inferred by the receiver.
There is another problem: the size of the message. Like the handshake message which consists of a fixed-sized part and a variable-sized padding. The transport protocol also uses a fixed-sized header for the size of the message and the size of the padding. Then the message data and the padding comes as another message. Both the header and the message are sealed by the authenticated encryption primitive.
struct __attribute__((packed)) Header {
uint32_t payload_sz = 0;
uint32_t padding_sz = 0;
// more fields ...
};
As mentioned before, padding is a form of obfuscation, it also reduces information leakage via the length of transmission.
Comparing with the SSH protocol
There are some major differences compared with the SSH protocol:
- The client can not communicate with a server at all without knowing its public key. This prevents some non-secure use cases, but it is a pro when privacy is a consideration since the protocol doesn’t reveal whom the client is communicating with, this allows bi-directional anonymous communication.
- No negotiation between the server and the client. The protocol is
pretty much fixed. However, future versions of the protocol could use
the reserved
flag
field to provide backward compatibility. - The replay prevention depends on the persistent storage and timestamps which is a downside: Miss-configured device time can prevent the client from login in, and the persistent storage can add another failure mode to the server. I can think of how to prevent replay without using timestamps, but not without adding an extra RTT.
- All these design differences lead to significantly fewer RTTs, the protocol is 1-RTT.
- The protocol is not obviously identifiable and classifiable. It does not contain low entropy bytes and the length of the transmission can be obfuscated via padding. I’m not sure about this one; perhaps the complete lack of low entropy bytes is a unique feature among protocols and can be used for classification.
Comparing with the TLS Protocol
Rather than build a new protocol, we could just use the TLS protocol, and use client certificates for authentication. This is a solid choice and far less work is needed. Except there is a privacy concern when using client certificates: the client certificate is transmitted in plain text, which may upset some folks. The TLS is not designed for anonymous communication and privacy after all.
Further Explorations
While the handshake protocol is 1-RTT, and login into a server is significantly faster than SSH, there is still a noticeable latency. To reduce the latency further, I explored the TCP fast open options.
On a modern Linux kernel, TCP fast open can be done purely on the
application side, and sysctl
is not needed.
To use TCP fast open on the client, set the
TCP_FASTOPEN_CONNECT
and the
TCP_FASTOPEN_NO_COOKIE
socket option to 1. After setting
the two options, connect
will do nothing, the TCP handshake
only begins when the initial data is written, with an SYN
packet that contains data. Note that we are using TCP fast open without
cookies, which makes SYN floods more damaging.
On the server side: The TCP_FASTOPEN
option is for the
queue size and the TCP_FASTOPEN_NO_COOKIE
is set to 1.
Like the TCP fast open mechanism, we can also transmit the transport protocol before the handshake protocol responds. There is nothing in the protocol that prevents us from doing so, thus, saving another RTT.
Conclusion
Thanks to libsodium, I now have my own remote login protocol. The actual implementation of the remote login server might be covered in a future post.