09. Data Serialization
For now, our server protocol response is an error code plus a string.
What if we need to return more complicated data? For example, we might
add the keys
command that returns a list of strings. We
have already encoded the list-of-strings data in the request protocol.
In this chapter, we will generalize the encoding to handle different
types of data. This is often called “serialization”.
Our serialization protocol consists of five types of data:
enum {
= 0,
SER_NIL = 1,
SER_ERR = 2,
SER_STR = 3,
SER_INT = 4,
SER_ARR };
The SER_NIL
is like NULL
, the
SER_ERR
is for returning error code and message, the
SER_STR
and SER_INT
are for string and
int64
, and the SER_ARR
is for arrays.
Code listing starts with the try_one_request
function:
static bool try_one_request(Conn *conn) {
// code omitted...
// parse the request
std::vector<std::string> cmd;
if (0 != parse_req(&conn->rbuf[4], len, cmd)) {
("bad req");
msg->state = STATE_END;
connreturn false;
}
// got one request, generate the response.
std::string out;
(cmd, out);
do_request
// pack the response into the buffer
if (4 + out.size() > k_max_msg) {
.clear();
out(out, ERR_2BIG, "response is too big");
out_err}
uint32_t wlen = (uint32_t)out.size();
(&conn->wbuf[0], &wlen, 4);
memcpy(&conn->wbuf[4], out.data(), out.size());
memcpy->wbuf_size = 4 + wlen;
conn
// code omitted...
}
For convenience, std::string
was used to hold the
response data. Production-grade projects often have more sophisticated
ways to manage buffers.
A new command keys
was added to the
do_request
handler:
static void do_request(std::vector<std::string> &cmd, std::string &out) {
if (cmd.size() == 1 && cmd_is(cmd[0], "keys")) {
(cmd, out);
do_keys} else if (cmd.size() == 2 && cmd_is(cmd[0], "get")) {
(cmd, out);
do_get} else if (cmd.size() == 3 && cmd_is(cmd[0], "set")) {
(cmd, out);
do_set} else if (cmd.size() == 2 && cmd_is(cmd[0], "del")) {
(cmd, out);
do_del} else {
// cmd is not recognized
(out, ERR_UNKNOWN, "Unknown cmd");
out_err}
}
The code for our serialization protocol:
static void out_nil(std::string &out) {
.push_back(SER_NIL);
out}
static void out_str(std::string &out, const std::string &val) {
.push_back(SER_STR);
outuint32_t len = (uint32_t)val.size();
.append((char *)&len, 4);
out.append(val);
out}
static void out_int(std::string &out, int64_t val) {
.push_back(SER_INT);
out.append((char *)&val, 8);
out}
static void out_err(std::string &out, int32_t code, const std::string &msg) {
.push_back(SER_ERR);
out.append((char *)&code, 4);
outuint32_t len = (uint32_t)msg.size();
.append((char *)&len, 4);
out.append(msg);
out}
static void out_arr(std::string &out, uint32_t n) {
.push_back(SER_ARR);
out.append((char *)&n, 4);
out}
As we can see, our serialization protocol starts with one byte of data type, followed by various types of payload data. Arrays come with their size first, then their possibly nested elements.
The do_keys
function generates a response consisting of
a list of strings:
static void h_scan(HTab *tab, void (*f)(HNode *, void *), void *arg) {
if (tab->size == 0) {
return;
}
for (size_t i = 0; i < tab->mask + 1; ++i) {
*node = tab->tab[i];
HNode while (node) {
(node, arg);
f= node->next;
node }
}
}
static void cb_scan(HNode *node, void *arg) {
std::string &out = *(std::string *)arg;
(out, container_of(node, Entry, node)->key);
out_str}
static void do_keys(std::vector<std::string> &cmd, std::string &out) {
(void)cmd;
(out, (uint32_t)hm_size(&g_data.db));
out_arr(&g_data.db.ht1, &cb_scan, &out);
h_scan(&g_data.db.ht2, &cb_scan, &out);
h_scan}
The del
command responds with an integer indicating
whether the deletion took place.
static void do_del(std::vector<std::string> &cmd, std::string &out) {
;
Entry key.key.swap(cmd[1]);
key.node.hcode = str_hash((uint8_t *)key.key.data(), key.key.size());
key
*node = hm_pop(&g_data.db, &key.node, &entry_eq);
HNode if (node) {
delete container_of(node, Entry, node);
}
return out_int(out, node ? 1 : 0);
}
The code for other commands is of nothing interesting, there is no need to list them.
Listing the client “deserialization” code:
static int32_t on_response(const uint8_t *data, size_t size) {
if (size < 1) {
("bad response");
msgreturn -1;
}
switch (data[0]) {
case SER_NIL:
("(nil)\n");
printfreturn 1;
case SER_ERR:
if (size < 1 + 8) {
("bad response");
msgreturn -1;
}
{
int32_t code = 0;
uint32_t len = 0;
(&code, &data[1], 4);
memcpy(&len, &data[1 + 4], 4);
memcpyif (size < 1 + 8 + len) {
("bad response");
msgreturn -1;
}
("(err) %d %.*s\n", code, len, &data[1 + 8]);
printfreturn 1 + 8 + len;
}
case SER_STR:
// code omited...
case SER_INT:
// code omited...
case SER_ARR:
if (size < 1 + 4) {
("bad response");
msgreturn -1;
}
{
uint32_t len = 0;
(&len, &data[1], 4);
memcpy("(arr) len=%u\n", len);
printfsize_t arr_bytes = 1 + 4;
for (uint32_t i = 0; i < len; ++i) {
int32_t rv = on_response(&data[arr_bytes], size - arr_bytes);
if (rv < 0) {
return rv;
}
+= (size_t)rv;
arr_bytes }
("(arr) end\n");
printfreturn (int32_t)arr_bytes;
}
default:
("bad response");
msgreturn -1;
}
}
Testing our new server/client:
$ ./client asdf
(err) 1 Unknown cmd
$ ./client get asdf
(nil)
$ ./client set k v
(nil)
$ ./client get k
(str) v
$ ./client keys
(arr) len=1
(str) k
(arr) end
$ ./client del k
(int) 1
$ ./client del k
(int) 0
$ ./client keys
(arr) len=0 (arr) end
Source code:
codecrafters.io offers “Build Your Own X” courses in many programming languages.
Including Redis, Git, SQLite, Docker, and more.