09. Data Serialization
For now, our server protocol response is an error code plus a string.
What if we need to return more complicated data? For example, we could
add the keys
command, which returns a list of strings. We
have already encoded the list-of-strings data in the request protocol.
In this chapter, we will generalize the encoding to handle different
types of data. This is often called “serialization”.
9.1 The Command Interface
Our serialization protocol consists of five data types:
enum {
= 0, // Like `NULL`
SER_NIL = 1, // An error code and message
SER_ERR = 2, // A string
SER_STR = 3, // A int64
SER_INT = 4, // Array
SER_ARR };
The array type can contain any type of data, even nested arrays.
The code listing starts with the try_one_request
function:
static bool try_one_request(Conn *conn) {
// code omitted...
// parse the request
std::vector<std::string> cmd;
if (0 != parse_req(&conn->rbuf[4], len, cmd)) {
("bad req");
msg->state = STATE_END;
connreturn false;
}
// got one request, generate the response.
std::string out;
(cmd, out);
do_request
// pack the response into the buffer
if (4 + out.size() > k_max_msg) {
.clear();
out(out, ERR_2BIG, "response is too big");
out_err}
uint32_t wlen = (uint32_t)out.size();
(&conn->wbuf[0], &wlen, 4);
memcpy(&conn->wbuf[4], out.data(), out.size());
memcpy->wbuf_size = 4 + wlen;
conn
// code omitted...
}
For convenience, std::string
is used to hold the
response data. Production-grade projects often have more sophisticated
ways to manage buffers.
The new command keys
is added to the
do_request
handler:
static void do_request(std::vector<std::string> &cmd, std::string &out) {
if (cmd.size() == 1 && cmd_is(cmd[0], "keys")) {
(cmd, out);
do_keys} else if (cmd.size() == 2 && cmd_is(cmd[0], "get")) {
(cmd, out);
do_get} else if (cmd.size() == 3 && cmd_is(cmd[0], "set")) {
(cmd, out);
do_set} else if (cmd.size() == 2 && cmd_is(cmd[0], "del")) {
(cmd, out);
do_del} else {
// cmd is not recognized
(out, ERR_UNKNOWN, "Unknown cmd");
out_err}
}
9.2 Data Encoding Scheme
The code for our serialization protocol:
static void out_nil(std::string &out) {
.push_back(SER_NIL);
out}
static void out_str(std::string &out, const std::string &val) {
.push_back(SER_STR);
outuint32_t len = (uint32_t)val.size();
.append((char *)&len, 4);
out.append(val);
out}
static void out_int(std::string &out, int64_t val) {
.push_back(SER_INT);
out.append((char *)&val, 8);
out}
static void out_err(std::string &out, int32_t code, const std::string &msg) {
.push_back(SER_ERR);
out.append((char *)&code, 4);
outuint32_t len = (uint32_t)msg.size();
.append((char *)&len, 4);
out.append(msg);
out}
static void out_arr(std::string &out, uint32_t n) {
.push_back(SER_ARR);
out.append((char *)&n, 4);
out}
As we can see, our serialization protocol starts with a byte of data type, followed by various types of payload data. Arrays come first with their size, then their possibly nested elements.
The serialization scheme can be summarized as “type-length-value” (TLV): “Type” indicates the type of the value; “Length” is for variable length data such as strings or arrays; “Value” is the encoded at last.
TLV is the basis of many real-world serialization protocols. It has many advantages:
- It can be decoded without a schema, like JSON or XML, which enables some types of mddleware.
- It can encode arbitrarily nested data.
The Thrift RPC framework includes 2 serialization schemes, both derived from the TLV scheme. You can learn more by reading the specification and comparing it to the popular Protobuf scheme.
9.3 Command Responses
The do_keys
function generates a response consisting of
a list of strings:
static void h_scan(HTab *tab, void (*f)(HNode *, void *), void *arg) {
if (tab->size == 0) {
return;
}
for (size_t i = 0; i < tab->mask + 1; ++i) {
*node = tab->tab[i];
HNode while (node) {
(node, arg);
f= node->next;
node }
}
}
static void cb_scan(HNode *node, void *arg) {
std::string &out = *(std::string *)arg;
(out, container_of(node, Entry, node)->key);
out_str}
static void do_keys(std::vector<std::string> &cmd, std::string &out) {
(void)cmd;
(out, (uint32_t)hm_size(&g_data.db));
out_arr(&g_data.db.ht1, &cb_scan, &out);
h_scan(&g_data.db.ht2, &cb_scan, &out);
h_scan}
The del
command responds with an integer indicating
whether the deletion took place.
static void do_del(std::vector<std::string> &cmd, std::string &out) {
;
Entry key.key.swap(cmd[1]);
key.node.hcode = str_hash((uint8_t *)key.key.data(), key.key.size());
key
*node = hm_pop(&g_data.db, &key.node, &entry_eq);
HNode if (node) {
delete container_of(node, Entry, node);
}
return out_int(out, node ? 1 : 0);
}
The code for the other commands is of nothing interesting, so we’ll skip them.
9.4 The Client and Testing
Listing the client “deserialization” code:
static int32_t on_response(const uint8_t *data, size_t size) {
if (size < 1) {
("bad response");
msgreturn -1;
}
switch (data[0]) {
case SER_NIL:
("(nil)\n");
printfreturn 1;
case SER_ERR:
if (size < 1 + 8) {
("bad response");
msgreturn -1;
}
{
int32_t code = 0;
uint32_t len = 0;
(&code, &data[1], 4);
memcpy(&len, &data[1 + 4], 4);
memcpyif (size < 1 + 8 + len) {
("bad response");
msgreturn -1;
}
("(err) %d %.*s\n", code, len, &data[1 + 8]);
printfreturn 1 + 8 + len;
}
case SER_STR:
// code omited...
case SER_INT:
// code omited...
case SER_ARR:
if (size < 1 + 4) {
("bad response");
msgreturn -1;
}
{
uint32_t len = 0;
(&len, &data[1], 4);
memcpy("(arr) len=%u\n", len);
printfsize_t arr_bytes = 1 + 4;
for (uint32_t i = 0; i < len; ++i) {
int32_t rv = on_response(&data[arr_bytes], size - arr_bytes);
if (rv < 0) {
return rv;
}
+= (size_t)rv;
arr_bytes }
("(arr) end\n");
printfreturn (int32_t)arr_bytes;
}
default:
("bad response");
msgreturn -1;
}
}
Testing our new server/client:
$ ./client asdf
(err) 1 Unknown cmd
$ ./client get asdf
(nil)
$ ./client set k v
(nil)
$ ./client get k
(str) v
$ ./client keys
(arr) len=1
(str) k
(arr) end
$ ./client del k
(int) 1
$ ./client del k
(int) 0
$ ./client keys
(arr) len=0 (arr) end
Source code: