12. Timer and Timeout
12.1 The need for timers
There are many uses for timers in a Redis server:
- Time-to-live (TTL) and cache expiration.
- Network IO timeouts.
- Closing idle connections.
Cache expiration
TTL is often used in caching because:
- Memory is finite. TTL is a cache eviction method.
- Some lazy systems do not synchronize the DB with the cache and rely on cache expiration to keep the data fresh.
Network IO timeouts
Network timeouts are a major missing piece of real-world software. We need to handle the case where the remote peer is unreachable or simply gone.
A TCP connection will eventually timeout and be closed by the OS, assuming TCP keep-alive works. But this can take a very long time, measured in minutes or hours, while applications use timeouts measured in seconds or milliseconds. So timeouts are an explicit consideration for networked apps.
Idle timeouts
While in the middle of reading a request, the reasonable timeout value is measured in seconds or milliseconds, since the typical case is just a few TCP retransmissions. It’s better to report errors than to be stuck for a long time.
However, while waiting for a new request, the timeout value should be larger, since long-lived connections are the intended use case. Still, it’s not infinite.
12.2 Timeouts in syscalls
Blocking IO
For regular files, timeouts are not possible on Linux.
For blocking sockets, timeouts are set with setsockopt()
and the SO_RCVTIMEO
and SO_SNDTIMEO
options.
See man socket.7
for details.
- A blocking
read()
orwrite()
will returnEAGAIN
after the timeout. - A blocking
connect()
will returnETIMEDOUT
instead.
Nonblocking IO
Timeouts are irrelevent for nonblocking syscalls. In an event loop, the readiness API is the only blocking syscall, so it’s the only syscall that can timeout.
int poll(struct pollfd *fds, nfds_t nfds, int timeout); // milliseconds
A timeout value of -1
means no timeouts.
Sleeping
In blocking code, you can wait for a future time by sleeping. This is
not possible in an event loop. However, poll()
can wake up
from timeouts, which is a form of sleeping.
`timerfd`
timerfd_create()
creates an in-kernel timer and returns
a timer handle called timerfd
. Expiring timers can be
notified by poll()
using the timer handles, along with the
socket handles for IO readiness.
// return a timer handle
int timerfd_create(int clockid, int flags);
// set the expiration time
int timerfd_settime(int fd, int flags,
const struct itimerspec *new_value, struct itimerspec *old_value);
timerfd
can be used with event loops, but this isn’t
very useful because timers can be implemented in userspace without any
syscalls.
12.3 Timer data structures
Wait for the nearest timer
poll()
has a single timeout value, but we want to use
multiple timers. The native way is to wake from poll()
periodically, say every 1s, then loop through each timer to check for
expirations. This solution has limited resolution and is O(N) per wakeup.
There are many special data structures invented for timers. But the general solution is based on sorting; having things happen at a certain time means having things happen in a certain order, and ordering things by time is a sorting problem. So the problem can be restated as: output a list of timestamps in sort order.
If timers are sorted, poll()
only needs the timeout
value of the nearest timer. This is how to implement timers in an event
loop.
Fully sorted data structures
We can use the AVL tree to sort timers:
- Adding, updating, or removing a timer is O(log N).
- Finding the nearest timer (the leftmost node) is O(log N).
The minimum node can be cached to make finding the nearest timer O(1).
Partially sorted data structures
Timers don’t need to be fully sorted; the only requirement is to find the nearest timer. This is satisfied by the heap data structure. A heap can find the minimum in O(1) and updates are O(log N). It’s an array-based implicit tree without child or parent pointers, which takes up less space. It’s the most popular choice for timers.
Fixed timeout values and FIFO
The socket timeout problem can be solved with much simpler data structures. Note that the timeout values are the same for all sockets, so when adding a timer, it becomes the farthest timer, which means that the order of the timers is the order in which they are added, if the timeout values are fixed.
This is just a FIFO (first-in-first-out). New timers are added to the back and the nearest timer is removed from the front, which can be done with a linked list in O(1).
If you have multiple timeout values, e.g., IO timeout and idle timeout, just use multiple linked lists.
12.4 Implement timers
Getting the time
clock_gettime()
is the syscall to get the time.
int clock_gettime(clockid_t clockid, struct timespec *tp);
struct timespec {
time_t tv_sec; /* Seconds */
long tv_nsec; /* Nanoseconds [0, 999'999'999] */
};
The clockid
argument is the type of timestamp. The most
common types are wall time and monotonic time.
#define CLOCK_REALTIME 0 // wall time, the Unix timestamp
#define CLOCK_MONOTONIC 1 // monotonic time
A wall timestamp is related to time keeping in the real-world. The most common is the Unix timestamp. It’s the number of days since 1970 multiplied by 24×3600, roughly the number of seconds since 1970. The wall time in a computer can be adjusted, so the duration between 2 timestamps can be unstable or even negative. It’s not usable for timers.
A monotonic time cannot be adjusted, it only moves forward. It’s value is not related to any real-world time keeping. It can only be used to measure durations.
poll()
uses CLOCK_MONOTONIC
internally for
timeouts, so that’s what we will use.
static uint64_t get_monotonic_msec() {
struct timespec tv = {0, 0};
(CLOCK_MONOTONIC, &tv);
clock_gettimereturn uint64_t(tv.tv_sec) * 1000 + tv.tv_nsec / 1000 / 1000;
}
Doubly-linked list
We’ll timeout idle connections using a linked list. Each time a socket does IO, the timeout is set to a fixed offset from now, and the socket is moved to the list back.
The linked list must be doubly-linked so that nodes can be detached from the middle.
struct DList {
*prev = NULL;
DList *next = NULL;
DList };
inline void dlist_detach(DList *node) {
*prev = node->prev;
DList *next = node->next;
DList ->next = next;
prev->prev = prev;
next}
The list node is embedded into struct Conn
.
struct Conn {
// ...
// timer
uint64_t last_active_ms = 0;
;
DList idle_node};
The list head is a dummy node, not a list member. It’s used to avoid the empty list special case.
// global states
static struct {
;
HMap db// a map of all client connections, keyed by fd
std::vector<Conn *> fd2conn;
// timers for idle connections
; // list head
DList idle_list} g_data;
The dummy node is linked to itself, forming a circle:
inline void dlist_init(DList *node) {
->prev = node->next = node;
node}
An empty list is a list with only the dummy node:
inline bool dlist_empty(DList *node) {
return node->next == node;
}
Because of the dummy node, insertion doesn’t need to handle the empty list case.
inline void dlist_insert_before(DList *target, DList *rookie) {
*prev = target->prev;
DList ->next = rookie;
prev->prev = prev;
rookie->next = target;
rookie->prev = rookie;
target}
Create and update timers
New Conn
is added to the list back by inserting it
before the dummy node:
// create a `struct Conn`
*conn = new Conn();
Conn ->fd = connfd;
conn->want_read = true;
conn->last_active_ms = get_monotonic_msec();
conn(&g_data.idle_list, &conn->idle_node); dlist_insert_before
Each time the socket is ready for IO, it’s moved to the list back:
// handle connection sockets
for (size_t i = 1; i < poll_args.size(); ++i) {
uint32_t ready = poll_args[i].revents;
if (ready == 0) {
continue;
}
*conn = g_data.fd2conn[poll_args[i].fd];
Conn // update the idle timer by moving conn to the end of the list
->last_active_ms = get_monotonic_msec();
conn(&conn->idle_node);
dlist_detach(&g_data.idle_list, &conn->idle_node);
dlist_insert_before// ...
}
Find the timeout value from the nearest timer
The poll()
timeout is calculated from the first list
item:
static int32_t next_timer_ms() {
if (dlist_empty(&g_data.idle_list)) {
return -1; // no timers, no timeouts
}
uint64_t now_ms = get_monotonic_msec();
*conn = container_of(g_data.idle_list.next, Conn, idle_node);
Conn uint64_t next_ms = conn->last_active_ms + k_idle_timeout_ms;
if (next_ms <= now_ms) {
return 0; // missed?
}
return (int32_t)(next_ms - now_ms);
}
int32_t timeout_ms = next_timer_ms();
int rv = poll(poll_args.data(), (nfds_t)poll_args.size(), timeout_ms);
// ...
(); process_timers
Process timers at each event loop iteration
Check the nearest timers to see if they have expired.
poll()
can wake up from IO, so there may be none. The
timers are sorted, so stop at the first non-expired timer.
static void process_timers() {
uint64_t now_ms = get_monotonic_msec();
while (!dlist_empty(&g_data.idle_list)) {
*conn = container_of(g_data.idle_list.next, Conn, idle_node);
Conn uint64_t next_ms = conn->last_active_ms + k_idle_timeout_ms;
if (next_ms >= now_ms) {
break; // not expired
}
(stderr, "removing idle connection: %d\n", conn->fd);
fprintf(conn);
conn_destroy}
}
Timeout sockets are destroyed. Remember to remove them from the list.
static void conn_destroy(Conn *conn) {
(void)close(conn->fd);
g_data.fd2conn[conn->fd] = NULL;
(&conn->idle_node);
dlist_detachdelete conn;
}
Done
The server should close idle connections after a timeout. Test it
with socat
:
socat tcp:127.0.0.1:1234 -
Exercise: Add IO timeouts for read()
and
write()
(with a different timeout value).
Source code: