05. Concurrent IO Models
Prerequisites: Basic operating system concepts like threads, processes, concurrency.
5.1 Thread-based concurrency
A connection-oriented request-response protocol can be used for any number of request-response pairs, and the client can hold the connection as long as it wants. So there is a need to handle multiple connections simultaneously, because while the server is waiting on one client, it cannot do anything with the other clients. This is solved by multi-threading. Pseudo code:
= socket()
fd
bind(fd, address)
listen(fd)while True:
= accept(fd)
conn_fd
new_thread(do_something_with, conn_fd)# continue to accept the next client without blocking
def do_something_with(conn_fd):
while not_quiting(conn_fd):
= read_request(conn_fd) # blocks thread
req = process(req)
res # blocks thread
write_response(conn_fd, res) close(conn_fd)
Why aren’t threads enough?
We will not bother with threading because most modern server apps use event loops to handle concurrent IO without creating new threads. What are the drawbacks of thread-based IO?
- Memory usage: Many threads means many stacks. Stacks are used for local variables and function calls, memory usage per thread is hard to control.
- Overhead: Stateless clients like PHP apps will create many short-lived connections, adding overhead to both latency and CPU usage.
Forking new processes is older than multi-threading, and it costs even more. It’s in the same league as multi-threading. Multi-threading and multi-processing are still used when there is no need to scale to large numbers of connections, and they have a big advantage over event loops: they are easier and less error-prone.
5.2 Event-based concurrency
Concurrent IO is possible without threading. Let’s start by examining
the read()
syscall. The Linux TCP stack handles sending and
receiving IP packets transparently, placing incoming data in a
per-socket kernel-side buffer. read()
merely copies data
from the kernel-side buffer, and when the buffer is empty,
read()
suspends the calling thread until more data is
ready.
Similarly, write()
does not interact directly with the
network; it merely put the data into a kernel-side buffer for the TCP
stack to consume, and when the buffer is full, write()
suspends the calling thread until there is room.
The need for multi-threading comes from the need to wait for each
socket to become ready (to read or write). If there is a way to
wait for multiple sockets at once, and then
read
/write
whichever ones are ready, only a
single thread is needed!
while running:
= [...] # socket fds
want_read = [...] # socket fds
want_write = wait_for_readiness(want_read, want_write) # blocks!
can_read, can_write for fd in can_read:
= read_nb(fd) # non-blocking, only consume from the buffer
data # application logic without IO
handle_data(fd, data) for fd in can_write:
= pending_data(fd) # produced by the application
data = write_nb(fd, data) # non-blocking, only append to the buffer
n # n <= len(data), limited by the available space data_written(fd, n)
This involves 3 operating system mechanisms:
- Readiness notification: Wait for multiple sockets, return when one or more are ready. “Ready” means the read buffer is not empty or the write buffer is not full.
- Non-blocking read: Assuming the read buffer is not empty, consume from it.
- Non-blocking write: Assuming the write buffer is not full, put some data into it.
This is called an event loop. Each loop iteration waits for any readiness events, then reacts to events without blocking, so that all sockets are processed without delay.
Callback-based programming
Callbacks are common in JS. To read from something in JS, first register a callback on some event, then the data is delivered to the callback. This is what we will do next. Except in JS, the event loop is hidden, while in this project, the event loop is coded by us. We will have a better understanding of this important mechanism.
5.3 Non-blocking IO
Non-blocking read & write behavior
If the read buffer is not empty, both blocking and non-blocking reads
will return the data immediately. Otherwise, a non-blocking read will
return with errno = EAGAIN
, while a blocking read will wait
for more data. Non-blocking reads can be called repeatedly to fully
drain the read buffer.
If the write buffer is not full, both blocking and non-blocking
writes will fill the write buffer and return immediately. Otherwise, a
non-blocking write will return with errno = EAGAIN
, while a
blocking write will wait for more room. Non-blocking writes can be
called repeatedly to fully fill the write buffer. If the data is larger
than the available write buffer, a non-blocking write will do a partial
write, while a blocking write may block.
Non-blocking `accept()`
accept()
is similar to read()
in that it
just consumes an item from a queue, so it has a non-blocking mode and
can provide readiness notifications.
for fd in can_read:
if fd is a listening socket:
= accept_nb(fd) # non-blocking accept()
conn_fd
handle_new_conn(conn_fd)else: # fd is a connection socket
= read_nb(fd) # non-blocking read()
data handle_data(fd, data)
Enable non-blocking mode
Non-blocking reads & writes use the same syscalls as blocking
reads & writes. The O_NONBLOCK
flag puts a socket in
non-blocking mode.
static void fd_set_nonblock(int fd) {
int flags = fcntl(fd, F_GETFL, 0); // get the flags
|= O_NONBLOCK; // modify the flags
flags (fd, F_SETFL, flags); // set the flags
fcntl// TODO: handle errno
}
The fcntl()
syscall gets and sets file flags. Only
O_NONBLOCK
is accepted for sockets.
5.4 Readiness API
Waiting for IO readiness is platform specific, and there are several ones on Linux.
= wait_for_readiness(want_read, want_write) can_read, can_write
The simplest one on Linux is poll()
.
int poll(struct pollfd *fds, nfds_t nfds, int timeout);
struct pollfd {
int fd;
short events; // request: want to read, write, or both?
short revents; // returned: can read? can write?
};
poll()
takes an array of fds, each with an input flag
and an output flag:
- The
events
flag indicates whether you want to read (POLLIN
), write (POLLOUT
), or both (POLLIN|POLLOUT
). - The
revents
flag returned from the syscall indicates the readiness.
The timeout
argument is used to implement timers
later.
Other readiness APIs
select()
is likepoll()
and is present on both Windows and Unix, but it can only use 1024 fds, which is a tiny number. It should not be used!epoll_wait()
is Linux specific. Unlikepoll()
, the fd list is not passed as an argument, but stored in the kernel.epoll_ctl()
is used to add or modify the fd list. It’s more scalable thanpoll()
because passing a huge number of fds is inefficient.kqueue()
is BSD specific. It’s likeepoll
, but requires fewer syscalls because it can batch update the fd list.
We will use poll()
because it’s the simplest. But note
that epoll
is the default choice on Linux as it’s more
scalable and should be used in real projects instead. All the readiness
APIs are just differs in shapes, using them isn’t much different.
Readiness APIs cannot be used with files
All the readiness APIs can only be used with sockets, pipes, and some
special stuff like signalfd
. They cannot be used with disk
files! Why? Because when a socket is ready to read, it means that the
data is in the read buffer, so the read is guaranteed not to block, but
for a disk file, no such buffer exists in the kernel, so the readiness
for a disk file is undefined.
These APIs will always report a disk file as ready, but the IO will block. So file IO must be done outside the event loop, in a thread pool, which we’ll learn later.
On Linux, file IO within an event loop may be possible with
io_uring
, which is a unified interface for both file IO and
socket IO. But io_uring
is a very different API, so we will
not pursue it.
5.5 Summary of concurrent IO techniques
Type | Method | API | Scalability |
---|---|---|---|
Socket | Thread per connection | pthread |
Low |
Socket | Process per connection | fork() |
Low |
Socket | Event loop | poll() ,
epoll |
High |
File | Thread pool | pthread |
|
Any | Event loop | io_uring |
High |