03. TCP Server and Client
We’ll write a TCP server and client. The 2 programs are incomplete and incorrect in many ways, but are sufficient to learn the API.
3.1 Tips for Learning Socket Programming
Look Up Stuff in Man Pages
This is not a reference or beginner’s book, so we will not include every detail of using the socket API. You are expected to look things up.
man socket.2
The above command shows the man page for the socket()
syscall. Man pages are divided into several sections, as specified by
the numerical suffix. Examples:
man read.2
returns theread()
syscall (section 2 is for syscalls).man read
returns theread
shell command (in section 1; not what you want).man socket.7
returns the socket interface overview, not the syscall.
Find Online Resources
Man pages are for looking up things, not for learning; they are not tutorials, and they rarely explain anything. You may not be sure what is relevant, or the man page simplify does not have the answer. However, there are some good online resources that can fill the gap, such as Beej’s Guide.
And Googling is still effective in 2024 if you have a specific question.
3.2 Create a TCP Server
What the server will do: read data from the client, write a response, then close the connection.
Step 1: Obtain a Socket Handle
The socket()
syscall takes 3 integer arguments.
int fd = socket(AF_INET, SOCK_STREAM, 0);
AF_INET
is for IPv4. UseAF_INET6
for IPv6 or dual-stack sockets. This selects the IP level protocol. For simplicity, we’ll only consider IPv4.SOCK_STREAM
is for TCP. UseSOCK_DGRAM
for UDP, which is not our concern.- The 3rd argument is 0 and useless for our purposes.
There are other types of sockets that are created with different arguments, such as Unix domain sockets for IPC. We’ll only use TCP, so you can forget about those arguments for now.
The socket.2
man page is not relevant at this point. How
to create a TCP or UDP socket is documented in tcp.7
and
udp.7
respectively.
Step 2: Configure the Socket
There are many options that change the behavior of a socket. Such as configuring TCP keepalive and Nagle’s algorithm (neither of which are our concern).
However, the socket()
syscall does not have a way to
pass these of options, so another syscall setsockopt()
is
used to configure socket options after the socket has been created.
int val = 1;
(fd, SOL_SOCKET, SO_REUSEADDR, &val, sizeof(val)); setsockopt
- The 2nd and 3rd arguments specifies which option to set.
- The 4th argument is the option value.
- Different options use different types, so the size of the option value is also needed.
Socket options are scattered over several man pages. TCP-specific
options are in tcp.7
, while many generic options are in
socket.7
.
Most options are optional, except the SO_REUSEADDR
option; SO_REUSEADDR
is enabled (set to 1) for every
listening socket. Without it, bind()
will fail when you
restart your server. This is related to delayed packets and
TIME_WAIT
, and you can easily find explanations by
Googling.
Step 3: Bind to an Address
We’ll bind to the wildcard address 0.0.0.0:1234
.
// bind, this is the syntax that deals with IPv4 addresses
struct sockaddr_in addr = {};
.sin_family = AF_INET;
addr.sin_port = ntohs(1234);
addr.sin_addr.s_addr = ntohl(0); // wildcard address 0.0.0.0
addrint rv = bind(fd, (const sockaddr *)&addr, sizeof(addr));
if (rv) {
("bind()");
die}
struct sockaddr_in
holds an IPv4 address and port. You
must initialize the structure as shown in the sample code. The
ntohs()
and ntohl()
functions convert numbers
to the required big endian format. This is the ugliest part of the API.
Details are explained later.
For IPv6, use struct sockaddr_in6
instead. The
addr
argument accepts either address types, so the method
also needs the struct size because they are different.
Step 4: Listen
All the previous steps are just configuring options. The socket is
actually effective after listen()
. The OS will
automatically handle TCP handshakes and place established connections in
a queue. The application can then retrieve them via
accept()
.
// listen
= listen(fd, SOMAXCONN);
rv if (rv) {
("listen()");
die}
The backlog
argument is the size of the queue, which in
our case is SOMAXCONN
. SOMAXCONN
is defined as
128 on Linux, which is sufficient for us.
Step 5: Accept Connections
The server enters a loop that accepts and processes each client connection.
while (true) {
// accept
struct sockaddr_in client_addr = {};
socklen_t addrlen = sizeof(client_addr);
int connfd = accept(fd, (struct sockaddr *)&client_addr, &addrlen);
if (connfd < 0) {
continue; // error
}
(connfd);
do_something(connfd);
close}
The accept()
syscall also returns the peer’s address.
The addrlen
argument is both the input size and the output
size.
Step 6: Read & Write
Our dummy processing is just 1 read()
and 1
write()
.
static void do_something(int connfd) {
char rbuf[64] = {};
ssize_t n = read(connfd, rbuf, sizeof(rbuf) - 1);
if (n < 0) {
("read() error");
msgreturn;
}
("client says: %s\n", rbuf);
printf
char wbuf[] = "world";
(connfd, wbuf, strlen(wbuf));
write}
For now, we’re ignoring the return value of write()
,
which is the number of bytes written. This chapter is just to
familiarize you with the socket API, we’ll write real programs
later.
3.3 Create a TCP Client
Write something, read back from the server, then close the connection.
int fd = socket(AF_INET, SOCK_STREAM, 0);
if (fd < 0) {
("socket()");
die}
struct sockaddr_in addr = {};
.sin_family = AF_INET;
addr.sin_port = ntohs(1234);
addr.sin_addr.s_addr = ntohl(INADDR_LOOPBACK); // 127.0.0.1
addrint rv = connect(fd, (const struct sockaddr *)&addr, sizeof(addr));
if (rv) {
("connect");
die}
char msg[] = "hello";
(fd, msg, strlen(msg));
write
char rbuf[64] = {};
ssize_t n = read(fd, rbuf, sizeof(rbuf) - 1);
if (n < 0) {
("read");
die}
("server says: %s\n", rbuf);
printf(fd); close
Compile our programs with the following command line:
g++ -Wall -Wextra -O2 -g 03_server.cpp -o server
g++ -Wall -Wextra -O2 -g 03_client.cpp -o client
Run ./server
in one window and then run
./client
in another window:
$ ./server
client says: hello
$ ./client
server says: world
3.4 More on socket API
Some important but not immediately relevant things.
Using `struct sockaddr`
The API actually takes struct sockaddr *
as argument,
but we used struct sockaddr_in
and casted the pointer type.
What are these?
int accept(int sockfd, struct sockaddr *addr, socklen_t len);
int connect(int sockfd, const struct sockaddr *addr, socklen_t len);
int bind(int sockfd, const struct sockaddr *addr, socklen_t len);
These structs are defined as follows:
struct sockaddr {
unsigned short sa_family; // AF_INET, AF_INET6
char sa_data[14]; // useless
};
struct sockaddr_in {
short sin_family; // AF_INET
unsigned short sin_port; // port number, big endian
struct in_addr sin_addr; // IPv4 address
char sin_zero[8]; // useless
};
struct sockaddr_in6 {
uint16_t sin6_family; // AF_INET6
uint16_t sin6_port; // port number, big endian
uint32_t sin6_flowinfo;
struct in6_addr sin6_addr; // IPv6 address
uint32_t sin6_scope_id;
};
struct sockaddr_storage {
sa_family_t ss_family; // AF_INET, AF_INET6
// enough space for both IPv4 and IPv6
char __ss_pad1[_SS_PAD1SIZE];
int64_t __ss_align;
char __ss_pad2[_SS_PAD2SIZE];
};
The addr
argument can be either IPv4 or IPv6. So all of
these structs start with a 16-bit integer *_family
(despite
the different types) to indicate the address type. They just emulate a
tagged union:
struct sane_sockaddr { // fictional
uint16_t family; // AF_INET, AF_INET6
union {
// whatever ...
};
};
How to use these structures:
struct sockaddr
is pointless; you just cast your structs to this pointer type.struct sockaddr_in
andstruct sockaddr_in6
are the concrete structures for IPv4 and IPv6.struct sockaddr_storage
is large enough for all address types. Cast this to a concrete struct to initialize it. It is used when you don’t know the address type beforehand.
Syscalls, APIs, and Libraries
When you call any of the methods on Linux, you are actually calling a thin wrapper in libc — a wrapper to the stable Linux syscall interface. On Windows, the API is mostly the same, with minor differences like different function names.
There are also socket libraries, but they are not as useful as you
might think; the main complexity is not the API, but the rest of the
things like protocols, event loops. So a library won’t do much. The
socket API is simple and reasonable, except for the
struct sockaddr
mess.
Get the address of each side
You can get the address (IP + port) for both local and peer.
getpeername()
: Get the remote peer’s address (also returned byaccept()
).getsockname()
: Get my exact address (after binding to a wildcard address).
Configure the local address
bind()
can also be used on the client socket before
connect()
to specify the source address. Without this, the
OS will automatically select a source address. This is useful for
selecting a particular source address if multiple ones are available. If
the port in bind()
is zero, the OS will automatically
choose a port.
The Linux-specific SO_BINDTODEVICE
option selects the
source interface by name.
Domain name resolution
getaddrinfo()
resolves a domain name into IP addresses.
There is a sample program in its man page.
Unlike other socket APIs, this is not a Linux syscall and is
implemented in libc because name resolution is a complicated and
high-level function on Linux. It involves reading a bunch of files such
as /etc/resolv.conf
and /etc/hosts
and then
query a DNS server with UDP.
Socket and inter-process communication
There are mechanisms that allow processes within the same machine to communicate such as Unix domain sockets, pipes, etc. They are just a computer network limited to a single machine, so the programming techniques are the same.
Unix domain sockets share the same API with network sockets. You can
create either packet-based or byte-stream-based Unix domain sockets,
like UDP or TCP. A Unix domain socket is created with different flags on
the socket()
method and uses a different type of address,
but the rest is the same as a network socket. Read
man unix.7
for more info.
A pipe is a one-way byte stream. So you need a protocol like a TCP socket, which is not as trivial as you might think. You’ll learn protocol in the next chapter.
Source code: