01. Introduction

ELF, the acronym for Executable and Linkable Format, is the standard binary executable file format for Unix and Unix-like systems.

This article is my notes on the ELF while building a mini compiler. It contains a high-level overview of how ELF works, and instructions for crafting a minimal ELF executable.

I have also documented some surprises in the ELF format.

02. A High-Level Overview

At the highest level, an ELF executable is roughly a container for machine code. However, this level of understanding is too high to be useful, so let’s go lower.

The Program Header

To execute a piece of machine code, the code must be loaded into the memory. This is the job of the so-called “program header” in ELF — it maps a portion of the file into memory at a specific virtual address.

A program header contains:

  • The virtual address of the start of the mapping.
  • The size of the mapping.
  • The offset in the ELF file.

There can be multiple program headers and they create multiple memory mappings, possibly with different access modes. The mapping for machine code can be read and executed, but not written. In addition, an ELF may have a read-only and non-executable mapping for constants, and/or a read-write mapping for global variables. Access modes are also specified in program headers.

The Section Header

In addition to program headers, there are also “section headers”. Section headers are not required for ELF executables. They can be used to define different roles of data sections (machine code, constants, or global variables), which can be used by ELF manipulation tools such as linkers and dissemblers. For example, the machine code section is named “.text”, when you run the objdump -d to dissemble an executable, it looks for the “.text” section.

A section header contains:

  • The name of the section.
  • The size of the section.
  • The offset in the ELF file.

The ELF Header

Where do all these headers go and how many of them? This is defined in the “ELF header” at the beginning of the file.

The ELF header contains:

  • The number of program headers.
  • The offset of the first program header. All program headers are placed together.
  • The number of section headers.
  • The offset of the first section header. Section headers are also placed together.
  • The “entry point”.

The entry point is the virtual address where the machine code execution begins, and it is usually inside the “.text” section.

The File Layout

So far, we have enough information to read an ELF file. What about creating an ELF file, where to put these sections and headers? The only requirement for the layout is that the ELF header is at the beginning, everything else can be placed at anywhere, as long as it can be reached by offsets.

As a file format, this sounds crazy and overly flexible. In fact, headers and sections can even overlap with each other.

03. ELF by Example

Let’s examine a hex dump of an ELF executable.

First, we create a tiny Linux program that exits with 42.

$ cat t.asm
BITS 64
GLOBAL main
SECTION .text
main:
    mov     rax, 60
    mov     rdi, 42
    syscall

Compile it:

$ nasm -f elf64 t.asm -o t.o
$ gcc -Wall -Wextra -nostdlib -nostartfiles -static -s -Wl,--build-id=none -Os -o t t.o
$ wc -c t
4320 t

A hex dump with annotations:

// the ELF header begins
│0000│ 7f 45 4c 46 02 01 01 00 ┊ 00 00 00 00 00 00 00 00 │•ELF••••┊••••••••│
│0010│ 02 00 3e 00 01 00 00 00 ┊ 00 10 40 00 00 00 00 00 │••>•••••┊••@•••••│
                                 [^^^^^entry point^^^^^]
                                program start at 0x401000
│0020│ 40 00 00 00 00 00 00 00 ┊ 20 10 00 00 00 00 00 00 │@•••••••┊ •••••••│
       [program header offset]   [section header offset]
                0x40                     0x1020
│0030│ 00 00 00 00 40 00 38 00 ┊ 02 00 40 00 03 00 02 00 │••••@•8•┊••@•••••│
                                 ^^^^^       ^^^^^
                                 |           [number of section headers]
                                 [number of program headers]
// the ELF header ends

// program headers begins
// program header 0, maps the ELF header and program headers, useless for now.
│0040│ 01 00 00 00 04 00 00 00 ┊ 00 00 00 00 00 00 00 00 │••••••••┊••••••••│
                   [^^flags^^]   [^^^^^^^offset^^^^^^^^]
                    read-only
│0050│ 00 00 40 00 00 00 00 00 ┊ 00 00 40 00 00 00 00 00 │••@•••••┊••@•••••│
       [^^^virtual address^^^]
│0060│ b0 00 00 00 00 00 00 00 ┊ b0 00 00 00 00 00 00 00 │••••••••┊••••••••│
       [^^^^^^file size^^^^^^]   [^^^^^^mem size^^^^^^^]
│0070│ 00 10 00 00 00 00 00 00 ┊                         │••••••••┊        │

// program header 1, maps the ".text" section into 0x401000
│0070│                         ┊ 01 00 00 00 05 00 00 00 │        ┊••••••••│
                                             [^^flags^^]
                                           read and execute
│0080│ 00 10 00 00 00 00 00 00 ┊ 00 10 40 00 00 00 00 00 │••••••••┊••@•••••│
       [^^^^^^^offset^^^^^^^^]   [^^^virtual address^^^]
│0090│ 00 10 40 00 00 00 00 00 ┊ 0c 00 00 00 00 00 00 00 │••@•••••┊••••••••│
                                 [^^^^^^file size^^^^^^]
│00a0│ 0c 00 00 00 00 00 00 00 ┊ 00 10 00 00 00 00 00 00 │••••••••┊••••••••│
       [^^^^^^mem size^^^^^^^]
// program headers ends

// pad zeros to 4K
... zeros ...

// the .text section at the file offset 4096
    401000:       b8 3c 00 00 00          mov    $0x3c,%eax
    401005:       bf 2a 00 00 00          mov    $0x2a,%edi
    40100a:       0f 05                   syscall
│1000│ b8 3c 00 00 00 bf 2a 00 ┊ 00 00 0f 05             │•<••••*•┊•••••.sh│
// the .shstrtab section, a list of strings of section names
│1000│                         ┊             00 2e 73 68 │•<••••*•┊•••••.sh│
│1010│ 73 74 72 74 61 62 00 2e ┊ 74 65 78 74 00 00 00 00 │strtab•.┊text••••│

// section header 0, all zero
│1020│ 00 00 00 00 00 00 00 00 ┊ 00 00 00 00 00 00 00 00 │••••••••┊••••••••│
│1030│ 00 00 00 00 00 00 00 00 ┊ 00 00 00 00 00 00 00 00 │••••••••┊••••••••│
│1040│ 00 00 00 00 00 00 00 00 ┊ 00 00 00 00 00 00 00 00 │••••••••┊••••••••│
│1050│ 00 00 00 00 00 00 00 00 ┊ 00 00 00 00 00 00 00 00 │••••••••┊••••••••│

// section header 1, .text
│1060│ 0b 00 00 00 01 00 00 00 ┊ 06 00 00 00 00 00 00 00 │••••••••┊••••••••│
│1070│ 00 10 40 00 00 00 00 00 ┊ 00 10 00 00 00 00 00 00 │••@•••••┊••••••••│
                                 [^^^^^^^offset^^^^^^^^]
                                          4096
│1080│ 0c 00 00 00 00 00 00 00 ┊ 00 00 00 00 00 00 00 00 │••••••••┊••••••••│
       [^^^^^^file size^^^^^^]
                 12
│1090│ 10 00 00 00 00 00 00 00 ┊ 00 00 00 00 00 00 00 00 │••••••••┊••••••••│

// section header 2, .shstrtab
│10a0│ 01 00 00 00 03 00 00 00 ┊ 00 00 00 00 00 00 00 00 │••••••••┊••••••••│
│10b0│ 00 00 00 00 00 00 00 00 ┊ 0c 10 00 00 00 00 00 00 │••••••••┊••••••••│
│10c0│ 11 00 00 00 00 00 00 00 ┊ 00 00 00 00 00 00 00 00 │••••••••┊••••••••│
│10d0│ 01 00 00 00 00 00 00 00 ┊ 00 00 00 00 00 00 00 00 │••••••••┊••••••••│

As mentioned before, section headers are not required for the executable to function. We can remove all the section headers from the file, and set the “number of section headers” and “section header offset” to zero. The ELF will still run, but tools like objdump -d will be confused and won’t know what to do with the file.

And there are 2 program headers, only the program header for mapping the machine code is used for running this example. We can remove the other one.

Optionally, reduce the padding after program headers.

04. Crafting a Minimial ELF Executable

This is a minimum ELF executable layout:

  1. The ELF header.
  2. Followed by a program header that maps your machine code into memory.
  3. Followed by your machine code.

The above hex dump serves as a template for crafting your own ELF executables, the annotated fields are what you need to modify. Section headers can be ignored.

This is my minimal ELF writer in Python.

05. Surprises of the ELF Format

  1. The format is not rigid — it has no constraints on where sections and headers go — which allows for:

    • Gaps between sections and headers.
    • Overlapping sections or headers.

    I’m no security expert, but I think loosely defined file formats can be a source of security problems. A more rigid design would require that headers and sections to be placed in order, and paddings be explicitly defined.

  2. The program header can be used to create a mapping at the 0 virtual address, if you do that, a null pointer dereference will not be an immediate error. I have tried this on Linux and it works. This can be used to bypass the minimum mmap address enforced by /proc/sys/vm/mmap_min_addr.