Crafting an ELF Executable
2023-02-19
01. Introduction
ELF, the acronym for Executable and Linkable Format, is the standard binary executable file format for Unix and Unix-like systems.
This article is my notes on the ELF while building a mini compiler. It contains a high-level overview of how ELF works, and instructions for crafting a minimal ELF executable.
I have also documented some surprises in the ELF format.
02. A High-Level Overview
At the highest level, an ELF executable is roughly a container for machine code. However, this level of understanding is too high to be useful, so letβs go lower.
The Program Header
To execute a piece of machine code, the code must be loaded into the memory. This is the job of the so-called βprogram headerβ in ELF β it maps a portion of the file into memory at a specific virtual address.
A program header contains:
- The virtual address of the start of the mapping.
- The size of the mapping.
- The offset in the ELF file.
There can be multiple program headers and they create multiple memory mappings, possibly with different access modes. The mapping for machine code can be read and executed, but not written. In addition, an ELF may have a read-only and non-executable mapping for constants, and/or a read-write mapping for global variables. Access modes are also specified in program headers.
The Section Header
In addition to program headers, there are also βsection headersβ.
Section headers are not required for ELF executables. They can
be used to define different roles of data sections (machine code,
constants, or global variables), which can be used by ELF manipulation
tools such as linkers and dissemblers. For example, the machine code
section is named β.textβ, when you run the objdump -d
to
dissemble an executable, it looks for the β.textβ section.
A section header contains:
- The name of the section.
- The size of the section.
- The offset in the ELF file.
The ELF Header
Where do all these headers go and how many of them? This is defined in the βELF headerβ at the beginning of the file.
The ELF header contains:
- The number of program headers.
- The offset of the first program header. All program headers are placed together.
- The number of section headers.
- The offset of the first section header. Section headers are also placed together.
- The βentry pointβ.
The entry point is the virtual address where the machine code execution begins, and it is usually inside the β.textβ section.
The File Layout
So far, we have enough information to read an ELF file. What about creating an ELF file, where to put these sections and headers? The only requirement for the layout is that the ELF header is at the beginning, everything else can be placed at anywhere, as long as it can be reached by offsets.
As a file format, this sounds crazy and overly flexible. In fact, headers and sections can even overlap with each other.
03. ELF by Example
Letβs examine a hex dump of an ELF executable.
First, we create a tiny Linux program that exits with 42.
.asm
$ cat t64
BITS
GLOBAL mainSECTION .text
main:
mov rax, 60
mov rdi, 42
syscall
Compile it:
$ nasm -f elf64 t.asm -o t.o
$ gcc -Wall -Wextra -nostdlib -nostartfiles -static -s -Wl,--build-id=none -Os -o t t.o
$ wc -c t
4320 t
A hex dump with annotations:
// the ELF header begins
β0000β 7f 45 4c 46 02 01 01 00 β 00 00 00 00 00 00 00 00 ββ’ELFβ’β’β’β’ββ’β’β’β’β’β’β’β’β
β0010β 02 00 3e 00 01 00 00 00 β 00 10 40 00 00 00 00 00 ββ’β’>β’β’β’β’β’ββ’β’@β’β’β’β’β’β
[^^^^^entry point^^^^^]
program start at 0x401000
β0020β 40 00 00 00 00 00 00 00 β 20 10 00 00 00 00 00 00 β@β’β’β’β’β’β’β’β β’β’β’β’β’β’β’β
[program header offset] [section header offset]
0x40 0x1020
β0030β 00 00 00 00 40 00 38 00 β 02 00 40 00 03 00 02 00 ββ’β’β’β’@β’8β’ββ’β’@β’β’β’β’β’β
^^^^^ ^^^^^
| [number of section headers]
[number of program headers]
// the ELF header ends
// program headers begins
// program header 0, maps the ELF header and program headers, useless for now.
β0040β 01 00 00 00 04 00 00 00 β 00 00 00 00 00 00 00 00 ββ’β’β’β’β’β’β’β’ββ’β’β’β’β’β’β’β’β
[^^flags^^] [^^^^^^^offset^^^^^^^^]
read-only
β0050β 00 00 40 00 00 00 00 00 β 00 00 40 00 00 00 00 00 ββ’β’@β’β’β’β’β’ββ’β’@β’β’β’β’β’β
[^^^virtual address^^^]
β0060β b0 00 00 00 00 00 00 00 β b0 00 00 00 00 00 00 00 ββ’β’β’β’β’β’β’β’ββ’β’β’β’β’β’β’β’β
[^^^^^^file size^^^^^^] [^^^^^^mem size^^^^^^^]
β0070β 00 10 00 00 00 00 00 00 β ββ’β’β’β’β’β’β’β’β β
// program header 1, maps the ".text" section into 0x401000
β0070β β 01 00 00 00 05 00 00 00 β ββ’β’β’β’β’β’β’β’β
[^^flags^^]
read and execute
β0080β 00 10 00 00 00 00 00 00 β 00 10 40 00 00 00 00 00 ββ’β’β’β’β’β’β’β’ββ’β’@β’β’β’β’β’β
[^^^^^^^offset^^^^^^^^] [^^^virtual address^^^]
β0090β 00 10 40 00 00 00 00 00 β 0c 00 00 00 00 00 00 00 ββ’β’@β’β’β’β’β’ββ’β’β’β’β’β’β’β’β
[^^^^^^file size^^^^^^]
β00a0β 0c 00 00 00 00 00 00 00 β 00 10 00 00 00 00 00 00 ββ’β’β’β’β’β’β’β’ββ’β’β’β’β’β’β’β’β
[^^^^^^mem size^^^^^^^]
// program headers ends
// pad zeros to 4K
... zeros ...
// the .text section at the file offset 4096
401000: b8 3c 00 00 00 mov $0x3c,%eax
401005: bf 2a 00 00 00 mov $0x2a,%edi
40100a: 0f 05 syscall
β1000β b8 3c 00 00 00 bf 2a 00 β 00 00 0f 05 ββ’<β’β’β’β’*β’ββ’β’β’β’β’.shβ
// the .shstrtab section, a list of strings of section names
β1000β β 00 2e 73 68 ββ’<β’β’β’β’*β’ββ’β’β’β’β’.shβ
β1010β 73 74 72 74 61 62 00 2e β 74 65 78 74 00 00 00 00 βstrtabβ’.βtextβ’β’β’β’β
// section header 0, all zero
β1020β 00 00 00 00 00 00 00 00 β 00 00 00 00 00 00 00 00 ββ’β’β’β’β’β’β’β’ββ’β’β’β’β’β’β’β’β
β1030β 00 00 00 00 00 00 00 00 β 00 00 00 00 00 00 00 00 ββ’β’β’β’β’β’β’β’ββ’β’β’β’β’β’β’β’β
β1040β 00 00 00 00 00 00 00 00 β 00 00 00 00 00 00 00 00 ββ’β’β’β’β’β’β’β’ββ’β’β’β’β’β’β’β’β
β1050β 00 00 00 00 00 00 00 00 β 00 00 00 00 00 00 00 00 ββ’β’β’β’β’β’β’β’ββ’β’β’β’β’β’β’β’β
// section header 1, .text
β1060β 0b 00 00 00 01 00 00 00 β 06 00 00 00 00 00 00 00 ββ’β’β’β’β’β’β’β’ββ’β’β’β’β’β’β’β’β
β1070β 00 10 40 00 00 00 00 00 β 00 10 00 00 00 00 00 00 ββ’β’@β’β’β’β’β’ββ’β’β’β’β’β’β’β’β
[^^^^^^^offset^^^^^^^^]
4096
β1080β 0c 00 00 00 00 00 00 00 β 00 00 00 00 00 00 00 00 ββ’β’β’β’β’β’β’β’ββ’β’β’β’β’β’β’β’β
[^^^^^^file size^^^^^^]
12
β1090β 10 00 00 00 00 00 00 00 β 00 00 00 00 00 00 00 00 ββ’β’β’β’β’β’β’β’ββ’β’β’β’β’β’β’β’β
// section header 2, .shstrtab
β10a0β 01 00 00 00 03 00 00 00 β 00 00 00 00 00 00 00 00 ββ’β’β’β’β’β’β’β’ββ’β’β’β’β’β’β’β’β
β10b0β 00 00 00 00 00 00 00 00 β 0c 10 00 00 00 00 00 00 ββ’β’β’β’β’β’β’β’ββ’β’β’β’β’β’β’β’β
β10c0β 11 00 00 00 00 00 00 00 β 00 00 00 00 00 00 00 00 ββ’β’β’β’β’β’β’β’ββ’β’β’β’β’β’β’β’β
β10d0β 01 00 00 00 00 00 00 00 β 00 00 00 00 00 00 00 00 ββ’β’β’β’β’β’β’β’ββ’β’β’β’β’β’β’β’β
As mentioned before, section headers are not required for the
executable to function. We can remove all the section headers from the
file, and set the βnumber of section headersβ and βsection header
offsetβ to zero. The ELF will still run, but tools like
objdump -d
will be confused and wonβt know what to do with
the file.
And there are 2 program headers, only the program header for mapping the machine code is used for running this example. We can remove the other one.
Optionally, reduce the padding after program headers.
04. Crafting a Minimial ELF Executable
This is a minimum ELF executable layout:
- The ELF header.
- Followed by a program header that maps your machine code into memory.
- Followed by your machine code.
The above hex dump serves as a template for crafting your own ELF executables, the annotated fields are what you need to modify. Section headers can be ignored.
This is my minimal ELF writer in Python.
05. Surprises of the ELF Format
The format is not rigid β it has no constraints on where sections and headers go β which allows for:
- Gaps between sections and headers.
- Overlapping sections or headers.
Iβm no security expert, but I think loosely defined file formats can be a source of security problems. A more rigid design would require that headers and sections to be placed in order, and paddings be explicitly defined.
The program header can be used to create a mapping at the 0 virtual address, if you do that, a null pointer dereference will not be an immediate error. I have tried this on Linux and it works. This can be used to bypass the minimum
mmap
address enforced by/proc/sys/vm/mmap_min_addr
.
Welcome to build-your-own.org.
A website for free educational software development materials.
for updates and new books.
![]() |
Read the book βBuild Your Own Redisβ online. Learn network programming and data structures. By build stuff from scratch. With straightforward C/C++ code. |
![]() |
Read the book βBuild Your Own Databaseβ online. Learn 3 important database topics:
|