Several years ago I read an excellent guide to ELF executables called “A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux.” It outlines some of the factors that contribute to overhead in ELF executables, and goes to great lengths to make the smallest-possible ELF program. Unfortunately, the tutorial only covers 32-bit ELF executables. By hand-coding the ELF and program headers and compiling them with NASM, it’s also possible to create very small 64-bit ELF programs.
For instance, a 64-bit “Hello World!” program written in C and compiled with gcc takes up 8.2 kilobytes on my system. When the same program is written in assembly and compiled using the techniques outlined by Raiter, it only takes up 152 bytes. Further improvements are possible, but they entail some compatibility-compromising hacks that may reduce portability between different systems and versions of Linux.
tl;dr: Download the zip file containing the sources and makefile here, or get the 32-bit and 64-bit sources and compile them individually (here are the fully optimized 32-bit and 64-bit sources). To compile all of the examples, extract the files from the archive and type “make” or compile them yourself with
$ nasm -f bin [filename]
.
Teensy 64-bit ELF Executables
The original guide to teensy ELF executables shows that you can cut most of the fat from a compiled program by creating an executable that provides an ELF header, a program header, a main routine and nothing else. A program that doesn’t link to external libraries or rely on other resources doesn’t need anything else, but regular compilers don’t bother cutting out the unnecessary sections. The techniques in the Teensy Elf guide can be applied almost directly to the 64-bit ELF format with a few key adjustments.
The 64-bit ELF Header
The ELF specification defines the 64-bit ELF header with the following struct:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
typedef struct { unsigned char e_ident[EI_NIDENT]; // ELF Magic bits: 0x7f,E,L,F + info Elf64_Half e_type; // Object file type Elf64_Half e_machine; // CPU architecture Elf64_Word e_version; // Object file version Elf64_Addr e_entry; // Program entry point Elf64_Off e_phoff; // Program header offset Elf64_Off e_shoff; // Section header offset Elf64_Word e_flags; // Processor flags Elf64_Half e_ehsize; // ELF header size Elf64_Half e_phentsize; // Program header size Elf64_Half e_phnum; // Number of program header entries Elf64_Half e_shentsize; // Section header size Elf64_Half e_shnum; // Number of section header entries Elf64_Half e_shtrndx; // String table section header index } Elf64_Ehdr; |
The 64-bit ELF header is exactly like the 32-bit ELF header except for the size of its address and offset entries. While the word and halfword entries are consistent between 32 and 64-bit architectures, Elf64_Off and Elf64_Addr entries are 8 bytes long (64 bits) and Elf32_Off and Elf32_Addr entries each take up four bytes (32 bits).
The 64-bit ELF Program Header
The 64-bit ELF program header is similar to the 32-bit phdr except that in addition to the different address and offset widths, the segment’s flag values are in a new location:
|
1 2 3 4 5 6 7 8 9 10 11 12 |
typedef struct elf64_phdr { Elf64_Word p_type; // Type of segment Elf64_Word p_flags; // Segment's flag values (!) Elf64_Off p_offset; // Segment's offset from start off file Elf64_Addr p_vaddr; // Segment's virtual address in memory Elf64_Addr p_paddr; // Segment's physical address (ignored) Elf64_Xword p_filesz; // Segment's size on disk Elf64_Xword p_memsz; // Segment's size in memory Elf64_Xword p_align; // Segment's alignment } Elf64_Phdr; |
Creating a Tiny 64-bit ELF Header in NASM
Porting the data from the ELF spec into NASM allows us to create a 64-bit NASM ELF template. This is a 64-bit version of the Teensy ELF proposed in the original tutorial. The main routine is taken from the “64-bit NASM Hello World” tutorial:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
BITS 64 org 0x00400000 ; Program load offset ; 64-bit ELF header ehdr: ; ELF Magic + 2 (64-bit), 1 (LSB), 1 (ELF ver. 1), 0 (ABI ver.) db 0x7F, "ELF", 2, 1, 1, 0 ; e_ident times 8 db 0 ; reserved (zeroes) dw 2 ; e_type: Executable file dw 0x3e ; e_machine: AMD64 dd 1 ; e_version: current version dq _start ; e_entry: program entry address (0x78) dq phdr - $$ ; e_phoff program header offset (0x40) dq 0 ; e_shoff no section headers dd 0 ; e_flags no flags dw ehdrsize ; e_ehsize: ELF header size (0x40) dw phdrsize ; e_phentsize: program header size (0x38) dw 1 ; e_phnum: one program header dw 0 ; e_shentsize dw 0 ; e_shnum dw 0 ; e_shstrndx ehdrsize equ $ - ehdr ; 64-bit ELF program header phdr: dd 1 ; p_type: loadable segment dd 5 ; p_flags read and execute dq 0 ; p_offset dq $$ ; p_vaddr: start of the current section dq $$ ; p_paddr: " " dq filesize ; p_filesz dq filesize ; p_memsz dq 0x200000 ; p_align: 2^11=200000 = section alignment ; program header size phdrsize equ $ - phdr ; Hello World!/your program here _start: ; sys_write(stdout, message, length) mov rax, 1 ; sys_write mov rdi, 1 ; stdout mov rsi, message ; message address mov rdx, length ; message string length syscall ; sys_exit(return_code) mov rax, 60 ; sys_exit mov rdi, 0 ; return 0 (success) syscall message: db 'Hello, world!',0x0A ; message and newline length: equ $-message ; message length calculation ; File size calculation filesize equ $ - $$ |
Compiling and running
Download the package containing the source files here, or get the 32-bit and 64-bit sources and compile them individually.To compile all of the examples, extract the files and type “make” or
$ nasm -f bin [filename]
.
wc shows the dramatic difference in the compiled program sizes:
|
1 2 3 4 5 6 7 8 9 10 |
$ wc normal 5 68 8377 normal $ wc hello32 1 3 134 hello32 $ wc hello64 1 3 198 hello64 |
The NASM programs are 40-60 times smaller than the program compiled by gcc with default settings.
Assembly Optimizations
By carefully choosing instructions, a few bytes can be shaved from each executable. For instance, as François-Renaud Escriva pointed out in the comments, instead of using 64-bit register instructions, specifying 32 or even 16-bit registers can make the instructions smaller and save a few bytes. Also, as olsner pointed out on r/osdev, the CDQ instruction will sign-extend eax into edx, effectively doing the same thing as xor edx, edx in less bytes since eax contains a positive value. The optimized 64-bit program looks like this:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
; Space-optimized Hello World! _start: ; sys_write(stdout, message, length) mov al, 1 ; sys_write mov edi, eax ; stdout mov esi, message ; message address mov dl, length ; message string length syscall ; sys_exit(return_code) mov al, 60 ; sys_exit cdq ; Sign-extend eax into edi to return 0 (success) syscall message: db 'Hello, world!',0x0A ; message and newline length: equ $-message ; message length calculation |
Using the optimized routine shaves 66 bytes from the 64-bit executable, resulting in a 152 byte ELF. Similar optimization shaves only 7 bytes from the 32-bit ELF.
The original Teensy ELF tutorial provides additional improvements by overlapping the ELF and program headers, but the resulting executables may not work consistently across different systems and versions. If you want an even smaller ELF, playing with the headers is your next step.
Further Reading
System V Application Binary Interface - ELF Header and Program Header specs
A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux (again)
Just wanted to drop by and let you know you are crazy
And hey, great work! Didn’t understand much