A deep understanding of object files reveals how C supports modular programming and how the linker constructs a final executable. We'll explore this through the lens of the ELF (Executable and Linkable Format), the standard format on Linux and many other Unix-like systems.
1. The ELF (Executable and Linkable Format) Structure
An object file isn't a monolithic block of code; it's a structured file containing multiple sections, indexed by a section header table.
路聽聽聽聽聽聽聽聽 ELF Header: The file's blueprint, containing metadata like whether it's a 32/64-bit file, its endianness, and its type (e.g., relocatable object, executable).
路聽聽聽聽聽聽聽聽
.text
: Contains
the compiled, read-only machine code for your functions.
路聽聽聽聽聽聽聽聽
.rodata
: Contains
read-only data, such as string literals and const
-qualified variables.
路聽聽聽聽聽聽聽聽
.data
: Contains
initialized global and static variables.
路聽聽聽聽聽聽聽聽
.bss
: A
placeholder for uninitialized global and static variables. It specifies a size
but occupies no space in the file itself, saving disk space.
路聽聽聽聽聽聽聽聽
.symtab
: The symbol
table, which lists all global symbols (functions and variables) that are
defined or referenced by the file.
路聽聽聽聽聽聽聽聽
.rel.text
/ .rel.data
: The relocation
sections. These are the key to making object files modular and are
discussed next.
2. Relocation: Making Code Position-Independent
A compiler
generating an object file for main.c
doesn't know the final memory address of a function add()
that's defined in calc.c
. So how does it generate the machine code for call add
?
The answer is relocation. The compiler generates a placeholder instruction and leaves a note for the linker in a relocation section.
The process works like this:
1.聽聽 Compiler:
When compiling a call to an external function, the compiler generates a call
instruction with a temporary (often zero) address.
2.聽聽 Compiler: It
then adds an entry to the .rel.text
section. This entry essentially says: "Dear
Linker, when you figure out the final address for the symbol add
, please come back to this specific location in the .text
section and patch the placeholder address with the
real one."
3.聽聽 Linker: During linking, the linker determines the final addresses for all functions. It then reads the relocation entries in each object file and performs the requested patches, "stitching" the code together.
Example with objdump
Let's inspect the
object file for a main.c
that calls a function add()
from another file.
C
// main.c
int add(int, int);
// Prototype for a function in another file
int main() {
聽聽聽
return add(
5,
10);
}
Compile
to an object file: gcc
-c main.c -o main.o
Inspect with objdump -d -r
main.o
:
Code snippet
0000000000000000 <main>:
聽聽 0:聽聽 ...
聽聽 b:聽聽 e8 00 00 00 00聽聽聽聽聽聽聽聽聽 callq聽 10 <main+0x10>
聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 c: R_X86_64_PLT32聽聽聽聽聽聽 add-0x4
聽聽 ...
路聽聽聽聽聽聽聽聽
e8 00 00
00 00
: This is the machine
code for the call
instruction. Note the address is 00 00 00 00
鈥攁 placeholder.
路聽聽聽聽聽聽聽聽
R_X86_64_PLT32 add-0x4
: This is the relocation entry. It tells the
linker to find the address of the symbol add
and use it to patch the 4 bytes at offset c
.
3. Symbol Resolution and Linkage Types
The linker uses the symbol tables from all object files to connect function calls with function definitions.
路聽聽聽聽聽聽聽聽 Strong and Weak Symbols: This is the mechanism the linker uses to handle multiple definitions of the same global symbol.
o聽聽聽 Strong: Functions and initialized global variables.
o聽聽聽 Weak:
Uninitialized global variables or symbols explicitly marked as weak (e.g.,
using __attribute__((weak))
in GCC).
o聽聽聽 Linker Rules:
1.聽聽 Multiple strong symbols with the same name result in a linker error.
2.聽聽 Given one strong symbol and multiple weak symbols, the strong symbol is chosen.
3.聽聽 Given only multiple weak symbols, the linker picks one without an error.
路聽聽聽聽聽聽聽聽
C vs. C++
Linkage: C uses a simple linkage
model where a function's symbol is just its name (e.g., add
). This is why you cannot have two global C functions
with the same name. C++ supports function overloading and uses name mangling
to encode the function's parameter types into the symbol name (e.g., add(int, int)
might become the symbol _Z3addii
). This makes the symbols unique, allowing the linker
to differentiate them.
聽
聽
Executable code is a self-contained, machine-readable file that the operating system can load directly into memory and run. It's the final output of the linker in the compilation process.