StudyLover
  • Home
  • Study Zone
  • Profiles
  • Typing Tutor
  • B Tree
  • Contact us
  • Sign in
StudyLover Advanced Concepts of C Object Files 馃敩
Download
  1. C Programming
  2. Unit 1: Foundations of Problem Solving & C Language Basics
Understanding Object Code in C 馃敩 : Understanding Executable Code in C 馃殌
Unit 1: Foundations of Problem Solving & C Language Basics

A deep understanding of object files reveals how C supports modular programming and how the linker constructs a final executable. We'll explore this through the lens of the ELF (Executable and Linkable Format), the standard format on Linux and many other Unix-like systems.

1. The ELF (Executable and Linkable Format) Structure

An object file isn't a monolithic block of code; it's a structured file containing multiple sections, indexed by a section header table.

路聽聽聽聽聽聽聽聽 ELF Header: The file's blueprint, containing metadata like whether it's a 32/64-bit file, its endianness, and its type (e.g., relocatable object, executable).

路聽聽聽聽聽聽聽聽 .text: Contains the compiled, read-only machine code for your functions.

路聽聽聽聽聽聽聽聽 .rodata: Contains read-only data, such as string literals and const-qualified variables.

路聽聽聽聽聽聽聽聽 .data: Contains initialized global and static variables.

路聽聽聽聽聽聽聽聽 .bss: A placeholder for uninitialized global and static variables. It specifies a size but occupies no space in the file itself, saving disk space.

路聽聽聽聽聽聽聽聽 .symtab: The symbol table, which lists all global symbols (functions and variables) that are defined or referenced by the file.

路聽聽聽聽聽聽聽聽 .rel.text / .rel.data: The relocation sections. These are the key to making object files modular and are discussed next.


2. Relocation: Making Code Position-Independent

A compiler generating an object file for main.c doesn't know the final memory address of a function add() that's defined in calc.c. So how does it generate the machine code for call add?

The answer is relocation. The compiler generates a placeholder instruction and leaves a note for the linker in a relocation section.

The process works like this:

1.聽聽 Compiler: When compiling a call to an external function, the compiler generates a call instruction with a temporary (often zero) address.

2.聽聽 Compiler: It then adds an entry to the .rel.text section. This entry essentially says: "Dear Linker, when you figure out the final address for the symbol add, please come back to this specific location in the .text section and patch the placeholder address with the real one."

3.聽聽 Linker: During linking, the linker determines the final addresses for all functions. It then reads the relocation entries in each object file and performs the requested patches, "stitching" the code together.

Example with objdump

Let's inspect the object file for a main.c that calls a function add() from another file.

C

// main.c

int add(int, int); // Prototype for a function in another file

int main() {

聽聽聽 return add(5, 10);

}

Compile to an object file: gcc -c main.c -o main.o Inspect with objdump -d -r main.o:

Code snippet

0000000000000000 <main>:

聽聽 0:聽聽 ...

聽聽 b:聽聽 e8 00 00 00 00聽聽聽聽聽聽聽聽聽 callq聽 10 <main+0x10>

聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 c: R_X86_64_PLT32聽聽聽聽聽聽 add-0x4

聽聽 ...

路聽聽聽聽聽聽聽聽 e8 00 00 00 00: This is the machine code for the call instruction. Note the address is 00 00 00 00鈥攁 placeholder.

路聽聽聽聽聽聽聽聽 R_X86_64_PLT32 add-0x4: This is the relocation entry. It tells the linker to find the address of the symbol add and use it to patch the 4 bytes at offset c.


3. Symbol Resolution and Linkage Types

The linker uses the symbol tables from all object files to connect function calls with function definitions.

路聽聽聽聽聽聽聽聽 Strong and Weak Symbols: This is the mechanism the linker uses to handle multiple definitions of the same global symbol.

o聽聽聽 Strong: Functions and initialized global variables.

o聽聽聽 Weak: Uninitialized global variables or symbols explicitly marked as weak (e.g., using __attribute__((weak)) in GCC).

o聽聽聽 Linker Rules:

1.聽聽 Multiple strong symbols with the same name result in a linker error.

2.聽聽 Given one strong symbol and multiple weak symbols, the strong symbol is chosen.

3.聽聽 Given only multiple weak symbols, the linker picks one without an error.

路聽聽聽聽聽聽聽聽 C vs. C++ Linkage: C uses a simple linkage model where a function's symbol is just its name (e.g., add). This is why you cannot have two global C functions with the same name. C++ supports function overloading and uses name mangling to encode the function's parameter types into the symbol name (e.g., add(int, int) might become the symbol _Z3addii). This makes the symbols unique, allowing the linker to differentiate them.

聽

聽

Executable code is a self-contained, machine-readable file that the operating system can load directly into memory and run. It's the final output of the linker in the compilation process.


Understanding Object Code in C 馃敩 Understanding Executable Code in C 馃殌
Our Products & Services
  • Home
Connect with us
  • Contact us
  • +91 82955 87844
  • Rk6yadav@gmail.com

StudyLover - About us

The Best knowledge for Best people.

Copyright © StudyLover
Powered by Odoo - Create a free website