A compiler works in phases to translate high-level source code into machine code or an intermediate representation (i.e., the orange box in the image below). Compilation phases are categorized into two parts: analysis and synthesis.
Within these compiler phases, there are sub-phases (1-3: analysis; 4-7: synthesis):
- Lexical analysis – tokenize source code.
- Syntax analysis – builds an AST based on grammar.
- Semantic analysis – ensures correct meaning and types.
- Intermediate code generation – produces a machine-independent representation.
- Optimization – improves intermediate code.
- Code generation – produces machine-specific assembly code.
- Code optimization (optional) – improves generated code.
Let’s explore each compilation phase.
🌸👋🏻 Join 10,000+ followers! Let’s take this to your inbox. You’ll receive occasional emails about whatever’s on my mind—offensive security, open source, boats, reversing, software freedom, you get the idea.
Analysis phase
1. Lexical analysis (scanner)
The lexical analysis phase is responsible for reading the source code character by character and grouping them into sequences called tokens.
Tokens represent syntax components (e.g., keywords, identifiers, literals, and operators).
- Output – a stream of tokens passed to the next phase.
- Example – from the line
int x = 10;, the lexer might produce tokens like:int(keyword)x(identifier)=(operator)10(literal);(semicolon)
2. Syntax analysis (parser)
Abstract Syntax Tree (source).
The parser takes the stream of tokens produced by the lexical analyzer and checks if they follow the language’s grammatical rules. It typically produces an abstract syntax tree (AST) or parse tree.
- Output – a tree-like structure (AST) representing the source code’s grammatical structure.
- Example – the tokens
int x = 10;might be parsed into an AST representing the variable’s declaration and assignment.
Read more: Parse Tree vs Syntax Tree – GeeksforGeeks
3. Semantic analysis
In this phase, the compiler ensures that the parsed structure (AST) adheres to the language’s semantic rules.
Semantic analysis includes type checking, checking scope rules, and ensuring variable declarations and uses are correct.
- Output – an annotated AST that includes type information and other semantic details.
- Example – ensuring that I am not trying to assign a string to an integer variable or that the variable
xhas been declared before use.

Shows syntax and semantics (source).

Annotated AST (source).
Synthesis phase
4. Intermediate code generation
After the semantic analysis phase completes, the compiler generates an intermediate representation (IR) of the source code.
The IR is usually independent of the target machine, making it easier to optimize later on, and retarget the compiler for different architectures.
- Output – intermediate code, such as three-address code (TAC), control flow graphs (CFGs), or static single assignment (SSA).
- Example – a high-level code like
x = a + bmight be translated to a lower-level IR liket1 = a + b, followed byx = t1.

(source)
🌸👋🏻 Join 10,000+ followers! Let’s take this to your inbox. You’ll receive occasional emails about whatever’s on my mind—offensive security, open source, boats, reversing, software freedom, you get the idea.
5. Optimization
In this phase, the compiler tries to improve the intermediate code to make it more efficient, reducing resource usage (e.g., CPU and memory) or execution time.
Optimization can happen at multiple levels, including peephole optimization, loop optimization, constant folding, dead code elimination, etc.
- Output – an optimized version of the intermediate code.
- Example – if a variable’s value is constant, the compiler may replace all instances of that variable with the constant itself (constant propagation).
Vocab
- Peephole optimization – local technique where small sequences of instructions are examined and replaced with more efficient ones, improving performance in a small code region.
- Loop optimization – enhances loop performance through unrolling, invariant code motion, or loop fusion, reducing the number of iterations or redundant computations. Here is an explanation of these loop optimization techniques:
- Unrolling – expanding loop’s iterations by duplicating the loop body, reducing the overhead of loop control and potentially enabling more optimizations.
- Invariant code motion – moves computations or statements that produce the same result on every iteration (loop-invariant) outside the loop, avoiding redundant calculations within the loop.
- Loop fusion – merges two or more adjacent loops that iterate over the same range into a single loop, reducing the loop’s overhead and improving cache speed.
- Constant folding – a process where constant expressions are evaluated at compile-time rather than runtime, reducing calculations during execution.
- Dead code elimination – removing code that is never executed or whose results are never used, reducing unnecessary computations and resource usage.
6. Code generation
Then, the compiler translates the optimized intermediate code into target machine code (e.g., assembly or binary code) for the specific hardware architecture. This phase involves selecting instructions, allocating registers, and mapping variables to memory locations.
- Output – target machine code (e.g., assembly code).
- Example – a high-level statement like
x = 10might be translated into assembly instructions such asMOV R1, #10andMOV [x], R1.
7. Code optimization (optional)
After generating the target code, additional machine-level optimizations may be performed (e.g., minimizing register usage, instruction reordering, or eliminating unnecessary instructions).
- Output – an even more efficient version of machine code.
- Example – removing redundant load and store instructions that don’t affect the program’s outcome.
🌸👋🏻 Join 10,000+ followers! Let’s take this to your inbox. You’ll receive occasional emails about whatever’s on my mind—offensive security, open source, boats, reversing, software freedom, you get the idea.
After the compiler: assembler and linker
The final machine code might still need to be assembled into an executable format. The assembly phase converts assembly code into machine-readable binary, while the linking phase resolves external references (e.g., function calls to external libraries), and produces the final executable.
- Output – the final executable file.
- Example – linking compiled code with external libraries and system calls to produce the final executable binary.
Read more: Converting High Level Languages to Machine Language – Olivia A. Gallucci
Compilation phases
Overall, each compilation phase is needed (or, helpful at least) for converting source code into an executable program that adheres to the syntax and semantics of the original high-level language. In short, there are two primary phases of compilation—analysis and synthesis—and these phases divide into subphases (1-3: analysis; 4-7: synthesis):
- Lexical analysis – tokenize source code.
- Syntax analysis – builds an AST based on grammar.
- Semantic analysis – ensures correct meaning and types.
- Intermediate code generation – produces a machine-independent representation.
- Optimization – improves intermediate code.
- Code generation – produces machine-specific assembly code.
- Code optimization (optional) – improves generated code.
If you enjoyed this post on compilation phases, consider reading how to use ROP to bypass security mechanisms.


You must be logged in to post a comment.