Compilation is the task of transforming a
computer program written in some
source language to a program with equivalent
semantics in some
target language. Compilation is performed by another computer program called a
compiler. The source language is usually some
programming language, like
C,
C++, or
Java. The target language is most often some sort of
machine language. Exceptions exist -- the
Java bytecode compiler,
javac, and the
Java virtual machine with
just-in-time compilation being the notable example.
For the common case of compiling source code to machine code, compilation occurs in a series of stages, as diagrammed below:
source code
↓ lexical analysis (or scanning)
token stream
↓ syntactic analysis (or parsing)
parse tree & symbol table
↓ semantic analysis
parse tree & symbol table
↓ language-specific optimizations (or high-level optimizations)
parse tree & symbol table
↓ intermediate format generation
intermediate format
↓ general optimizations and additional program transformations
intermediate format
↓ code generation
machine code
↓ machine-specific optimizations (or low-level optimizations)
machine code (or object code)
The first set of steps comprise the
front end of the compiler. The front end is responsible for transforming the source code to an intermediate format. The last set of steps comprise the
back end of the compiler. The back end transforms the intermediate format to machine code.
The intermediate format is an abstract representation of the program being compiled. The intermediate format is independent of both the source programming language and the instruction-set architecture of the target machine. The purpose of having an intermediate format is to de-couple the front end from the back end. This way, the front end can be made independent of the instruction-set architecture, and the back end of can be made independent of the source language. Doing this makes it easier to write general optimizations and transformations that can operate on any program. It also makes it easier to write compiler systems that operate on many languages and architectures, such as gcc.