1 Chapter - 5: Intermediate Code Generation Bahir Dar Institute of Technology

Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

Chapter 5

Intermediate code generation

Chapter – 5 : Intermediate Code Generation 1 Bahir Dar Institute of Technology


Introduction

Phases of compiler
Chapter – 5 : Intermediate Code Generation 2 Bahir Dar Institute of Technology
Introduction to intermediate code generation
• Intermediate code is the interface between front end and back end in a
compiler
• It receives input from its predecessor phase semantic analyzer, in the
form of an annotated syntax tree.

• Translates the annotated abstract-syntax tree to intermediate code

• Ideally the details of source language are confined to the front end and
the details of target machines to the back end
▪ Means that, m * n compliers can be built by writing m front ends and n
back ends – save considerable amount of effort
▪ In a compiler,
• the front end translates source program into an
intermediate representation,
• and the back end generates the target code from this
intermediate representation.

Chapter – 5 : Intermediate Code Generation 3 Bahir Dar Institute of Technology


Introduction to intermediate code generation
• Although a compiler can directly produce a target language
(i.e. machine code or assembly of the target machine),
producing a machine independent intermediate representation
has the following benefits.
• Retargeting to another machine is facilitated.
▪ Intermediate code representation is neutral in relation to
target machine, so the same intermediate code generator can
be shared for all target languages (machines).
▪ Build a compiler for a new machine by attaching a new
code generator to an existing front-end
• Machine independent code optimization can be applied to
intermediate code.
• See the next two slides for more elaboration about benefits of IR
Chapter – 5 : Intermediate Code Generation 4 Bahir Dar Institute of Technology
Why IR?
Portability - Suppose We have n-source languages and m-Target
languages. Without Intermediate code we will change each source
language into target language directly. So, for each source-target
pair we will need a compiler. Hence we will require (n*m)
Compilers, one for each pair. If we Use Intermediate code We will
require n-Compilers to convert each source language into
Intermediate code and m-Compilers to convert Intermediate code
into m-target languages. Thus We require only (n+m) Compilers.

C SPARC

Pascal HP PA

FORTRAN x86

C++ IBM PPC

Chapter – 5 : Intermediate Code Generation 5 Bahir Dar Institute of Technology


Why IR?

C SPARC

Pascal HP PA
IR
FORTRAN x86

C++ IBM PPC

▪ Retargeting - Build a compiler for a new machine by attaching a new


code generator to an existing front-end.
▪ Optimization - reuse intermediate code optimizers in compilers for
different languages and different machines.
▪ Program understanding - Intermediate code is simple enough to be
easily converted to any target code but complex enough to represent all
the complex structure of high level language.
Chapter – 5 : Intermediate Code Generation 6 Bahir Dar Institute of Technology
Intermediate Languages/code Types
• An intermediate language is an abstract programming
language used by a compiler as an in-between step
when translating a computer program into machine
code.
• Before compiling the program into code for an actual,
physical machine, the compiler first translates it into
intermediate code suitable for a theoretical, abstract
machine.
• This code is analyzed by the compiler, and if any
opportunities for optimization are identified the
compiler can perform the optimizations when making
the translation into assembly language.

Chapter – 5 : Intermediate Code Generation 7 Bahir Dar Institute of Technology


Intermediate Languages/code Types
• Intermediate language can be many different languages, and the
designer of the compiler decides this intermediate language.
• Graphical IRs:
– Abstract Syntax trees
– Directed Acyclic Graphs (DAGs)
– Control Flow Graphs
• Linear IRs:
– postfix(suffix or polish) notation
– Three address code (quadruples)
– quadruples are close to machine instructions, but they are not actual machine instructions.

• some programming languages have well defined


intermediate languages:
• java – java virtual machine
• prolog – warren abstract machine
• In fact, there are byte-code emulators to execute instructions in
these intermediate languages.
Chapter – 5 : Intermediate Code Generation 8 Bahir Dar Institute of Technology
Graphical IRs
• Abstract Syntax Trees (AST) – retain essential structure of
the parse tree, eliminating unneeded nodes.

• Directed Acyclic Graphs (DAG) – gives same information but


in a more compacted AST to avoid duplication – smaller
footprint as well
•because common subexpressions are identified.

• Control flow graphs (CFG) – explicitly model control flow


• translation of statements like if-else and while-statements.
• In programming languages, Boolean expressions are used to:
• Alter the flow of control. (used as conditional expressions in
statements that alter the flow of control)
• Compute logical values (represent true or false values.
• And can be evaluated in analogy to arithmetic expressions using
three-address instructions with logical operators)

Chapter – 5 : Intermediate Code Generation 9 Bahir Dar Institute of Technology


Graphical IRs: Generating DAG
• Check whether an operand is already present
▫ if not, create a leaf for it
• Check whether there is a parent of the operand that represents
the same operation
▫ if not create one, then label the node representing the result
with the name of the destination variable, and remove that
label from all other nodes in the DAG
:= string a := b *-c + b*-c
a + :=

* * a +
b - (unary) b - (unary) *
b - (unary)
c c
AST c
DAG
Chapter – 5 : Intermediate Code Generation 10 Bahir Dar Institute of Technology
Constructing DAG/AST using Value Number Method
• Nodes of a syntax tree or DAG are stored in an array of records.
• Each row of the array represents one record, and therefore one node.
• In each record, the first field is an operation code, indicating the label
of the node.
• Leaves have one additional field, which holds the lexical value (either a
symbol-table pointer or a constant, in this case), and
• interior nodes have two additional fields indicating the left and right
children
• Egg. Representation of statement: i =i+10

index of the record for that


node within the array and
called the value number

Egg. node + has value number


3, and its left and right
children have value numbers
1 and 2, respectively. Nodes of a DAG for i = i + 10
allocated in an array

Chapter – 5 : Intermediate Code Generation 11 Bahir Dar Institute of Technology


Constructing DAG/AST using Value Number Method
• Egg.2: a= b* -c + b * -c

Chapter – 5 : Intermediate Code Generation 12 Bahir Dar Institute of Technology


Graphical IRs: control flow graphs
▪ Nodes in the control flow graph are basic blocks
• A basic block is a sequence of statements always entered
at the beginning of the block and exited at the end
▪ Edges in the control flow graph represent the control flow
Egg: B0
if (x < y) if (x < y) goto B1 else goto B2

x = 5*y + 5*y/3;
else B1 B2
y = 5; x = 5*y + 5*y/3 y = 5

x = x+y;
B3 x = x+y

• Each block has a sequence of statements


• No jump from or to the middle of the block
• Once a block starts executing, it will execute till the end
Chapter – 5 : Intermediate Code Generation 13 Bahir Dar Institute of Technology
Linear IRs: Postfix notation (PN)
• Postfix notation is a linearized representation of a syntax
tree;
• it is a list of the nodes of the tree in which a node appears
immediately after its children
• In postfix notation the operands occurs first and then
operators are arranged.
◼ Form Rules:
◼ If E is a variable/constant, the PN of E is E itself.
◼ If E is an expression of the form E1 op E2, the PN of E is E’1
E’2 op (E’1 and E’2 are the PN of E1 and E2, respectively.)
◼ If E is a parenthesized expression of form (E1), the PN of E
is the same as the PN of E1.
Ex: (A + B) * (C + D), then
PN: A B + C D + *
a* (b+c), then
PN: abc+* How about (a+b)/(c-d)
Chapter – 5 : Intermediate Code Generation 14 Bahir Dar Institute of Technology
Linear IRs: Three-Address Code
• A three-address code is a linearized representation of a syntax
tree or a DAG in which explicit names correspond to the interior
nodes of the graph.
• Has the form: x := y op z where x, y and z are names,
constants or compiler- generated temporaries; op is any operator.
• For example expression x+y*z can be translated into the
sequence of three-address instructions:
t1 =y*z,
t2= x+t1
• But we may also the following notation for three-
address code (it looks like a machine code instruction)
op y,z,x
apply operator op to y and z, and store the result in x.
• We use the term “three-address code” because each
statement usually contains three addresses (two for
operands, one for the result).
Chapter – 5 : Intermediate Code Generation 15 Bahir Dar Institute of Technology
Three address Representation of DAG/AST
• Source Code1: a = b * -c + b * -c

• Three address code:

Note that the statements: minus c appears two


t1= minus c and a = t5 have times b/c this code is for
only two addresses. abstract syntax tree

• Tree and DAG Representation

Chapter – 5 : Intermediate Code Generation 16 Bahir Dar Institute of Technology


Three address Representation of DAG/AST
• Source Code2: a + a * (b – c) + d * ( b - c)

b - c appears
once b/c this
code is for DAG

• DAG Representation
Three address code representation

Chapter – 5 : Intermediate Code Generation 17 Bahir Dar Institute of Technology


Types of Three-Address Statements
1. Binary Operator: op y,z,result or
result := y op z
where op is a binary arithmetic or logical operator.
This binary operator is applied to y and z, and the
result of the operation is stored in result.
Ex: add a,b,c
gt a,b,c
addr a,b,c
addi a,b,c
2. Unary Operator: op y, result or
result := op y
where op is a unary arithmetic or logical operator.
This unary operator is applied to y, and the result of
the operation is stored in result.
Ex: uminus a,c
Chapter – 5 : Intermediate Code Generation 18 Bahir Dar Institute of Technology
Types of Three-Address Instruction
3. Assignment Type 1: x := y op z
op is a binary arithmetic or logical operation
x, y and z are addresses
4. Assignment Type 2: x := op z
op is a unary arithmetic or logical operation
x and z are addresses
5. Copy Instruction: x:= y
x and y are addresses and x is assigned the value of y

6. Unconditional Jump: goto L


We will jump to the three-address code with the label L, and
the execution continues from that statement.
Ex: goto L1 // jump to L1
jmp 7 // jump to the statement 7
Chapter – 5 : Intermediate Code Generation 19 Bahir Dar Institute of Technology
Types of Three-Address Statements (cont.)
8. Procedure Parameters: param x
Procedure Calls: call p,n
where x is an actual parameter, we invoke the procedure
p with n parameters.

Chapter – 5 : Intermediate Code Generation 20 Bahir Dar Institute of Technology


Types of Three-Address Statements (cont.)
9. Indexed Assignments:
x := y[i]
sets x to the value in location i memory units beyond locationy
y[i] := x
sets contents of the location i memory units beyond location y to
the value of x
10. Address and Pointer Assignments:
x := &y
sets the r-value of x to l-value of y
x := *y where y is a pointer whose r-value is a location
sets the r-value of x equal to the contents of that location
*x := y
sets the r-value of the object pointed by x to the r-value of y

Chapter – 5 : Intermediate Code Generation 21 Bahir Dar Institute of Technology


Representing three-Address Statements

• A three-address statement is an abstract form of


intermediate code.

• Has three representations:


• quadruples,
• triples, and
• indirect triples

Chapter – 5 : Intermediate Code Generation 22 Bahir Dar Institute of Technology


Quadruples
▪ The quadruple is a structure with at the most four fields such
as op, arg1, arg2 and result.
▪ The op field is used to represent the internal code for
operator.
▪ The arg1 and arg2 represent the two operands.
▪ And result field is used to store the result of an expression.
• Example-1: The three-address instruction a:= x + y * z

y * z
x + t0

Chapter – 5 : Intermediate Code Generation 23 Bahir Dar Institute of Technology


Quadruples
• Store each fields directly
• A benefit of quadruples over triples can be seen in an optimizing
compiler, where instructions are often moved around.
• t0= y*z
• t0 = x + t0
• a = t1
Using array Using linked list
* y z t0
op arg1 arg2 result

* y z t0 + x t0 t1
+ x t0 t1
= t1 a
= t1 a

Less
Easy to
space
Re-order

Chapter – 5 : Intermediate Code Generation 24 Bahir Dar Institute of Technology


Quadruples
• Example-2: Three-address code for the assignment a = b * - c +b * - c ;
• Special operator minus is used to distinguish the unary minus operator (- c), from
binary minus operator (b – c)
NB: unary-minus "three-address" statement has only two addresses, like copy
statement a = t5
• Why do We need Copy Instructions like (a = t5) copy t5 into a rather than
assigning t2 + t4 to a directly?
• Each subexpression typically gets its own, new temporary to hold its result, and
only when the assignment operator = is processed do we learn where to put the
value of the complete expression.

Three address code and its quadruple representation


Chapter – 5 : Intermediate Code Generation 25 Bahir Dar Institute of Technology
Triples
A triple has only three fields, which we call op, arg1, and arg2.
• Example-1:
• a:= x + y * z
Solution: t0 :=y * z
t1 := x + t0
a := t1

op arg1 arg2

• Example-2: X[i]:= y 0 [ ]= x i
• But this instruction is difficult 1 := 0 y
• It takes two triples
Chapter – 5 : Intermediate Code Generation 26 Bahir Dar Institute of Technology
Triples
Triple representations of statement: a = b*- c + b*- c

In the triple representation in Fig. (b), the copy statement a = t5 is


encoded in the triple representation by placing a in the arg1 field and
(4) in the arg2 field.

Chapter – 5 : Intermediate Code Generation 27 Bahir Dar Institute of Technology


Indirect Triples
• Indirect triples consist of a listing of pointers to triples, rather than a listing
of triples themselves. i.e. listing pointers are used instead of using statement.
• With indirect triples, an optimizing compiler can move an instruction by reordering
the instruction list, without affecting the triples themselves

Chapter – 5 : Intermediate Code Generation 28 Bahir Dar Institute of Technology


Indirect Triples
• Example-2:
• Triple representations of statement: a = b*- c + b*- c
• Let us use an array instruction to list pointers to triples in the desired
order.

To avoid entering temporary names into the symbol


Indirect triples
table, we might refer to a temporary value by the
representation of
position of the statement that computes it
three-address code
Chapter – 5 : Intermediate Code Generation 29 Bahir Dar Institute of Technology
Reading assignment

• Declarations
• Declarations in procedures
• Flow of control statements
• Backpatching and Procedure calls

Chapter – 5 : Intermediate Code Generation 30 Bahir Dar Institute of Technology

You might also like