BCS402 Module 2
BCS402 Module 2
Module-2
Introduction to the ARM Instruction Set
Different ARM architecture revisions support different instructions. However, new revisions
usually add instructions and remain backwardly compatible. Code you write for architecture
ARMv4T should execute on an ARMv5TE processor.
In the following sections, the hexadecimal numbers are represented with the prefix 0x and
binary numbers with the prefix 0b.
The examples follow this format: PRE <pre-conditions>
<instruction/s>
POST <post-conditions>
In the pre- and post-conditions, memory is denoted as: mem<data_size>[address]
This refers to data_size bits of memory starting at the given byte address. For example,
mem32[1024] is the 32-bit value starting at address 1 KB.
ARM instructions process data held in registers and memory is accessed only with load and
store instructions.
ARM instructions commonly take two or three operands. For instance, the ADD instruction
below adds the two values stored in registers r1 and r2 (the source registers). It writes the result
to register r3 (the destination register).
ARM instructions are classified as - data processing instructions, branch instructions, load-
store instructions, software interrupt instruction, and program status register instructions.
Example: 1. This example shows a simple move instruction. The MOV instruction takes the
contents of register r5 and copies them into register r7, in this case, taking the value 5, and
overwriting the value 8 in register r7.
2. The MVN instruction complements the contents of register r5 and copies the same to register
r7.
• There are data processing instructions that do not use the barrel shift, for example, the
MUL (multiply), CLZ (count leading zeros), and QADD (signed saturated 32-bit add)
instructions.
Figure shows the data flow between the ALU and the barrel shifter. To illustrate the barrel
shifter, we will take below example.
• Register Rn enters the ALU without any pre- processing of registers. We apply a logical
shift left (LSL) to register Rm before moving it to the destination register. This is same
as applying the standard C language shift operator « to the register.
• The MOV instruction copies the shift operator result N into register Rd. N represents
the result of the LSL operation described in the following Table.
The example multiplies register r5 by four and then places the result (20 in decimal = 14 in
hexadecimal) into register r7.
Barrel shifter operations
For example, in above figure, the contents of bit0 are shifted to bit 1. Bit 0 is cleared. The C
flag is updated with the last bit shifted out of the register. This is bit (32-y) of the original value,
where y is the shift amount. When y is greater than one, shift by y positions is same as a shift
by one position executed y times.
Example:
1. PRE R5=0x80000000
R7=0X0
MOVS R7, R5, LSR #2
POST R5=0X80000000
R7=0X20000000, No change in CPSR flags
2. PRE R5=0x80000000
R7=0X0
MOVS R7, R5, ASR #2
POST R5=0X80000000
R7=0XE0000000, No change in CPSR flags
3. PRE R5=0x80000002
R7=0X0
MOVS R7, R5, ROR #2
POST R5=0X80000002
R7=0XA0000000, No change in CPSR flags
4. PRE R5=0x80000003
R7=0X0, C=0
MOVS R7, R5, RRX
POST R5=0X80000003
R7=0X40000001, C=1
Example: 1. The following simple subtract instruction subtracts a value stored in register r2
from a value stored in register r1. The result is stored in register r0.
2. The following reverse subtract instruction (RSB) subtracts r1 from the constant value #0,
writing the result to r0. You can use this instruction to negate numbers.
3. The SUBS instruction is useful for decrementing loop counters. In this example, we subtract
the immediate value one from the value one stored in register r1. The result value zero is written
to register r1. The CPSR is updated with the ZC flags being set.
4. PRE R0=0X00001234
R1=0X00123456
R2=0X00000000
ADD R2, R1, R0 ;R2=R1+R0
POST R0=0X00001234
R1=0X00123456
R2=0X0012468A
5. PRE R0=0X00001234
R1=0X00123456
R2=0X00000000
C=1 in CPSR
ADC R2, R1, R0 ;R2=R1+R0+C
POST R0=0X00001234
R1=0X00123456
R2=0X0012468B
6. PRE R0=0X00001234
R1=0X00123456
R2=0X00000000
C=1 in CPSR
SBC R2, R1, R0 ;R2=R1-R0-!C
POST R0=0X00001234
R1=0X00123456
R2=0X00122222
7. PRE R0=0X00001234
R1=0X00000012
R2=0X00000000
C=1 in CPSR
RSC R2, R1, R0 ;R2=R0-R1-!C
POST R0=0X00001234
R1=0X00000012
R2=0X00001222
2.1.4 Using the Barrel Shifter with Arithmetic Instructions
The wide range of second operand shifts available on arithmetic and logical instructions is a
very powerful feature of the ARM instruction set. The following Example illustrates the use of
the inline barrel shifter with an arithmetic instruction. The instruction multiplies the value
stored in register r1 by three.
Example: Register r1 is first shifted one location to the left to give the value of twice r1. The
ADD instruction then adds the result of the barrel shift operation to register r1. The result
transferred into register r0 is equal to three times the value stored in register r1.
3. PRE R1=0X000F00FF
R2=0X0000000F
R9=0X00000000
EOR R9, R2, R1 ;R9=R2 EX-OR R1
POST R1=0X000F00FF
R2=0X0000000F
R9=0X000F00F0
4. This example shows a more complicated logical instruction called BIC, which carries out a
logical bit clear.
In this example, register r2 contains a binary pattern where every binary 1 in r2 clears a
corresponding bit location in register r1. This instruction is particularly useful when clearing
status bits and is frequently used to change interrupt masks in the CPSR.
2.1.6 Comparison instructions
The comparison instructions are used to compare or test a register with a 32-bit value. They
update the CPSR flag bits according to the result, but do not affect other registers.
After the bits have been set, the information can then be used to change program flow by using
conditional execution. It is not required to apply the S suffix for comparison instructions to
update the flags.
2. PRE CPSR=nzcvqiFt_USER
R1=0XF0000000
R2=0X80000000
CMN R1, R2 ; R1+R2
POST CPSR=nzCVqiFt_USER
3. PRE CPSR=nzcvqiFt_USER
R0=0XF0000000
R9=0X08000000
TST R0, R9 ; R0&R9
POST CPSR=nZcvqiFt_USER
4. PRE CPSR=nzcvqiFt_USER
R0=0XF0000000
R9=0X08000000
TEQ R0,R9 ; R0^R9
POST CPSR=nzcvqiFt_USER
2.1.7 Multiply instructions
The multiply instructions multiply the contents of a pair of registers and, depending upon the
instruction, accumulate the results in with another register.
The long multiplies accumulate onto a pair of registers representing a 64-bit value. The result
is placed in a destination register or a pair of registers.
Example: 1. This example shows a simple multiply instruction that multiplies registers r1 and
r2 together and places the result into register r0. In this example, register r1 is equal to the value
2, and r2 is equal to 2. The result, 4, is then placed into register r0.
2. PRE R1=0X00000001
R2=0X00000001
R5=0X0000000F
R10=0X00000000
MLA R10, R2, R1, R5 ;R10=(R1*R2)+R5=(1*1)+15=16=0x10
POST R1=0X00000001
R2=0X00000001
R5=0X0000000F
R10=0X00000010
The long multiply instructions (SMLAL, SMULL, UMLAL, and UMULL) produce a 64-bit
result. The result is too large to fit a single 32-bit register so the result is placed in two registers
labeled RdLo and RdHi. RdLo holds the lower 32 bits of the 64-bit result, and RdHi holds the
higher 32 bits of the 64-bit result.
Example: 1. The following shows an example of a long unsigned multiply instruction. The
instruction multiplies registers r2 and r3 and places the result into register r0 and r1. Register
r0 contains the lower 32 bits and register r1 contains the higher 32 bits of the 64-bit result.
The address label is stored in the instruction as a signed pc-relative offset and must be within
approximately 32 MB of the branch instruction. T refers to the Thumb bit in the CPSR. When
instructions set T, the ARM switches to Thumb state.
Example: 1. This example shows a forward and backward branch. Because these loops are
address specific, we do not include the pre- and post-conditions. The forward branch skips
three instructions. The backward branch creates an infinite loop.
In this example, forward and backward are the labels. The branch labels are placed at the
beginning of the line and are used to mark an address that can be used later by the assembler
to calculate the branch offset.
2.
The branch with link, or BL, instruction is like the B instruction but overwrites the link
register LR with a return address. It performs a subroutine call.
Example: This example shows a simple fragment of code that, branches to a subroutine using
the BL instruction. To return from a subroutine, you copy the link register to the pc.
or
The BX instruction uses an absolute address stored in register Rm. It is primarily used to
branch to and from Thumb code. The T bit in the CPSR is updated by the least significant bit
of the branch register.
Similarly, the BLX instruction updates the T bit of the CPSR with the least significant bit and
additionally sets the link register with the return address.
i.e., Bit[0] of Rm is copied into the T bit in the CPSR and bits[31:1] are moved into the PC:
• If Rm[0] is 1, the processor switches to execute Thumb instructions and begins
executing at the address in Rm aligned to a half-word boundary by clearing the bottom
bit.
• If Rm[0] is 0, the processor continues executing ARM instructions and begins
executing at the address in Rm aligned to a word boundary by clearing Rm[l].
Example: This example shows a load from a memory address contained in register r1, followed
by a store back to the same address in memory.
The first instruction loads a word from the address stored in register r1 and places it into register
r0. The second instruction goes the other way by storing the contents of register r0 to the
address contained in register r1. The offset from register r1 is zero. Register r1 is called the
base address register.
2.3.2 Single-Register Load-Store Addressing Modes
The ARM instruction set provides different modes for addressing memory. These modes
incorporate one of the indexing methods: preindex with writeback,
preindex, and
postindex.
• Pre-index with writeback calculates an address from a base register plus address offset
and then updates that address base register with the new address.
PRE r0=0x00000000
R1=0x00009000
• Pre-index offset is the same as the pre-index with writeback but does not update the
address base register. The pre-index mode is useful for accessing an element in a data
structure.
• Post index only updates the address base register after the address is used.
This example shows how each indexing method affects the address held in register r1, as well
as the data loaded into register r0.
The postindex and preindex with writeback modes are useful for traversing an array.
The addressing modes available with a particular load or store instruction depend on the
instruction class.
The following Table shows the addressing modes available for load and store of a 32-bit word
or an unsigned byte.
The following Table shows the addressing modes available on load and store instructions using
16-bit halfword or signed byte data.
These operations cannot use the barrel shifter. There are no STRSB or STRSH instructions
since STRH stores both a signed and unsigned halfword; similarly, STRB stores signed and
unsigned bytes.
The following Table shows the variations for STRH instructions.
The following Table shows the different addressing modes for the load-store multiple
instructions. Here N is the number of registers in the list of registers.
Example: 1. In this example, register r0 is the base register Rn and is followed by !, indicating
that the register is updated after the instruction is executed. You will notice within the load
multiple instruction that the registers are not individually listed. Instead, the “-” character is
used to identify a range of registers. In this case the range is from register r1 to r3 inclusive.
Each register can also be listed, using a comma to separate each register within “{” and “}”
brackets.
The following Figure shows a graphical representation. The base register r0 points to memory
address 0x80010 in the PRE condition.
Memory addresses 0x80010, 0x80014, and 0x80018 contain the values 1, 2, and 3 respectively.
After the load multiple instruction executes, registers r1, r2, and r3 contain these values as
shown in the following Figure. The base register r0 now points to memory address 0x8001c
after the last loaded word.
LDMIB r0!,{r1-r3}
POST r0=0x00080018
r1=0x00000002
r2=0x00000003
r3=0x00000004
The first word pointed to by register r0 is ignored and register r1 is loaded from the next
memory location as shown in the following Figure.
After execution, register r0 now points to the last loaded memory location. This is in contrast
with the LDMIA example, which pointed to the next memory location.
The decrement versions DA and DB of the load-store multiple instructions decrement the
start address and then store to ascending memory locations.
This is equivalent to descending memory but accessing the register list in reverse order.
With the increment and decrement load multiples; you can access arrays forwards or
backwards. They also allow for stack push and pull operations.
The following Table shows a list of load-store multiple instruction pairs when base update is
used. If you use a store with base update, then the paired load instruction of the same number
of registers will reload the data and restore the base address pointer.
This is useful when you need to temporarily save a group of registers and restore them later.
Example: This example shows an STM increment before instruction followed by an LDM
decrement after instruction.
The STMIB instruction stores the values 7, 8, 9 to memory. We then corrupt register r1 to r3.
The LDMDA reloads the original values and restores the base pointer r0.
Example: We illustrate the use of the load-store multiple instructions with a block memory
copy example. This example is a simple routine that copies blocks of 32 bytes from a source
address location to a destination address location.
The example has two load-store multiple instructions, which use the same increment after
addressing mode.
This routine relies on registers r9, r10, and r11 being set up before the code is executed.
Registers r9 and r11 determine the data to be copied, and register r10 points to the destination
in memory for the data.
• LDMIA loads the data pointed to by register r9 into registers r0 to r7. It also updates r9
to point to the next block of data to be copied.
• STMIA copies the contents of registers r0 to r7 to the destination memory address
pointed to by register r10. It also updates r10 to point to the next destination location.
• CMP and BNE compare pointers r9 and r11 to check whether the end of the block copy
has been reached.
• If the block copy is complete, then the routine finishes; otherwise, the loop repeats with
the updated values of register r9 and r10.
• The BNE is the branch instruction B with a condition mnemonic NE (not equal). If the
previous compare instruction sets the condition flags to not equal, the branch instruction
is executed.
The following Figure shows the memory map of the block memory copy and how the routine
moves through memory.
Theoretically this loop can transfer 32 bytes (8 words) in two instructions, for a maximum
possible throughput of 46 MB/second being transferred at 33 MHz. These numbers assume a
perfect memory system with fast memory.
[Link] Stack Operation
The ARM architecture uses the load-store multiple instructions to carry out stack operations.
• The pop operation (removing data from a stack) uses a load multiple instruction.
• The push operation (placing data onto the stack) uses a store multiple instruction.
When using a stack you have to decide whether the stack will grow up or down in memory. A
stack is either – ascending (A) – stacks grow towards higher memory addresses or
descending (D) – stacks grow towards lower memory addresses.
When you use a full stack (F), the stack pointer sp points to an address that is the last used or
full location (i.e., sp points to the last item on the stack).
If you use an empty stack (E) the sp points to an address that is the first unused or empty
location (i.e., it points after the last item on the stack).
There are number of load-store multiple addressing mode aliases available to support stack
operations (see the following Table).
Next to the pop column is the actual load multiple instruction equivalent.
For example, a full ascending stack would have the notation FA appended to the load multiple
instruction—LDMFA. This would be translated into an LDMDA instruction.
Example: The STMFD instruction pushes registers onto the stack, updating the sp. The
following Figure shows a push onto a full descending stack. You can see that when the stack
grows the stack pointer points to the last full entry in the stack.
Example: The following Figure shows a push operation on an empty stack using the STMED
instruction.
The STMED instruction pushes the registers onto the stack but updates register sp to point to
the next empty location.
Stack error
When handling a checked stack there are three attributes that need to be preserved:
the stack base,
the stack pointer, and
the stack limit.
• The stack base is the starting address of the stack in memory.
• The stack pointer initially points to the stack base; as data is pushed onto the stack, the
stack pointer descends memory and continuously points to the top of stack.
• If the stack pointer passes the stack limit, then a stack overflow error has occurred.
Here is a small piece of code that checks for stack overflow errors for a descending stack
If register r10 as the stack limit or sl. This is optional since it is only used when stack checking
is enabled.
• The BLLO instruction is a branch with link instruction plus the condition mnemonic
LO.
• If sp is less than register r10 after the new items are pushed onto the stack, then stack
overflow error has occurred.
• If the stack pointer goes back past the stack base, then a stack underflow error has
occurred.
2.3.4 Swap Instruction
The swap instruction is a special case of a load-store instruction. It swaps the contents of
memory with the contents of a register.
This instruction is an atomic operation—it reads and writes a location in the same bus
operation, preventing any other instruction from reading or writing to that location until it
completes.
Syntax:
Swap cannot be interrupted by any other instruction or any other bus access. We say the system
“holds the bus” until the transaction is complete. Also, swap instruction allows for both a word
and a byte swap.
Example: The swap instruction loads a word from memory into register r0 and overwrites the
memory with register r1.
Summary
When the processor executes an SWI instruction, it sets the program counter pc to the offset
0x8 in the vector table. The instruction also forces the processor mode to SVC, which allows
an operating system routine to be called in a privileged mode.
Each SWI instruction has an associated SWI number, which is used to represent a particular
function call or feature.
Example: Here we have a simple example of an SWI call with SWI number 0x123456, used
by ARM toolkits as a debugging SWI. Typically, the SWI instruction is executed in user mode.
Since SWI instructions are used to call operating system routines, you need some form of
parameter passing. This is achieved using registers. In this example, register r0 is used to pass
the parameter 0x12. The return values are also passed back via registers.
Code called the SWI handler is required to process the SWI call. The handler obtains the SWI
number using the address of the executed instruction, which is calculated from the link register.
The SWI number is determined by
Here the SWI instruction is the actual 32-bit SWI instruction executed by the processor.
Example: This example shows the start of an SWI handler implementation. The code fragment
determines what SWI number is being called and places that number into register r10.
You can see from this example that the load instruction first copies the complete SWI
instruction into register r10. The BIC instruction masks off the top bits of the instruction,
leaving the SWI number. We assume the SWI has been called from ARM state.
The number in register r10 is then used by the SWI handler to call the appropriate SWI service
routine.
The following Example shows how to enable IRQ interrupts by clearing the I mask. This
operation involves using both the MRS and MSR instructions to read from and then write to
the CPSR.
Example: The MSR first copies the CPSR into register r1. The BIC instruction clears bit 7 of
r1. Register r1 is then copied back into the CPSR, which enables IRQ interrupts. You can see
from this example that this code preserves all the other settings in the CPSR and only modifies
the I bit in the control field.
#0x80 = 128 in decimal = 10000000 in binary; used to clear I bit in Control Field
Here CP15 register-0 contains the processor identification number. This register is copied into
the general-purpose register r10.
The first pseudo-instruction writes a 32-bit constant to a register using whatever instructions
are available. It defaults to a memory read if the constant cannot be encoded using other
instructions.
• The second pseudo-instruction writes a relative address into a register, which will be encoded
using a pc-relative expression.
Example: This example shows an LDR instruction loading a 32-bit constant 0xff00ffff into
register r0.
This example involves a memory access to load the constant, which can be expensive for time-
critical routines.
The following Example shows an alternative method to load the same constant into register r0
by using an MVN instruction.
Example: Loading the constant 0xff00ffff using an MVN.
As you can see, there are alternatives to accessing memory, but they depend upon the constant
you are trying to load.
The LDR pseudo-instruction either inserts an MOV or MVN instruction to generate a value (if
possible) or generates an LDR instruction with a pc-relative address to read the constant from
a literal pool—a data area embedded within the code.
The following Table shows two pseudo-code conversions.
The first conversion produces a simple MOV instruction; the second conversion produces a pc-
relative load.
Another useful pseudo-instruction is the ADR instruction or address relative. This instruction
places the address of the given label into register Rd, using a pc-relative add or subtract.
Learning Resources
Textbooks
1. Andrew N Sloss, Dominic Symes and Chris Wright, ARM system developers guide, Elsevier,
Morgan Kaufman publishers, 2008.
Reference Books
1. Raghunandan. G.H, Microcontroller (ARM) and Embedded System, Cengage learning
Publication,2019
2. The Insider’s Guide to the ARM7 Based Microcontrollers, Hitex Ltd.,1st edition, 2005.
VTU Question paper questions
June/July 2024
June/July 2023
July/Aug 2022
Aug/Sep 2020
Jun/Jul 2019
Dec.2019 /Jan.2020
June/July 2018
Dec.2018/Jan. 2019