Skip to content

[AArch64] Variety of incorrect disassembly info (mainly SVE) #2472

@FinnWilkinson

Description

@FinnWilkinson

Work environment

Questions Answers
OS/arch/bits MacOS AArch64
Architecture armv8
Source of Capstone git clone
Version/git commit df72286

Sorry for not using the template fully, and sorry in advance for the long issue, but I have identified a fair few instructions (mainly SVE) with incorrect access types, and others with incorrect implicit register reads / writes or no immidiate encoding.

Incorrect implicit destinations

  • Opcode AArch64_MRS (0x42d03bd5) currently has the NZCV register as an implicit write register. This isn't correct
  • Opcode AArch64_BLR (0x20003fd6) has an implicit read of SP. Again this isn't correct.

Incorrect access permissions

For this, for SVE instructions which have the format of fsub zdn, pg, zdn, zm - that is, where the 1st and 2nd z-regs must be the same - both operands have their access set to READ | WRITE.

Although this is technically correct, as register zdn is being written to and read from, I think it can be confusing. My reasoning for this is that operand[0] represents the register being written to, so its access should be just WRITE. Then operand[2] (in the example above) is the first source vector and should be READ.
If someone does not know that the destination vector register and the first source vector are mandated by the ISA spec to be the same register, then it could be confusing to see 2 registers being written to.

Here is an examples of this occuring:

 0  20 00 19 04  eor	z0.b, p0/m, z0.b, z1.b
	ID: 311 (eor)
	op_count: 4
		operands[0].type: REG = z0
		operands[0].access: READ | WRITE
			Vector Arrangement Specifier: 0x8
		operands[1].type: PREDICATE
		operands[1].pred.reg: p0
		operands[1].access: READ
		operands[2].type: REG = z0
		operands[2].access: READ | WRITE
			Vector Arrangement Specifier: 0x8
		operands[3].type: REG = z1
		operands[3].access: READ
			Vector Arrangement Specifier: 0x8
	Write-back: True
	Registers read: z0 p0 z1
	Registers modified: z0
	Groups: HasSVEorSME

Which could be instead this:

 0  20 00 19 04  eor	z0.b, p0/m, z0.b, z1.b
	ID: 311 (eor)
	op_count: 4
		operands[0].type: REG = z0
		operands[0].access: WRITE
			Vector Arrangement Specifier: 0x8
		operands[1].type: PREDICATE
		operands[1].pred.reg: p0
		operands[1].access: READ
		operands[2].type: REG = z0
		operands[2].access: READ
			Vector Arrangement Specifier: 0x8
		operands[3].type: REG = z1
		operands[3].access: READ
			Vector Arrangement Specifier: 0x8
	Write-back: True
	Registers read: z0 p0 z1
	Registers modified: z0
	Groups: HasSVEorSME

I don't have a comprehensive list of all instructions that are effected by this, but it generally seems to be SVE only and with the format Zdn, pg, zdn, <zm|#imm>.
Below is a list of opcode enums (and some bytecodes where I've made a note of them) of the ones I have run into so far:

  • AArch64_FSUB_ZPmI_D
  • AArch64_FSUB_ZPmI_H
  • AArch64_FSUB_ZPmI_S // Example bytecode - 00849965
  • AArch64_FMUL_ZPmI_D
  • AArch64_FMUL_ZPmI_H
  • AArch64_FMUL_ZPmI_S // Example bytecode - 00809a65
  • AArch64_FADD_ZPmI_D // Example bytecode - 0584d865
  • AArch64_FADD_ZPmI_H
  • AArch64_FADD_ZPmI_S
  • AArch64_AND_ZPmZ_D // Example bytecode - 4901da04
  • AArch64_AND_ZPmZ_H
  • AArch64_AND_ZPmZ_S
  • AArch64_AND_ZPmZ_B
  • AArch64_SMULH_ZPmZ_B // Example bytecode - 20001204
  • AArch64_SMULH_ZPmZ_D
  • AArch64_SMULH_ZPmZ_H
  • AArch64_SMULH_ZPmZ_S
  • AArch64_SMIN_ZPmZ_B
  • AArch64_SMIN_ZPmZ_D
  • AArch64_SMIN_ZPmZ_H
  • AArch64_SMIN_ZPmZ_S // Example bytecode - 01008a04
  • AArch64_SMAX_ZPmZ_B
  • AArch64_SMAX_ZPmZ_D
  • AArch64_SMAX_ZPmZ_H
  • AArch64_SMAX_ZPmZ_S // Example bytecode - 01008804
  • AArch64_MUL_ZPmZ_B // Example bytecode - 40001004
  • AArch64_MUL_ZPmZ_D
  • AArch64_MUL_ZPmZ_H
  • AArch64_MUL_ZPmZ_S
  • AArch64_FSUBR_ZPmZ_D
  • AArch64_FSUBR_ZPmZ_H
  • AArch64_FSUBR_ZPmZ_S // Example bytecode - 24808365
  • AArch64_FSUB_ZPmZ_D
  • AArch64_FSUB_ZPmZ_H
  • AArch64_FSUB_ZPmZ_S // Example bytecode - 24808165
  • AArch64_FMUL_ZPmZ_D
  • AArch64_FMUL_ZPmZ_H
  • AArch64_FMUL_ZPmZ_S // Example bytecode - 83808265
  • AArch64_FDIV_ZPmZ_D // Example bytecode - 0184cd65
  • AArch64_FDIV_ZPmZ_H
  • AArch64_FDIV_ZPmZ_S
  • AArch64_FDIVR_ZPmZ_D // Example bytecode - 0184cc65
  • AArch64_FDIVR_ZPmZ_H
  • AArch64_FDIVR_ZPmZ_S
  • AArch64_FADDA_VPZ_D
  • AArch64_FADDA_VPZ_H
  • AArch64_FADDA_VPZ_S // Example bytecode - 01249865
  • AArch64_FADD_ZPmZ_D // Example bytecode - 6480c065
  • AArch64_FADD_ZPmZ_H
  • AArch64_FADD_ZPmZ_S
  • AArch64_FCADD_ZPmZ_D // Example bytecode - 2080c064
  • AArch64_FCADD_ZPmZ_H
  • AArch64_FCADD_ZPmZ_S
  • AArch64_ADD_ZPmZ_B // Example bytecode - 00000004
  • AArch64_ADD_ZPmZ_D
  • AArch64_ADD_ZPmZ_H
  • AArch64_ADD_ZPmZ_S
  • AArch64_EOR_ZPmZ_B // Example bytecode - 20001904
  • AArch64_EOR_ZPmZ_D
  • AArch64_EOR_ZPmZ_H
  • AArch64_EOR_ZPmZ_S

Similar has also been seen with unpredicated SVE instructions where operand[0] and operand[1] must be the same SVE vector register:

  • AArch64_SMAX_ZI_B
  • AArch64_SMAX_ZI_D
  • AArch64_SMAX_ZI_H
  • AArch64_SMAX_ZI_S // Example bytecode - 03c0a825
  • AArch64_AND_ZI // Example bytecode - 00068005
  • AArch64_ADD_ZI_B // Example bytecode - 00c12025
  • AArch64_ADD_ZI_D
  • AArch64_ADD_ZI_H
  • AArch64_ADD_ZI_S

Incorrect access permissions pt. 2

There are some other instructions I have found with wrong access information.

AArch64_CASALX and AArch64_CASALW // Example bytecode - 02fce188

 0  02 fc e1 88  casal	w1, w2, [x0]
	ID: 127 (casal)
	op_count: 3
		operands[0].type: REG = w1
		operands[0].access: READ | WRITE
		operands[1].type: REG = w2
		operands[1].access: READ
		operands[2].type: MEM
			operands[2].mem.base: REG = x0
		operands[2].access: READ | WRITE
	Write-back: True
	Registers read: w1 w2 x0
	Registers modified: w1 x0
	Groups: HasLSE

All permissions should be READ as no register is updated with CASAL. Also writeback should be False:

 0  02 fc e1 88  casal	w1, w2, [x0]
	ID: 127 (casal)
	op_count: 3
		operands[0].type: REG = w1
		operands[0].access: READ
		operands[1].type: REG = w2
		operands[1].access: READ
		operands[2].type: MEM
			operands[2].mem.base: REG = x0
		operands[2].access: READ
	Registers read: w1 w2 x0
	Registers modified: w1 x0
	Groups: HasLSE

AArch64_FCVTNv4i32 // Example bytecode - 0168614e

 0  01 68 61 4e  fcvtn2	v1.4s, v0.2d
	ID: 367 (fcvtn2)
	op_count: 2
		operands[0].type: REG = q1 (vreg)
		operands[0].access: READ | WRITE
			Vector Arrangement Specifier: 0x420
		operands[1].type: REG = q0 (vreg)
		operands[1].access: READ
			Vector Arrangement Specifier: 0x240
	Write-back: True
	Registers read: fpcr q1 q0
	Registers modified: q1
	Groups: HasNEON

operands[0] should be WRITE only. More variants of this instruction may be effected, I just haven't verified this:

 0  01 68 61 4e  fcvtn2	v1.4s, v0.2d
	ID: 367 (fcvtn2)
	op_count: 2
		operands[0].type: REG = q1 (vreg)
		operands[0].access: WRITE
			Vector Arrangement Specifier: 0x420
		operands[1].type: REG = q0 (vreg)
		operands[1].access: READ
			Vector Arrangement Specifier: 0x240
	Write-back: True
	Registers read: fpcr q1 q0
	Registers modified: q1
	Groups: HasNEON

Imm not set when a shift is present

For many instructions that take an immidiate, a shift can also optionally be provided. When the shift is not provided, the instructions work fine.
However, when the shift is provided the shift amount is often fixed or in a range. As such, Capstone / LLVM disassembler automatically works out the shifted value. The shifted immidiate is given correctly in the operand string, but is not in the disassembly info.

Example: AArch64_CPY_ZPzI_H: // Example bytecode - 01215005

 0  01 21 50 05  mov	z1.h, p0/z, #0x800
	ID: 273 (cpy)
	Is alias: 1429 (mov) with REAL operand set
	op_count: 4
		operands[0].type: REG = z1
		operands[0].access: WRITE
			Vector Arrangement Specifier: 0x10
		operands[1].type: PREDICATE
		operands[1].pred.reg: p0
		operands[1].access: READ
		operands[2].type: IMM = 0x0
		operands[2].access: READ
		operands[3].type: IMM = 0x0
		operands[3].access: READ
	Registers read: p0
	Registers modified: z1
	Groups: HasSVEorSME

Here, there is an extra operand in operand[3], and the imm is not set:

 0  01 21 50 05  mov	z1.h, p0/z, #0x800
	ID: 273 (cpy)
	Is alias: 1429 (mov) with REAL operand set
	op_count: 4
		operands[0].type: REG = z1
		operands[0].access: WRITE
			Vector Arrangement Specifier: 0x10
		operands[1].type: PREDICATE
		operands[1].pred.reg: p0
		operands[1].access: READ
		operands[2].type: IMM = 0x800
		operands[2].access: READ
	Registers read: p0
	Registers modified: z1
	Groups: HasSVEorSME

An alternative assembly for this instruction (and the one I used to generate the bytecode) is cpy z1.h, p0/z, #8, lsl #8, where the only LSL available is by #8.

The instructions we have found to be effected are:

  • AArch64_ADD_ZI_B
  • AArch64_ADD_ZI_D
  • AArch64_ADD_ZI_H
  • AArch64_ADD_ZI_S
  • AArch64_CPY_ZPzI_B
  • AArch64_CPY_ZPzI_D
  • AArch64_CPY_ZPzI_H
  • AArch64_CPY_ZPzI_S

This issue is likely to effect all instructions which use immidiates and optional shifts in this way.

FP immidate not shown in disassembly information

For instructions which take a fixed floating point immidiate value, it is correctly identified that one exists, and the EXACTFPIMM field is populated. But, we also have the .fp field in the cs_aarch64_op union. It could be useful to also populate this field as well as the enum for better clarity and improved in-project usage.

Example: AArch64_FADD_ZPmI_D // Example bytecode - 0584d865

 0  05 84 d8 65  fadd	z5.d, p1/m, z5.d, #0.5
	ID: 332 (fadd)
	op_count: 4
		operands[0].type: REG = z5
		operands[0].access: READ | WRITE
			Vector Arrangement Specifier: 0x40
		operands[1].type: PREDICATE
		operands[1].pred.reg: p1
		operands[1].access: READ
		operands[2].type: REG = z5
		operands[2].access: READ | WRITE
			Vector Arrangement Specifier: 0x40
		operands[3].type: SYS IMM:
		operands[3].subtype EXACTFPIMM = 1
	Write-back: True
	Registers read: z5 p1
	Registers modified: z5
	Groups: HasSVEorSME

Could be

 0  05 84 d8 65  fadd	z5.d, p1/m, z5.d, #0.5
	ID: 332 (fadd)
	op_count: 4
		operands[0].type: REG = z5
		operands[0].access: READ | WRITE
			Vector Arrangement Specifier: 0x40
		operands[1].type: PREDICATE
		operands[1].pred.reg: p1
		operands[1].access: READ
		operands[2].type: REG = z5
		operands[2].access: READ | WRITE
			Vector Arrangement Specifier: 0x40
		operands[3].type: SYS IMM:
		operands[3].subtype EXACTFPIMM = 1
                 operands[3].fp = 0.5
	Write-back: True
	Registers read: z5 p1
	Registers modified: z5
	Groups: HasSVEorSME

Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    AArch64ArchLLVMAnything LLVM relatedbugSomething is not working as it should

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions