3. Myself was a Kaffe (world-frist open source JVM)
Developer
● Threaded Interpreter, JIT, AWT for
embedded system, robustness
was a GCJ (Java Frontend for GCC)
and GNU Classpath Developer
is an AOSP (Android Open Source
Project) contributror
● 45+ patches are merged officially
● bionic libc, ARM optimizations
4. Goals of This Presentation
• Understand how a virtual machine works
• Analyze the Dalvik VM using existing tools
• VM hacking is really interesting!
6. Reference Hardware and Host
Configurations
• Android Phone: Nexus S
– http://www.google.com/phone/detail/nexus-s
– Install CyanogenMod (CM9)
http://www.cyanogenmod.com/
• Host: Lenovo x200
– Ubuntu Linux 11.10+ (32-bit)
• AOSP/CM9 source code: 4.0.3
• Follow the instructions in Wiki
http://wiki.cyanogenmod.com/wiki/Building_from_source
7. Build CyanogenMod from Source
• cyanogen-ics$ source build/envsetup.sh
including device/moto/stingray/vendorsetup.sh
including device/moto/wingray/vendorsetup.sh
including device/samsung/maguro/vendorsetup.sh
including device/samsung/toro/vendorsetup.sh
including device/ti/panda/vendorsetup.sh
including vendor/cm/vendorsetup.sh
including sdk/bash_completion/adb.bash
• cyanogen-ics$ lunch
You're building on Linux
Lunch menu... pick a combo:
1. full-eng Target: cm_crespo
… Configuration: userdebug
8. full_panda-eng
9. cm_crespo-userdebug
8. Nexus S Device Configurations
• Which would you like? [full-eng] 9
============================================
PLATFORM_VERSION_CODENAME=REL
PLATFORM_VERSION=4.0.3
TARGET_PRODUCT=cm_crespo
TARGET_BUILD_VARIANT=userdebug
TARGET_BUILD_TYPE=release
TARGET_BUILD_APPS=
TARGET_ARCH=arm
TARGET_ARCH_VARIANT=armv7-a-neon
HOST_ARCH=x86
HOST_OS=linux
HOST_BUILD_TYPE=release
BUILD_ID=MR1
============================================
9. Build Dalvik VM
(ARM Target + x86 Host)
• cyanogen-ics$ make dalvikvm dalvik
============================================
PLATFORM_VERSION_CODENAME=REL
PLATFORM_VERSION=4.0.3
TARGET_PRODUCT=cm_crespo
TARGET_BUILD_VARIANT=userdebug
TARGET_BUILD_TYPE=release
… libdvm.so is the VM engine
Install: out/host/linux-x86/lib/libdvm.so
Install: out/target/product/crespo/system/bin/dalvikvm
host C++: dalvikvm <= dalvik/dalvikvm/Main.cpp
host Executable: dalvikvm Install: out/host/linux-
x86/bin/dalvikvm
Copy: dalvik (out/host/linux-
x86/obj/EXECUTABLES/dalvik_intermediates/dalvik)
Install: out/host/linux-x86/bin/dalvik
“dalvik” is a shell script to launch dvm
10. Dalvik VM requires core APIs for runtime
cyanogen-ics$ out/host/linux-x86/bin/dalvik
E( 6983) No valid entries found in bootclasspath
'/tmp/cyanogen-ics/out/host/linux-x86/framework/core-
hostdex.jar:/tmp/cyanogen-ics/out/host/linux-
x86/framework/bouncycastle-hostdex.jar:/tmp/cyanogen-
ics/out/host/linux-x86/framework/apache-xml-
hostdex.jar' (dalvikvm)
E( 6983) VM aborting (dalvikvm)
...
out/host/linux-x86/bin/dalvik: line 28: 6983
Segmentation fault (core dumped)
ANDROID_PRINTF_LOG=tag ANDROID_LOG_TAGS=""
ANDROID_DATA=/tmp/android-data
ANDROID_ROOT=$ANDROID_BUILD_TOP/out/host/linux-x86
LD_LIBRARY_PATH=$ANDROID_BUILD_TOP/out/host/linux-x86/lib
$ANDROID_BUILD_TOP/out/host/linux-x86/bin/dalvikvm -Xbootclasspath:
$ANDROID_BUILD_TOP/out/host/linux-x86/framework/core-hostdex.jar:
$ANDROID_BUILD_TOP/out/host/linux-x86/framework/bouncycastle-
hostdex.jar:
$ANDROID_BUILD_TOP/out/host/linux-x86/framework/apache-xml-
hostdex.jar $*
11. Satisfy Dalvik Runtime Dependency
cyanogen-ics$ make bouncycastle bouncycastle-hostdex
cyanogen-ics$ make sqlite-jdbc mockwebserver
cyanogen-ics$ make sqlite-jdbc-host
cyanogen-ics$ make mockwebserver-hostdex
cyanogen-ics$ make apache-xml-hostdex
cyanogen-ics$ (cd libcore && make)
cyanogen-ics$ out/host/linux-x86/bin/dalvik
...
I(19820) Unable to open or create cache for
/tmp/cyanogen-ics/out/host/linux-x86/framework/core-
hostdex.jar (/data/dalvik-cache/tmp@cyanogen-
ics@out@host@linux-x86@framework@core-
[email protected]) (dalvikvm)
E(19820) Could not stat dex cache directory
'/data/dalvik-cache': No such file or directory
(dalvikvm)
Extra space for "dalvik-cache" is required.
12. Host-side Dalvik VM
cyanogen-ics$ make dexopt
cyanogen-ics$ sudo mkdir -p /data/dalvik-cache
cyanogen-ics$ sudo chmod 777 /data/dalvik-cache
cyanogen-ics$ out/host/linux-x86/bin/dalvik
Dalvik VM requires a class name
Finally, host-side dalvik vm is ready.
It just complain no given class.
cyanogen-ics$ ls /data/dalvik-cache/
tmp@cyanogen-ics@out@host@linux-x86@framework@apache-xml-
[email protected]
tmp@cyanogen-ics@out@host@linux-x86@framework@bouncycastle-
[email protected]
tmp@cyanogen-ics@out@host@linux-x86@framework@core-
[email protected]
Optimized DEX generated by “dexopt"
13. Agenda (1) How Virtual Machine Works
(2) Dalvik VM
(3) Utilities
15. What is Virtual Machine
• A virtual machine (VM) is a software implementation
of a machine (i.e. a computer) that executes programs
like a physical machine.
• Basic parts
– A set of registers
– A stack (optional)
– An execution environment
– A garbage-collected heap
– A constant pool
– A method storage area
– An instruction set
16. VM Types
• Based on its functionality
– System Virtual Machine
supports execution of a complete OS
– Process Virtual Machine
supports execution of a single process
• Based on its architecture
– Stack based VM (uses instructions to load in a
stack for execution)
– Register based VM (uses instructions to be
encoded in source and destination registers)
17. JVM Conceptual Architecture
Class
Classfile
Loader
Memory Space Automatic
memory
Native
Java method
manager
Method Java heap
stack heap
Address
Data and
Instruction
Instruction counter Native
Execution Native
and implicit Method
Engine LIbraries
registers Interface
18. Segment
javaframe
optop
method
Class fields
pc
others
{Variables locales }
vars
19. Segment
javaframe
vars
Environnement cotext
javaframe_i optop
optop_i
registres
20. Example: JVM
• Example Java source: Foo.java
class Foo {
public static void main(String[] args) {
System.out.println("Hello, world");
}
int calc(int a, int b) {
int c = 2 * (a + b);
return c;
}
}
22. Bytecode execution
c := 2 * (a + b)
• Example bytecode
– iconst 2
– iload a
– iload b
– iadd
– imul
– istore c
23. Example bytecode:
iconst 2
iload a a 42
iload b b 7
iadd c 0
imul 2
istore c
Computes: c := 2 * (a + b)
24. Example:
iconst 2
iload a a 42
iload b b 7
iadd c 0 42
imul 2
istore c
Computes: c := 2 * (a + b)
25. Example:
iconst 2
iload a a 42
iload b b 7 7
iadd c 0 42
imul 2
istore c
Computes: c := 2 * (a + b)
26. Example:
iconst 2
iload a a 42
iload b b 7
iadd c 0 49
imul 2
istore c
Computes: c := 2 * (a + b)
27. Example:
iconst 2
iload a a 42
iload b b 7
iadd c 0
imul 98
istore c
Computes: c := 2 * (a + b)
28. Example:
iconst 2
iload a a 42
iload b b 7
iadd c 98
imul
istore c
Computes: c := 2 * (a + b)
29. iadd in specification and implementation
③ add
value1 + value2
④ push
① pop
value2
② pop value1 +
value1 value2
case SVM_INSTRUCTION_IADD: {
/* instruction body */
jint value1 = stack[stack_size - 2].jint; ②
jint value2 = stack[--stack_size].jint; ①
stack[stack_size - 1].jint = value1 +
④ value2; ③
/* dispatch */
goto dispatch;
} Taken from SableVM
sablevm/src/libsablevm/instructions_switch.c
30. Example: Dalvik VM
$ dx --dex --output=Foo.dex Foo.class
$ dexdump -d Foo.dex
Processing 'Foo.dex'...
Opened 'Foo.dex', DEX version '035'
...
Virtual methods -
#0 : (in LFoo;)
name : 'calc'
type : '(II)I'
...
00018c: |[00018c] Foo.calc:(II)I
00019c: 9000 0203 |0000: add-int v0, v2, v3
0001a0: da00 0002 |0002: mul-int/lit8 v0, v0, #int 2
0001a4: 0f00 |0004: return v0
31. Java bytecode vs. Dalvik bytecode
(stack vs. register)
public int method(int i1, int i2)
{
int i3 = i1 * i2;
return i3 * 2;
}
.var 0 is “this” this: v1 (Ltest2;)
.var 1 is argument #1 parameter[0] : v2 (I)
.var 2 is argument #2 parameter[1] : v3 (I)
method public method(II)I
iload_1
iload_2 .method public method(II)I
imul mul-int v0,v2,v3
istore_3 mul-int/lit-8 v0,v0,2
iload_3 return v0
iconst_2 .end method
imul
ireturn
.end method
Java Dalvik
32. Dalvik is register based
• Dalvik uses 3-operand form, which it what a
processoractually uses
33. Dalvik is register based
• To execute "int foo = 1 + 2", the VM does:
– const-4 to store 1 into register 0
– add-int/lit8 to sum the value in register 0 (1) with the literal
2 and store the result intoregister 1 -- namely “foo”
34. Dalvik is register based
• This is only 2 dispatches, but Dalvik byte code is measured
into 2-byte units
• Java byte code was 4-bytes, the Dalvik byte code is actually
6-bytes
35. Code Size
• Generally speaking, the code size of register-based
VM instructions is larger than that of the
corresponding stack VM instructions
• On average, the register code is 25.05% larger than
the original stack code
36. Execution Time
• Register architecture requires an average of 47%
fewer executed VM instructions
Source: Virtual Machine Showdown: Stack Versus Registers
Yunhe Shi, David Gregg, Andrew Beatty, M. Anton Ertl
41. Best Dispatch Implementation
• The computed GOTO can be further optimized if
we re-write it in assembly.
• The code above uses typically two memory
reads. We can lay out all our bytecodes in
memory in such a way that each bytecode takes
exactly the same amount of memory - this way
we can calculate the address directly from the
index.
• Added benefit is the cacheline warm-up for
frequently used bytecodes.
43. Class 文件例子
import java.io.Serializable;
结构:
public class Foo implements Serializable { 声明与常量
public void bar() {
int i = 31;
if (i > 0) {
int j = 42;
代码:
} 语句与表达式
}
}
输出调试符号信息
编译 Java 源码
javac -g Foo.java
javap -c -s -l -verbose Foo
反编译 Class 文件
44. Class 文件例子
public Foo();
Signature: ()V
LineNumberTable:
方法 line 2: 0
元数据
LocalVariableTable:
Start Length Slot Name Signature
0 5 0 this LFoo;
Code:
字节码 Stack=1, Locals=1, Args_size=1
0: aload_0
1: invokespecial #1; //Method java/lang/Object."<init>":()V
4: return
45. Class 文件例子
public void bar();
Signature: ()V
LineNumberTable:
line 4: 0
line 5: 3
line 6: 7
line 8: 10
LocalVariableTable:
方法 Start Length Slot Name
元数据 Signature
10 0 2 j I
0 11 0 this Java 6 开始,有分支
LFoo; 控制流的方法会带有
3 8 1 i I StackMapTable,记
录每个基本块开头处
StackMapTable: number_of_entries = 1
frame_type = 252 /* append */
操作数栈的类型状态
offset_delta = 10
locals = [ int ]
Code:
Stack=1, Locals=3, Args_size=1
0: bipush 31
字节码 2: istore_1
3: iload_1
4: ifle 10
7: bipush 42
9: istore_2
10: return
46. 基于栈与基于寄存器的体系结构的区别
public class Demo {
public static void foo() {
int a = 1;
int b = 2;
int c = (a + b) * 5;
}
}
概念中的 Dalvik 虚拟机
概念中的 Java 虚拟机
Source: Java Program in Action——Java 程序的编译、加载与执行 , 莫枢
48. Dalvik VM
• Dalvik architecture is register based
• Optimized to use less space
• Execute its own Dalvik byte code rather than Java
byte code
• Class Library taken from Apache Harmony
– A compatible, independent implementation of the
Java SE 5 JDK under the Apache License v2
– A community-developed modular runtime (VM and
class library) architecture. (deprecated now)
49. Reasons to Choose Dalvik
• Dalvik (Register based) take average 47 % less
executed VM instruction then JVM (Stack based).
• Register code is 25% larger than the corresponding
stack code.
• This increased cost of fetching more VM instructions
due to larger code size involves only 1.07% extra
real machine loads per VM instruction. Which is
negligible.
• Some Marketing Reasons too
– Oracle lawsuit against Google
50. Dalvik ARM CPU
Memory(DDR)
hello.JAVA Write
(Source) back
“Stack” Section Excute
JAVAC (ARM
(Compiler) code) Memory
Reg
“BSS” Section access
Decode excute
hello.class “Data” Section (Number)
DX Decode
(Compiler)
“Text” Section Fetch PC fetch
Hello.dexLoading Excute Interpreter
(Dex)
51. Constant Pool:
References to other classes
Method names
Numerical constants
Class Definition:
Access flags
Class names
Data:
Method code
Info related to methods
Variables
52. Dalvik Architecture
• Register architecture
• 216 available registers
• Instruction set has 218 opcodes
– JVM: 200 opcodes
• 30% fewer instructions, but 35% larger code size
(bytes) compared to JVM
53. Constant Pool
• Dalvik
– Single pool
– dx eliminates some constants by inlining their
values directly into the bytecode
• JVM
– Multiple
54. Primitive Types
• Ambiguous primitive types
– Dalvik
int/float, long/double use the same opcodes
does not distinguish : int/float, long/double, 0/null.
– JVM
Different: JVM is typed
• Null references
– Dalvik
Not specify a null type
Use zero value
55. Object Reference
• Comparison of object references
• Dalvik
– Comparison between two integers
– Comparison of integer and zero
• JVM
– if_acmpeq / if_acmpne
– ifnull / ifnonnull
56. Dalvik
• Storage of primitive types in arrays
• Dalvik
– Ambiguous opcodes
– aget for int/float, aget-wide for long/double
57. Dalvik
• Dalvik uses annotation to store:
– signature
– inner class
– Interface
– Throw statement.
• Dalvik is more compact, average of 30% less
instructions than JVM.
61. Shared constant pool
• Zapper.java
public interface Zapper {
public String zap(String s, Object o);
}
public class Blort implements Zapper {
public String zap(String s, Object o) { … }
}
public class ZapUser {
public void useZap(Zapper z) { z.zap(...); }
}
70. Efficient Interpreter in Android
• There are 3 forms of Dalvik
– dexopt: optimized DEX
– Zygote
– libdvm + JIT
71. Efficient Interpreter: Optimized DEX
• Apply platform-specific optimizations:
– specific bytecode
Common operations like String.length
– vtables for methods
have their own special instruction
– offsets for attributes execute-inline
VM has special code just for those
– method inlining
common operations
• Example: Things like calling the Object’s
constructor - optimized to nothing
because the method is empty
73. • Virtual (non-private, non-constructor, non-static methods)
invoke-virtual <symbolic method name> → invoke-virtual-quick <vtable index>
Before:
invoke-virtual {v0, v1},
Ljava/io/PrintStream;.println:(Ljava/lang/String;)V
After:
+invoke-virtual-quick {v0, v1}, [002c] // vtable #002c
• Can change invoke-virtual to invoke-virtual-quick
– because we know the layout of the v-table
74. DEX Optimizations
• Before being executed by Dalvik, DEX files are optimized.
– Normally it happens before the first execution of code from the DEX file
– Combined with the bytecode verification
– In case of DEX files from APKs, when the application is launched for
the first time.
• Process
– The dexopt process (which is actually a backdoor of Dalvik) loads the
DEX, replaces certain instructions with their optimized counterparts
– Then writes the resulting optimized DEX (ODEX) file into the
/data/dalvik-cache directory
– It is assumed that the optimized DEX file will be executed on the same
VM that optimized it. ODEX files are NOT portable across VMs.
76. Meaning of DEX Optimizations
• Sets byte ordering and structure alignment
• Aligns the member variables to 32-bits / 64-bits
• boundary (the structures in the DEX/ODEX file itself
are 32-bit aligned)
• Significant optimizations because of the elimination
of symbolic field/method lookup at runtime.
• Aid of Just-In-Time compiler
77. Efficient Interpreter: Zygote
is a VM process that starts at system boot time.
• Boot-loader load kernel and start init process.
• Starts Zygote process
• Initializes a Dalvik VM which preloads and pre-
initializes core library classes.
• Keep in an idle state by system and wait for socket
requests.
• Once an application execution request occur, Zygote
forks itself and create new process with pre-loaded
Dalvik VM.
79. Efficient Interpreter:
Just-In-Time Compilation
• Just-in-time compilation (JIT), also known as
dynamic translation, is a technique for improving
the runtime performance of a computer program.
• A hybrid approach, with translation occurring
continuously, as with interpreters, but with caching of
translated code to minimize performance
degradation
80. JIT Types
• When to compile
– install time, launch time, method invoke time, instruction
fetch time
• What to compile
– whole program, shared library, page, method, trace, single
instruction
• Android needs a combination that meet the needs of a mobile
– Minimal additional memory usage
– Coexist with Dalvik’s container-based security model
– Quick delivery of performance boost
– Smooth transition between interpretation & compiled code
81. Android system_server example
Source: Google I/O 2010 - A JIT Compiler for Android's Dalvik VM
• Compiled Code takes up memory - want the benefits of JIT with small memory footprint
• Small amount compilation provides a big benefit
• In test program, 4.5MB of byte code - 8% of methods: 390K was hot; 25% of code in
methods was hot - so 2% in the end
• 90% of time in 10% of the code may be generous
82. Trace JIT
• Trace : String of Instructions
• Minimizing memory usage critical for mobile devices
• Important to deliver performance boost quickly
– User might give up on new app if we wait too long to JIT
• Leave open the possibility of supplementing with method
based JIT
– The two styles can co-exist
– A mobile device looks more like a server when it's
plugged in
– Best of both worlds
• Trace JIT when running on battery
• Method JIT in background while charging
84. Dalvik JIT Overview
• Tight integration with interpreter
– Useful to think of the JIT as an extension of the
interpreter
• Interpreter profiles and triggers trace selection mode
when a potential trace head goes hot
• Trace request is built during interpretation
• Trace requests handed off to compiler thread, which
compiles and optimizes into native code
• Compiled traces chained together in translation
cache
85. Dalvik JIT Features
• Per-process translation caches (sharing only within
security sandboxes)
• Simple traces - generally 1 to 2 basic blocks long
• Local optimizations
– Register promotion
– Load/store elimination
– Redundant null-check elimination
• Loop optimizations
– Simple loop detection
– Invariant code motion
– Induction variable optimization
99. Dexmaker: bytecode generator
http://code.google.com/p/dexmaker/
• A Java-language API for doing compile time or
runtime code generation targeting the Dalvik VM.
Unlike cglib or ASM, this library creates Dalvik .dex
files instead of Java .class files.
• It has a small, close-to-the-metal API. This API
mirrors the Dalvik bytecode specification giving you
tight control over the bytecode emitted.
• Code is generated instruction-by-instruction; you
bring your own abstract syntax tree if you need one.
And since it uses Dalvik's dx tool as a backend, you
get efficient register allocation and regular/wide
instruction selection for free.
100. Reference
• Dalvik VM Internals, Dan Bornstein (2008)
http://sites.google.com/site/io/dalvik-vm-internals
• Analysis of Dalvik Virtual Machine and Class Path Library,
Institute of Management SciencesPeshawar, Pakistan (2009)
http://serg.imsciences.edu.pk
• Reconstructing Dalvik applications, Marc Schonefeld (2009)
• A Study of Android Application Security, William Enck,
Damien Octeau, Patrick McDaniel, and Swarat Chaudhuri (2011)
• dalvik の GC をのぞいてみた , @akachochin (2011)
• 《 Android 惡意代碼分析教程》 , Claud Xiao (2012)
http://code.google.com/p/amatutor/
• XXX