Virtual Machines For Abstraction: The Dalvik VM

Droid time

Droid time

With the rise of heterogeneous systems, there is a requirement for a scalable software system that is cost-effective and maintenance-free. Virtual machines (VMs) provide abstraction from the heterogeneity, and present a low-cost, scalable software development environment, as in the case of VM-based modern programming languages like Java. In this article, we will cover the fundamental concepts of a VM, using one of the critical components of Google’s Android OS — the Dalvik VM.

A virtual machine (or VM) is an isolated, self-contained implementation that abstracts hardware resources. Typically, it simulates an actual physical machine capable of executing instruction sets. Broadly, VMs are classified based on how they work and their  functionality. There are process VMs, which support the execution of a single process/program (e.g., Java, .NET), and system VMs, which support execution of a complete operating system (e.g., VirtualBox, VMWare).

How does a VM work?

In general, a VM is an interpreter for an instruction-set. The fundamental task of a compute machine is to carry out an operation. An operation is performed with the execution of a set of instructions. A basic instruction goes through the following cycle of events:

  • Instruction fetch, where an instruction is fetched from memory.
  • Decoding, which determines the type of the instruction — the opcode, or the operation it is supposed to perform. Additionally, decoding may also involve fetching operand(s) from memory.
  • Execute. The decoded instruction is then executed; the operation is performed on the operands.
  • Storing the result in the mentioned register.

Which VM implementations to choose

Mainly, VMs are implemented in one of two ways — a register-based VM, or a stack-based VM. The register-based VM computation model assumes registers as the memory, and operands are accessed with explicit addresses coded in an instruction. In the stack-based VM model, memory takes the form of a Last-In-First-Out (LIFO) stack. Every operation boils down to a stack PUSH/POP or STORE/LOAD. These instructions do not need explicit operand addresses, because the operand is always available with the stack pointer (top-of-stack). Examples are Microsoft .NET and the JVM.

Traditionally, when it came to choosing the VM architecture, people preferred stack-based machines, rather than the register-based architecture prevalent in most real processors. The main reason for this choice was the simplicity of the compiler, which emanates from the fact that stack instructions are small, and have implicit addressing. The basic criteria for choosing a particular VM implementation is the code size of the machine implementation (the smaller the better); and machine efficiency — which is measured in terms of the number of instructions needed to carry out an operation.

Implementation-wise, instructions for a stack-based VM are shorter (since it’ll have only POP/PUSH instructions) and are hence faster to fetch. A register-based VM has long instructions, and usually a quadruple of {opcode, result, operand1, operand2}. However, the advantage of the stack-based VM diminishes in terms of the number of instructions needed to perform an operation. A given task can often be expressed in fewer register-based VM instructions than what is required for a stack VM. For example, an addition operation “a = b + c” would need the following instructions for a stack VM:


The equivalent operation in a register VM can be expressed in a single instruction:

I1 "add, a, b, c"

The fewer the instructions, the smaller the number of instruction dispatches needed. (Instruction dispatch involves fetching/reading the instruction from memory, and jumping to the corresponding segment of VM code that implements the VM instruction.) Instruction dispatch is an expensive and often unpredictable operation, because in implementation it is essentially a “switch-case”, and hence a hard-to-predict indirect branch, unlike an “if-else” condition.

There is a cost involved in the operand access too. In stack VMs, operands are accessed relative to the stack pointer, and hence explicit addressing for operands is not required. On the contrary, register VMs need to provide the location of all operands, by specifying the registers that carry those operands.

Having all operands in the instruction has its benefits; the execution is faster compared to the stack VM, which needs a small calculation to find out the operand, while register VMs just read the registers.
Also, a stack VM might consume more memory cycles, since it uses a set of registers to simulate a stack. Besides, temporary values often spill out into the main memory, adding more memory cache cycles.

In register VMs, temporary values usually remain in registers. Stack VMs are unable to use a few optimisations too. For example, in the case of common sub-expressions, which are recalculated each time they appear in the code, a register VM can calculate an expression once, and keep that in a register for all future references.

These reasons made a register VM model the preferred choice of implementation for the Dalvik VM, where quick execution is the foremost requirement.

The Android OS stack

Android is a complete software stack that comprises the Linux kernel, middleware and a few useful applications. It is developed, promoted, and maintained by Google. Its software stack is shown in Figure 1.

Android OS stack
Figure 1: Android OS stack

The top layer comprises portable user applications, programmed in Java. These rely on the next layer, which exports a rich set of Java APIs, and provides a set of services and a system to facilitate applications, e.g., the content manager, resource manager, view services, etc.

The next layer comprises C/C++ system libraries and the Android runtime. The common set of system libraries includes:

  • System C library (libc)
  • Media library
  • SQLite (database engine)
  • SGL (to support 2-D graphics)

The Android runtime is a combination of the Dalvik VM and Java core libraries.

The Linux kernel is the last layer in the software stack, and provides an abstraction of the underlying hardware, also taking care of networking, process and memory management.

Dalvik VM

Dalvik VM is Google’s implementation of the Java VM, targeted to run on mobile devices. Dalvik is a process VM; each Android application runs in its own Linux process. Every process on Android has its own instance of the Dalvik VM, which is created along with the process, and destroyed during process termination.

“init” is the parent process for all Android processes. During startup (see Figure 2), in addition to a handful of daemons, “init” forks a process named zygote, which loads and initialises an instance of the Dalvik VM and core services. zygote then creates a read socket, and runs a select loop on it. When we run an application, a message is sent to zygote, which forks, and contacts the Android Activity manager service to run the application’s Java class. Thus, all processes share a copy of DVM, created by the Zygote process, and avoid keeping duplicate pages of DVM.

The Android runtime
Figure 2: The Android runtime

Actually, pages for DVM are COW (copy-on-write) among processes; as soon as a process writes to a shared, read-only page, that page becomes local to the process.

Why a process VM?

This design has many benefits compared to a single system VM instance that serves all processes in the system. If the VM crashes, the whole system would crash, killing all processes. In Android, this problem has been narrowed down to the granularity of a single process.

VM internal details

The Dalvik VM accepts Dalvik byte-code (*.dex files) as input, instead of the Java byte-code expected by a JVM. The *.dex files are created by the dxtool, which is part of the Android system (see Figure 3). This tool takes Java class files created by the Java compiler as the input.

The compilation flow of a Java application on Android
Figure 3: The compilation flow of a Java application on Android

How is a Dalvik-like register VM implemented?

To answer that, let’s implement a register VM in C — myvm.c:

 * myvm.c
#include "stdio.h"

#define NUM_REGS 4
#define TRUE    1
#define FALSE   0
#define INVALID -1
enum opCodes {
    HALT  = 0x0,
    LOAD  = 0x1,
    ADD   = 0x2,

 * Register set of the VM
int regs[NUM_REGS];

 * VM specific data for an instruction
struct VMData_ {
  int reg0;
  int reg1;
  int reg2;
  int reg3;
  int op;
  int scal;
typedef struct VMData_ VMData;

int  fetch();
void decode(int instruction, VMData *data);
void execute(VMData *data);
void run();

 * Addressing Modes:
 * - Registers used as r0, r1,..rn.
 * - Scalar/ Constant (immediate) values represented as #123
 * - Memory addresses begin with @4556

 * Instruction set:
 * - Load an immediate number (a constant) into a register
 * - Perform an arithmetic sum of two registers (in effect,
 *   adding two numbers)
 * - Halt the machine
 * LOAD reg0 #100
 * LOAD reg1 #200
 * ADD reg2 reg1 reg0  // 'reg2' is destination register

 * Instruction codes:
 * Since we have very small number of instructions, we can have
 * instructions that have following structure:
 * - 16-bit instructions
 * Operands get 8-bits, so range of number supported by our VM
 * will be 0-255.
 * The operands gets place from LSB bit position
 * |7|6|5|4|3|2|1|0|
 * Register number can we encoded in 4-bits 
 * |11|10|9|8|
 * Rest 4-bits will be used by opcode encoding.
 * |15|14|13|12|
 * So an "LOAD reg0 #20" instruction would assume following encoding:
 * <0001> <0000> <00010100>
 * or 0x1014 is the hex representation of given instruction.

  * Sample program with an instruction set

unsigned int code[] = {0x1014,

 * Instruction cycle: Fetch, Decode, Execute

 * Current state of machine: It's a binary true/false

int running = TRUE;

 * Fetch
int fetch()
   * Program Counter
  static int pc = 0;

  if (pc == NUM_REGS)
    return INVALID;

  return code[pc++];

void decode(int instr, VMData *t)
  t->op   = (instr & 0xF000) >> 12;
  t->reg1 = (instr & 0x0F00) >> 8;
  t->reg2 = (instr & 0x00F0) >> 4;
  t->reg3 = (instr & 0x000F);
  t->scal = (instr & 0x00FF);

void execute(VMData *t)
  switch(t->op) {
    case 1:
      /* LOAD */
      printf("\nLOAD REG%d %d\n", t->reg1, t->scal);
      regs[t->reg1] = t->scal;

    case 2:
      /* ADD */
      printf("\nADD %d %d\n", regs[t->reg2], regs[t->reg3]);
      regs[t->reg1] = regs[t->reg2] + regs[t->reg3];
      printf("\nResult: %d\n", regs[t->reg1]);

    case 0:
      /* Halt the machine */
      running = FALSE;

void run()
  int instr;
  VMData t;

    instr = fetch();

    if (INVALID == instr)

    decode(instr, &t);

int main()
  return 0;

The primary components of a register VM are:

  1. A register set: Simulated with an array of integers acting as read/write memory cells.
  2. An instruction set: What operations, and how many operands, should we support?
    • Our primitive VM supports addition, load and halt operations.
    • Addressing mode: Operands are part of the instructions, so an instruction decode would unveil the associated operands.
  3. Execution unit: The engine works similar to a CPU:
    • It fetches the next instruction from the program.
    • It decodes the instruction to figure out the type of operation and operands.
    • It executes the decoded instruction.


  1. Hi,

    As per my understanding this is wrong. “During startup (see Figure 2), in addition to a handful of daemons, “init” forks a process named zygote, which loads and initialises an instance of the Dalvik VM and core services.”

    Runtime.cpp first initializes DVM and Zygote. It is not Zygote which loads and initializes DVM.

    Later, when an application is launched, what is mentioned is correct. Zygote forks itself and creates an instance of DVM for each process.

  2. in “why process vm”, just one reason is mentioned. the other reason is for additional security within each processes. This is mentioned in android developer site.

    However, the article is a great read and thanks for posting.

  3. Dalvik is definitey not Google’s implementation of the JVM. A JVM is a virtual machine capable of interpreting Java bytecode. Dalvik can not interpret Java bytecode, it uses a different (more compact, for one) bytecode format. In principle, you can build tools to create .dex files from various sources, not just Java .class files, or aggregates thereof. .class files as input for dx were a choice of convenience, not one resulting from technical constraints.

    You are clearly aware of these things. Keep saying that Dalvik is a JVM, and then wonder about why people think Google stole Sun’s/Oracle’s intellectual property.


Please enter your comment!
Please enter your name here