Archive for the 'Assembly' Category

Common Intermediate Language

Common Intermediate Language, simply known as CIL, IL or MSIL (among the legacy coders). This is not to be confused with C Intermediate Language which is also abbreviated as CIL. CIL is a stack-based high-level assembly language which is used as an intermediate language by virtual machines such as the CLR (Common Language Runtime) and the Mono Runtime. It is because it has a fairly simple syntax that can easily be translated into native machine code. As a part of the Common Language Infrastructure (CLI, a standard developed by Microsoft) it is the most significant piece of the .NET Framework, among others. Compilers which are written for the specific frameworks support the CLI standard and are translated to CIL. This can be done entirely or partially, like C++/CLI (which can mix it with native code). Because of that they can interop with other programs written in other CLI languages through the virtual machine. CIL is cross-platform and can be be executed on different machine architectures without being modificated or recompiled. The only thing that is needed is an implementation of the virtual machine targeting the specific processor and operating system. CIL is also, in many aspects, similar to Java bytecode which is executed by the Java Virtual Machine (JVM).

CIL is stack-based, in difference to many native machine languages that are register-based. This means that everytime you want to manipulate an object (a type or an unit of code) you put it on the stack and call an op code (language instruction) that is performing an operation, i.g. adding two values. Thereby adding the two objects (Int32 or simply 32-bit integers) on the top of the stack and leaving the sum of those. The stack should be empty at the end of runtime. See the example.

IL:

ldc.i4.3 //Loads a constant in the stack
ldc.i4.2
add

Stack:

2 //Int32
3 //Int32

After the last operation (add) this will remain on the stack:

5 //Int32

If you had only one object on the stack and performed the add operation you would get a stack underflow.
The opposite is stack overflow that is caused by objects remaining on the stack in the end of the program. These errors are often caused by bugged compilers.

Some common op codes are:

ldstr "Kazoom" //Puts a string on the stack
ldarg.0 //Loads the method argument at the index 0 and puts it on the stack
ldc.i4.29 //Puts an integer, 29, on the stack
ldloca.s lcal //Loads a local (name lcal) an places it on top of the stack
call //Used when calling a method
newobj //Used when instanciating an object

// Arithmic operations that pops two values from the top of the stack.
add
sub
mul
div
mod

This is the most simple part of the CIL language. Because of the object-oriented nature of CLI you are able to use some OOP approach in your programming. Many of the constructs found in the programming languages at a higher level are represented directly as IL. For instance, a class declaration looks like this:

.class public MyClass
{

}

A class can be declared as static (non-instance class) with the static attribute static.

.class public static MyStaticClass
{

}

Classes contains methods defined this way:

.method Int32 Add(Int32, Int32)
{
      //Load both arguments on the stack and add them, then return the value.
      ldarg.0
      ldarg.1
      add
      ret
}

Likewise to classes, methods can also be marked as static.

This syntax, is like the class syntax, very similar to the C style. The only difference is the IL opcodes instead of ordinary statements. Another thing you must know is that methods always ends with a return statement even though they do not seem to return any value. In reallity theres a value, void. This will the runtime engine (virtual machine) handle for you.

To make a method the entrypoint of the program you add the .entrypoint attribute inside the method body. Notice that you can only have one entrypoint in a program.

Local variables (mostly refered to as locals) are also defined as attributes inside a method.

.locals init (
      [0] string str1
      [1] Int32 int1)

Those locals named by the compilers often have a random name consisting of letters and a number.

Because of performance there is also a .maxstack attribute which sets the maximum numbers of items on the stack. This tells the runtime engine that this method needs a stack of this size. Now is the virtual machine able to manage the amount of memory.

There are a lot of other things as well. I will not dig deeper into it because it is to much at a time. If you are familiar with a high-level language or any other stack-based and object-oriented intermediate language, like the Parrot Intermediate Language you, should understand the IL too.

Finally, this is the Hello World program in IL style. You can try to assemble it with ILASM (IL Assembler) ported with .NET Framework or Mono.

.method public static void Main() cil managed
{    
      .entrypoint    
      .maxstack 8    
      ldstr "Hello world!."    
      call void [mscorlib]System.Console::WriteLine(string)    
      ret
}

Hello World! (Assembly x86)

title Hello World Program
dosseg
.model small
.stack 100h
.data
hello_message db 'Hello, World!',0dh,0ah,'$'
.code
 main  proc
    mov    ax,@data
    mov    ds,ax
    mov    ah,9
    mov    dx,offset hello_message
    int    21h
    mov    ax,4C00h
    int    21h
main  endp
end   main


Pages

Categories