Article Options
Premium Sponsor
Premium Sponsor

 »  Home  »  .NET Framework  »  Demystifying Microsoft Intermediate Language. Part 1 - Introduction
Demystifying Microsoft Intermediate Language. Part 1 - Introduction
by Kamran Qamar | Published  10/17/2002 | .NET Framework | Rating:
Kamran Qamar

Kamran Qamar is independent consultant with extensive experience in all facets of software development life cycle (SDLC). He specializes in web-enabled application development using Microsoft tools and technologies. He has spent last seven years architecting and developing web based telemetry and management applications around the world. Recently he delivered Bridge Management System for Ukrainian government built on .NET technologies.

Kamran also enjoys teaching and presenting on .NET technologies. He has Electronic Engineering background and has Master degree in computer sciences.

 

View all articles by Kamran Qamar...
Demystifying Microsoft Intermediate Language. Part 1 - Introduction

In .NET Framework, it is Common Language Infrastructure that provides specifications for executable code and the execution environment (the Virtual Execution System or VES) in which it runs. Executable code is presented to the VES as modules. A module is a single file containing executable content in the specified format.

Common Language Infrastructure uses Common Language Specifications to bind different Languages in an agreement to access frameworks by implementing at least those parts of the Common Type System (CTS) that are part of the Common Language Specifications (CLS). Hence all the languages (C#, VB.NET, Effil.NET etc.) which are targeted towards .NET framework converse to standard single language known as Microsoft Intermediate Language (MSIL).

Microsoft Intermediate Language (MSIL) represents the transient stage in the process of conversion of source code written in any .NET language to machine language as a pseudo-assembly language code that's between the source code you write-such as Visual Basic .NET or C#-and Intel-based assembly language or machine code. When you compile a .NET program, , the compiler translates your source code into Microsoft intermediate language (MSIL), which is a CPU-independent set of instructions that can be efficiently converted to native code. When we execute the code, MSIL is converted to CPU-specific code, usually by a just-in-time (JIT) compiler. Because the common language runtime supplies one or more JIT compilers, the same set of MSIL can be JIT-compiled and executed on any supported architecture.

It is inescapably imperative to gain mastery over Intermediate Language, because knowledge of Intermediate Language translates into competence over IL code that may have originally been written in any programming language. MSIL (or simply IL) puts an end to the unending war amongst programmer community on the superiority of one language over others. To this end, IL is a great leveler. In .NET world one part of the code may be written in Effil while other may have been written in C# or VB.NET, but it all eventually gets converted in IL. This provides great freedom and flexibility to the programmers to select the language she is more familiar with and does away with the need to constantly learning new languages every day.

In these series of articles, I will sheen the complexity of surrounding IL by representing complex concepts in a simple and comprehensive manner. These concepts have been supplemented with detailed examples. To facilitate the understanding of the sample programs, in every example, I will first present the source code of the program in C# or VB.NET language, and then we will explore the IL produced by respective compiler. On other occasion I will write same code entirely in IL. We will compare the two to better understand the limitation of our compilers and will learn to write better and faster code.

The aim of this series is to explain the complexity surrounding IL and to make you adept at understanding IL code. I also want to alleviate your fear of lower languages. The power of MSIL lies in its simplicity. I warn you at this moment that doesn't be misled by the apparent simplicity of the examples I am going to use throughout the series. I like you to try them out by yourself and ascertain the outcome. Dive right into it, I assure at the end of this fantastic journey you will emerge as conquer.

Simplest Program written in IL

We will start our journey with the classic example of Hello World, but with a little twist. Open your favorite text editor, create a new text file and name it HelloWorld.il and punch in the following lines of code:

.method void HelloWorld()
{
    ret
} 

This is the smallest, non working program written in IL. This program tells us two important things:

  1. In an IL program, each new line can commence with or without a dot "."
    Anything that begins with dot "." Is a directive to the assembler, asking it to perform some functioning, such as creating a function or class. Whereas anything that does not start with a dot "." is an actual IL instruction, in this case ret.
  2. The structure of a method defined in IL.
    A function in IL is created by assembler directive method it followed by return type of method, in our case void and then actual name of the function, HelloWorld with the pair of round brackets "()". The start point and the end point of the functions are signified by the curly brackets "{}". A well formed function written in IL always has the "end of function" instruction that is "ret".

We can compile this program using ILAsm.exe utility provided by Microsoft. We can use the following command to compile. Make sure that your path variable is properly set. If you have VisualStudio.NET installed than use its command prompt utility that will set proper paths. If you have only .NET SDK installed then set path using following command:

Set path = c:\progra~1\microsoft.net\frameworksdk\bin

Let's compile our program using following command

C:\>ilasm Helloworld.il

On doing so, following error is generated, as shown in figure 1.

This error message is quite informative. It tells us that source file is of ANSI type, more on it later. It also gives us one warning message:

HelloWorld.il(2) : warning -- Non-static global method 'HelloWorld', made static

And it throws one error:

Error: No entry point declared for executable
Could not create output file, error code=0x80004005

***** FAILURE *****

One IL file can contain numerous function, and assembler has no way of distinguishing as to which of them is to be executed first. In conventional languages like C# or VB, the function to be executed first has to have a specific name e.g. in C# its Main function that is the entry point in application. This is what assembler is complaining about. In IL the first function to be executed is called entrypoint function. To tell assembler that function HelloWorld is entrypoint function, we need a directive. Lo and behold the directive is nothing else but entrypoint. Add this directive in our code as shown in listing below. By the way, assembler doesn't require this directive as the first statement in the function. Function simply has to have this directive defined anywhere in function, start, mid or end doesn't matter, hence both code listing given below produce same effect. However, in one assembly only one function can have this directive. If more than one function have this directive, program will never compile.

.method void HelloWorld()
{
    .entrypoint
    ret
}

or

.method void HelloWorld()
{
    ret
    .entrypoint
}

This time program compile into HelloWorld.exe file without any error. However, when we try to run this program we are welcomed with this error message shown in figure 2:

The reason for this error message is improper formation of our code. IL code is always compiles to one module and a module always associate with one assembly. The concept of modules and assembly are extremely crucial in .NET world and should be thoroughly understood. In articles to come, we will explore the architecture of .NET program and this directive. For a moment I will add this assembly in our HelloWorld program. The format of the directive is keyword assembly followed by name and curly brackets

.assembly <name of assembly> {}

Hence our updated code will look like this:

.assembly DemystifyingILChapter1 {}
.method static void HelloWorld()
{
    .entrypoint
    ret
}

This program will compile and run without any error. Notice, I have added keyword "static" in directive "method" to overcome the warning thrown by assembler the first time. Static methods belong to a class. Remember in C# we use to define Main function as public static void Main(). Hence any function that has directive entrypoint must be decorated by static attribute.

Wow, we have created our first, application in IL that doesn't produced any out put : but don't be dishearten we have understood the basics of an IL program. We will build on this structure.

Next we need to find a way to call functions defined in other assemblies. Using that we will be able to call famous function WriteLine to output our "HelloWorld" string on the console. Guess will it be a directive or instruction to assembler?

Yes, you are right it should be instruction. We will look in our bag of IL instruction to find and instruction right for this purpose named call. The format of this instruction is:

call <return type> <namespace>.<class name>::<function name>

There is a significant difference between IL and other programming languages when it comes to calling a function. In IL, when we are calling a function, we have to completely specify a function, including its namespace, return type and data type of its parameter. This ensures that assembler can authenticate the function.

New code listing will be:

.assembly DemystifyingILChapter1 {}
.method static void  HelloWorld()
{
    .entrypoint
    call void [mscorlib]System.Console::WriteLine(class System.String)
    ret
}

This program compiles without error, however, it through exception when we execute it as shown in following figure 3:

The reason it throws exception is omission of expected parameter values. Program expects a string parameter to be used in call function. We shall now see how to pass parameters to a function.

All the parameters passed are placed in memory stack. IL has one instruction named ldstr which is short for "Load String". This instruction loads a string on the stack. A stack is memory area that facilitates passing of parameters to a function, we will talk more about stack in coming chapters. All functions receive their parameters from the stack. Therefore, instructions like ldstr are indispensable. The format of this instruction is:

ldstr <parameter string>

Using this instruction, our updated code listing will be:

.assembly DemystifyingILChapter1 {}
.method static void  HelloWorld()
{
    .entrypoint
    ldstr "Hello World."
    call void [mscorlib]System.Console::WriteLine(class System.String)
    ret
}

This program compiles and produces well known out put "Hello World."

You see, IL programming is not difficult at all. You have written very first IL program that greets you. You also learned about the structure of IL program and learned some directives and instruction. If you have done some hands on with this example, you should have noticed that IL is a case sensitive language.

In next section we will enhance this application. No we will not enhance its functionality but decorate it with some attributes. These attributes will give our application a look which is equivalent to output from another of great utility ILDAsm.exe. I will explain it later.

Enhancing HelloWorld Example

All the programming languages targeted to .NET framework are object oriented in nature. However, our sample application is more of a structured program. I will convert it into an object oriented program. In OOP we define everything in a class. To convert our program into OO, I need to direct assembler to create a class. I can do this using class directive like this:

.class HelloWorld
{
}

The directive class is followed by the name of the class which is optional in IL.

I also need to decorate this directive with some attributes that will define its accessibility, layout and interoperability options. Thus my updated code is:

.assembly DemystifyingILChapter1 {}
.class public auto ansi HelloWorld
{
  .method static void  HelloWorld()
  {
      .entrypoint
      ldstr "Hello World."
      call void [mscorlib]System.Console::WriteLine(class System.String)
      ret
  }
}

I have used three attributes they are:

  • public: This is accessibility attribute that signifies that access to the members of the class is restricted to the current class only.
  • auto: This means that the layout of the class in memory will be decided by the runtime and not by our program.
  • ansi: The attribute ansi is used for smooth transition between unmanaged and managed code. Code which is not targeted towards CLI is termed as unmanaged code. Languages like C, C++ or VB6 all produce unmanaged codes. We need an attribute that handle the interoperability between managed and unmanaged codes. In managed code, a string is represented as 2-byte Unicode characters, where as unmanaged code uses 1-byte ANSI characters, attribute ansi is used to handle transition of strings from one code to another.

Our program still compiles and produces the same output. But lo and behold now it is an object oriented application. Before I move on, let me handle another subtle issue. We know that all the classes in .NET framework directly or indirectly inherit from System.Object class. We haven't written explicit code for this but IL compiler does it for us. To make things clear lets write add this code explicitly. This is a trivial task. I can do so, using keyword extends followed by full name of class. Let me take this opportunity and update our code a little further to decorate method HelloWorld with some attributes. The enhanced version of HelloWorld example's code is given below:

.assembly DemystifyingILChapter1 {}
.class public auto ansi HelloWorld extends [mscorlib]System.Object
{
  .method public hidebysig static void HelloWorld() cil managed
  {
      .entrypoint
      ldstr "Hello World."
      call void [mscorlib]System.Console::WriteLine(class System.String)
      ret
  }
}

I have extended our class by System.Object and have added some attributes to the method HelloWorld. I shall explain them one by one:

public: In C# or VB.NET, when we define a method, we classify its accessibility using accessible attributes. One of them is public. Applying this attribute means that this method is accessible to every other part of the program. In Part 3 - Basic IL we will take a detail look on accessibility attributes.

hidebysig: A class can be derived from any other class. The attribute hidebysig ensures that the function in the parent class is hidden from the derived class having same name and signature. In our example, this attribute make sure that if the function HelloWorld is present in the base class, it is not visible in derived class.

cil managed: I will be explaning this attribute later in the article.

These attributes doesn't affect the output of the program. Our program still compiles and run as previously. Have patience with me and I will show you the effects of these attributes in a minute.

You remember from your high level language (C#, VB.NET etc.) experience that each class has to have a constructor defined. And the first line of this constructor code should be a call to our base class constructor. If there is no constructor defined then constructor of base class will be called automatically. This is the duty of our language compiler to implement that constructor.

Since, I am using a class that is extended from System.Object, I need to define a constructor which should call the constructor of my base class. To create a constructor, I have to define a special method name .ctor with attributes specialname, rtspecialname and instance. Our updated code will look like this:

.assembly DemystifyingILChapter1 {}
.class public auto ansi HelloWorld extends [mscorlib]System.Object
{
  .method public hidebysig static void HelloWorld() cil managed
  {
      .entrypoint
      ldstr "Hello World."
      call void [mscorlib]System.Console::WriteLine(class System.String)
      ret
  }

  .method public hidebysig specialname rtspecialname 
    instance void  .ctor() cil managed
  {
      ldarg.0
      call instance void [mscorlib]System.Object::.ctor()
      ret
  }
}

In the next articles to come, I will introduce you to IL instruction set, how IL is used to perform basic operations like Selection, Iteration, overloading etc. We will also see how to create reference and value type. Define methods, properties and indexers. I will show you the basic of exception handling and creating special classes like delegates and define custom events. I will finish this series with a full functional GUI application written in IL.

But to do all this I need your support and interest. You can post your comments and suggestion here or email me at kamran@kenlogix.com


Related devCity.NET articles:

How would you rate the quality of this article?
1 2 3 4 5
Poor Excellent
Tell us why you rated this way (optional):

Article Rating
The average rating is: No-one else has rated this article yet.

Article rating:3.93939393939393 out of 5
 66 people have rated this page
Article Score38217
Sponsored Links