Article Options
Premium Sponsor
Premium Sponsor

 »  Home  »  .NET Framework  »  Taking Out the Trash - An In-depth Look at the .NET Garbage Collector
 »  Home  »  Windows Development  »  Taking Out the Trash - An In-depth Look at the .NET Garbage Collector
Taking Out the Trash - An In-depth Look at the .NET Garbage Collector
by John Spano | Published  08/23/2002 | .NET Framework Windows Development | Rating:
John Spano

John Spano cofounder and CTO of NeoTekSystems, a Greenville, South Carolina technology consulting company. NeoTekSystems offers IT consulting, custom programming, web design and web hosting. We specialize in Microsoft .Net enterprise development and business design.

I have six years of experience in software architecture. My primary focus is on Microsoft technologies, and I have been involved in .NET since beta 1. I currently hold a MCSD certification, 2 MCTS's (Windows, Web) a MCPD in Distributed, 2 MCITP's, a Microsoft MVP, and have won the Helper of the Month contest for July 2002 in the devCity.NET forums.

Corporate URL: www.NeoTekSystems.com
Primary email: JSpano@NeoTekSystems.com
Alternate email: Jspano@devcity.net.

 

View all articles by John Spano...
Taking Out the Trash - An In-depth Look at the .NET Garbage Collector

With the advent of Microsoft's .NET platform, they have introduced many new features to help programmers produce more robust code, faster. Among the many new features, is Managed Memory, or in simpler terms, Garbage Collection (GC). This article will delve into the depths of this new feature; explore how it works, and how to write code to take advantage of it. All code will be presented in C#, my language of choice.

GC attempts to address the many memory oriented programming tasks that come with normal "Native" code. Before GC, a programmer was responsible for reclaiming memory when used. Some languages, such as Visual Basic 6, did this for you; other languages like Visual C++ required the programmer to do this. If one forgot to release an object when done, it resulted in a memory leak and possible undesirable behavior. GC today, guarantees that your objects are freed sometime after they go out of scope.

Visual Basic, C# and Managed Extensions for Visual C++, implement the GC. When an application is run, the GC sets aside a block of continuous memory called the managed heap. Managed memory is similar to the C heap, but you don't have to free objects from it. It is also one continuous block of memory, unlike the C heap.

All objects allocated with the new keyword are created on the managed heap, sequentially after the last object that was made. The GC keeps a pointer to the memory location after the last object so it knows where to allocate the next object. This makes memory allocation very fast. Programmers tend to allocate similar objects together also. For example, if you were going to write data to a database, you would create a connection object, then a command object, then a dataset.. Since these objects are put into the managed heap together, access to them is faster. The CPU doesn't have to hunt around for each object.

When the GC's pointer to the next memory location is set to a position outside of the managed heap, it knows that a garbage collection is needed. To do this, the GC uses a system of "Roots" or strong references. Roots are objects that are global, static, local and in scope, or pointed to by a CPU register. When started, the GC assumes all objects in the managed heap are trash. It then begins to transverse the objects, beginning with the root objects. As it finds objects pointed to by the root objects, it adds them to a list of good objects. It then checks the newly found good objects for any object that they have a reference to, and so on until all objects have been examined. When done, the GC has a list of all objects that are roots, or can be reached from a root, so it can assume all other objects are trash and remove them.

When the object was added to the managed heap, if it implements a finalize method, the GC creates a pointer to the object in its finalization list. When marked as trash by the GC, the list is checked to see if the finalize method should be called. The GC then moves the pointer from the finalization list to the f-reachable queue. This queue holds a list of objects that are ready to have their Finalize method called. At this point the objects aren't really garbage yet, so the GC doesn't remove them from the managed heap.

Now that the object is ready for finalization, another GC thread calls the finalize method on each object in the f-reachable queue. When done, the thread removes the pointer from the f-reachable queue so that the GC knows to clean up the finalized object on the next garbage collection. You can see the overhead involved in creating an object with a finalize method. You should avoid using finalized objects when able.

To create an object with a finalize method, you can write syntax similar to C++. If we have a class called Class1, to implement the finalize method you could write the following:

~Class1 ()
{
    //do clean up here 
}

A big problem with a finalize method, is that you never know when it is going to be called, or in what order when compared to other objects. It doesn't behave like the normal C++ destructor that gets called whenever you delete the object. To handle this, Microsoft implemented the dispose interface, IDisposable. It is suggested you follow the below pattern when implementing any object that needs a Finalize method.

public class Class1 : IDisposable 
{
    public Class1()
    {
    }

    ~Class1 ()
    {
        //The garbage collector is calling this method, so pass in false
        Dispose (false);
    }

    //this method satisfies the IDisposable interface.
    public void Dispose ()
    {
        //Programmer is calling this method, after this call the object 
        //will be finalized.  We don't want the GC to finalize it again, 
        //so take it off of the finalize list.
        GC.SuppressFinalize (this);
        //Programmer is calling the dispose method so pass in true
        Dispose (true);
    }

    //all cleanup code is done here in our own private method
    private void Dispose(bool disposing)
    {
        lock(this)  //keep from getting threading errors.
        {
            if (disposing)
            {
                //Programmer is disposing Object
                //We can access any fields in any other object that this 
                //object has reference to, because we know that the other 
                //objects are still valid and haven't had their finalize 
                //methods called yet since this object still refers to them.
            }

            //object is being finalized by GC, do any other cleanup here
        }
    }
}

Now that we know how GC works, let's take an in depth look at how the GC functions internally. There are several kinds of GCs today. Microsoft decided to implement a generational style GC. Generational GCs break their memory into sections, or generations. Microsoft currently uses three generations: 0, 1 and 2.

The three generations begin with thresholds of about 256K, 2M and 10M for generation 0, 1 and 2 respectively. These values are just beginning values; the GC changes them if it decides different values would help performance. For example, if your application initializes many small objects that are released quickly, it will decrease generation 0 to 128K to increase the frequency of collections. This makes the collections faster and the GC's work easier. The opposite is true also, if the GC sees that it isn't freeing much memory in generation 0 when it collects it, it will increase the size of generation 0 to reduce the amount of collections. Generation 2 also expands to be as large as needed, since it is the highest generation.

Generational GCs follow several principals: The newer the object, the shorter its life will be, the reverse of the previous; the older the object, the longer it will be around, and Collecting some of the managed heap is faster than collecting all of it. Following these principals, the GC spends most of its time collecting generation 0. Objects in later generations, 1 and 2 don't get released very often.

When an application initializes, all generations are empty. As the application initializes objects, they are placed in generation 0. The new objects are placed sequentially, one after the other, to increase speed. This actually makes the managed heap faster than the traditional C heap; it doesn't have to search for a memory location to create the object. The GC keeps a pointer to the memory location after the last object, so it will know where to put the next object allocated. Figure 1 shows a new application that has four objects in generation 0.


Figure 1.

When generation 0 is full, the GC begins a garbage collection. Let's say we initialize a new object, causing the first generation to become more than 256K. It will examine all objects in generation 0, looking for any that can't be reached anymore.

When run, the GC makes the assumption that all objects are trash. It will then begin looking at all objects that are roots. It adds the roots to its "Good Object List" and then any objects that they point to, until it reaches the end of all objects. Any objects that aren't roots or pointed to are now considered garbage. The GC then moves all objects in generation 0 to generation 1 that are still valid. It sets the generation 0 next object pointer back to the beginning of the memory block, effectively clearing it. The GC will also compact generation 1 to remove any holes in memory, and make all objects sequential. When generation 1 fills up, the same happens to it, all of its valid objects are moved to generation 2.

The following illustrations show the GC in action. Let's say that object D went out of scope before the GC took place, and objects E and F were created, causing the GC. After the GC the Heap looks like the following:


Figure 2.

Now we create new objects G and H, which causes another GC. In the meantime, object E has gone out of scope. After generation 0 and 1 have been collected, the managed heap now looks like the following:


Figure 3.

Exceptions to the rule are large objects. Any object over 84K will be placed in a "large object" block of memory. This memory is set aside by the GC, and isn't part of its generations. The objects that reside in it never get moved out of it unless they are garbage, and they aren't compacted, like the other generations. They are treated just like the other objects with respect to finalization and roots. This helps increase performance, by limiting the GC to moving only small objects.

Controlling the Garbage Collector

There are several commands in the framework that give you direct control over the GC. You can force the GC to collect any of its generations by using: GC.Collect() or GC.Collect(int GenerationNumber). Most of the time you don't need to worry about collecting any of the generations, but you might want to force a GC after doing a large task to make sure memory is cleaned up after it. Another method is GC.WaitForPendingFinalizers(). This method will suspend the current thread, and wait until the finalization thread clears the f-reachable queue.

When you program, the best way to use GC is to keep track of your objects, and free what you don't need. This is almost like any other programming language. If you have an object that you know will be kept for the life of the application, keep track of internal variables. For example, if you have a large string value in the object and you know you won't use it after the first time it is called, set the variable to null. This allows the GC to reclaim the memory for it. Also, if you know you are done with an object, make sure you don't have any references to it, or the GC won't be able to collect it.

Finalization methods should also run as fast as possible. The GC gives them a time limit. If this limit is exceed by a finalization method, the GC will terminate the thread it is running on. All the following cause finalization methods to be called: Generation 0 is full, the program specifically calls for a collection, the CLR is unloading an application domain, or the CLR itself is unloading.

The Perfmon.exe program can be used to monitor the GC. There are several useful counters under .NET Memory that can be used to check collections and size remaining in the generations. Here are some of the more useful counters.

Table 1.
  
# Bytes in HeapsTotal number of bytes in all generations and the large object heap. This shows how much memory the GC is using for your objects.
# Gen 0 CollectionsNumber of collections of generation 0. There are also counters for generation 1 and 2.
% Time in GCPercent of time spent in GC threads.
Gen 0 Heap SizeSize remaining in generation 0. There are also counters for generation 1 and 2.
Large Object Heap SizeTracks the large object heap remaining bytes.
How would you rate the quality of this article?
1 2 3 4 5
Poor Excellent
Tell us why you rated this way (optional):

Article Rating
The average rating is: No-one else has rated this article yet.

Article rating:3.85454545454545 out of 5
 55 people have rated this page
Article Score21803
Sponsored Links