C# Help - C# Community and Information

Garbage collection

August 27th, 2006 in C# Language by admin


.Net is the much hyped revolutionarytechnology gifted to the programmer's community by Microsoft. Manyfactors make it a must use for most developers. In this article wewould like to discuss one of the primary advantages of .NET framework -the ease in memory and resource management.

Every program uses resources of one sort oranother-memory buffers, network connections, database resources, and soon. In fact, in an object-oriented environment, every type identifiessome resource available for a program's use. To use any of theseresources, memory must be allocated to represent the type.

The steps required to access a resource are as follows:

1. Allocate memory for the type that represents the resource.
2. Initialize the memory to set the initial state of the resource and to make the resource usable.
3. Use the resource by accessing the instance members of the type (repeat as necessary).
4. Tear down the state of the resource to clean up.
5. Free the memory.


The garbage collector (GC) of .NET completely absolves the developer from tracking memory usage and knowing when to free memory.

The Microsoft� .NET CLR (Common LanguageRuntime) requires that all resources be allocated from the managedheap. You never free objects from the managed heap-objects areautomatically freed when they are no longer needed by the application.

Memory is not infinite. The garbage collector must perform a collectionin order to free some memory. The garbage collector's optimizing enginedetermines the best time to perform a collection, (the exact criteriais guarded by Microsoft) based upon the allocations being made. Whenthe garbage collector performs a collection, it checks for objects inthe managed heap that are no longer being used by the application andperforms the necessary operations to reclaim their memory.

However for automatic memory management, thegarbage collector has to know the location of the roots i.e. it shouldknow when an object is no longer in use by the application. Thisknowledge is made available to the GC in .NET by the inclusion of aconcept know as metadata. Every data type used in .NET softwareincludes metadata that describes it. With the help of metadata, the CLRknows the layout of each of the objects in memory, which helps theGarbage Collector in the compaction phase of Garbage collection.Without this knowledge the Garbage Collector wouldn't know where oneobject instance ends and the next begins.


Garbage Collection Algorithm

Application Roots

Every application has a set of roots. Rootsidentify storage locations, which refer to objects on the managed heapor to objects that are set to null.

For example:

� All the global and static object pointers in an application.
� Any local variable/parameter object pointers on a thread's stack.
� Any CPU registers containing pointers to objects in the managed heap.
� Pointers to the objects from Freachable queue
The list of active roots is maintained by the just-in-time (JIT)compiler and common language runtime, and is made accessible to thegarbage collector's algorithm.


Implementation

Garbage collection in .NET is done using tracing collection and specifically the CLR implements the Mark/Compact collector.

This method consists of two phases as described below.

Phase I: Mark
Find memory that can be reclaimed.

When the garbage collector starts running, itmakes the assumption that all objects in the heap are garbage. In otherwords, it assumes that none of the application's roots refer to anyobjects in the heap.

The following steps are included in Phase I:

1. The GC identifies live object references or application roots.
2. It starts walking the roots and building a graph of all objects reachable from the roots.
3. If the GC attempts to add an object already present in the graph,then it stops walking down that path. This serves two purposes. First,it helps performance significantly since it doesn't walk through a setof objects more than once. Second, it prevents infinite loops shouldyou have any circular linked lists of objects. Thus cycles are handlesproperly.

Once all the roots have been checked, thegarbage collector's graph contains the set of all objects that aresomehow reachable from the application's roots; any objects that arenot in the graph are not accessible by the application, and aretherefore considered garbage.

Phase II: Compact

Move all the live objects to the bottom of the heap, leaving free space at the top.

Phase II includes the following steps:
1. The garbage collector now walks through the heap linearly, looking for contiguous blocks of garbage objects (now considered free space).
2. The garbage collector then shifts the non-garbage objects down in memory, removing all of the gaps in the heap.
3. Moving the objects in memory invalidates all pointers to the objects. So the garbage collector modifies the application's roots so that the pointers point to the objects' new locations.
4. In addition, if any object contains a pointer to another object, the garbage collector is responsible for correcting these pointers as well.
After all the garbage has been identified, all the non-garbage has beencompacted, and all the non-garbage pointers have been fixed-up, apointer is positioned just after the last non-garbage object toindicate the position where the next object can be added.


Finalization

.Net Framework's garbage collection implicitlykeeps track of the lifetime of the objects that an application creates,but fails when it comes to the unmanaged resources (i.e. a file, awindow or a network connection) that objects encapsulate.

The unmanaged resources must be explicitlyreleased once the application has finished using them. .Net Frameworkprovides the Object.Finalize method: a method that the garbagecollector must run on the object to clean up its unmanaged resources,prior to reclaiming the memory used up by the object. Since Finalizemethod does nothing, by default, this method must be overridden ifexplicit cleanup is required.

It would not be surprising if you willconsider Finalize just another name for destructors in C++. Though,both have been assigned the responsibility of freeing the resourcesused by the objects, they have very different semantics. In C++,destructors are executed immediately when the object goes out of scopewhereas a finalize method is called once when Garbage collection getsaround to cleaning up an object.

The potential existence of finalizerscomplicates the job of garbage collection in .Net by adding some extrasteps before freeing an object.

Whenever a new object, having a Finalizemethod, is allocated on the heap a pointer to the object is placed inan internal data structure called Finalization queue. When an object isnot reachable, the garbage
collector considers the object garbage. Thegarbage collector scans the finalization queue looking for pointers tothese objects. When a pointer is found, the pointer is removed from thefinalization queue and appended to another internal data structurecalled Freachable queue, making the object no longer a part of thegarbage. At this point, the garbage collector has finished identifyinggarbage. The garbage collector compacts the reclaimable memory and thespecial runtime thread empties the freachable queue, executing eachobject's Finalize method.

The next time the garbage collector isinvoked, it sees that the finalized objects are truly garbage and thememory for those objects is then, simply freed.

Thus when an object requires finalization, itdies, then lives (resurrects) and finally dies again.It is recommended to avoid using Finalize method, unless required.Finalize methods increase memory pressure by not letting the memory andthe resources used by that object to be released, until two garbagecollections. Since you do not have control on the order in which thefinalize methods are executed, it may lead to unpredictable results.


Garbage Collection Performance Optimizations

� Weak references
� Generations


Weak References

Weak references are a means of performance enhancement, used to reduce the pressure placed on the managed heap by large objects.

When a root points to an abject it's called astrong reference to the object and the object cannot be collectedbecause the application's code can reach the object.

When an object has a weak reference to it, itbasically means that if there is a memory requirement & the garbagecollector runs, the object can be collected and when the applicationlater attempts to access the object, the access will fail. On the otherhand, to access a weakly referenced object, the application must obtaina strong reference to the object. If the application obtains thisstrong reference before the garbage collector collects the object, thenthe GC cannot collect the object because a strong reference to theobject exists.

The managed heap contains two internal datastructures whose sole purpose is to manage weak references: the shortweak reference table and the long weak reference table.

Weak references are of two types:
� A short weak reference doesn't track resurrection.
i.e. the object which has a short weak reference to itself is collected immediately without running its finalization method.

� A long weak reference tracks resurrection.
i.e. the garbage collector collects object pointed to by the long weakreference table only after determining that the object's storage isreclaimable. If the object has a Finalize method, the Finalize methodhas been called and the object was not resurrected.


These two tables simply contain pointers to objects allocated withinthe managed heap. Initially, both tables are empty. When you create aWeakReference object, an object is not allocated from the managed heap.Instead, an empty slot in one of the weak reference tables is located;short weak references use the short weak reference table and long weakreferences use the long weak reference table.

Consider an example of what happens when thegarbage collector runs. The diagrams (Figure 1 & 2) below show thestate of all the internal data structures before and after the GC runs.

 

Now, here's what happens when a garbage collection (GC) runs:

 

  1. The garbage collector builds a graph ofall the reachable objects. In the above example, the graph will includeobjects B, C, E, G.
  2. The garbage collector scans the short weak reference table. If apointer in the table refers to an object that is not part of the graph,then the pointer identifies an unreachable object and the slot in theshort weak reference table is set to null. In the above example, slotof object D is set to null since it is not a part of the graph.
  3. The garbage collector scans the finalization queue. If a pointer inthe queue refers to an object that is not part of the graph, then thepointer identifies an unreachable object and the pointer is moved fromthe finalization queue to the freachable queue. At this point, theobject is added to the graph since the object is now consideredreachable. In the above example, though objects A, D, F are notincluded in the graph they are treated as reachable objects becausethey are part of the finalization queue. Finalization queue thus getsemptied.
  4. The garbage collector scans the long weak reference table. If apointer in the table refers to an object that is not part of the graph(which now contains the objects pointed to by entries in the freachablequeue), then the pointer identifies an unreachable object and the slotis set to null. Since both the objects C and F are a part of the graph(of the previous step), none of them are set to null in the longreference table.
  5. The garbage collector compacts the memory, squeezing out the holesleft by the unreachable objects. In the above example, object H is theonly object that gets removed from the heap and it's memory isreclaimed.

Generations
Since garbage collection cannot complete without stopping the entireprogram, they can cause arbitrarily long pauses at arbitrary timesduring the execution of the program. Garbage collection pauses can alsoprevent programs from responding to events quickly enough to satisfythe requirements of real-time systems.

One feature of the garbage collector thatexists purely to improve performance is called generations. Agenerational garbage collector takes into account two facts that havebeen empirically observed in most programs in a variety of languages:
   1. Newly created objects tend to have short lives.
   2. The older an object is, the longer it will survive.

Generational collectors group objects by ageand collect younger objects more often than older objects. Wheninitialized, the managed heap contains no objects. All new objectsadded to the heap can be said to be in generation 0, until the heapgets filled up which invokes garbage collection. As most objects areshort-lived, only a small percentage of young objects are likely tosurvive their first collection. Once an object survives the firstgarbage collection, it gets promoted to generation 1.Newer objectsafter GC can then be said to be in generation 0.The garbage collectorgets invoked next only when the sub-heap of generation 0 gets filledup. All objects in generation 1 that survive get compacted and promotedto generation 2. All survivors in generation 0 also get compacted andpromoted to generation 1. Generation 0 then contains no objects, butall newer objects after GC go into generation 0.

Thus, as objects "mature" (survive multiplegarbage collections) in their current generation, they are moved to thenext older generation. Generation 2 is the maximum generation supportedby the runtime's garbage collector. When future collections occur, anysurviving objects currently in generation 2 simply stay in generation 2.

Thus, dividing the heap into generations ofobjects and collecting and compacting y
ounger generation objectsimproves the efficiency of the basic underlying garbage collectionalgorithm by reclaiming a significant amount of space from the heap andalso being faster than if the collector had examined the objects in allgenerations.

A garbage collector that can performgenerational collections, each of which is guaranteed (or at least verylikely) to require less than a certain maximum amount of time, can helpmake runtime suitable for real-time environment and also prevent pausesthat are noticeable to the user.

Myths Related To Garbage Collection

GC is necessarily slower than manual memory management.
Counter Explanation:
Not necessarily. Modern garbage collectors appear to run as quickly asmanual storage allocators (malloc/free or new/delete). Garbagecollection probably will not run as quickly as customized memoryallocator designed for use in a specific program. On the other hand,the extra code required to make manual memory management work properly(for example, explicit reference counting) is often more expensive thana garbage collector would be.

GC will necessarily make my program pause.
Counter Explanation:
Since garbage collectors usually stop the entire program while seekingand collecting garbage objects, they cause pauses long enough to benoticed by the users. But with the advent of modern optimizationtechniques, these noticeable pauses can be eliminated.

Manual memory management won't cause pauses.
Counter Explanation:
Manual memory management does not guarantee performance. It may causepauses for considerable periods either on allocation or deallocation.

Programs with GC are huge and bloated; GC isn't suitable for small programs or systems.
Counter Explanation:
Though using garbage collection is advantageous in complex systems,there is no reason for garbage collection to introduce any significantoverhead at any scale.

I've heard that GC uses twice as much memory.
Counter Explanation:
This may be true of primitive GCs, but this is not generally true ofgarbage collection. The data structures used for GC need be no largerthan those for manual memory management.

Most Commented Articles :

Leave a Reply

RSS Feed Follow Us on Twitter!