Objective
After reading this article Java programmers should be able to decipher
and de-jargonize the .NET architecture and relate it with the proposed
ECMA standard.
Target Audience
Java programmers and system architects.
Summary
This article outlines Microsoft's proposed standardization of .NET framework
in ECMA forum as CLI (Common Language Infrastructure), but the Microsoft
documentation refer this as CLR (Common Language Runtime). The CLR and
JVM are compared with respect to market forces which shaped the CLR definition.
Components of CLR are examined followed by details of Microsoft's implementation
of the CLR as the .NET framework.
All along .NET framework is compared with Java architecture.
The material is derived from author's own experience with Java since
early 1996, Microsoft's MSDN site
and standard documents from sites like ECMA
and W3C.org.
Overview
.NET framework is the Microsoft's answer to Java
commune's objections to \"Windonization\" of Java.
Microsoft introduces a new language C#, designed by the Visual J++ team.
But in the process it has done away with DCOM and also have changed it's
flagship language Visual Basic.
In a nutshell, .NET constitutes presently of three compiled languages
C#, VB.NET and C++, a Java like runtime virtual machine environment, five
execution containers hosting this runtime, namely: ASP.NET, Windows Shell,
VBA scripting host for Office suite, Visual Forms container and IE (Internet
Explorer). Much like Java it contains a rich set of API and lib.
Enhancements over Java framework include use of SOAP (Simple Object
Access Protocol) for remoting. Version and security scoping using concept
of Application Assembly (described later). A Common Type System is introduced
for making mixed language programming easier. For example a VB component
can inherit from a C# class.
In longer term Java and .NET will converge and therefore an overview
of the new framework is presented here from Java programmer's perspective.
Comparing CLR with JVM
The .NET framework's Common Language Runtime (CLR) is much similar to Java
Virtual Machine (JVM), in terms of garbage collection, security, just in
time compilation (JIT).
However, the fundamental difference arises from the variance in perception
of the Sun's Java design team headed by James Gosling and that of Microsoft's
C# designers spear headed by Anders Hejlsberg
Sun viewed the Internet as an heterogeneous
network consisting of multiple operating systems. Thus Sun had to design
the GUI as the least common factor, supportable by all such platform. This
was also the major reason of Java's failure in client side applications.
Java has been successful only on server side where there is no great need
for GUI.
Having failed at client side desktop application arena, Sun is now targeting
Java to server side applications market, which is dominated by Unix and
Linux flavors having approximately 60% of the server market, the rest 40%
rests with Windows NT.
But this view was not conducive to Microsoft, which holds about 90%
of client side desktop market. Microsoft wanted to provide a window centric
Internet development platform. Thus it added a few Window specific features
in it's Java implementation, similar to what it had had done with it's
C++ implementation. This along with Microsoft's refusal to support Java
RMI, which competed with it floundering remoting technology called DCOM,
resulted a law suite. Microsoft lost the law suite in late 2000, and had
to pay USD 20 million to Sun as settlement amount. This antagonist
attitude made Microsoft break away from Java and float it's own language
called C#.
The C# team was carved out of the Microsoft J++ team, and it's effort
finally led to the creation of .NET framework.
Microsoft intends to leverage it's desktop leadership, to shape the
Internet applications development by introducing the .NET framework. Thus
the supported languages map the Windows GUI more closely in it's framework,
much similar to C++ MFC and J++ WFC (Windows Foundation Classes). In spite
of the platform independence design claims, all the three supported languages
produce windows .exe code by default.
Microsoft played the standardization game better than Sun. Microsoft,
though being an USA based company proposed the C# and Common Language Infrastructure
(CLI), the back bone of .NET framework, for standardization with ECMA (European
Computer Manufacturing Association) TC39 Technical Committee in October
2000. Ironically Sun also happens to be a member of this standing committee,
which looks after computer languages related standardization issues. See
http://www.ecma.ch.
Microsoft has also successfully standardized Simple Object Access protocol
(SOAP) through W3C (http://www.w3c.org).
SOAP is a XML and HTTP based remote object access protocol. SOAP competes
with Java's RMI and Microsoft's own DCOM. RMI has the limitation of being
language specific, and DCOM had limited acceptability outside the Windows
community, this was, despite the best of Microsoft's effort to port DCOM
on Unix platforms.
CORBA, another remoting contender, which even has internet specific
transport namely IIOP, is more or less dead, due to it's vendor non interoperability.
SOAP, by virtue of HTTP transport can operate easily over firewalls
and therefore can easily transident LAN and Internet. However, SOAP
being XML based, burdens both client and server for XML parsing,
which is relatively CPU intensive, compared to binary protocols like RMI
and DCOM.
Java platform views the Internet world as one language running
on different operating systems (OS), whereas .NET framework views the world
running on one OS with a programmers having choice of multiple languages.
Therefore Java platform interpolates multiple operating systems, and .NET
framework interpolates multiple languages.
Apparently from the above discussion, the market forces are largely
responsible for the state-of-the-art rather than technical design considerations.
Inside The Common Language Runtime
The Common Language Runtime (CLR), is the runtime environment of the .NET
framework, which manages the execution of code and provides services.
The Common Language Runtime (CLR), is also proposed for ECMA standard.
However, The ECMA documents refer the CLR as Common Language Infrastructure
(CLI). It has five components namely:
-
CTS - Common Type System
-
CLS - Common Language Specification
-
CIL - Common Intermediate Language
-
JIT - Just in Time Compiler
-
VES - Virtual Execution System
CLI - Common Language Infrastructure
The Common Language Infrastructure (CLI) provides a language neutral platform
for application development and deployment. CLI supports both Object Oriented
Paradigm (OOP) as well as hooks for modeling procedural and structured
languages.
CLI provides languages with a framework for security, garbage collection,
exception handling and also provides a platform for language interoperability.
For example C# objects can inherit from C++ classes and VB procedures can
use the C# components.
Please Note that the Microsoft documentations refer CLI as CLR (Common
Language Runtime).
After reading through the ECMA standard documents, like me, you will
probably develop the feeling that CLI is an attempt to standardize the
next generation Java framework for accommodating the older pre Internet
era languages like VB and C++.
The five components of the CLI is briefly described below.
CTS - Common Type System
The Common Type System, support both Object Oriented Programming like Java
as well as Procedural languages like 'C'. It deals with two kinds of entities:
Objects and Values. Values are the familiar atomic types like integers
and chars. Objects are self defining entities containing both methods and
variables.
Objects and Values can be categorized into the following hierarchy:
Types can be of two kinds Value Types and Reference Types. Value Types
can further categorized into built-in (for example Integer Types and Float
Type) and user defined types like Enum.
Reference Type can be divided into three sub categories: Self Describing
Reference Type, Pointers and Interfaces. Pointers can be sub divided into
Function pointers, Managed and Unmanaged Types.
Value Types can be converted into Reference Type, and this conversion
is called Boxing of Values. De-referencing the Boxed Value Types from the
Referenced Type is called Un-Boxing.
Casting rules from one type to another, for example conversion of char
to integer types are also defined within the Common Type System.
Common Type System also defines scope and assemblies. An assembly
is a configured set of loadable code modules and other resources that together
implement a unit of functionality. A scope is a collection of grouped names
of different kinds of values or reference types.
CLS - Common Language Specification
The Common Language Specification (CLS) aids the development of mixed language
programming. It defines a subset of Common Type System which all class
library providers and language designers targeting CLR must adhere to.
CLS is a subset of CTS. If a component written in one
language (say C#) is to be used from another language (say VB.NET), then
the component writer must adhere to types and structures defined by CLS.
CIL - Common Intermediate Language
All compilers complying with CLI must generate an intermediate language
representation called Common Intermediate Language (CIL). The CLI uses
this intermediate language to either generate native code or use Just In
Time (JIT) compilation to execute the intermediate code on the fly.
The Microsoft documents refer this standard's implementation as MSIL
(Microsoft Intermediate Language).
JIT - Just in Time Compiler
The JIT or Just in Time Compiler is the part of the runtime execution environment,
which is used to convert the intermediate language contained in the executable
file, called assemblies, into native executable code.
The security policy settings are referred at this stage to decide if
code being compiled needs to be type safe. If not an exception is thrown
and JIT process is aborted.
VES - Virtual Execution System
Virtual Execution System (VES), is more or less equivalent to the
JVM (Java Virtual Machine).
VES loads, links and runs the programs written for Common Language Infrastructure
contained in Portable Executable (PE) files.
Virtual Execution System (VES) fulfills it's loader function by using
information contained in the metadata and uses late binding (or linking)
to integrate modules compiled separately, which may even be written in
different languages.
VES also provides services during execution of the codes, that include
automatic memory management, profiling and debugging support, security
sandboxes, and interoperability with unmanaged code, such as COM components.
Managed codes are Intermediate Language (IL) code along with metadata
contained in Portable Executable (PE) files, these may be .EXE or .DLL.
This needs just in Time (JIT) compiler to convert it into native executable
code. There is also a provision of pre compiled executable which is called
unmanaged code. The advantage of unmanaged code is that is does not need
to JIT compilation but has the disadvantage of unportablity across different
Operating System (OS) platforms.
Microsoft's Implementation of CLI is CLR
The Microsoft's implementation and adaptation of the above standard has
resulted in difference in terminology, for example Common Intermediate
Language (CIL) is called Microsoft Intermediate Language (MSIL) and Common
Language Infrastructure (CLI) is referred to as Common Language Runtime
(CLR).
These changes in naming convention, I believe, is to create a branding
distinction while implementing the standards. This was probably intended
to avoid the clash that occurred with the Java the language standard, Java
the island, Java the coffee brand and Java the Sun's trademark! But, in
the long run, it will only lengthen the already long list of confusing
acronyms and jargons in the programmer's dictionary.
We use CLI and CLR interchangeably, however, it will be more correct
to say that CLR is the Microsoft's implementation of CLI.
Apart from scripted languages like JavaScript and VBScript, the .NET
framework presently supports three compiled languages, namely: VB.NET,
VC++ and C# (pronounced C Sharp) These language compilers target
this runtime. The type verifiable compiler's output is called managed
code.
Unsafe codes can also be generated by compilers, which is called unmanaged
code. Garbage collection is only handled for managed codes.
The managed code has access to Common Language Runtime (CLR) features
such as multi- language integration, exception handling across language
boundaries, security and versioning and a simplified deployment .
An interesting facility being experimented by microsoft is the cross
language inheritance. For example, a C# class can inherit from a VB object!
Each of these features will be discussed in detail later.
The CLR provides services to the managed code. The language compilers
emit metadata, that describes the types, members, and references
in the code. Metadata is stored along with the code: every loadable common
language runtime image contains metadata.
The metadata helps the CLR to locate and load classes, lay out
instances in memory, resolve method invocations, generate native code,
enforce security, and set up run time context boundaries.
The CLR, much like Java Virtual Machine (JVM) provides automatic garbage
collection facilities to the managed code, this garbage collection feature
is called managed data. But unlike Java VM, the CLR also has mechanism
to syntactically switch off automatic garbage collection called unmanaged
data, where the programmer is responsible for garbage collection.
The CLR has been designed to facilitate cross language integration.
Two kind of integration is possible: tightly coupled and loosely coupled,
which is also called remoting. The tightly coupled inter language
method call is achieved within the CLR; this assumes that the two languages
calling each other are both .NET framework compliant like VC++, VB.NET
or C# or are at least COM compliant. Thus C# programs can talk to Java
programs through ActiveX Java Bean bridge! This is assuming that both the
C# and Java codes reside on a single computer.
Remoting or loosely coupled inter language interaction is suitable when
the two interacting programs written in different languages are on different
operating system (OS) platforms, like C# client residing on Windows CE
talking to Solaris based server side Java code. This integration is achieved
through an XML based protocol called Simple Object Access Protocol (SOAP)
which was proposed by Microsoft and is adopted by W3C
consortium (http://www.w3c.org). An open source SOAP gateway
implementation of Java is available from Apache.org
at http://xml.apache.org.
SOAP has transport layer independent, XML formatted content and currently
HTTP and SMTP transport implementations are available from both Microsoft
and Apache.org for .NET framework and Java platforms respectively .
All .NET framework components carry information about the components
and resources they use, in a XML formatted document called metadata. The
runtime, uses this information to dynamically link the components, ensuring
version integrity and security controls; This makes the application theoretically
more resilient against version changes. Only time will tell if this innovation
is successfully implemented.
Another good feature introduced in this new framework is reduction of
Windows system registry dependency. Registration information and state
data are no longer stored in the system registry, but inside the metadata.
This should make the server side component deployment much easier.
.NET framework's Common Language Runtime (CLR) claims to have the ability
to compile once and run on any CPU and operating system that supports the
runtime. We will see if this becomes a real possibility in near future.
Common Intermediate Language (CIL)
The .NET framework's implementation of Common Intermediate Language
(CIL) is called Microsoft Intermediate Language (MSIL). Unless specified
otherwise, we will use the terms Intermediate Language (IL), MSIL and CIL
interchangeably.
Managed code is produced by one of the three compilers which translate
the source code into Microsoft intermediate language (MSIL).
Common Intermediate Language (CIL) and therefore it's Microsoft rendering
called Microsoft intermediate language (MSIL) is said to be a CPU independent
set of instructions that can be efficiently converted to native code.
MSIL intermediate instruction set has instructions for loading,
storing, initializing, object method calling , many conventional
instructions for arithmetic and logical operations, control flow, direct
memory access, and exception handling. All the three languages included
in this framework have Java like \"try catch\" exception handling facility.
Just like Java, before the managed code is executed, the intermediate
language is converted to CPU specific code by a just in time (JIT)
compiler. The runtime supplies one or more JIT compilers for each computer
architecture it supports. However, the code can be compiled into native
form during installation itself.
When a Common Language Specification (CLS) compliant compiler produces
Common Intermediate Language (CIL), it also produces metadata, describing
the Common Language Types (CLT) specific types used in the code, including
the definition of each type, the signatures of each type's members, the
members that the code references, and other data that the runtime uses
at execution time.
The MSIL and metadata are contained in a portable executable (PE) file
which is an extension of the Microsoft Portable Executable (PE) and Unix
world's Common Object File Format (COFF) used for executable content.
They appear to the user as the familiar .EXE and .DLL files.
One of the fundamental differences between Java Virtual Machine (JVM)
instruction sets and Common Intermediate Language (CIL) is that JVM is
big endian ( most significant byte first) and CIL uses little endian (
least significant byte first) binary representation. This difference will
not be apparent to most of the programmers. Only system level programmers
would have to deal with it.
The file format, can accommodate either of Common Intermediate Language
or native code as well as metadata, a signature pattern enables the operating
system to recognize Common Language Runtime images.
The presence of metadata in the executable file enables the components
to be self descriptive. This eliminates the need for additional type libraries
or Interface Definition Language (IDL) used in DCOM and CORBA. The runtime
locates and extracts the metadata from the file as necessary during execution.
Managed Execution
There are two kinds of codes that can exist inside the executable files
now, the old machine dependent codes, like existing ActiveX controls, are
called unmanaged
As mentioned earlier, there are currently three compiled languages C#,
C++ and VB provided by Microsoft, which target the Common Language Runtime
(CLR). This runtime is a multi-language execution environment, and supports
a common base of data types and language features. however, the language
compiler determines what subset of the runtime's functionality is available,
and the design pattern of the code is influenced by the features exposed
by the compiler.
The coding syntax is determined by the compiler, not by
the runtime. If the component is required to be completely usable by components
written in other languages, it must use only language features that are
included in the Common Language Specification (CLS) in the component's
exported types.
Application Domains
Application domains are light weight process. It can be visualized as an
extension of Java's sandbox security and Thread model.
The Common Language Runtime provides a secure, lightweight unit
of processing called an application domain. Application domains also enforce
security policy.
By light weight it means that multiple application domains run
in a single Win32 process, yet they provide a kind of fault isolation,
that is fault in one application domain does not corrupt other application
domains. This aids in enhancing execution security against viruses as well
as helps in debugging faulty codes.
The Common Language Runtime relies on type safety and verifiability
features of Common Type System (CTS) to provide fault isolation between
application domains. Since type verification can be conducted statically
before execution, it is cost efficient and needs less security support
from microprocessor hardware.
Each application can have multiple application domains associated with
it. And each application domain has a configuration file, containing security
permissions. This configuration information is used by the Common Language
Runtime to provide sandbox security similar to that of Java sandbox model.
Although multiple application domains can run within a process,
no direct calls are allowed between methods of objects in different application
domains. Instead, a proxy mechanism is used for code space isolation.
Assemblies
An assembly is the functional unit of sharing and reuse in the Common Language
Runtime. It is the equivalent of JAR (Java Archive) files of Java.
Assembly is a collection of physical files package in a .CAB format
or newly introduced .MSI file format. The assemblies contained in a .CAB
or .MSI files are called static assemblies, they include .NET Framework
types (interfaces and classes) as well as resources for the assembly (bitmaps,
JPEG files, resource files, etc.). They also include metadata that eliminates
the need of IDL file descriptors, which were required for describing
COM components.
The Common Language Runtime also provide API's that script engines use
to create dynamic assemblies when executing scripts. These assemblies are
run directly and are never saved to disk.
Microsoft has greatly diminished the role of Windows Registry
system with introduction of assemblies concept, which is an adaptation
of Java's JAR deployment technology.
Assemblies is an adaptation, but not a copy of Java's JAR technology.
It has been improved upon in some ways, for example it has introduced a
versioning system. However, since the .NET framework is skewed towards
the Windows architecture some of the Java's JAR portability features may
have been sacrificed.
Again, similar to JAR files, the assemblies too contain an entity called
manifest.
However, manifest in .NET framework plays somewhat wider role. Manifest
is a metadata describing the inter-relationship between the entities contained
in the assemblies like managed code, images and multimedia resources. Manifest
also specifies versioning information.
The manifest is basically a deployment descriptor, having XML syntax.
Java programmers can relate it with J2EE (Java 2 Enterprise Edition) deployment
descriptors for EjB (Enterprise Java Beans) applications.
The Microsoft documentation stress that assemblies are \"logical dlls\".
This may be a reasonable paradigm for VB or C++ programmers, but
Java programmers will find it easier, if we visualize assemblies as an
extension of JAR concept. However, unlike JAR, each assembly can have only
one entry point defined, which can be either DllMain, WinMain, or Main.
As stated earlier, Assemblies have a manifest metadata. This contains
version and digitally signed information. This purports to implement version
control and authentication of the software developer. Version and authentication
procedure is carried out by the runtime during loading the assembly into
the code execution area.
Again, much like Java's trusted lib. concept, .NET Assemblies can be
placed in secured area called global assembly cache. This area
is equivalent to trusted class path of Java. Only system administrators
can install or deinstall Assemblies from the global assembly cache. There
is a place for downloaded or transient Assemblies called downloaded
assembly cache. The Assemblies loaded from global assembly cache run
outside the sandbox and have faster load time as well as enjoy more freedom
to access file system resources. The Assemblies loaded from the downloaded
cache area are subject to more security checks, therefore are slower to
load and since they run inside the sandbox; enjoy much less privileges.
Assemblies manifests also contain information regarding sharing of code
by different Applications and Application Domains.
To summarize, the Operating System can have multiple applications running
simultaneously, each such application occupies a separate Win32 process
and can contain multiple Application Domains. An Application Domain can
be constructed from multiple assemblies.
Execution
The Common Language Runtime provides the infrastructure that enables execution
to take place as well as a variety of services that can be used during
execution. Before a method can be executed, it must be compiled to processor
specific code. Each method for which MSIL has been generated is JIT compiled
when it is called for the first time, then executed. The next time the
method is executed, the existing JIT compiled native code is executed.
The process of JIT compiling and then executing the code is repeated until
execution is complete.
As mentioned earlier, the recompilation can be avoided by compiling
the code during installation into native executable code.
During execution, managed code receives services such as automatic memory
management, security, interoperability with unmanaged code, cross language
debugging support, and enhanced deployment and versioning support.
JIT Compilation
Before Intermediate Language (IL) can be executed, it must be converted
by a .NET Framework Just In Time (JIT) compiler to native code, which is
CPU specific code that runs on the same computer architecture that the
JIT compiler is running on.
Microsoft's designers insist that the runtime never interprets any language,
it always executes native code, only conversion to native form may be deferred.
Even the scripting languages like VBScript are now compiled and executed!
The idea behind JIT compilation recognizes the fact that some code may
never get called during execution; therefore, rather than using time and
memory to convert all of the MSIL in a PE (portable executable) file to
native code, it converts the Intermediate Language as it is needed
during execution and store the resulting native code so that it is accessible
for subsequent calls.
The loader creates and attaches a stub to each of the type's methods
when the type is loaded; on the initial call to the method, the stub passes
control to the JIT compiler, which converts the MSIL for that method into
native code and modifies the stub to direct execution to the location of
the native code. Subsequent calls of the JIT compiled method proceed directly
to the native code that was previously generated, reducing the time it
takes to JIT compile and execute the code.
The compilation process (JIT or during installation time) converts the
Intermediate Language (IL) to native code. The code however, must pass
a verification process. Verification examines the Intermediate Language
(IL) and metadata to see whether the code is type safe, that is, it accesses
only the authorized memory locations, Identities are what they claim to
be and reference to a type is compatible with the type referenced. These
features protects the application from bugs and viruses.
During the verification process, Intermediate Language (IL) code is
examined in an attempt to confirm that the code can access memory locations
and call methods only through properly defined types.
Due to design limitation of some programming languages, like 'C',
it's compilers may not be able to produce verifiable type safe codes, such
codes can only be executed from trusted area.
Runtime Hosts
The runtime is typically started and managed by environments like ASP.NET,
IE or the Windows Shell. These hosting environments run managed code on
behalf of the user and take advantage of the application isolation features
provided by application domains. In fact it is the host that determines
where the application domain boundaries lie and in what application domain
user code is run in. The Common Language Runtime provides a set of classes
and interfaces used by hosts to create and manage Application Domains.
There are five Common Language Runtime hosts:
ASP.NET - ASP.NET creates application domains to run user code. Application
domains are created per application as defined by the web server.
Internet Explorer - IE creates an application domain per site.
Windows Shell EXE - Each application that is launched from the command
line runs in a separate application domain.
VBA - VBA runs the script code contained in an Office document in an
application domain.
Windows Forms Designer - The Windows Forms Designer places each form
the user is building in a separate application domain. When the user edits
the form and rebuilds, Windows Forms shuts down the old application domain,
recompiles the code and runs it in a new application domain.
Conclusion
.NET is definitely an improvement over Java framework, but it is NOT
going to displace Java any time soon. Though in coming years Java and .NET
will converge.
It currently lacks support for other platforms. Since .NET has been
architected by Microsoft, it is less likely to find the open source
support base of free thinking programmers, which was one of the main reasons
of Java's popularity.
Java has been there for more than five years now, and Java programmers
have already survived two waves of downturn. First in 1998 when most web
sites weeded out Applets and second in late 2000, when all the VC fueled
DOTCOM hot balloons came down. Scott Adams'
Dilbert strips at http://www.dilbert.com has a good fill of VC and
DOTCOM cartoons.
All remaining employed Java programmers must have a good handle of .NET
architecture to continue to remain employable.
The party is over for DOTCOM, so let's party with DOTNET !!!