C# Collections – What is a Collection?
Admitted, we blew it in the first version of the framework with System.Collections.ICollection, which is next to useless. But we fixed it up pretty well when generics came along in .NET framework 2.0: System.Collections.Generic.ICollection<T> lets you Add and Remove elements, enumerate them, Count them and check for membership.
Obviously from then on, everyone would implement ICollection<T> every time they make a collection, right? Not so. Here is how we used LINQ to learn about what collections really are, and how that made us change our language design in C# 3.0.
Collection initializers
With LINQ, the Language INtegrated Queryframework that we're shipping in Orcas, we're enabling a moreexpression-oriented style of programming. For instance it should bepossible to create and intialize an object within one expression. Forcollections, initialization typically amounts to adding an initial setof elements. Hence collection initializers in C# 3.0 look like this:
new MyNames { "<st1:personname w:st="on">Luke Hoban</st1:personname>", "<st1:personname w:st="on">Karen Liu</st1:personname>", "<st1:personname w:st="on">Charlie Calvert</st1:personname>" }
The meaning of this new syntax is simply to create an instance of MyNamesusing its no-arg constructor (constructor arguments can be supplied ifnecessary) and call its Add method with each of the strings.
So what types do we allow collection initializers on? Easy: collection types. What are those? Obvious: types that implement ICollection<T>. This is a nice and easy design – ICollection<T> ensures that you have an Addmethod so obviously that is the one that gets called for each elementin the collection initializer. It is strongly typed, too – theinitializer can contain only elements of the appropriate element type.In the above new expression, MyNames would be a class that implements ICollection<string> and everything works smoothly from there.
There's just one problem: Nobody implements ICollection<T>!
LINQ to LINQ
Well, nobodyis a strong word. But we did an extensive study of our own frameworkclasses, and found only a few that did. How? Using LINQ of course. Thefollowing query does the trick:
from name in assemblyNames
select Assembly.LoadWithPartialName(name) into a
from c in a.GetTypes()
where c.IsPublic &&
c.GetConstructors().Any(m => m.IsPublic) &&
GetInterfaceTypes(c).Contains(typeof(ICollection<>))
select c.FullName;<o:p></o:p>
Let.s go through this query a little bit and see what it does. For each name in a list of assemblyNames that we pre-baked for the purpose, load up the corresponding assembly:
from name in assemblyNames
yle="color:#008000;"> select Assembly.LoadWithPartialName(name)
One at a time, put the reflection objects representing these assemblies into a, and for each assembly a run through the types c defined in there:
from c in a.GetTypes()
Filter through, keeping each type only if it
a) IsPublic<o:p></o:p>
b) has Any constructor that IsPublic
c) implements ICollection<T> for some T:
where c.IsPublic &&
c.GetConstructors().Any(m => m.IsPublic) &&
GetInterfaceTypes(c).Contains(typeof(ICollection<>))
For those that pass this test, select out their full name:
select c.FullName;<o:p></o:p>
Nothing to it, really.
What is a collection?
What did we find then? Only 14 of our own (public) classes (with public constructors) implement ICollection<T>!Obviously there are a lot more collections in the framework, so it wasclear that we needed some other way of telling whether something is acollection class. LINQ to the rescue once more: With modified versionsof the query it was easy to establish that among our public classeswith public constructors there are:
<<<<<< 189 that have a public Add method and implement System.Collections.IEnumerable
<<<<<< 42 that have a public Add method but do not implement System.Collections.IEnumerable
If youlook at the classes returned by these two queries, you realize thatthere are essentially two fundamentally different meanings of the name .Add.:
a) Insert the argument into a collection, or
b) Return the arithmetic sum of the argument and the receiver.
People are actually very good at (directly or indirectly) implementing the nongeneric IEnumerable interface when writing collection classes, so that turns out to be a pretty reliable indicator of whether an Add method is the first or the second kind. Thus for our purposes the operational answer to the headline question becomes:
A collection is a type that implements IEnumerable and has a public Add method<o:p></o:p>
Which Add to call?
We ain.t done yet, though. Further LINQ queries over the 189 collection types identified above show:
<<<<<< 28 collection types have more than one Add method
<<<<<< 30 collection types have no Add method with just one argument
So, given that our collection initializers are supposed to call .the. Add method which one should they call? It seems that there will be some value in collection initializers allowing you to:
a) choose which overload to call
b) call Add methods with more than one argument
Ourresolution to this is to refine our understanding of collectioninitializers a little bit. The list you provide is not a .list ofelements to add., but a .list of sets of arguments to Add methods.. If an entry in the list consists of multiple arguments to an Add method, these are enclosed in { curly braces }. This is actually immensely useful. For example, it allows you to Add key/value pairs to a dictionary, something we have had a number of requests for as a separate feature.
The initializer list does not have to be homogenous; we do separate overload resolution against Add methods for each entry in the list.
So given a collection class
public class Plurals : IDictionary<string,string> {
public void Add(string singular, string plural); // implements IDictionary<string,string>.Add
public void Add(string singular); // appends an .s. to the singular form
public void Add(KeyValuePair<string,string> pair); // implements ICollection<KeyValuePair<string,string>>.Add
.
}
We can write the following collection initializer:
Plurals myPlurals = new Plurals{ .collection., { .query., .queries. }, new KeyValuePair(.child., .children.) };
which would make use of all the different Add methods on our collection class.
Is this right?
Theresulting language design is a .pattern based. approach. We rely onusers using a particular name for their methods in a way that is notchecked by the compiler when they write it. If they go and change thename of Add to AddPairin one assembly, the compiler won.t complain about that, but insteadabout a collection initializer sitting somewhere else suddenly missingan overload to call.
Here I think it is instructive to look at our history. We already have pattern-based syntax in C# – the foreach pattern. Though not everybody realizes it, you can actually write a class that does not implement IEnumerable and have foreach work over it; as long as it contains a GetEnumerator method. What happens though is that people overwhelmingly choose to have the compiler help them keep it right by implementing the IEnumerable interface. In the same way we fully expect people to recognize the additional benefit of implementing ICollection<T>in the future . not only can your collection be initialized, but thecompiler checks it too. So while we are currently in a situation wherevery few classes implement ICollection<T> this is likely tochange over time, and with the new tweaks to our collection initializerdesign we hope to have ensured that the feature adds value both now andin that future.




04. Sep, 2007 by 







No comments yet... Be the first to leave a reply!