Search Forum
(53671 Postings)
Search Site/Articles

Archived Articles
712 Articles

C# Books
C# Consultants
What Is C#?
Download Compiler
Code Archive
Archived Articles
Advertise
Contribute
C# Jobs
Beginners Tutorial
C# Contractors
C# Consulting
Links
C# Manual
Contact Us
Legal

GoDiagram for .NET from Northwoods Software www.nwoods.com


 
Printable Version

String manipulation, string constructors, string assignment, and the StringBuilder class
By Bryan Miller

Why is string manipulation so important? Because, as programmers, we do so much of it. In fact, on some days, it seems that everything is string manipulation. If you know how to program with strings, you're a great way along the journey to becoming a powerful programmer. Fortunately, C# provides facilities for working with and manipulating strings. In this tutorial we will explain the types and methods related to strings, so that you can make short work of most common string-manipulation tasks in your projects.

This tutorial provides the information you need about:

Understanding C# Strings

In C#, a text string is stored in a data type named string, which is an alias to the System.String type. In other words, when you create a string, you instantiate a System.String object. In addition to its instance members, the System.String class has quite a few important static methods. These methods don.t require a string instance to work. It.s important to understand that the string type is immutable. This means that once it has been created, a string cannot be changed. No characters can be added or removed from it, nor can its length be changed. In those situations in which it appears that a string's contents have been changed, what has really happened is that a new string instance has been created.

But wait, you say, my strings change all the time when I do things like:


string str = "I am having a great deal of fun";
str += " on the DonationCoder website... ";
str += " The End"; 

Well, my friend, what this set of statements actually does is create a new instance of the variable str each time a value is assigned to str. In other words, in this example, an instance of str has been created and a literal string value assigned to it three times. The statement that the type is immutable means that an instance cannot be edited but it can be assigned.

Using an instance method without an assignment is syntactically legal, but it doesn.t do anything (has no effect). As a demonstration of my claim that an instance cannot be edited, examine the following code:


using System;

namespace ConsoleApplication1 {
    class Program {
        static void Main(string[] args) {
            string str = "Mouser";
            str.Replace("u", "U");
            Console.WriteLine(str);
            Console.ReadLine();
        }
    }
}

The above code does not change the contents of the variable str, because an instance cannot be edited. Run the above code in your IDE, and you'll see this result:

However, combining it with an assignment would work:


using System;

namespace ConsoleApplication1 {
    class Program {
        static void Main(string[] args) {
            string str = "Mouser";
            str = str.Replace("u", "U");
            Console.WriteLine(str);
            Console.ReadLine();
        }
    }
}

The value of the newly assigned str instance is, of course, MoUser. If you carefully compare the two source code listings above, you'll see that the only difference occurs in the two highlighted lines. The second source code listing differs only by prefacing that line of code with str =, which tells the compiler to assign whatever follows the equal sign to the str instance of the String type. You may remember, from a previous tutorial in this series, that a type is defined by a class. There is, indeed, a String class. So, when I write string str = "Bryan";, what I've really done is created an instance of the String class called str, then assigned the text Bryan to it.

Is there any real problem with this use of assignment in conjunction with strings? Well, no, it.s easy enough to use assignments when you need to change the value of a string variable. But instantiating and destroying strings every time you need to change or edit one has negative performance consequences. If performance is an issue, you should consider working with instances of the System.Text.StringBuilder class, which are mutable (can be changed). The StringBuilder class is discussed later in this tutorial in a little more detail.

Once created, a string has a specific length.which, of course, is fixed (since the string type is immutable). Unlike in the C language, in which a string is simply an array of characters terminated with a zero byte, a C# string is not a terminated array of characters. So, how does one determine the length of a C# string? As you probably know, you can query the read-only Length property of the string instance to find its length. Let's write a short console program demonstrating the use of this property:


using System;

namespace ConsoleApplication1 {
    class Program {
        static void Main(string[] args) {
            string name = string.Empty;
            do {
                Console.Write("Please enter a name (enter \"quit\" to exit program): ");
                name = Console.ReadLine();
                if (name.Trim().ToLower() != "quit") {
                    Console.WriteLine("\"{0}\", is {1} characters in length.\n", name, name.Length);
                }
            } while (name.Trim().ToLower() != "quit");
        }
    }
}
Now click on this link to see the program in action.


Working with the Char type

Char is a value type that contains a single Unicode character. While chars have a numeric value from hexadecimal 0x0000 through 0xFFFF, they are not directly numerically usable without explicit type casting.

Literal values are assigned to char variables using single quotes. The literal can be a simple, single letter, or an escape sequence (for more on escape sequences, see the next section). Here are two examples:


char chr = 'P'; // contains capital P
char pi = '\u03A0.; // contains capital Pi 

The char data type is related to the string data type: A string can be constructed from (or converted into) an array of characters but string and char are two distinct types.

Suppose you have a string defined like this:

string str = "rotfl";

You can retrieve a character of the string using the string.s indexer. For example, the following statement

char chr = str[1];

stores the character .o. in the variable chr. But you cannot set a character in the string with a statement like str[1] = 'a'; because the indexer property of the String class is read-only. (Actually, one would expect this in any case, because the string instance is immutable.)

If you use the Object Browser to have a look at the String class, you.ll see that its indexer property, declared as this[int], has a get accessor, but no set accessor.




Control Characters

The backslash (\) is a special control character, also called the escape character. The character after the backslash has a special significance. See the following table for the meaning of each escape sequence:

Character

Meaning

\0

Null

\b

Backspace

\t

Tab

\r

Carriage return

\n

New line

\v

Vertical tab

\f

Form feed

\u, \x

A single Unicode character when followed by a four-digit hexadecimal number

\"

Double quote

'

Single quote

\

Backslash

Here is a very brief program that demonstrates use of the backspace escape character:


using System;

class Program{

    static void Main(){
        
	Console.Write("This is line #1\b\b");
        Console.Write("number 2");
        Console.ReadLine();
        
    }//end Main

}//end class Program

The program's output looks like this:

As you can see, there are two backspace escape characters at the end of the first Console.Write() parameter. I've highlighted these in the source code listing above. These have the effect of backing the cursor up to just in front of the # symbol. The subsequent Console.Write() invocation then prints "number 2", starting at the position occupied by the # symbol, and thus effectively overwriting #1.

The escape sequences \u or \x are used (interchangeably) to specify a single Unicode character. For example, \x03A9 and \u03A9 both mean the Greek letter capital omega. The escape sequences \x03A3 and \u03A3 will appear as the Greek letter capital sigma:


using System;

class Program{

    static void Main(){

        Console.Write("This is Omega: {0}.  And this is Sigma: {1} ",
            "\u03A9", "\u03A3");
        
        Console.ReadLine();

    }//end Main

}//end class Program

The output from the above program is:

A complete listing of unicode character sets can be found at http://www.unicode.org/charts/.




String Assignment

There are different ways to create strings in C#. You can, for instance, declare a string variable, without initializing it, such as in this example:

string mystring;

While this is legal, we've also discussed the fact that attempting to use an uninitialized string variable causes a compiler error. For this reason, it's generally a good idea to initialize a string variable when you declare it:

string mystring = "Hello, world!";

Because a string variable is a reference type, it can be initialized to null:

string mystring = null;

And initializing a string to null is not the same thing as initializing it to an empty string. The above is not equivalent to this:

string mystring = string.Empty;

In the case of the null assignment, no memory has been allocated for the string. An attempt to determine the length of a null string by using its Length property causes an exception to be thrown. For an example of this, see the following program listing, and then the output it produces. The program will compile okay, because there is no syntax error. However, attempting to execute the highlighted line causes a runtime error:


using System;

class Program{

    static void Main(){

        string mystring = null;
        Console.WriteLine("Length of mystring is {0}", mystring.Length);
        Console.ReadLine();
        
    }//end Main

}//end class Program


The Length property of an empty string is 0. The preferred way to initialize an empty string is to use the static Empty field of the String class: string str = string.Empty;

However, you will sometimes see it done thusly: string str = "";




The String Constructor

The String class itself has eight constructors. If you look in the Object Browser, you can see that five of these use pointers and are not safe under the rules of the Common Language Specification (CLS) that .NET languages must follow. We will only discuss the other three constructors.

You may recall from a previous tutorial that when a method has more than one form, via differing method signatures, it is said to be overloaded. You also learned in a previous tutorial that a constructor is a special class method that, if present, is invoked whenever a new instance of that class is created. Well, the String class has several overloaded constructors.

The first such overloaded constructor that we'll discuss takes two parameters -- a character and an integer. It then produces a string of that particular character that is x characters in length, where x is the value of the integer passed as the constructor's second argument. This constructor takes the following form:

String(char, int)

For example, the following lines of code, placed inside the Main method of a Console C# application...


string s = new string('a',5);
Console.WriteLine(s);
Console.ReadLine();

...would produce the following output:

The next overloaded string constructor we'll consider takes the following form:

String(char[])

This constructor takes a character array as a parameter, and converts the character array into a string. For example, the following code...


using System;

class Program{

    static void Main(){

        char[] mychararray = new char[5]{'B','r','y','a','n'};
        string mystring = new string(mychararray);
        Console.WriteLine("The 5-element character array has been converted into this string: \"{0}\".", mystring);
        Console.ReadLine();
        
    }//end Main

}//end class Program

...produces this output...

The final overloaded string constructor we'll consider is simply an extension of the constructor we just examined, and takes this form:

String(char[],int,int)

Like the previous constructor, it converts a character array into a string, but it additionally will then derive a string with a particular starting and stopping position within the converted string. For example, if these lines were put into a console program...


char[] mychararray = new char[11]{'N','e','w','Y','o','r','k','C','i','t','y'};
string mystring = new string(mychararray,3,6);
Console.WriteLine(mystring);

...the output would be York.




Verbatim Strings

You could use the @ symbol to use a keyword as an identifier if you were so inclined. You could create variables named, for example, @if, @string, and @true; but, just because one can do something, doesn.t mean it is a good idea.

In the context of strings, the @ symbol is used to create verbatim string literals. The @ symbol tells the string constructor to use the string literal that follows it literally even if it includes escape characters or spans multiple lines.

This comes in handy when working with directory paths (without the @, you would have to double each backslash). For example, the following two strings are equivalent:


string desktop = "C:\Documents and Settings\Owner\Desktop\";     //uses eight slashes
string desktop = @"C:\Documents and Settings\Owner\Desktop\";        //uses four slashes

You can also use the ampersand symbol to cause a single string to span more than one line. For example, consider the following program listing, and the output it produces:


using System;
    
class Program {

    static void Main() {

        string str =
            @"I'm so happy to be a string
            that is split across
            a number of different
            lines.";

        Console.WriteLine(str);
        Console.ReadLine();
    }
}

In console mode, line breaks are preserved. In a WinForms application (we'll learn how to code those in later tutorials), you could assign str to a textbox control that has its multiline property set to true, and all line breaks and whitespace would be faithfully preserved.




Using String Methods

The String class provides many powerful instance and static methods. These methods are described in this section. Most of these methods are overloaded, so there are multiple ways that each can be used. The table below describes many of the instance methods of the String class.

Method

What It Does

Clone

Returns a reference to the instance of the string.

CompareTo

Compares this string with another.

CopyTo

Copies the specified number of characters from the string instance to a char array.

EndsWith

Returns true if the specified string matches the end of the instance string.

Equals

Determines whether the instance string and a specified string have the same value.

GetEnumerator

Method required to support the IEnumerator interface.

IndexOf

Reports the index of the first occurrence of a specified character or string within the instance.

Insert

Returns a new string with the specified string inserted at the specified position in the current string.

LastIndexOf

Reports the index of the last occurrence of a specified character or string within the instance.

PadLeft

Returns a new string with the characters in this instance right-aligned by padding on the left with spaces or a specified character for a specified total length.

PadRight

Returns a new string with the characters in this instance left-aligned by padding on the right with spaces or a specified character for a specified total length.

Remove

Returns a new string that deletes a specified number of characters from the current instance beginning at a specified position.

Replace

Returns a new string that replaces all occurrences of a specified character or string in the current instance, with another character or string.

Split

Identifies the substrings in this instance that are delimited by one or more characters specified in an array, then places the substrings into a string array.

StartsWith

Returns true if the specified string matches the beginning of the instance string.

Substring

Returns a substring from the instance.

ToCharArray

Copies the characters in the instance to a character array.

ToLower

Returns a copy of the instance in lowercase.

ToUpper

Returns a copy of the instance in uppercase.

Trim

Returns a copy of the instance with all occurrences of a set of specified characters from the beginning and end removed.

TrimEnd

Returns a copy of the instance with all occurrences of a set of specified characters at the end removed.

TrimStart

Returns a copy of the instance with all occurrences of a set of specified characters from the beginning removed.

Whereas the above table shows many of the String class instance methods, the following tables shows some of the static methods of the String class.

Method

What It Does

Compare

Compares two string objects.

CompareOrdinal

Compares two string objects without considering the local national language or culture.

Concat

Creates a new string by concatenating one or more strings.

Copy

Creates a new instance of a string by copying an existing instance.

Format

Formats a string using a format specification. See Formatting Overview in online help for more information about format specifications.

Join

Concatenates a specified string between each element in a string to yield a single concatenated string.

Here is a program that demonstrates the use of String.Clone(). What this program shows is that cloning a string returns a reference to that string instance, which can be assigned to another string variable, giving that other string variable a reference to the exact same location in memory. In other words, both strings will reference the same instance, after cloning has occurred.


using System;
    
class Program {

    static void Main() {

        bool test = false;

        string s1 = "William";
        string s2 = "Bryan";
        string s3 = "Miller";

        Console.WriteLine("s1 = {0}, s2 = {1}, s3 = {2}\n", s1, s2, s3);

        s2 = (string)s1.Clone(); //now s2 references the same string instance that s1 does
        Console.WriteLine("s2 = (string)s1.Clone();  //now s2 references the same string instance that s1 does\n");

        test = SameValue(s1, s2);

        if (test) {
            Console.WriteLine("s2 now holds the same value that s1 holds, i.e., \"{0}\"\n", s1);
        }

        test = ReferencesSameObject(s1, s2);

        if (test) {
            Console.WriteLine("s1 and s2 reference the same instance object.\n");
        }

        s2 = s3;

        Console.WriteLine("s2 = s3;    // now s2 = {0} and s3 = {1}\n", s2, s3);

        test = SameValue(s2, s3);

        if (test) {
            Console.WriteLine("s2 and s3 now hold the same value\n");
        }

        test = ReferencesSameObject(s2, s3);

        if (test) {
            Console.WriteLine("s2 and s3 now reference the very same instance.");
        } else {
            Console.WriteLine("s2 and s3 do NOT reference the  same instance.");
        }

        Console.ReadLine();

    }//end Main

    private static bool SameValue(object o1, object o2){
        if (o1.Equals(o2)) {
            return true;
        } else {
            return false;
        }
    }

    private static bool ReferencesSameObject(object o1, object o2) {
        if (Object.ReferenceEquals(o1, o2)) {
            return true;
        } else {
            return false;
        }
    }
}

Here is the output the above program produces:

Whereas the Clone() method of the String object returns a reference to the string instance, the static Copy() method creates a new instance of a string by copying an existing instance, as the following program demonstrates:


// Sample for String.Copy()
using System;

class Sample {
    public static void Main() {
    string str1 = "abc";
    string str2 = "xyz";
    Console.WriteLine("1) str1 = '{0}'", str1);
    Console.WriteLine("2) str2 = '{0}'", str2);
    Console.WriteLine("Copy...");
    str2 = String.Copy(str1);
    Console.WriteLine("3) str1 = '{0}'", str1);
    Console.WriteLine("4) str2 = '{0}'", str2);
    }
}
/*
This example produces the following results:
1) str1 = 'abc'
2) str2 = 'xyz'
Copy...
3) str1 = 'abc'
4) str2 = 'abc'
*/

Next we have the CompareTo() method. This method returns an integer whose value signals how the two strings compare. First, I'll show you code that will throw an exception. The comment in the following program listing explains what the problem is that leads to an exception being thrown.


using System;

public class Program {
    public static void Main() {

        MyClass my = new MyClass();
        
        string s = "sometext";
        
        try {
            int i = s.CompareTo(my); //comparing string s to class instance my (which doesn't evaluate to a string)
        } catch (Exception e) {
            Console.WriteLine("Error: {0}", e.ToString());
        }
        
        Console.ReadLine();
    }
}

public class MyClass { }

Now, unlike the above program, which demonstrates the throwing of an exception due to an invalid parameter, let's see a program that demonstrates the other three cases we might encounter when dealing with the CompareTo() instance method:

If you wish to view the source code that produces the output shown above, click here.

The example program here produces this output:

A fairly straightforward instance method of the String class is the EndsWith() method. It is used to determine if a specified string matches the end of an instance string. Consider this program listing, and the output it creates (shown below):

Now, the above program uses a couple of things we have not covered yet in any of our tutorials -- the foreach loop, and arrays. For now, don't worry about those. The lines of code that are important in this example are these lines, which allow the program to determine how many Millers and Pikes are in the names array:


if(name.EndsWith("Miller")){
    m++;
}

if(name.EndsWith("Pike")){
    p++;
}

The integer variables m and p permit us to track how many Millers and Pikes we come across as we traverse the entire string array of names. By incrementing one or the other of these values when we encounter the corresponding surname in our array, we end up, after finishing stepping through each member of the array, having an accurate count of the number of incidents of each surname.

The next String instance method we'll demonstrate is the String.Equals() method, which simply determines if the instance string and another string have the same value. An example program can be found here. I'll not dwell on this particular method, because its usage is fairly obvious, and I think the example program succinctly demonstrates this.

Now we come to the String.GetEnumerator() method, which is included in order for programming languages that support the IEnumerator interface to iterate through members of a collection. For example, the Microsoft Visual Basic and C# programming languages' foreach statement invokes this method to return a CharEnumerator object that can provide read-only access to the characters in a given string instance. This is not a method you'll need in your day to day programming tasks.

Moving on, we come to the String.IndexOf() method, which reports the index of the first occurrence of a specified character or string within the instance. Remembering that the first position in a string is position 0, what value do you think i holds below?


string s = "abcde";
int i = s.IndexOf("b");
If you said "i = 1", you are correct! How about character e? Position 4, right!

It is worth noting that this method can be useful whenever you want to determine if a string contains a particular character or substring. The integer value returned by this method, if the specified character or string is not found, is negative one, i.e., -1. This can come in handy in any number of common programming tasks.

Another very handy instance method of the String object is String.Substring() instance method, which returns a substring (i.e., a string that lies within the string) from the instance string, by acting upon two integer values you pass the method as arguments for its parameters. The form that this method takes is String.Substring(start, length), where start is the position within the string where the substring starts, and length is the number of characters, beginning with the character found at position start, that the substring holds.

Perhaps an example will help. Let's take the word internet. To extract the string net from it, we'd have something like this:

 
string s = "internet";
string t =  s.Substring(5,3);

The String.Insert() method returns a new string with a specified string of characters inserted into the instance string at the specified position. For example, let's say we have a string holding a first name and a last name, and we want to insert a nickname, in quotation marks, in between the two:


            string s = "Josh Reichler";
            Console.WriteLine("Example #1:\n\n" + s);
            Console.WriteLine();
            Console.WriteLine("after insertion...\n");
            string t = "Mouser ";
            s = s.Insert(5, t);
            Console.WriteLine(s);
            Console.WriteLine("\nExample #2:\n");
            s = "abcefg";
            Console.WriteLine(s + "   //missing letter d\n");
            Console.WriteLine("after insertion...\n");
            t = "d";
            s = s.Insert(3, t);
            Console.WriteLine(s);
            Console.ReadLine();

The code shown above will produce this output:

The String.LastIndexOf() method is rather intutive, and a brief example should serve to demonstrate it adequately:

This program will produce the following output:

The String.PadLeft() method returns a new string with the characters in this instance right-aligned by padding on the left with spaces or a specified character for a specified total length. In this program, whose output is shown below, I use this method to align the decimals in multiple numbers, before adding them together:

I think that by now you should be becoming familiar with the concept of a string's Indexer, and the many ways these string methods take advantage of it to give us useful functionality. PadRight is simply the symmetrical analogue of PadLeft, if you understand EndsWith you can certainly comprehend StartsWith, and we've been using Trim() frequently in our example programs, so you should be able to understand TrimEnd and TrimStart with ease.

Let's examine String.Split(). This is a very useful method that programmers find applicable to many situations. As an example, let's take a sentence, and use this method to split it up into its constituent words. This program will produce the following output:

Whereas the String.Split() method splits a string into an array of strings, based upon a specified character delimiter, the String.Join() static method takes a string array, and joins its members together into one string, using a specified character delimeter. This program demonstrates this in action.




Understanding the StringBuilder class

As I mentioned earlier in this tutorial, instances of the StringBuilder class.as opposed to the String class.are mutable, or dynamically changeable. StringBuilder is located in the System.Text namespace. It does not have nearly as many members as the String class.but it does have enough to get most jobs done. If you have a situation in which you need to perform many string operations.for example, within a large loop.from a performance viewpoint it probably makes sense to use StringBuilder instances instead of String instances. In many ways the StringBuilder behaves like a String. But for string-processing intensive applications, StringBuilder is the better choice because its mutability means not nearly so many instances need to be created.

This program shows one example of the StringBuilder class in action. I leave it to you to explore the StringBuilder further, on your own.





Okay, that's enough teaching for this tutorial. Now let's finish up with a programming exercise:


Programming Assignment: create a C# console application that prompts the user repeatedly for a character. The user can enter whole words, but the program will pull the first character from the entry and will concatenate it onto string mystring. Whenever the user enters the word "quit", the program will stop prompting for characters, and will display the string that has been concatenated from the individual characters which the user has entered.

Sample output:

To view my source code for this program, click here.




Programming Assignment: create a C# console application that finds the very first occurence of a vowel in a word entered by the user. The program should only allow the user to enter three to fifteen characters when prompted for a word. If the user enters a string outside this range, the program should object testily, remind the user of the appropriate range for word length, then again prompt for a word. When the user has entered a word whose length falls in the acceptable range, the program should locate the very first occurrence of a vowel in that word, counting left to right, and should tell the user what position (start counting at zero) the character occupies in the word (The very first letter in a word is in position zero, and is the 1st character. The second letter is in position one, and is the 2nd character, and so on). Continue to prompt the user for words until the user enters the word "quit", which signals the program that the user wishes to exit.

Sample output:

To view my source code for this program, click here.

Well, I hope that you've benefited, at least a little, from this tutorial. As ever, I welcome any feedback you may wish to impart.