| Printable Version
SMTP Internationalization
By Andriy Zolotoiy
You can find many articles dedicated to C# SMTP implementation on this or
other sites. I'm not going to stop on protocol implementation details but rather
on the issue of sending e-mail in languages other than English (I'd use Russian
in our scenario). English-only based e-mail messaging systems use 7-bit
System.Text.Encoding.ASCII encoding when text has to be converted to
sequence of bytes for network transmission. All such applications convert any
non-English characters (hex codes 0x80-0xFF) into '?' meaning that there is no
proper character representation.
Simple solution to this problem is to use System.Text.Encoding
instance that corresponds to source text encoding scheme. Source character set
would usually correspond to one set in Control Panel/Regional Settings:

I use Russian as my default language, so that all Cyrillic characters appear
properly inside text areas and on title bars. Apparently, there is an easy way
to find out what default encoding scheme is used by Windows:
System.Text.Encoding sourceEncoding =
System.Text.Encoding.Default;
A little test
Console.WriteLine( "Windows charset: " +
sourceEncoding.HeaderName );
Console.WriteLine( "Windows code page: " + sourceEncoding.CodePage );
would reveal that we are on the right way:
> Windows charset:
windows-1251
> Windows code page: 1251
Now e-mail can be properly encoded for transmission. We'd just need to add
character set identifier to message header:
text.AppendFormat( "Content-Type:
text/plain;\r\n\tcharset=\"{0:G}\"\r\n",
sourceEncoder.HeaderName );
where text is a StringBuffer variable containing resulting
text. Message body would be transmitted like this:
byte []
data = sourceEncoding.GetBytes( text.ToString()
);
smtpStream.Write( data, 0, data.Length );
That would be all but in real world not everything is that simple. By
historical reasons Russian speaking countries use KOI-8 encoding as de-facto
e-mail standard (not everyone is using Windows and accordingly code page 1251
might not be supported on some DOS or UNIX systems). That's why I set my default
e-mail encoding in Outlook Express to KOI-8 (Options/Send), so I'd be able to
chat with 'non-Windows' buddies:

Some investigation reveals that this value is also present in
default encoding object:
Console .WriteLine(
"E-Mail charset: " + sourceEncoding.BodyName
);
> E-Mail charset: koi8-r
Luckily, there is a static function System.Text.Encoding.Convert()
that can convert text from one encoding scheme to another. Here is a snippet of
code that must be implemented before message is sent. Don't forget that
resulting code page will be different now, so 'Content-Type' charset
header must refer to sourceEncoding.BodyName.
Using System.Text;
// ............
Encoding srcEnc = Encoding.Default;
Encoding dstEnc;
// src & dst refer to same object if no intermediate conversion is required
if( srcEnc.HeaderName.Equals( srcEnc.BodyName ) )
dstEnc = srcEnc;
else
dstEnc = Encoding.GetEncoding( srcEnc.BodyName );
// ............
byte[] srcData = srcEnc.GetBytes( messageString );
byte[] dstData;
// see if we need to convert data
if( dstEnc != srcEnc )
dstData = Encoding.Convert( srcEnc, dstEnc, srcData );
else
dstData = srcData;
// write encoded data
smtpStream.Write( dstData, 0, dstData.Length );
That's all, folks. Full SMTP library source code and help file can be found at
http://www16.brinkster.com/zolotoiy/
|