Search Forum
(53671 Postings)
Search Site/Articles

Archived Articles
712 Articles

C# Books
C# Consultants
What Is C#?
Download Compiler
Code Archive
Archived Articles
Advertise
Contribute
C# Jobs
Beginners Tutorial
C# Contractors
C# Consulting
Links
C# Manual
Contact Us
Legal

GoDiagram for .NET from Northwoods Software www.nwoods.com


 
Printable Version

Weather Screen Scraping with C#
By Waheed Khan

Introduction:

There are many weather sites that they will provide a custom weather sticker with the there site link for the advertising purpose. I did not wanted to put this kind of weather sticker on my portal site, so I decided to write my own custom weather solution for my portal site. In this project I'm parsing a CNN weather site and creating a custom solution for my portal site. For an easy solution you can parse the weather site and grab the HTML table, but I did not want to do this either, instead I wanted to get the actual data, so I can generate my own custom looking weather sticker.

About "Weather.dll" component and content parsing technique:

Weather.dll component is the engine to the Weather data. To parse the weather data, first I connect to the site "http://weather.cnn.com/weather/search?wsearch=" and pass the city code. E.g.; Santa Monica, CA code is SMO, once I get the response back I just start looking for the data between two tags, like if I wanted to look for City and State, I just start looking for beginning and ending tags, and grab the data between two tags. E.g.; in this line "<title>cnn.com - weather- santa monica, ca </title>" I will grab the data between "<title>cnn.com - weather- " and "</title>" tags, and I will continue this process until I match all the data between tags and will expose this data with component public properties.

Here is the Generic parsing method in which we can pass the beginning and ending tags, which will then return a data between the two tags.

    private void GenericParser(string startString, string endString, int nIndex, out int nIndexLast, out string rtnOutput)
    {
      int nIndexStart = 0;
      int nIndexEnd = 0;
      
      nIndex = Find(startString, nIndex);
      nIndexStart = Find(startString, nIndex);
      nIndexStart = nIndexStart + startString.Length; 
      nIndexEnd = Find(endString, nIndex);

      string rtnHTML = rawHtml.Substring(nIndexStart, nIndexEnd - nIndexStart);

      rtnOutput = rtnHTML.Trim().ToString();
      nIndexLast = nIndexEnd;
    }

    private int Find(string strSearch, int nStart)
    {
      return rawHtml.IndexOf(strSearch, nStart);
    }

This component have public properties for Current day forecast, Five Day Forecast, and Error catching properties.

Public properties for the Current Day Forecast

    public string CurrentDayCity;
    public string CurrentDayState;
    public string CurrentDayCountry;
    public string CurrentDayFahrenheit;
    public string CurrentDayCelsius;
    public string CurrentDayPicture;
    public string CurrentDayCondition;
    public string CurrentDayHumidity;
    public string CurrentDayWind;
    public string CurrentDaySunrise;
    public string CurrentDaySunset;

Public properties for five day forecast, which will return an array of 5.

    public string[,] FiveDayDay             = new string[5,1];   // row col
    public string[,] FiveDayHiFahrenheit    = new string[5,1];
    public string[,] FiveDayHiCelsius       = new string[5,1];
    public string[,] FiveDayLowFahrenheit   = new string[5,1];
    public string[,] FiveDayLowCelsius      = new string[5,1];
    public string[,] FiveDayPicture         = new string[5,1];

And finally Public properties for to catch any errors, during a site connection or data parsing errors.

    public string ErrorWeatherEngine;
    public string ErrorSiteConnection;
    public string ErrorWeatherCode;

WeatherTest.cs file is the complete example for using a Weather.dll component in your code.

using System;
using Weather;

class WeatherTest
{
    static void Main()
    {
        
        WeatherData obj = new WeatherData();

        // To pass a CNN Weather Code
        // e.g.; Santa Monica, CA code is SMO
        // e.g.; London, England code is EGLL
        obj.WeatherCode("SMO");

        // To pass a City zip code US only e.g.; 90295
        //obj.WeatherCode(90295);

        // Exceptions
        Console.WriteLine("Site Connection     : {0}", obj.ErrorSiteConnection);
        Console.WriteLine("Weather Engine Error : {0}", obj.ErrorWeatherEngine);
        Console.WriteLine("Weather Code         : {0}", obj.ErrorWeatherCode);    

        // Current Day Weather
        Console.WriteLine("City         : {0}", obj.CurrentDayCity);
        Console.WriteLine("State        : {0}", obj.CurrentDayState);
        Console.WriteLine("County     : {0}", obj.CurrentDayCountry);    
        Console.WriteLine("Fahrenheit : {0}", obj.CurrentDayFahrenheit);
        Console.WriteLine("Celsius     : {0}", obj.CurrentDayCelsius);
        Console.WriteLine("Picture     : {0}", obj.CurrentDayPicture);
        Console.WriteLine("Condition    : {0}", obj.CurrentDayCondition);
        Console.WriteLine("Humidity     : {0}", obj.CurrentDayHumidity);
        Console.WriteLine("Wind         : {0}", obj.CurrentDayWind);
        Console.WriteLine("Sunrise     : {0}", obj.CurrentDaySunrise);
        Console.WriteLine("Sunset     : {0}", obj.CurrentDaySunset);
        
        // Get the Five Day Weather
        for (int row=0; row<5; row++)
        {
            Console.WriteLine("Day         : {0}", obj.FiveDayDay[row,0]);
            Console.WriteLine("HiFahrenheit : {0}", obj.FiveDayHiFahrenheit[row,0]);
            Console.WriteLine("HiCelsius     : {0}", obj.FiveDayHiCelsius[row,0]);
            Console.WriteLine("LowFahrenheit : {0}", obj.FiveDayLowFahrenheit[row,0]);
            Console.WriteLine("LowCelsius    : {0}", obj.FiveDayLowCelsius[row,0]);
            Console.WriteLine("Picture     : {0}", obj.FiveDayPicture[row,0]);
        }


    }
}

In the DOS prompt window when you execute the following WeatherTest.exe file, the output would be.

Site Connection       : Successful!!!
Weather Engine Error  : None
Weather Code          : Found!!!
City         : Santa Monica
State        : CA
County       : CA
Fahrenheit   : 62 F
Celsius      : 17 C
Picture      : partly.cloudy.gif
Condition    : Partly Cloudy
Humidity     : 80%
Wind         : WSW at 7 mph (11 km/h)
Sunrise      : 6:58 AM
Sunset       : 5:10 PM
Day           : Friday
HiFahrenheit  : 81
HiCelsius     : 27
LowFahrenheit : 55
LowCelsius    : 13
Picture       : sunny.gif
Day           : Saturday
HiFahrenheit  : 73
HiCelsius     : 23
LowFahrenheit : 50
LowCelsius    : 10
Picture       : sunny.gif
Day           : Sunday
HiFahrenheit  : 69
HiCelsius     : 21
LowFahrenheit : 48
LowCelsius    : 9
Picture       : sunny.gif
Day           : Monday
HiFahrenheit  : 66
HiCelsius     : 19
LowFahrenheit : 49
LowCelsius    : 9
Picture       : sunny.gif
Day           : Tuesday
HiFahrenheit  : 66
HiCelsius     : 19
LowFahrenheit : 48
LowCelsius    : 9
Picture       : partly.cloudy.gif

Compiling and building your projects:

The zip file contains three projects

1. Weather project (Weather.dll)

  • This project has main weather parsing technique, when you compile this project it will create a component "Weather.dll" under bin folder.

2. Weather Test project (WeatherTest.exe)

  • To test the Weather.dll component copy the DLL file to bin folder, create the reference to the Weather.dll component and compile your WeatherTest project.

3. Weather Web (ASP.NET project)

  • This project is an ASP.NET project.
  • I'm assuming you should have some basic knowledge how to create a virtual folder under IIS Web Server, and how to work with ASP.NET pages.
  • Copy the Weather.dll component file to the virtual bin folder and create a reference to it.
  • I have created two user controls CurrentDay, and FiveDay user controls with public property which is WeatherCode.
  • The WeatherCode public property is used to pass the City zip code, or CNN weather code to the user controls
  • If you want to know how to pass a data to user control then refer to this article "Setting user control property in code behind page" http://www.dotnetjunkies.com/howto/default.aspx?id=7
  • In the default ASP.NET page register these controls and create a reference to the user control like this
              Control myCDUC = Page.FindControl("CurrentDay1"); 
              ((CurrentDay)myCDUC).WeatherCodeStr += TxtBoxCode.Text.ToUpper(); 
  • Compile your ASP.NET project and the output should like the following.

 

ASP.NET Output:

And finally, it would be nice if this parsing is done with Regular Expression instead of IndexOf and Substring methods. Regular Expression pattern is more powerful then using IndexOf and Substring methods, for simplicity I use IndexOf and Substring methods to get the data between start and end tags. And some time this kind of parsing have some drawbacks, if the site owner decided to change the content of his site then the logic will fail, and you will be forced to rewrite the parsing logic again.

Source code:

Download the source code

 


About the Author :

Waheed Khan is currently working with ASP.NET, ADO.NET, XML, and C#.