By Klaus Salchner
Introduction
The first thing coming to mind when we hear Google is search engine. Google
has been able to turn the search business up-side-down within the last five
years. The founders of Google started with an idea in 95 which really became
widely used and known in 98/99. Today Google is the number one search engine.
You can find out more about Google’s history here. Like other
organizations Google is trying to establish itself as a platform rather then a
solution. This means it provides the necessary tools and infrastructure so other
people can build their own solutions on top of it. Google provides a web service
interface which allows you to integrate Google searches right into your
application. You can find out more about the Google web service API at http://www.google.ca/apis .
How to get started with the Google API
You can download from the URL above the developer’s kit which comes with a
number of sample applications for different languages like .NET or Java. You
also need a valid license key, which you need to pass along with every web
service call. To obtain a Google license key visit the URL http://www.google.ca/apis and select
“Create Account” on the left side navigation bar. You need to create an account
by entering your email address and a password. This sends an email to the email
address you entered to verify its existence. The email you receive has a link to
complete the account creation by activating it. When done click on the continue
link which brings you back to the account creation page. At the bottom of the
page you see a link “sign in here”. Follow the link and sign into your account
with your email address and password. This shows then a page confirming that a
license key has been generated and sent to your email address. Should you loose
your license key, sign in again and Google will resend the license key to your
email address. The license key is for free but limits you to 1,000 calls per
day. This will be more then enough to get started. If you need to make more then
1,000 calls per day contact Google.
How to reference the Google web service API in your project
Create your project in Visual Studio .NET and in the "solution explorer" pane
right click on the project. In the popup menu select “Add Web Reference” and
enter as URL the following WSDL URL - http://api.google.com/GoogleSearch.wsdl
. This will check the existence of the WSDL, download it and show you in the
dialog the web methods available by this web service. Enter under “web reference
name” the name of the web service reference, for example GoogleSearch. When done
click “Add Reference” and you are ready to use the Google web service API. It
will be shown in the “solution explorer” under “Web References”. You can right
click on the web service reference and update it through the “Update Web
Reference” menu item or view it in the object explorer through the “View in
Object Browser” popup menu. This shows you that there are four different types
available. The type GoogleSearchService exposes the actual web service calls you
can make. It has three different web methods (plus the usual Begin/End methods
if you want to call a web method asynchronously).
GoogleSearchService.doSpellingSuggestion()
When you open up Google in your browser and search for a word or phrase you
see sometimes the phrase “Did you mean: [suggested search term]” at the top of
the search results page. Google performs a spell check of the search term you
entered and then shows you alternative spellings of your search term. This helps
the user to search for properly spelled words and phrases and the user can
simply click on it to search for the corrected search term. The Google web
service also provides a web method to check for alternate spellings of a search
term. Here is a code snippet:
public static string SpellingSuggestion(string Phrase)
{
// create an instance of the Google web service
Google.GoogleSearchService GoogleService = new
Google.GoogleSearchService();
// get the new spelling suggestion
string SpellingSuggestion = GoogleService.doSpellingSuggestion(Key,
Phrase);
// null means we have no spelling suggestion
if (SpellingSuggestion == null)
SpellingSuggestion = Phrase;
// release the web service object
GoogleService.Dispose();
return SpellingSuggestion;
}
First we create an instance of the web GoogleSearchService class and then we
call the web method doSpellingSuggestion(). The first argument is the Google
license key you pass along and the second one is the search term. The web method
returns the alternate spelling of the search term or null if there is no
alternate spelling. The code snippet above returns the alternate spelling or the
original one. At the end it calls Dispose() to free up the underlying unmanaged
resource.
GoogleSearchService.doGetCachedPage()
Google is constantly crawling the Internet to keep its search index and
directory up to date. Google’s crawler also caches the content locally on its
servers and allows you to obtain the cached page, which is the content as of
when the crawler visited that resource the last time. URL’s can point to many
different resources, most typically to HTML pages. But these can also be Word
documents, PDF files, PowerPoint slides, etc. The cached page is always in HTML
format. So for any other resources then HTML it also converts the format to
HTML. Here is a code snippet:
public static void GetCachedPageAndSaveToFile(string PageUrl, string
FileName)
{
// create an instance of the Google web service
Google.GoogleSearchService GoogleService = new
Google.GoogleSearchService();
// get the cached page content
byte[] CachedPage = GoogleService.doGetCachedPage(Key, PageUrl);
// file writer to write a stream to the file & a binary writer to write
data to
FileStream FileWriter = new FileStream(FileName, FileMode.Create);
BinaryWriter Writer = new BinaryWriter(FileWriter);
// write the page content to the file and close the streams;
Writer.Write(CachedPage);
Writer.Close();
FileWriter.Close();
// release the web service object
GoogleService.Dispose();
}
First we again create an instance of the GoogleSearchService class and then
we call the web method doGetCachedPage(). We pass along the Google license key
plus the URL of the page we are looking for. This returns a byte array, using
base64 encoding, which contains the HTML content of the cached page. Next we
create a FileStream which we use to write the obtained page to a local file.
With FileMode.Create we tell it to create the file, which overwrites any
existing file. Then we create a BinaryWriter which uses as output the
FileStream. Then we write the returned byte array to the BinaryWriter which in
turn writes it to the FileStream, which in turn writes it to the local file.
Then we close the FileStream and BinaryWriter. At the end we call again
Dispose() to free up underlying unmanaged resources.
GoogleSearchService.doGoogleSearch()
The web method doGoogleSearch() allows you to perform searches. You pass
along the search term and then certain filter criteria’s to filter the content
for example to a specific country, language, topic, etc. Here are the arguments
you pass along to the web method:
- Key – The Google license key.
- QueryTerm – The actual search term. This can be a simple word, a
phrase (to search for the phrase you need to put it under double quotes
otherwise it searches for the occurrence of all individual words), a list of
words (you can use the AND or OR operator; when no operator is used between
the words AND is assumed), etc. You can also exclude words or phrases by
putting a minus sign in front of it. The Google reference at http://www.google.ca/apis/reference.html
explains all query term capabilities.
- Start – A zero based index of the first result to be returned. This
allows you to page through the result set. The search result returned by this
web method can not be more then MaxResults, therefore you need to make
multiple calls and set Start appropriately to get the next results and so
forth. If you provide a user interface which allows the user to page through
the complete result set, then you would set Start accordingly, to return the
results for each page. For example the first call would set it to 0, the next
to 11, followed by 21, etc. (assuming MaxResults is set to 10).
- MaxResults – The maximum number of results to be returned by the
query. This can be a value between one and ten.
- Filter – When set to true it filters out duplicate or
near-duplicate search results. Near duplicate results are results with the
same title and snippets (snippet is the summary text shown for each search
result). This also limits the number of search results coming from the same
host. So if a web site would return ten records matching the search term then
this would only return the first two (called host crowding).
- Restricts – Allows to restrict the search to results from one or
more countries or one or more topics. For example you can restrict the search
to content within the US by setting this value to "countryUS". You can
restrict the search to content centered around Linux by setting this value to
"linux". The Google reference at http://www.google.ca/apis/reference.html
lists all the possible values.
- SafeSearch – Filters out adult content when set to true.
- LanguageRestrict – This allows you to restrict the search within
one or more languages. The Google reference at http://www.google.ca/apis/reference.html
lists all the possible values.
- InputEncoding – This value is ignored. All requests should be
encoded using UTF-8.
- OutputEncoding – This value is ignored. All returned results are
encoded using UTF-8.
This web method allows you to perform simple or complex search queries
against Google. It also allows you to filter the search result as well as page
through the search result. Here is a code snippet:
public static XmlNode Search(string QueryTerm, int Start, int
MaxResults, bool Filter, string Restricts,
bool SafeSearch, string LanguageRestrict, string InputEncoding, string
OutputEncoding)
{
// create an instance of the Google web service
Google.GoogleSearchService GoogleService = new
Google.GoogleSearchService();
// perform search
Google.GoogleSearchResult SearchResult = GoogleService.doGoogleSearch(Key,
QueryTerm, Start,
MaxResults, Filter, Restricts, SafeSearch, LanguageRestrict,
InputEncoding, OutputEncoding);
// we return the result back as a XML document
XmlDocument ResultXml = CreateXmlDocument(SearchResultXmlNode);
// add the search result
StringValueOfObject(ResultXml.DocumentElement, SearchResult);
// add the result elements and directory categories root node
XmlElement ResultElementsParentNode =
AddChildElement(ResultXml.DocumentElement, "ResultElements");
XmlElement CategoriesParentNode =
AddChildElement(ResultXml.DocumentElement, "DirectoryCategories");
// now add all result elements
foreach (Google.ResultElement ResultElement in SearchResult.resultElements)
StringValueOfObject(ResultElementsParentNode, ResultElement);
// now add all directory categories
foreach (Google.DirectoryCategory DirectoryCategory in
SearchResult.directoryCategories)
StringValueOfObject(CategoriesParentNode, DirectoryCategory);
// release the web service object
GoogleService.Dispose();
return ResultXml;
}
First we create an instance of the GoogleSearchService class and then we call
the web method doGoogleSearch(). We pass along all the arguments as described
above. This performs the search and returns its result as an instance of the
GoogleSearchResult class. The code snippet then takes all values of the
GoogleSearchResult object and puts them into a XML document. Please refer to the
attached sample application for the complete code. First it creates a XML
document with the method CreateXmlDocument(). It then calls the method
StringValueOfObject() which creates a XML element for the object in the XML
document using the name of the object as the name of the XML element. The method
uses then reflection to walk the returned GoogleSearchResult object and for each
field it finds in the object it adds an attribute to the created XML element. It
of course adds to each created attribute the value of the associated object
field. The returned GoogleSearchResult object has two fields which hold an array
of ResultElement and DirectoryCategory objects. The method StringValueOfObject()
is not able to walk each object in those arrays. Therefore we create two root
XML elements in the XML document using the method AddChildElement(). We then
loop through both arrays and call for each object StringValueOfObject() so we
can convert each object to a XML element adding all its fields as attributes.
Finally we call again Dispose() to free up the underlying unmanaged resources
and then return the XML document which contains all search information of the
GoogleSearchService object. This enables you to run XPath queries against the
search result XML document to find the required search result information.
The attached sample application
The attached sample application provides a wrapper class for all Google web
methods. It also provides a simple user interface demonstrating the use of each
web method. You can enter a search term and get alternate spelling suggestions,
you can download the cached HTML page of a URL and display it and you can
perform a search entering all the search arguments. Please make sure to obtain
your own Google license key and enter it in the app.config file.
Download Source
Summary
The Google web service API is very easy to use. It enables you to search the
Internet from within your application. Complex query terms and filtering
capabilities assure relevancy of the search results to your application needs.
The Google web service is one of many other emerging ones, like Amazon’s web
service or eBay’s web service. By introducing a web service interface these
companies moved to a platform, enabling third parties to build solutions non top
of them. For these companies an ever increasing number of requests and business
transactions are coming through these web service interfaces. If you have
comments on this article or this topic, please contact me @ klaus_salchner@hotmail.com . I want
to hear if you learned something new. Contact me if you have questions about
this topic or article.
About the author
Klaus Salchner has worked for 14 years in the industry, nine years in Europe
and another five years in North America. As a Senior Enterprise Architect with
solid experience in enterprise software development, Klaus spends considerable
time on performance, scalability, availability, maintainability,
globalization/localization and security. The projects he has been involved in
are used by more than a million users in 50 countries on three continents.
Klaus calls Vancouver, British Columbia his home at the moment. His next big
goal is doing the New York marathon in 2005. Klaus is interested in guest
speaking opportunities or as an author for .NET magazines or Web sites. He can
be contacted at klaus_salchner@hotmail.com or http://www.enterprise-minds.com/ .
Enterprise application architecture and design consulting services are
available. If you want to hear more about it contact me! Involve me in your
projects and I will make a difference for you. Contact me if you have an idea
for an article or research project. Also contact me if you want to co-author an
article or join future research projects!