By Kamran Shakil
"This time, I would take a quick but a thought provoking aspects of what happens with XML knits with a database in Microsoft World."
Extensible Stylesheet Language Shape ups (XSLT)
When companies exchange data, their schemas will more than likely be inconsistent. One company might use CustomerID as an element name while another might simply use ID. Extensible Stylesheet Language Shape up (XSLT) represents an excellent mechanism for shaping up XML into a variety of structures, including other forms of XML. Using the .NET platform, it's fairly straightforward to shape up an XML document into a structure that can then be loaded into a DataSet object that contains the database schema. Using one of the .NET-managed provider DataAdapters, the XML can then be updated into the appropriate database tables.
Plus:
Suppose we're comfortable with XSLT, this might be the quickest way to achieve the goal of moving XML into a database, especially given the strengths of the DataSet class. Once loaded with the correct XML data, we can easily add the DataSet's contents to a database using a DataAdapter. If the source XML document changes, we can just edit the XSLT stylesheet to reflect the changes without having to recompile code.
Minus:
XSLT loads XML documents (and the corresponding XSLT stylesheets) into memory during the shape up process. Though we can minimize some of the memory impact by using the appropriate classes (the XPathDocument class instead of the XmlDocument class, for instance), we might run into scalability problems when large XML documents are involved.
DataSet/DataAdapter Mapping
The DataSet is an extremely powerful object that is part of the ADO.NET classes found in the .NET platform. Unlike classic ADO Recordsets, DataSets hold multiple tables and their associated rows and columns. We can even create relationships between the tables. DataSets do not interact directly with the database. Instead, they go through DataAdapters.
The DataSet object can be used with DataAdapter, DataTableMapping, and DataColumnMapping classes to map the source XML data to the proper database tables and fields. Once the mapping is complete, we can load the XML into a DataSet and update the database using the DataAdapter.
Plus:
We get a lot of the behind-the-scenes plumbing done for we with this technique, which cuts development time and makes the application more maintainable. Visual Studio .NET provides a GUI interface to make generating the mapping code easier.
Minus:
When working with complex hierarchical relationships in XML documents, the mapping process can become quite complex unless we're using a visual tool such as Visual Studio .NET. Changes to the source XML document will more than likely result in code needing to be recompiled and distributed to accommodate the changes.
The DOM
The Document Object Model (DOM) can be a choice to move the XML into a database. By Using the XmlDocument classes along with XPath, specific nodes/node values can be found in the XML document and SQL statements can be dynamically instantiated to be executed against the database.
Plus:
The DOM is easy to access and manipulate once we understand the different DOM objects and their associated properties and methods. Finding any node within an XML document is easy using the XPath language along with the XmlDocument's SelectSingleNode() and SelectNodes() methods.
Minus:
This is not a scalable solution because every XML document should be loaded into memory. For small applications this solution might seem viable. However, what happens if the "small" XML documents stay small but the number of documents to be parsed increases over time? Time to buy more RAM! If we absolutely have to use the DOM for some reason and need to use XPath, I'd recommend looking at the XPathNavigator class rather than the XmlDocument class because of differences in how the DOM structure is loaded into memory.
The XmlTextReader Class
The XmlTextReader class is a forward-only stream having XML tokens that can be parsed quickly and efficiently. It's a highly scalable solution capable of working with large XML documents without tying up huge amounts of server memory (as with the DOM).
Plus:
The XmlTextReader is a stream, so it uses a minimum amount of memory while parsing the XML document. Different XML "tokens" found in the stream can be pulled as necessary to access data or check node types. The ability to "pull" nodes increases this class's ability to scale well.
Minus:
Though simple to use, the XmlTextReader might take more time to get comfortable with compared to other classes such as those described in the DOM section. The XmlTextReader represents a forward-only stream, so random access to nodes in the XML document is not an option as with the DOM or XSLT. Also, if the XML document changes, we might have to recompile the code depending on how the application is designed.
Microsoft SQLServer 2000
The Microsoft SQLServer 2000 development team provides "Web releases" of different XML-related tools used with this very XML-enabled database. The latest Web release has specific classes we can use with .NET applics to move XML into database tables. Even without these new classes, however, Microsoft SQLServer 2000 comes with several XML features that can aid this process including technologies such as DataGrams and OPENXML.
Plus:
Microsoft SQLServer 2000 is arguably the most XML-enabled database available today. We can retrieve data as XML through SQL queries, XML templates, and stored procedures or move XML into a database using OPENXML "document shredding" techniques, DataGrams, and other technologies. With the release of the latest Microsoft SQLServer 2000 Web release tools, the options for integrating XML into database tables increase even more because new .NET classes are available to aid in this process.
Minus:
Well, we have to have Microsoft SQLServer 2000 to use it. To use the new Microsoft SQLServer 2000 .NET classes, we'll have to install the Web release, which may or may not be an option for us.
Happy .NETing !