Article source code: delimit_xml.zip
This article will demonstrate the conversion of a delimited file into a DataSet or an XML file.
For the purposes of demonstration, I've created a tab-delimited file for the Roach Motel's room service department.
Why make the migration?
Tab delimited and Comma delimited files once were the standard in representing data outside of a relational database. With the inception of XML, developers can now represent their data in a much richer and more descriptive environment without the overhead or cost of relational databases. Converting legacy delimited files into an XML format allows developers to take advantage of many of the new technologies that have been created around XML.
Why convert to a DataSet first?
The nice thing about .NET's DataSet object is the ability to save the internal data as XML. In the past, Microsoft developers would have to use a Split()
function to parse out their delimited files and then piece together the rows in the file into an XML format. With large files, this became a particularly nasty exercise. .NET's DataSet object does most of the work for us, cleanly and quickly. We'll be using the DataSet object's GetXml()
method to convert the DataSet representation of the delimited file into an XML representation.
Walk Through
We're going to create a DataSet dynamically with the help of the StreamReader object, the DataTable object and the DataRows object.
Let's take a quick look at a snippet of XML produced from our tab delimited file:
<Kitchen xmlns="The_Roach_Motel">
<Room_Service>
<Room_Number>290</Room_Number>
<Breakfast>Bagel</Breakfast>
<Lunch>Pizza</Lunch>
<Dinner>Salmon</Dinner>
</Room_Service>
</Kitchen>
One of the really sweet things about the XML/DataSet connection is how integrated ADO and XML have become. I can define the namespace of my XML representation though a property in the DataSet object:
oDS.DataSetName = "Kitchen"
oDS.Namespace = "The_Roach_Motel"
Notice in the XML snippet above, the DataSet name becomes our root element name and the Namespace property translates into XML nicely.
Next, we'll add a table called 'Room_Service'. I'd avoid using spaces when you are naming DataSet tables and XML elements.
oDS.Tables.Add("Room_Service")
We're using the StreamReader object to open up the delimited file.
Dim oSR As New StreamReader(strFilePath)
'Go to the top of the file
oSR.BaseStream.Seek(0, SeekOrigin.Begin)
Most delimited files have a header, or a group of descriptive names that identify a column of data. We're going to snatch out the first line of our delimited file and use the column header names as the columns in our DataSet. After we add a new table to our DataSet, we're going to add the column names to our new table.
Notice that we're using the Split
function on our StreamReader object to break apart the delimited column names.
'Add in the Header Columns
For Each strFields In oSR.ReadLine().Split(strDelimiter)
oDS.Tables(0).Columns.Add(strFields)
Next
After we've added our columns, we can start to populate the rows with data. We'll need to instantiate our DataTable object by setting a reference to our DataSet's table. After that is done, we'll need to create a DataRow object. The DataRow object holds all of the data for that row and makes sure that the data is mapped to the correct column.
oTable = oDS.Tables(0)
While (oSR.Peek() > -1)
oRows = oTable.NewRow()
Notice that we're iterating through our StreamReader object again. There's no need to initialize it once more or start from the beginning - we've already gotten our columns and now are just concerned with migrating the data from the delimited file. I've added an intCounter
to mark the ordinal location of the columns since we don't really know their names at this point.
For Each strFields In oSR.ReadLine().Split(strDelimiter)
oRows(intCounter) = strFields
intCounter = intCounter + 1
Next
intCounter = 0
Once we've filled up our DataRows object with data, we're ready to add that row to the DataTable Object.
oTable.Rows.Add(oRows)
End While
All that's left to do is to is convert the Dataset to XML using the GetXML()
function and return the XML as a string. You could save this string to an XML file using the WriteXML()
function.
In conclusion, XML and ADO.NET allow developers a wide berth of freedom when converting legacy data formats into usable, query-able documents. I hope that you've found this article helpful and useful.
Full Code
Imports System
Imports System.IO
Imports System.Collections
Imports System.Data
Imports System.Text
Module modXML
Sub Main()
Dim strXML As String
strXML = delimitedDataSet(vbTab, "c:/food.tab")
End Sub
Function delimitedDataSet(ByVal strDelimiter As String, _
ByVal strFilePath As String) As String
Dim oDS As New DataSet()
Dim strFields As String
Dim oTable As New DataTable()
Dim oRows As DataRow
Dim intCounter As Int32 = 0
Dim oRow As DataRow()
oDS.DataSetName = "Kitchen"
oDS.Namespace = "The_Roach_Motel"
oDS.Tables.Add("Room_Service")
Dim oSR As New StreamReader(strFilePath)
'Go to the top of the file
oSR.BaseStream.Seek(0, SeekOrigin.Begin)
'Add in the Header Columns
For Each strFields In oSR.ReadLine().Split(strDelimiter)
oDS.Tables(0).Columns.Add(strFields)
Next
'Now add in the Rows
oTable = oDS.Tables(0)
While (oSR.Peek() > -1)
oRows = oTable.NewRow()
For Each strFields In oSR.ReadLine().Split(strDelimiter)
oRows(intCounter) = strFields
intCounter = intCounter + 1
Next
intCounter = 0
oTable.Rows.Add(oRows)
End While
Return oDS.GetXml()
'oDS.WriteXml("c:/food.xml")
End Function
Tab File
Here's the tab file I created to represent the room service at an illustrious hotel.
Room_Number Breakfast Lunch Dinner
290 Bagel Pizza Salmon
301 Orange Pizza Chicken ala King
349 Sweet Roll Salad Tofu and Vegetables
500 Omelet Sausage Veal
702 Eggs Tuna fish Cheese Sandwich
XML Output
After we call GetXml()
on our DataSet, we can save out our string as an XML file.
<Kitchen xmlns="The_Roach_Motel">
<Room_Service>
<Room_Number>290</Room_Number>
<Breakfast>Bagel</Breakfast>
<Lunch>Pizza</Lunch>
<Dinner>Salmon</Dinner>
</Room_Service>
<Room_Service>
<Room_Number>301</Room_Number>
<Breakfast>Orange</Breakfast>
<Lunch>Pizza</Lunch>
<Dinner>Chicken ala King</Dinner>
</Room_Service>
<Room_Service>
<Room_Number>349</Room_Number>
<Breakfast>Sweet Roll</Breakfast>
<Lunch>Salad</Lunch>
<Dinner>Tofu and Vegetables</Dinner>
</Room_Service>
<Room_Service>
<Room_Number>500</Room_Number>
<Breakfast>Omelet</Breakfast>
<Lunch>Sausage</Lunch>
<Dinner>Veal</Dinner>
</Room_Service>
<Room_Service>
<Room_Number>702</Room_Number>
<Breakfast>Eggs</Breakfast>
<Lunch>Tuna fish</Lunch>
<Dinner>Cheese Sandwich</Dinner>
</Room_Service>
</Kitchen>
Related devCity.NET articles: