Wednesday, May 6, 2009

Joining DataTables

Recently, I needed to perform several joins in a SQL query and return that data to a DataGrid. Normally this works great because I only have 2-3 tables I am performing a join on and the number of records being returned is relatively low. In this latest case I had many tables to join together and some of them had millions of rows in them.

The query to pull all the data I needed would run for several minutes before returning anything to me. I decided that to speed this up I would handle the joins in the application and just pull the rows I needed in the queries themselves. This took a query that would return in 15+ minutes down to less than 1 minute.

Microsoft has some code here: on how to do this. The code they had was over-engineered for what I needed to do so I decided to write my own.

The main method takes 2 tables and returns a new table. It is quick and dirty and can still be fleshed out if needed.

public static class MergeData
{
public static DataTable Merge(DataTable tableOne, DataTable tableTwo)
{
DataTable output;

output = new DataTable();

CopyColumns(tableOne, ref output);
CopyColumns(tableTwo, ref output);
CopyData(tableOne, tableTwo, ref output);

return output;
}

public static void CopyData(DataTable tableOne, DataTable tableTwo, ref DataTable destinationTable)
{
foreach (DataRow row in tableOne.Rows)
{
DataRow nr;
DataRow[] joinedRows;

nr = destinationTable.NewRow();

foreach (DataColumn dc in tableOne.Columns)
{
nr[dc.Table.TableName + "." + dc.ColumnName] = row[dc.ColumnName];
}

string selectStatement = tableTwo.Columns[1].ColumnName + " = '" + row[tableOne.Columns[1].ColumnName] + "'";
joinedRows = tableTwo.Select(selectStatement);

foreach (DataColumn dc in tableTwo.Columns)
{
nr[dc.Table.TableName + "." + dc.ColumnName] = joinedRows[0][dc.ColumnName];
}

destinationTable.Rows.Add(nr);
}
}

public static void CopyColumns(DataTable table, ref DataTable output)
{
foreach (DataColumn dc in table.Columns)
{
DataColumn nc;

nc = new DataColumn(table.TableName + "." + dc.ColumnName, dc.DataType);
AddColumn(nc, ref output);
}
}

public static void AddColumn(DataColumn dc, ref DataTable dt)
{
if (!dt.Columns.Contains(dc.ColumnName))
{
dt.Columns.Add(dc);
}
}
}

Friday, April 10, 2009

Back to C#

After about 6 months of being away from C# I am finally able to get back to working with the language again. My first project went very smooth. It consisted of creating a class to read and write INI files. The most difficult part was the regex needed to determine what sections, settings and values. I need to perform some more research on regex, tokens and how C# uses them.

My next project is a bit more interesting and complicated. The basics of it are I need to parse a csv file and import it into a database. It isn't just 1 csv file but literally 10s of thousands of csv files and they get updated twice a day or more. The app will have to update the database at least once a day possibly more.

Parsing the csv will be interesting since some fields have quotes and just about any character can appear inside those quotes. I also need to look for certain strings and set them to DBNull since most fields from the csv file will be Ints or Floats in the database.

Some new tools in my arsenal since I last posted here.
XtUnit - Extension framework provides better support for unit tests with a database.
Log4Net - If you aren't using it for logging you don't know what you are missing.

I am glad to be back and look forward to updating this blog again.