Wednesday, May 6, 2009

Joining DataTables

Recently, I needed to perform several joins in a SQL query and return that data to a DataGrid. Normally this works great because I only have 2-3 tables I am performing a join on and the number of records being returned is relatively low. In this latest case I had many tables to join together and some of them had millions of rows in them.

The query to pull all the data I needed would run for several minutes before returning anything to me. I decided that to speed this up I would handle the joins in the application and just pull the rows I needed in the queries themselves. This took a query that would return in 15+ minutes down to less than 1 minute.

Microsoft has some code here: on how to do this. The code they had was over-engineered for what I needed to do so I decided to write my own.

The main method takes 2 tables and returns a new table. It is quick and dirty and can still be fleshed out if needed.

public static class MergeData
{
public static DataTable Merge(DataTable tableOne, DataTable tableTwo)
{
DataTable output;

output = new DataTable();

CopyColumns(tableOne, ref output);
CopyColumns(tableTwo, ref output);
CopyData(tableOne, tableTwo, ref output);

return output;
}

public static void CopyData(DataTable tableOne, DataTable tableTwo, ref DataTable destinationTable)
{
foreach (DataRow row in tableOne.Rows)
{
DataRow nr;
DataRow[] joinedRows;

nr = destinationTable.NewRow();

foreach (DataColumn dc in tableOne.Columns)
{
nr[dc.Table.TableName + "." + dc.ColumnName] = row[dc.ColumnName];
}

string selectStatement = tableTwo.Columns[1].ColumnName + " = '" + row[tableOne.Columns[1].ColumnName] + "'";
joinedRows = tableTwo.Select(selectStatement);

foreach (DataColumn dc in tableTwo.Columns)
{
nr[dc.Table.TableName + "." + dc.ColumnName] = joinedRows[0][dc.ColumnName];
}

destinationTable.Rows.Add(nr);
}
}

public static void CopyColumns(DataTable table, ref DataTable output)
{
foreach (DataColumn dc in table.Columns)
{
DataColumn nc;

nc = new DataColumn(table.TableName + "." + dc.ColumnName, dc.DataType);
AddColumn(nc, ref output);
}
}

public static void AddColumn(DataColumn dc, ref DataTable dt)
{
if (!dt.Columns.Contains(dc.ColumnName))
{
dt.Columns.Add(dc);
}
}
}

Friday, April 10, 2009

Back to C#

After about 6 months of being away from C# I am finally able to get back to working with the language again. My first project went very smooth. It consisted of creating a class to read and write INI files. The most difficult part was the regex needed to determine what sections, settings and values. I need to perform some more research on regex, tokens and how C# uses them.

My next project is a bit more interesting and complicated. The basics of it are I need to parse a csv file and import it into a database. It isn't just 1 csv file but literally 10s of thousands of csv files and they get updated twice a day or more. The app will have to update the database at least once a day possibly more.

Parsing the csv will be interesting since some fields have quotes and just about any character can appear inside those quotes. I also need to look for certain strings and set them to DBNull since most fields from the csv file will be Ints or Floats in the database.

Some new tools in my arsenal since I last posted here.
XtUnit - Extension framework provides better support for unit tests with a database.
Log4Net - If you aren't using it for logging you don't know what you are missing.

I am glad to be back and look forward to updating this blog again.

Thursday, October 2, 2008

Does C# need the Template pattern?

My understanding of the Template pattern is that it allows a programmer to provide a hook for outside use into a classes methods. With the advent of C# a new type has come into use, the Delegate. From the MSDN site
A delegate is a new type that references a method. Once a delegate is assigned a method, it behaves exactly like that method. The delegate method can be used like any other method, with parameters and a return value...


This sounds to me like a perfect candidate to hook into a method. Now maybe the Template pattern is still useful in other ways, although I haven't come up with one, but maybe Delegates can make it largely obsolete.

Some quick sample code I did to just test it out.

namespace DelegateTest
{
public delegate int Test(int MyInt);
class Program
{
public static Test MyTest;
static void Main(string[] args)
{
int SomeNumber = 5;
MyTest += new Test(MyInt);

if (MyTest != null)
{
SomeNumber = MyTest(SomeNumber);
}
Console.WriteLine(SomeNumber);
Console.Read();
}

public static int MyInt(int TestInt)
{
return 6;
}
}
}

Friday, September 19, 2008

Strategy Pattern

I found a very useful purpose for the strategy pattern. I am writing an application that will move all or part of a database depending on the users selection. Since 90% of the code was the same I decided to use an abstract class with concrete methods for the business logic that was the same and abstract classes for where it differed. The strategies I have to choose from are:

1. Move entire DB
2. Move all rows from X number of tables.
3. Move X number of rows from X tables.

Doing a quick mock I came up with

public abstract class DBMoveStrategy
{
//properties...

//events...

//concrete methods

public abstract void GetSource();
public abstract void AddConstraints(DataSetFacade Data);

}

Then I sub-class for the different branches of logic.

public class DBMoveAll : DBMoveStrategy
{
public override void GetSource()
{
//some code
}

public override void AddConstraints(DataSetFacade Data)
{
//some code
}
}

public class DBTables : DBMoveStrategy
{
public override void GetSource()
{
//some code
}

public override void AddConstraints(DataSetFacade Data)
{
//some code
}
}

public class DBMoveRows : DBMoveStrategy
{
public override void GetSource()
{
//some code
}

public override void AddConstraints(DataSetFacade Data)
{
//some code
}
}

To use this inside another class I just do the following:

public class MyControllerClass
{
private DBMoveStrategy movedb;

public MyControllerClass()
{
if (IsMoveAll)
movedb = new DBMoveAll();
}

public void Connect(string Server, string Database, string UserName, string Password)
{
movedb.connect(string Server, string Database, string UserName, string Password)
}

public void GetData()
{
movedb.GetSource();
movedb.AddConstraints();
}
}


There is more logic involved but this will give a flavor for how I am using the pattern. There are probably some better ways I could use it or other patterns that might be better but I am still learning the language and better ways to write code.

Friday, September 5, 2008

Event Driven Error Handling

I have been reading a new titled "Design Patterns in C#". It is a great book, tons of information and it is quite understandable. While reading through the observer pattern it struck me. Why do we need to use so may try/catch statements in our code when we could fire off the exceptions from an event handler.

So I wrote up some quick code to test it out.

public delegate void TestErrorHandler(string message);

public class TestClass
{
public event TestErrorHandler test;

public void TriggerHandler(object blah)
{
if (blah == null)
{
test("Null argument in TriggerHandler");
}
}
}

static void Main(string[] args)
{
TestClass MyTest = new TestClass();
MyTest.test += new TestErrorHandler(OhCrap);
MyTest.TriggerHandler(null);
}

public static void OhCrap(string e)
{
Console.WriteLine(String.Format("Oh CRAP! {0}", e));
Console.Read();
}

I can see some pitfalls to this solution like:
1. You fire off the same Exception Event from multiple methods, i.e. ArgumentNullException.
2. How do you tell what observing method called the method that threw the Exception Event? For instance what if you had a private property that had the same methods being called multiple times?
3. What about if I call the same method multiple times from the same piece of code. a for loop for instance?

These are definitely challenging issues to solve but I feel it may be possible. I already have some ideas but right now don't have the time to fully test these out. If it is possible then it could take decoupling code to a whole new level.

Monday, August 11, 2008

Updating identity columns in a Dataset

One of the things I have noticed about working with DataSets is that there is not a lot of information out there on how to work with Foreign Key constraints and merging 2 databases together. Let me rephrase that, there is a but it feels scattered about and I felt it didn't teach me what I needed to know. After spending some time researching the subject here is the best way I have found to deal with the issue.

First you need to setup your connection.

string ConnectionString = "server=" + _DatabaseServer + ";user id=" + UserName + ";password=" + Password + ";Trusted_Connection=no;connection timeout=30";

Establish your connection

Connection = new SqlConnection(ConnectionString);
Connection.Open();

Now we setup our adapter
string ParentQuery = "SELECT * FROM ParentTable";
string ChildQuery = "SELECT * FROM ChildTable";
SqlDataAdapter ParentAdapter = new SqlDataAdapter(ParentQuery, Connection);
SqlDataAdapter ChildAdapter = new SqlDataAdapter(ChildQuery, Connection);

At this point we can fill our DataSet. Before we do we need to setup an event handler for updating the Identity column in each row when it gets written back to the database. The reason for this is depending on how you you merge DataSets or add rows to a DataSet the Identity column will have a place holder. This place holder will not necessarily be the same value as what the database will contain. This wouldn't be a problem if you were only updating 1 table but when you have child tables that need the same value you will run into issues.

Here is where you add the event handler

ParentAdapter.RowUpdated += new SqlRowUpdatedEventHandler(AdapterRowUpdated);

And here is the event handler

private void AdapterRowUpdated(object sender, SqlRowUpdatedEventArgs e)
{
SqlCommand IdentityCommand = new SqlCommand("SELECT @@IDENTITY", e.Command.Connection);

e.Row["ID"] = IdentityCommand.ExecuteScalar();
e.Row.AcceptChanges();
}

Now when you Update the database all your rows will have the same IDs.

Go ahead and fill our DataSet.

ParentAdapter.FillSchema(MyDataSet, SchemaType.Source, "ParentTable");
ProfileAdapter.Fill(MyDataSet, "ParentTable");
ChildAdapter.FillSchema(MyDataSet, SchemaType.Source, "ChildTable");
ChildAdapter.Fill(MyDataSet, "ChildTable");

We have 1 DataSet with 2 DataTables in it, ParentTable and ChildTable. The database has an Identity field and a Foregn Key constraint. Those are not replicated down to the DataSets so we have to do this manually.

The Unique constraint
UniqueConstraint ParentConstraints = new UniqueConstraint(new DataColumn[] { MyDataSet.Tables["ParentTable"].Columns["ID"] });
MyDataSet.Tables["ID"].Constraints.Add(ParentConstraints);

Pay attention to the UpdateRule setting for the Foreign Key constraint. We set it as Rule.Cascade. This is done so that when we update the ID column for a row that ID gets propagated to any row within the child tables that contain the same Identity.

ForeignKeyConstraint ParentFK = new ForeignKeyConstraint("IDFK",
MyDataSet.Tables["ParentTable"].Columns["ID"],
MyDataSet.Tables["ChildTable"].Columns["ID"]);
ProfileFK.UpdateRule = Rule.Cascade;
MyDataSet.Tables["ChildTable"].Constraints.Add(ParentFK);

If you are merging 2 databases together I would recommend following the same steps above for the second database. That way you have 2 identical DataSets when you go to merge them together and you can reuse all the same code again. You don't have to do this if you don't want to as there are ways around it by using MissingSchemaAction but I prefer to have identical DataSets to work from.

We have 2 DataSets and we need to merge them together. The problem is that when you Fill your DataSet by default it sets all Rows to a RowState of NoChange. This means when you merge your 2 DataSets nothing will get written to your database because nothing has changed. Any new data should be marked as Added so the Adapter will Insert it into the database when you perform the Update. This is not very efficient if you have multiple tables that need to be updated. In that case wrap the below foreach loop in another foreach loop to iterate through each table.

foreach (DataRow row in NewData.Tables["ParentTable"].Rows)
{
row.BeginEdit();
row.SetAdded();
row.EndEdit();
}


Merge the 2 DataSets. This is very easy using the Merge method in your Destination DataSet, NewData being the DataSet I want to merge into my existing data. You have to set preserveChanges to true or the Merge will throw out all your changes and nothing will make it into the new database.

MyDataSet.Merge(NewData, true, MissingSchemaAction.Add);

Our DataSets are now merged and ready to be Updated in the new location. This is pretty trivial. The only thing to remember is that the Identity column is marked as ReadOnly, therefore you can't update it without first changing it.

SqlCommandBuilder ParentCommandBuilder = new SqlCommandBuilder(ParentAdapter);
SqlCommandBuilder ChildCommandBuilder = new SqlCommandBuilder(ChildAdapter);
MyDataSet.Tables["ParentTable"].Columns["ID"].ReadOnly = false;
ParentAdapter.Update(MyDataSet.Tables["ParentTable"]);
MyDataSet.Tables["ParentTable"].Columns["ID"].ReadOnly = true;
ChildAdapter.Update(MyDataSet.Tables["ChildTable"]);

Now just check the destination database and everything should be there. Just repeat the steps above for multiple tables and multiple databases and you can merge as many databases and tables as you need to.

Friday, August 1, 2008

Mocks and stubs

I am getting to a point where I am designing more and more complex applications. Since I am doing this alone I sometimes find it over whelming when I realize everything I am going to need to do as I progress though an application. Combine this with performing TDD at the same time and you begin to realize just how much work you have ahead of yourself.

Some of the things that I have begun to use in order to make things easier are mocks and stubs. I first heard about mocks and stubs while researching TDD and am beginning to use them for more than just testing. There are some subtle differences between mocks and stubs but they are similar.

Stubs are methods or functions that are only meant as place holders. I have yet to see a stub actually do anything. An example of a stub would be:

public static void SomeMethod(string MyString)
{

}

It doesn't do anything but it will allow code to compile that will need to use this method. Stubs are one of the first steps towards getting your unit tests to pass, if you are following a test driven approach.

Mocks can be anywhere from a little more complex to a lot more complex than a stub. Mocks allow you to create methods/functions that return expected data. They are usually the next logical step in TDD after a stub. Building on the above stub a mock could be:

public static string SomeMethod(string MyString)
{
return "This is my new modified string";
}

This will get a test to pass and it will allow you to test other portions of your code that depend on this method without fully implementing it before you are ready.

Now where mocks really shine is when you start using interfaces. The way I use this is as follows.

public interface IStringManipulator
{
public static string SomeMethod(string MyString)
public static bool IsNull(string MyString)
}

Now that the interface is designed you can start with your mock.

public class mockStringManipulator : IStringManipulator
{
public static string SomeMethod(string MyString)
{
return "This is my new modified string";
}
public static bool IsNull(string MyString)
{
return true;
}
}

You have your mock, now you can use the mock as a placeholder.

public class WebConnector
{
public void Connection(string Username)
{
String NewUsername;
if(!mockStringManipulator.IsNull(Username))
{
NewUsername = mockStringManipulator.SomeMethod(Username)
return NewUsername;
}
else
{
throw NullReferenceException;
}

}
}

You will of course be writing unit tests on the WebConnector class and when you get around to fully implementing the StringManipulator class you will be using the IStringManipulator interface. This ensures that all the only difference between your mock class and your production class is in the implementation. The signatures and everything else matches exactly and since you will be performing TDD on the production class you will have a high degree of confidence that it will work as expected before you change from the mock to the fully developed class.