Wednesday, March 4, 2009

Refactoring SQL code with file includes and variables

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/refactoring_sql_code_with_file_includes_and_variables.htm]

Everyone who maintains code knows that duplicated code is bad. While OOP languages like C# (and even xml "languages" like MSBuild) provide ways to refactor, SQL seems lacking in such features. If you run a similarity-analyzer (like Simian), you can probably see large amounts of duplicated code. Two common refactoring techniques to help with this would be:

  • Dynamic include – Include one snippet within another. For example, we could have a chunk of code that is duplicated in a stored proc, and a table function.
  • Variables – We can abstract any SQL code (table names, sql commands) to a variable.

Note that mere stored procs or user-defined-functions are insufficient, as they can’t handle all snippets (like declaring variables which are used in the calling function), or they have awful performance in the where clause.

We can use a technology “SqlCmds” to accomplish this. (http://msdn.microsoft.com/en-us/library/aa833281.aspx).

How to enable Sql Commands in SqlStudio:

http://msdn.microsoft.com/en-us/library/ms174187.aspx

  • Single query window – “On the Query menu, click SQLCMD Mode.”
  • For all windows – “To turn SQLCMD scripting on by default, on the Tools menu select Options, expand Query Execution, and SQL Server, click the General page, and then check the By default open new queries in SQLCMD Mode box.”

How to enable Sql Commands in Visual Studio

This requires the database edition of Visual Studio. Click the allow "SqlCmd" button on the tool bar.

Basic variable test

--set variable, and then use it - use the ":setvar" command
:setvar SomeTable TestCodeGen
select * from $(SomeTable)

 

-- environmental variables too!
select '$(COMPUTERNAME)' --returns my comp name (like 'TimStall2')

This means we could have an external script set the environmental variables (like the PrimaryDataBase), and then easily re-run those in the SQL Editor. Note that you can use the free tool SetX.exe to set environmental variables.


File Include – Basic (use the “:r” command)

--File 1 (var_def.inc):
:setvar PrimaryDB MyDatabase70


--File 2:
:r C:\Development\var_def.inc
select * from $(PrimaryDB).MySchema.TestCodeGen


For example, we could have a “header” file that includes a bunch of variable definitions (like all the PrimaryDB, ReportDB, etc…), and then include it wherever needed. Or, we could include any other SQL snippet. For example, we could use this to effectively make private functions (instead of only have global functions) that are encapsulated to a single stored proc.

File Include – avoid function in a where clause

--File 1 (myProc_func1.sql):
--some reusable snippet (note how is uses the variable '@myNum")
co = '1234' or NumberInteger > @myNum

--File 2:
declare @myNum integer
select @myNum = 10

select * from TestCodeGen
where
:r C:\Development\myProc_func1.sql
and LastChangedBy < GetDate()



Summary
 

One catch to all of this is that if you have your own database installation code via ADO.Net, you need to manually parse it yourself. However, that should be easy enough to do given the strict syntax of the SqlCmds.


Note that this is substituted “at compile time”. If you run the SQL Profiler, you won’t see the “:setvar” or “:r”, but rather the content already parsed. These techniques could be used to help refactor SQL code, just like similar techniques help refactor the code in other languages.

 

Tuesday, March 3, 2009

Getting source code from places other than source control

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/getting_source_code_from_places_other_than_source_control.htm]

Of course all official code should ultimately be stored in source control (real source control, not VSS). However, when creating an automated build on a build server, getting the source code may not be as easy as just doing a single SVN checkout. There may be other steps to effectively get the latest source code:

  • Copy in other reusable blocks - Where feasible, you want avoid checking in binaries into source control when those binaries will be constantly updated. Unlike plain-text files, you can't do an effective diff on binaries - it will just look like a mess. So instead of storing just the change set, it will probably need to store the entire physical assembly - which will bloat your source control. So, say you've got a team that is actively working on a set of reusable class libraries, to be shared across multiple departments and product groups. It may make sense to have your product's build copy in those latest reusable blocks (from their build's published output) to some external folder where you store your third-party assemblies, as opposed to constantly storing the latest version in source control. (So, ultimately the reusable blocks are still stored in source control, it's just a different repository).
  • Code Generate from input files - You can use code-generation for lots of things, such as SQL base data or your data-access layer. If these files are completely code-generated (i.e. no merge regions), then you may not want to check them into source control, as you'll just face synchronization errors. For example, if you generate your data-access layer, and it's 100% determined from some set of xml files and database schema, then your build server could simply re-generate that code. If you check it into source control, then that version may be out-of-sync with what gets regenerated, and your build fails. In other words, as long as the server can already obtain the code by regenerating it, there's no reason to check it in - and risk checking in something that doesn't match what will be re-generated. (So, the generated files are effectively stored in source control via the inputs necessary to recreate them are being stored in source control).

Sunday, March 1, 2009

Things to CodeGenerate besides the Data Access Layer

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/things_to_codegenerate_besides_the_data_access_layer.htm]

CodeSmith is a powerful code generator (worth its weight in gold). And while one of the most popular things to automatically generate is the data access layer, there's a lot more to CodeSmith than just wrapping databases:

  • System data - Given an xml file, you can generate all your interrelated system data. For example, say your application has a data-driven relationship of groups, roles, menu items, tabs, and such (i.e. security and navigation) - you could write tons of tedious and brittle SQL scripts, or you could abstract it to an xml file and generate the SQL scripts from that.
  • Data Structures - Especially before Generics in .Net 2.0, CodeSmith was popular for its strongly-typed collections (much faster performance than boxing and unboxing an ArrayList). You could make other data structures as well, depending on your application's need.
  • Documentation - While CodeSmith's default DataDictionary template is popular for documenting your database schema, you could use CodeSmith as a Super-XSLT to transform any arbitrary xml list (like a file containing business rules, config, or test cases), into human-friendly HTML reports.
  • Domain-Specific-Language - It's often more efficient to work at a higher level of abstraction. So, you could write an xml script, and use CodeSmith to translate that ("compile?") into useful actions.
    • Say you were trying to write automated UI tests, but the UI technologies keep changing, so you write a simple abstract xml script for the basic actions you care about (Load page, click button, etc...), and CodeSmith transforms that into the UI testing code for the relevant testing framework.
    • You could write abstract tests in xml (i.e. the data for pairs of input and output), and then use CodeSmith to dynamically generate all the unit tests from that.
    • You could read your file system to create an MSI installer using something like Wix.
  • Starting Templates - I favor active re-generation when possible, and there's a balance between what to code-generate vs. what to refactor, however, sometimes it's useful to passively generate a starting template - just to give you a head start. For example, say your UI is too complicated to actively re-generate, but you could take an xml file of input and generate a stating template, from which you could then modify.

Basically, CodeSmith lets you take any input (a database, xml file, your file system, etc...) and generate any text output (sql, xml files, C#, aspx, html, js, etc...), and then also call C# to do anything on those files (install them in the database, commit it to source control, execute the resulting C#, etc...). It's a beautiful thing.

Monday, February 23, 2009

What makes a framework easy to experiment with?

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/what_makes_a_framework_easy_to_experiment_with.htm]

I've been reading the excellent Framework Design Guidelines. One of the sections that stood out is the tips they offer to make your framework easy for other developers to experiment with. As most developers are experimental in nature, any framework must be "cooperative" in this manner in order to be popular.

  • "Allow a developer to use it immediately, whether or not it does what the developer ultimately wants it to do or not" (23).
  • "Types used in advanced scenarios should be places in subnamespaces" (23).
  • "Provide simple overloads of constructors and methods" (24). A type that is hard to instantiate is hard to experiment with.
  • Give common scenarios recognizable names. For example, "MyClass.CreateFile" sounds more friendly than "MyClass.OpenInputOutputSteam", so more developers would naturally experiment with the former.
  • "...a default should never result in a security hole or horribly performing code." (26)
  • "Do ensure that APIs are intuitive and can be successfully used in basic scenarios without reference documentation (27). I notice their emphasis - instead of writing tons of tutorials (which most devs won't even read), make the framework itself easy to use.
  • Make things strongly typed.
  • And the classic - A good framework makes it easy to do the right things, and hard to do the wrong things.

They offer ADO.Net as an example of what is confusing (juggling all the different types just to hit the database).

These are good points, and easy to overlook, but make sense when someone points them out to you.

Sunday, February 22, 2009

Killing the file handles, but not the process (from the command line)

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/killing_the_file_handles_but_not_the_process_from_the_comm.htm]

Most developers have come across that annoying error where you try to delete or rename an innocent file, only to be  rebuffed with "Access denied, another process is accessing this file." For example, you'll get this type of error if you create a text file, open it in Microsoft Word, and then try to delete the file. This happens for all sorts of things - whether it's a process that didn't clean up after itself (perhaps from an unexpected crash), or something like IIS that locks certain website directories. While PSTool's ProcessExplorer GUI is great for finding what process is locking the file, what if you just want to find the handle, and kill it - from the command line?

Note that there's s a difference between the process, and the handle that that process has on the file. For example, if you want to delete a website that IIS is locking, you could shutdown or kill IIS (i.e. the process itself), but say you want to leave IIS running? Killing just the handle would leave the process intact, while allowing you control of the file again.

A useful tool for this is the (free) PsTools Handle.exe. From the command line, you can query all handles to a file, and then you can delete those handles. For example, this will find all handles on the given directory:

handle.exe C:\MyFolder

And returns something like so:

inetinfo.exe pid: 1696 3DC: C:\MyFolder\MySubfolder
inetinfo.exe pid: 1696 3E0: C:\MyFolder

You can then use this info to close the handles (one-by-one), without killing the process by passing in the "-c" switch, along with the handle id (in hex), and the process id, and the "-y" switch to confirm the delete.

handle.exe -c 3DC -p 1696 -y
handle.exe -c 3E0 -p 1696 -y

You could glue these two steps together into one by calling System.Diagnostics.Process to run the first command, parse its output, and then use that to create the parameters for the second command.

I'm sure there are more elegant ways (perhaps direct C++ calls, or even some special method in the .Net Framework), and I'm all ears to any, but this approach will get the job done.

Tuesday, February 17, 2009

The different between Count, Length, and Index

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/the_different_between_count_length_and_index.htm]

When dealing with arrays and collections (like List), there are three integer "things" that can mess people up: Count, Length, and Index.

  • Count - refers to collections. This simply gets the number of items in the collection.
  • Length - refers to arrays. to quote: "Gets a 32-bit integer that represents the total number of elements in all the dimensions of the Array." (emphasis added). For a 1-d array, Count and Length seem similar. But for multi-dimensional arrays, the difference becomes apparent. A 3x2 array will have a length of 6. Because an array is allocate up front (as opposed to a collection that can grow or shrink), this conceptually makes sense. Length for an array doesn't change after declaration; Count for a List does (as you add or remove items).
  • Index - used by arrays, and some collections (like List), to indicate a specific item in the array or collection. Whereas Count and Length are 1-based properties, Index is a 0-based indexer.

This code snippet shows these in action:

    [TestMethod]
    public void TestMethod1()
    {
      //Length --> total number of items in the array
      //  acts like "Count" for a 1-d array
      string[] astr = new string[]{"a","b","c"};
      Assert.AreEqual(3, astr.Length);

      //  but very different for a 2-d array
      string[,] astr2 = new string[2, 3];
      Assert.AreEqual(6, astr2.Length); //Length = 2 * 3

      //Count --> 1-based
      List<string> lstr = new List<string>();
      lstr.Add("a");
      lstr.Add("b");
      Assert.AreEqual(2, lstr.Count);

      //Index --> 0-based
      Assert.AreEqual("a", astr[0]);
      Assert.AreEqual("b", lstr[1]);
     
    }