timstall

Monday, August 4, 2008

An analogy between game play and manageability

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/an_analogy_between_game_play_and_manageability.htm]

In the demanding field of software engineering, everyone wants to have technical ability. That's great, but there's more. As most problems are big enough that they require multiple people, that means that someone needs to manage these people. This means that a good employee should not just be technically able, but also manageable. It's like when playing Age Of Empires (or any real-time strategy game), imagine you have the "super unit" - i.e. a technical star - but they won't obey orders. You want them to move right, but they move left. It would drive the player insane (you could simulate this by using a older, pre-laser, mouse where the track ball gets dirty and the mouse no longer responds properly). That's what it's like to have an unmanageable employee. In fact, when judging most games, "game play", in this case the ability to manage your units, is a major factor. After all, what good is the "super unit" if you cannot control it?

Thursday, July 31, 2008

Why would my program suddenly stop working?

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/why_would_my_program_suddenly_stop_working.htm]

Deterministic bugs are easy. When you write "ConvertCelsiusToFahrenheit", the debugging is simple. When things break, they're very repeatable, and it's easy to step through the debugger and see why. However, production code doesn't work this way. Sometimes your enterprise application will just temporarily stop working, only to resume working correctly again a little later. Why? Here's a few ideas:

Caching - something was cached, and the cache expired.
Session - the session expired
External dependencies - a dependent web service or database could be down
Rare boundary condition - perhaps your code doesn't account for certain rare input (like nulls, or not escaping special characters)
Concurrency - perhaps the code works great in a single thread (which is how must code is tested), but doesn't handle being run concurrently, for example one thread deadlocks, or another process locks a resource.
Too much load - perhaps too much load temporarily crashed something - like throwing an out of memory exception.
Randomness - maybe your code uses random numbers, and most of those work, but some of them don't - i.e. the code crashes when the random number is divisible by 111, or something really weird like that.
Incremental buildup with rounding error - perhaps every time the code is run, it produces an incremental buildup somewhere, like inserting a row in a database table. And as long as there are less than X rows, it "rounds down" and works. However, once the table has X+1 rows, it "rounds up" and something fails. This is abnormal, but certainly possible.

There is almost always some sufficient cause that causes the code to act abnormally. It helps for your app to have a good logger, such that you have clues to track down what that cause was. It also helps to have a QA environment that matches production, so that you can try to reproduce the steps yourself. Knowing that there will inevitably be production errors, it should encourage us to write good code upfront such that we take care of all the easy errors and these preventable bugs don't distract us from fixing the non-trivial ones.

Tuesday, July 29, 2008

Getting file and line numbers without deploying the PDB files

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/getting_file_and_line_numbers_without_deploying_the_pdb_file.htm]

Outline:

Problem
Inadequate Solutions
A Better Way
Step 1: Create the pdb2xml Database
Step 2: Query the pdb2xml Database Using the IL Offset
Step 3: Have Your Application Log the IL Offset
Download the source code and demo
Conclusion

----

Problem

Enterprise applications will inevitably throw exceptions. The ideal thing to do with these exceptions is to log them, send the results back to the appropriate developer, and then have that developer fix the code. Part of the problem is that the production (i.e. release-mode) logs are often missing helpful information that developers take for granted in debug mode, like the source file and line number.

Inadequate Solutions

One approach is to just settle and not get the extra info. For easy bugs it is sufficient to just have the callstack (from calling the exception’s ToString method) and some extra arbitrary parameters from a custom logger. But what about the bugs that aren't easy?

Another approach is to just dump the PDB files into your production environment. If you put the PDB (program database) files right next to the corresponding DLLs, then .Net will automatically generate the file and line numbers for you in every exception callstack. Recall that the PDB files contain information to reverse-engineer your code, such that a debugger could step through it. So you almost never want to give those files out to the public. That means this approach only works for applications you host, such as an ASP.Net app. But even still, it could be a security risk, or your IT department or deployment team may have a policy against it.

A Better Way

Looking at these inadequate solutions, it makes us appreciate the ideal solution, which would full the two criteria:

Provide you the PDB info like file name and line number
Not require you to ship or deploy the PDB files.

.Net allows you to do this. The “trick” is to get the “IL Offset” and use that to lookup in the PDB files for the exact info you need. Recall that all .Net code gets compiled to Intermediate Language (IL); therefore the IL Offset is just the line number in the IL code. The PDB maps the IL code to your source code. So, this approach has three main steps:

Create the pdb2xml database, this maps your IL to your source code.
Query the pdb2xml database using the IL Offset.
Have your application log the IL Offset.

This approach fulfills our criteria, so let’s explore it in more detail.

Step 1: Create the pdb2xml Database

The PDB files are not plain text, so they’re hard to work with for most people. However, the MSDN debugging team wrote a free tool that lets you convert the PDB files to XML (special thanks to Mike Stall for helping me better understand Pdb2Xml), such that you can easily look up info in them. You can download the “pdb2xml” converter tool from MSDN: http://www.microsoft.com/downloads/details.aspx?familyid=38449a42-6b7a-4e28-80ce-c55645ab1310&displaylang=en

When running the pdb2xml tool, it creates an xml file like so:

<symbols file="TestLoggerApp.Core.dll">
<files>
    <file id="1" name="C:\Temp\MyApp.Core\Class1.cs" ... />
    <file id="2" name="C:\Temp\MyApp.Core\Class2.cs" ... />
files>
<methods>
    <method name="MyApp.Core.Class3.Start3" token="0x6000002">
      <sequencepoints total="1">
        <entry il_offset="0x4" start_row="17" start_column="7"
          end_row="17" end_column="54" file_ref="1" />
      sequencepoints>
      <locals />
    method>
    …

This lists all the files, classes, and method involved. Each method can be looked up via the unique token. The node provides what we ultimately want – the coveted file and line number. We get the file by using entry.file_ref to lookup in the files section, and we get the line number from the entry.start_row attribute.

In order to get the exact node, we will need to know the specific:

Xml file to lookup at, where each xml file maps to a .Net assembly.
Method, which we can obtain from the token. The method name is just extra info for convenience.
IL Offset, which is stored as a hex value.

Because real application usually have many assemblies, ideally we could just point to a bin directory full of pdb files, and have a tool (like an automated build) dynamically generate all the corresponding xml files. We can write a wrapper for pdb2xml to do this.

The biggest issue when writing such a wrapper tool is that pdb2xml, which uses Reflection to dynamically load the assemblies, will get choked up when loading one assembly which contains a class that inherits a class in a different assembly. The easiest way to solve this is to just copy all the targeted assemblies (that you want to generate your xml files for) to the bin of pdb2xml. You could use the ReflectionOnlyAssemblyResolve event to handle resolution errors, but that will provide other problems because you need a physical file, but the event properties only give you the assembly name. While most of the time they’re the same, it will be one more problem to solve when they’re not.

Pdb2xml should handle a variety of cases – assemblies with strong names, third-party references, compiled in release mode, or files that are even several MB big.

ASP.Net applications are a little trickier. Starting with .Net 2.0, ASP allows another compilation model, where every page can get compiled to its own DLL. The easiest way to collect all these DLLs is to run the aspnet_compiler.exe tool, which outputs all the assemblies (and PDBs) to the web’s bin directory. You can read about the aspnet_compiler here: http://msdn.microsoft.com/en-us/library/ms229863(VS.80).aspx, or its MSBuild task equivalent: http://msdn.microsoft.com/en-us/library/ms164291.aspx.

Note that when using the aspnet_compiler, you need to include the debug ‘-d’ switch in order to generate the PDB files. A sample call (ignoring the line breaks) could look like:

aspnet_compiler.exe
-v /my_virtualDir
-p C:\Projects\MyWeb\
-f
-d
-fixednames C:\Projects\precompiledweb\MyWeb\

For convenience, I’ve attached a sample tool - PdbHelper.Cmd.Test (from the download) which will mass-generate these xml files for you. This solves the first step – converting the pdb files to an xml format that we can then query. You can now put those xml files anywhere you want, such as on a shared developer machine.

Step 2: Query the pdb2xml Database Using the IL Offset

Given an xml data island, we can easily query that data. The only thing we need is a key. In this case, we can have the application’s logger generate an xml snippet which some tool or process can then scan for and use it to lookup in the pdb2xml database. Let’s say that our logger gave us the following xml snippet (we’ll discuss how in the next step):

<ILExceptionData>
<Point module='TestLoggerApp.Core.dll' classFull='TestLoggerApp.Core.Class2'
methodName='Start2' methodSignature='Int32 Start2(System.String, Boolean)'
methodToken='0x6000005' ILOffset='11' />
...
ILExceptionData>

For each line in the stack trace, our this XML snippet contains a node. The node has the attributes needed to lookup in the pdb2xml database. These are the three values that we need:

module – the .Net module, which directly maps to an xml file.
methodToken – the token, which uniquely identifies the method
ILOffset – The line, in IL, that threw the exception. Our logger wrote this as decimal, but we can easily convert it to hex.

These values are just included for convenience:

classFull – the full, namespace-qualified, name of the class
methodName – the actual method’s name
methodSignature – the signature, to help troubleshoot overloaded methods

Given this xml snippet, we can have any tool or process consume it. In this case, I wrote a sample WinForm app (PdbHelper.Gui, from the Download) that takes a directory to the pdb2xml database, as well as the Xml Snippet, and performs the lookup. The logic is straightforward, perhaps the only catch is the IL Offset is not always exact; therefore if there is no exact match in the pdb2xml file, round down – i.e. find the previous entry node.

So, the developer could run this app, and it returns the file, line, and column.

While this is a manual GUI app, the logic could be automate for a console app, or other process.

Step 3: Have Your Application Log the IL Offset

The last step is to generate the Xml snippet. Given any Exception, you can use the System.Diagnostics.StackTrace object to determine the IL Offset, method, and module. You first need to create a new StackTrace object using the current Exception. You can then cycle through each StackFrame, getting the relevant data. This logic could be abstracted to its own assembly such that you could easily re-use it across all your applications.

    public static string CreateXmlLog(Exception ex)
    {
      try
      {
        //Get offset:
        System.Diagnostics.StackTrace st = new System.Diagnostics.StackTrace(ex, true);
        System.Diagnostics.StackFrame[] asf = st.GetFrames();

        StringBuilder sb = new StringBuilder();
        sb.Append("\r\n");

        int[] aint = new int[asf.Length];
        for (int i = 0; i < aint.Length; i++)
        {
          System.Diagnostics.StackFrame sf = asf[i];
          sb.Append(string.Format(" \r\n",
            sf.GetMethod().Name, sf.GetILOffset(), sf.GetMethod().Module, sf.GetMethod().ReflectedType.FullName,
            sf.GetMethod().ToString(), GetILHexLookup(sf.GetMethod().MetadataToken)));
        }

        sb.Append("\r\n");

        return sb.ToString();
      }
      catch (Exception ex2)
      {
        return "Error in creating ILExceptionData: " + ex2.ToString();
      }
    }
    private static string GetILHexLookup(int intILOffsetDec)
    {
      return "0x" + intILOffsetDec.ToString("X").ToLower();
    }

Download the source code and demo

You can download the complete source code, and an automated demo here.

The package has the following folders:

BuildScripts - automated scripts to run everything. This is useful if you wan to integrate the PdbHelper into your own processes.
mdbg - the pdb2xml application, with the compiled binaries. This was downloaded from MSDN.
PdbHelper.Cmd.Test - the command line tool to create all the xml files from pdb (this wraps the MSDN pdb2xml code)
PdbHelper.Core - reusable logic that the command line and GUI both use.
PdbHelper.Gui - the windows GUI to easily look up debugging info in the pdb-generated xml files.
PdbHelper.Logger - a reusable logger component that takes in an Exception and returns an xml snippet containing the IL offset.
TestLoggerApp - a test application to demonstrate all this.

There's not much code to all of this, so you could just reverse engineer it all. But to make it easy, go to the BuildScripts folder and you're see 4 bat files, numbered in order:

0_DeleteBins.bat - cleans up things to "reset" everything (delete bin and obj folders). This is optional, but useful when developing.
1_CompileFramework.bat - compile the PdbHelper framework (you could just open the solution in VS)
2_RunTestApp.bat - runs the test console app, whose whole purpose is to throw an exception, display the IL offset xml snippet, and then write it out to a file for easy use.
3_LookupException.bat - Run the windows GUI app. This passes in command line arguments to automatically populate the xml directory and the IL offset snippet generated in the previous step. You just need to click the "Lookup" button, and it should show you the debug info.

Several of these scripts call MSBuild to run certain tasks. Also, by default, this dumps the pdb2xml files in C:\Temp\pdb2xml.

Conclusion

Using these three steps allows an application to log additional info, from which we can then query the pdb files to find the source file and line number. This extra info can be very useful when debugging, helping to reduce the total cost of ownership.

Monday, July 28, 2008

Deploying PDBs to get the exact line number of an exception

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/deploying_pdbs_to_get_the_exact_line_number_of_an_exception.htm]

(Also check out Getting file and line numbers without deploying the PDB files).

Ideally an application in production will log all its unhandled exceptions. It would also be ideal for those error logs to have as much information as possible to help reproduce the bug. One piece of info that is very handy is knowing what line number of source code ultimately triggered the exception. For example, if you have a 50-line method that throws a null exception, you'll want to know what exact line was the culprit. If you just have the raw assemblies (only the DLL or EXE), that alone doesn't tell you the source-code line number, and for good reason - you lose all that line-number info when compiling your friendly source code (complete with comments and white space) into an assembly. That compiled assembly is essentially just a collection of bytes; it is not plain text or human-friendly.

That's why the PDB ("program database") are so great. The PDB file maps the original source code to the compiled assembly. It is the PDB that helps you step through your actual code in the debugger. So, without your PDB file, the Exception's stack trace just shows method names. But with the PDB file, it also includes "debugging" information like file name and line number.

For example, say you have the following trivial program, whose sole point is to throw an exception.

using System;

namespace ExceptionDemo
{
  class Program
  {
    static void Main(string[] args)
    {
      try
      {
        Console.WriteLine("Started");
        DoStuff();
      }
      catch (Exception ex)
      {
        Console.WriteLine("Error: " + ex);
      }
      Console.WriteLine("Done");
    }

    public static void DoStuff()
    {
      throw new ApplicationException("test1");
    }
  }
}

If you compile this in release mode, it creates the exe and pdb. If you run this with the pdb existing, it will output:

Started
Error: System.ApplicationException: test1
at ExceptionDemo.Program.DoStuff() in C:\Temp\Program.cs:line 26
at ExceptionDemo.Program.Main(String[] args) in C:\Temp\Program.cs:line 15
Done

If you then delete the pdb, it outputs:

Started
Error: System.ApplicationException: test1
at ExceptionDemo.Program.DoStuff()
at ExceptionDemo.Program.Main(String[] args)
Done

Moral of the story - including your PDB files with the released code lets your exceptions automatically pick up the extra helpful info like line numbers and file names. Of course, there's always a catch - PDB files may expose your intellectual property, so you probably don't want to ship them. So, the ideal situation would be to keep your PDB files for the release builds, don't ship them, but have a way to look up the line and file info whenever the application logs an error. Can we do this? Yes, using several other techniques that we'll discuss tomorrow.

Note - including the PDB file is NOT the same as shipping the debug mode. Debug mode fundamentally can compile different code, like using the #IF DEBUG declaratives. Merely adding the PDB file does not suddenly turn the release build into the debug build.

Sunday, July 27, 2008

Code Reviews - objections and counter-objections

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/code_reviews__objections_and_counterobjections.htm]

It's a good thing to have another person review the production code that a developer writes. Two heads are better than one. Code Reviews offer many benefits, especially catching bugs when they're cheaper to fix and sharing knowledge across the team. However, some people still have a lot of resistance to code reviews. Here are some common objections to code reviews, and problems with those objections.

Reason to not do a code review	Problem with that reason
It's my code, I don't want anyone messing with my code.	Technically, it's not your code - it's your company's code. They're paying for it.
I don't have time.	Code reviews aren't about wasting time discussing pointless trivia, they're about saving time by double-checking the code upfront, where bugs are much cheaper to fix than once those bugs have propagated all the way to production.
My code isn't ready yet to be reviewed.	Some devs want to first write a 2-month feature, have it work perfectly, and then essentially have a quick code-review meeting that rubber-stamps their amazing feature. But what good is a 15 minute review after two months of work? How often should you do code reviews? It should be frequent enough such that there's still time to act on it.
I'm a senior dev, I don't need to have some junior dev telling me what to do.	There are good reasons for a junior dev to review a senior dev's code, such as helping that junior dev to learn, which in turn benefits the whole team.
My code will already work (I tested it myself) - it doesn't need a code review.	This just isn't probable. We humans are fallible creatures, and even the best of us makes mistakes. Even if a developer's code is functionally perfect, maybe it can still be improved by refactoring, or using better coding tips or team-build components. And if the code is truly perfect in a way that it cannot be improved, it would be great for other developers to review it such that they learn from it.
My code is too complicated to explain in a code review.	If the code is truly too complicated, that's exactly why it should be reviewed - such that other team members can see how to make it simpler, or at least start understanding it so that other people are prepared to maintain it when you cannot.

Thursday, July 24, 2008

Bugs - kill them when it's cheapest

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/bugs__kill_them_when_its_cheapest.htm]

The sooner in the development life cycle that you catch a bug, the cheaper it is to fix. A simple design flaw consumes hours of coding time, more hours of testing time, potentially gets passed into training and written into documentation, has other components built on top of it, and gets deployed into production. Sometimes the original developer is gone, leaving the team precious little knowledge of how to change the erroneous code. I've seen many projects where a production error, something as simple as an ill-placed null reference exception, pulls the entire team into fixing it. Usually there are people shaking their head, "if we had just spent 60 seconds to write that null-check when we first developed the code." It's sad, but true.

As time increases, bugs become more expensive to fix because:

The bug propagates itself - it gets copied around, or other code gets built on top of it.
The code becomes less flexible - the project hits a code-freeze date, or the code gets deployed (where it's much harder to change than when you're first developing it)
People lose knowledge of the erroneous code - they forget about that "legacy" code, or even leave the team.

This is why many of the popular, industry best practices, are weighted to help catch bugs early - code reviews up front, allow time for proper design, unit tests, code generation, and setting up process the right way, etc... But, a lot of development shops, perhaps because they're so eager to get code out now, often punt ("we'll fix it later"), and end up fixing the bugs when they're most expensive. That may be what's necessary when a project first starts ("There won't be a tomorrow if we don't get this code out now"), but eventually it's got to shift gears and kill the bugs when they're the cheapest.

Tuesday, July 22, 2008

Screen scraping the easy way with .Net

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/screen_scraping_the_easy_way_with_net.htm]

Sometimes you may want to collect mass amounts of data from many web pages, and the easiest way is to just screen-scrape it. For example, perhaps a site doesn't provide any other data export mechanism, or it only lets you look up one item at a time, but you really want to look up 1000 items. That's where you have an application request the html page, then parse through the response to get the data you want. This is becoming rarer and rarer as RSS feeds and data exporting becomes more popular. However, when you need to screen scrape, you really need to screen scrape. .Net makes it very easy:

WebClient w = new WebClient();
string strHtml = w.DownloadString(strUrl);

Using the WebClient class (in the System.Net namespace), you can simply call the DownloadString method, pass in the url, and it returns a string of html. From there, you can parse through with Regular Expressions, or perhaps an open-source html parser. It's almost too easy. Note that you don't need to call this from an ASP.Net web app - you could call it from any .Net app (console, service, windows forms, etc...). Scott Mitchell wrote a very good article about screen-scraping back in .Net 1.0, but I think new features since then have made it easier.

You could also use this for a crude form of web functional testing (if you didn't use MVC, and you didn't have VS Testers edition with MSTest function tests), or to write other web analysis tools (is the rendered html valid, are there any broken links, etc...)