Sunday, October 12, 2008

Real life: the leaking window

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/real_life_the_leaking_window.htm]

House repairs provide a lot of good software analogies. Once during a big rain storm, our window started leaking. It was a newly installed window, and it had never leaked before. Well, obviously a leaky window can become a huge problem if left unfixed. So, I went out sometime later, sprayed the window with the hose to try to get an idea of where the leak is (I was not, and still am not, a house repair expert), and to my great frustration - the window did not leak. Of course I wanted to reproduce the problem, narrow it down to the exact cause, and then make a quick fix - just like I would in a software project. I didn't want to rebuild the whole thing.

So, here is a "critical feature bug" (the leaky window), which occurred in "production" (during the actual rainstorm), but I cannot reproduce in my "development environment" (sunny day with my spraying the window with a hose). It was a non-reproducible bug. However, I couldn't just ignore it or look the other way, I needed to ensure that it didn't happen again (given that it's my house, I need to take "ownership" of the "project"). It's the kind of thing that drives a software project mad.

In this case, I just 'blindly" resealed things - checked the siding, exterior frame, interior, etc... And the window never leaked since. If it does end up leaking again, then I'll probably need to call an expert, much like how some doomed software projects sometimes call in a star consultant to troubleshoot their obscure bugs.

Thursday, October 9, 2008

Xml Design Patterns

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/xml_design_patterns.htm]

Xml is the defacto way to store data in files - whether it's build instructions as an MSBuild script, a Domain Specific Language based on code-generating from an xml filea config file, or something else.

 

So, your xml files are storing lots of mission-critical information. They're getting huge, being modified by tons of developers, and getting hard to maintain. It's time to refactor them. Here are several ideas to refactor your xml files (much of this is demonstrated in MSBuild):

  • Create a variables section. For example, MSBuild has a  section, where each inner node can define a new variable. You can then reference those variables elsewhere throughout your script like so: $(myVar).

  • Allow importing of files. Often you'll want to either split a big file into smaller ones, or reuse a single file many times (for example, put all your global variables into a separate file, and then re-import it everywhere).

  • Related to this, allow chunks of xml to be automatically included in other chunks with some sort of "include" syntax. In other words, ensure that the xml file itself is refactored. MSBuild allows you to call other targets with a task. Or, say you have a test script that's generated from xml, and every test needs to repeat your "login routine". Define the login routine in one xml snippet, and allow it to be re-included in all the other snippets.

  • Have the xml structured to automatically determine relationships. For example, if you have a parent-child relationship of data (here is the menu, here are all the pages accessible from that one menu), then consider using Xml's natural hierarchy (a

    node with a bunch of child nodes). You could also use the natural order of the xml nodes to determine the natural sequence that the data appears (like the order of the pages).

  • Provide some way of an extension method or hook. In MSBuild, you can define your own custom tasks and dynamically load them.

  • Create special syntax for domain-specific concepts. For example, the MassDataHandler takes xml snippets, and converts them to SQL insert statements to assist with database unit testing. A common problem for inserting SQL statements is handling the identity columns. In this case, if you prefix the value with an '@', the MassDataHandler automatically knows to go lookup its identity value.

  • Create templates for default values, which other xml nodes can override. For example, the MassDataHandler allows you to run SQL inserts, one xml node per database row. But say five rows all have the same values for something (like they're all children of the same parent), then it provides the ability to have a "default row" which defines the default values. Other xml nodes can then override these default values.

I guess you could say there are design patterns for good xml. As Xml becomes more and more prevalent, I expect that "Xml Design Patterns" to become more prevalent too.

Wednesday, October 8, 2008

Having Console apps write out Xml so you can parse their output

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/having_console_apps_write_out_xml_so_you_can_parse_their_out.htm]

Say you have an automatic process calling some executable, and you want to get information back from that executable to use in your own code. Ideally you'd be able to call a class library that you could just reference, but what if no such class library is available (it's a different development platform like Java vs. C#, or it's a black box third-party component). Is there any way to salvage the situation?

 

Some developers would stream out the console output to a string, and then write a bunch of ugly parse code (even with regular expressions), and get their results that way. For example, let's say you use Subversion for your source control, and you want to search the history log. For the sake of argument, assume there's no friendly C# API or web service call (it's a local desktop app you're calling), and you need to parse command output. (If anyone knows of such an API, please feel free to suggest it in the comments.) You can run a command like "svn log C:\Projects\MyProject", and it will give you back console output like so:

------------------------------------------------------------------------
r3004 | username1 | 2008-09-26 10:47:19 -0500 (Fri, 26 Sep 2008) | 2 lines

Solved world hunger
------------------------------------------------------------------------
r3000 | username1 | 2008-09-06 14:10:56 -0500 (Sat, 06 Sep 2008) | 2 lines

Invented nuclear fusion
 

Ok, you can parse all the information there, but it's very error prone. A much better way is if the console app provides an option to write out its output as a single XML string. For example, you can specify the "--xml" parameter in SVN to do just that:


       revision="3004">
    username1
    2008-09-26 10:47:19 -0500
    
      Solved world hunger
    

  
       revision="3000 ">
    username1
    2008-09-06 14:10:56 -0500
    
      Invented nuclear fusion
    

  

Obviously, that's much easier for a calling app to parse. If you need to bridge a cross-platform divide such that you can't just provide the backend class libraries, outputting to xml can be an easy fix.

Monday, September 29, 2008

Death by a thousand cuts

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/death_by_a_thousand_cuts.htm]

A single paper cut won't kill you, but a thousand of them probably will. That's what people mean by "death by a thousand cuts". This is how many software projects die - it's usually a hundred little things that pile up, and whose combined pain becomes unbearable. It's the runaway bug list, the brittle test script that gets abandoned because it's too hard to maintain, the endless tedious validation on a details page, the component with the fuzzy interface, the buggy deployment, all the little features that just don't work right, etc...

 

This is why continuous integration, unit tests, automation, proactive team communication, code generation, good architecture, etc... are so important. These are the techniques and tools to prevent all those little cuts. I think that many of the current hot methodologies and tools are designed to add value not by doing the impossible, but by making the routine common tasks "just work", so that they no longer cause developers pain or stress.

 

Sure, you can write a single list-detail page, any developer can "get it done". But what about writing 100 of them, and then continually changing them with client demands, and doing it in half the scheduled time? Sure, you can technically write a web real-time strategy game in JavaScript, but it's infinitely easier to write it with Silverlight. Sure, you can manually regression test the entire app, but it's so much quicker to have sufficient unit tests to first check all the error-prone backend logic.

 

The whole argument shifts from "is it possible" to "is it practical?" Then, hopefully your project won't suffer death from a thousand cuts.

Sunday, September 28, 2008

Real life: How do you test a sump pump?

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/real_life_how_do_you_test_a_sump_pump.htm]

Back when we moved into our home (a while ago), this was our sump pump.

As home-owners knows, sump pumps are very important because without one, your basement gets flooded. That causes at least tens of thousands of dollars of damage, as well as potential loss of any personal items in your basement - for example all your electronic equipment or books could get ruined due to water damage.

 

In other words, it's a big deal, i.e. an important "feature" of your house. So the natural question as a new home-owner is "how do I ensure that this mission-critical feature actually works?" Obviously I didn't want to wait for a real thunderstorm and power outage to find out if everything would be ok. While I had heard the buzzwords ("make sure your sump pump works and you have a battery backup in case of a power outage"), and I had been on previous "teams" (i.e. my parent's house, growing up as a kid) that had successfully solved this problem, when push came to shove, I was certainly no expert. For all I knew, this could be the best sump pump in the world, or a worthless piece of junk. However, I knew the previous owners of the house, they were great, so I assumed that the sump pump was great too, and everything would be okay.

 

Eventually, I figured out how to test it by contacting some "domain experts" (previous house owners), who explained the different components to me. I then "mocked out" a power outage by simply unplugging the power plug (as opposed to waiting for a real power outage, or even turning off my house power). I then lifted the float to simulate water rising, and listened for a running motor. I checked a couple other things, and became confident that the feature did indeed "work as designed". I was told that is was actually a very good setup because it had a separate batter charger, and two separate pipes out, so they were completely parallel systems (kudos to the previous owners).

 

The number of analogies between me needing to test my sump pump, and a user needing to test a critical feature of a software project, are staggering. It's a reminder to me how real life experiences help one understand software engineering.

 

Thursday, September 25, 2008

Writing non-thread-safe code

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/writing_nonthreadsafe_code.htm]

Multithreading is hard, and I'm certainly no expert at it. I like this post about Measuring how difficult a Threading bug is. Most application developers hover around phase 0 (totally reproducible) to 2 (solvable with locks). Application Developers constantly hear about how some piece of code isn't "thread safe". What exactly would that look like? How could you deterministically write non-thread safe code, like writing a unit test that fails your non-thread-safe object?

 

The trick is to run a unit test that opens up multiple threads, and then runs them in a loop 10,000 times to force the issue. The following code does something like that. Say we have a simple "Account" object with Credit and Debit instance methods, both sharing the same state (the _intBalance field). If you run the unit test "Threads_2_5", it opens up 2 threads, calling Credit in one and Debit in the other. Because Credit and Debit should cancel each other out, and they're "theoretically" called the same number of times, the final result should remain zero. But it's not - the test will fail (if it doesn't fail, increase the number of iterations to force more contention).

 

So, we have a reproducible multi-threading failure in our unit test. The Account object is not thread safe. However, we can apply the C# lock keyword to put a lock on the methods in question, which at least for this simple case, fixes the problem. I've shown how to apply the lock keyword in the commented-out lines of the Account object. I see two observations:

  1. Without the C# lock keyword, this gets essentially random errors as contention (thread count or loops) increases.

  2. Adding the lock keyword will prevent errors, but the code runs much slower.  This means it's also possible to abuse the lock keyword, perhaps adding it in places you don't need it, such that you get no benefit, but now your code runs slower. Tricky tricky.

Here's the code:

 

    public class Account
    {
      //private Object thisLock = new Object();

      private int _intBalance = 0;

      public void Credit()
      {
        //lock (thisLock)
          _intBalance++;
      }

      public void Debit()
      {
        //lock (thisLock)
          _intBalance--;
      }

      public int CurrentValue
      {
        get
        {
          return _intBalance;
        }
      }
    }

    private Account _account = null;
    private int _intMaxLoops = -1;

    private void RunThreads2(int intLoopMagnitude)
    {
      _intMaxLoops = Convert.ToInt32(Math.Pow(10, intLoopMagnitude));
      _account = new Account();

      Assert.AreEqual(0, _account.CurrentValue);

      Thread t1 = new Thread(new ThreadStart(Run_Credit));
      Thread t2 = new Thread(new ThreadStart(Run_Debit));

      t1.Start();
      t2.Start();

      t1.Join();
      t2.Join();

      //We did an equal number of credits and debits --> should still be zero.
      Assert.AreEqual(0, _account.CurrentValue);
    }

    private void Run_Credit()
    {
      for (int i = 0; i < _intMaxLoops; i++)
      {
        _account.Credit();
      }
    }

    private void Run_Debit()
    {
      for (int i = 0; i < _intMaxLoops; i++)
      {
        _account.Debit();
      }
    }

 
    [TestMethod]
    public void Threads_2_5()
    {
      RunThreads2(5);
    }

 

Someone who knows multi-threading much better than I do (and who hasn't yet started their own blog despite all my pleading), explained it well here:

With multithreading, it’s important to understand what’s happening at the assembly level.
For example, a statement like “count++” is actually “count = count + 1”. In assembly, this becomes something like:

1: mov eax, [esp + 10] ; Load ‘count’ into a register
2: add eax, eax, 1 ; Increment the register
3: mov [esp + 10], eax ; Store the register in ‘count’

With multithreading, both threads could run through this code at different rates. For example, start with ‘count = 7’. Thread ‘A’ could have executed (1), loading ‘7’ into ‘eax’. Thread ‘B’ then executes (1), (2), (3), also loading ‘7’ into ‘eax’, incrementing, and storing ‘count = 8’. It then executes 3 more times, setting ‘count = 11’. Thread ‘A’ finally starts running again, but it is out of sync! It then stores ‘count = 8’ because it didn’t get updated.


When using locks, that would prevent multiple threads from executing this code at the same time. So thread ‘A’ would make ‘count = 8’. Thread ‘B’ would make ‘count = 9’, etc.
 

The problem with locks is what happens if you ever need multiple locks. If thread ‘A’ grabs lock (1) and thread ‘b’ grabs lock (2), then neither of them would ever be able to grab both locks. In SQL, this throws an exception on one of the threads, forcing it to release its lock and try over. SQL can do this because the database uses transactions to modify the data. If anything goes wrong, the transaction rolls the database back to the previous values. Normal C# code can’t do this because there are no transactions. Modifying the data is immediate, and there is no undo. ;-)

Obviously there's a world more to multi-threading, but everyone has got to start somewhere.

Sunday, September 21, 2008

Getting Categories for MSTest (just like NUnit)

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/getting_categories_for_mstest_just_like_nunit.htm]

Short answer: check out: http://www.codeplex.com/testlistgenerator.

 

Long answer: I previously discussed Automating Code Coverage from the Command Line. Another way to buff up your unit tests is to add categories to them. For example, suppose you had several long-wrong tests (perhaps unit testing database procedures), and you wanted the ability to run them separately, using categories, from the command line.
 

The free, 6-year old, open-source NUnit has category attributes, for which you can filter tests easily by. For a variety of reasons, MSTest - the core testing component of Microsoft's flagship development product, after 2 releases (2005 and 2008) still does not have these. I've had the chance to ask to ask people at MSDN events about these, and I've heard a lot of "reasons":

 

Proposal for why MSTest does not need categories:Problem with that:
Just put all those tests in a separate DLL
  • This does not handle putting a single test into two categories. Also, it's not feasible, often the tests we want to categorize cannot be so easily isolated.
Just add the description tag to them and then sort by description in the VS GUI
  • You still cannot filter desc from the command line - it requires running in a GUI
  • This fails because it does an exact string comparison instead of a substring search. If you have Description("cat1, cat2"), and Description("cat1") - you cannot just filter by "cat1" and get both tests, it would exclude the first because "cat1" does not match "cat1, cat2".
  • Conceptually, you shouldn't need to put "codes" in a "description" field. Descriptions imply verbose alpha-numeric text, not a lookup code that you can filter by.
Just filter by any of MSTest's advanced filter GUI features - like class or namespace.
  • This requires opening the GUI, but we need to run from the command line.
  • Sometimes the tests we need to filter by span across the standard MSTest groups.
Just use MSTest test lists
  • This is only available for more advanced version of VS.
  • This is hard to maintain. It requires you continually update a global test file (which always becomes a maintenance problem), as opposed to applying the attributes to each test individually.
Well, you shouldn't really need categories, just run all your unit tests every time.
  • Of course your automated builds should eventually run all unit tests, but it's a big time-saver for a developer to occasionally just run a small subset of those.

 

So, if you want to easily get the equivalent of NUnit categories for your MSTest suite, check out http://www.codeplex.com/testlistgenerator.