Tuesday, June 28, 2011

Query files in the TFS VersionControl database

TFS provides an API that C# could programmatically query source control. However, even with Linq, that may become tedious coding. TFS also provides a TfsVersionControl database that you can query directly with SQL. This has power.
Why use the undocumented TfsVersionControl database when you're "encouraged" to use TfsWareHouse?
  1. The Transaction databases (TfsBuild, TfsVersionControl, TfsIntegration) are realtime, so you don't need to wait 30 minutes - 2 hours for it to refresh.
  2. Not all the info is migrated to the TfsWareHouse (or at least, I can't find it in any documentation). For example, the warehouse has a File table, but it doesn't contain all versioned files (such as binaries, images, etc...)
  3. The TFS warehouse may be corrupted (the process to sync it may be down)
Here's a simple (TFS 2008) query to get you started. It contains version, the full path, file name, and the CreateDate (when it was checked in). It's based on versioned items, so you can query history (you may also get duplicated, so you'd need to query that).
select
V.VersionFrom, V.FullPath, L.CreationDate,
Replace(V.ChildItem, '\', '') as [FileName],
V.*, L.*
from tbl_Version V (nolock)
inner join tbl_File L (nolock) on V.FileId = L.FileId
where V.ParentPath = '$\MyTeamProject\Folder\SubFolder\'
order by V.VersionFrom desc
Note that TFS by default stores paths in a different format, so you may need to convert:
·         '/' becomes \
·          '_' becomes >
·         '-' becomes " (double quote)

Monday, June 27, 2011

Linq: Creating new objects from selects and joins

I like Linq more every time I use is. I've posted about XLinq and using linq to sort and filter lists. You can also use Linq to join two objects and select properties from each to create a new collection of objects (somewhat like SQL), run ForEach clauses, and do simple functions like Distinct, Sum, and Count.
Here's a code sample (I prefer to do minimalist samples with a unit test syntax for easy demos):

using System;
using System.Text;
using System.Collections.Generic;
using System.Linq;
using Microsoft.VisualStudio.TestTools.UnitTesting;

namespace LinqDemo
{
      /// <summary>
      /// Summary description for UnitTest1
      /// </summary>
      [TestClass]
      public class UnitTest1
      {
            private Employee[] GetEmployees()
            {
                  return new Employee[]
                  {
                    new Employee(){ FirstName = "Homer", LastName ="Simpson", FavoriteNumber=7, DeptId=1},
                    new Employee(){ FirstName = "Marge", LastName ="Simpson", FavoriteNumber=18, DeptId=0},
                    new Employee(){ FirstName = "Bart", LastName ="Simpson", FavoriteNumber=99, DeptId=0},
                    new Employee(){ FirstName = "Monty", LastName ="Burns", FavoriteNumber=23, DeptId=9},
                    new Employee(){ FirstName = "Ned", LastName ="Flanders", FavoriteNumber=5, DeptId=0}
                  };
            }

            private Department[] GetDepartments()
            {
                  return new Department[]
                  {
                        new Department(){ DeptId=1, DeptName="Safety Operator"},
                        new Department(){ DeptId=2, DeptName="Customer Service"},
                        new Department(){ DeptId=9, DeptName="Executive"},
                  };
            }



            [TestMethod]
            public void SelectProperties()
            {
                  //create data
                  Employee[] emps = GetEmployees();

                  //Use linq to get a distinct list from some property
                  List<string> lastNames = emps
                        .Where(n => n.FavoriteNumber > 10) //Some filter
                        .OrderBy(n => n.LastName)
                        .Select(n => n.LastName) //Select specific fields
                        .Distinct() //Get only distict elements
                        .ToList();

                  Assert.AreEqual("Burns", lastNames[0]);
                  Assert.AreEqual("Simpson", lastNames[1]);
            }

            [TestMethod]
            public void JoinAndCreateAnotherObject()
            {
                  //Join "Employee" and Department to create "Worker"
                  Employee[] emps = GetEmployees();
                  Department[] depts = GetDepartments();

                  //This could be useful is emps and depts came from
                  //    different sources, or depts was cached
                  Worker[] workers =
                        (
                              from emp in emps
                              from dept in depts
                              where emp.DeptId == dept.DeptId
                                    && emp.DeptId > 0
                              select new Worker()
                              {
                                    FirstName = emp.FirstName,
                                    DeptName = dept.DeptName
                              }
                        ).ToArray();

                  Assert.AreEqual("Homer", workers[0].FirstName);
                  Assert.AreEqual("Safety Operator", workers[0].DeptName);
            }

            [TestMethod]
            public void ForEach()
            {
                  //Use a single line to update a property
                  List<Employee> emps = GetEmployees().ToList();
                  Assert.AreEqual(7, emps[0].FavoriteNumber);

                  //Double everyone's favorte number
                  //Easier than writing a for-each loop
                  emps.ForEach(n => n.FavoriteNumber = n.FavoriteNumber * 2);

                  Assert.AreEqual(14, emps[0].FavoriteNumber);
            }

            [TestMethod]
            public void Do_Aggregates()
            {
                  //Get the sum of all numbers where the number is already > 10.
                  List<Employee> emps = GetEmployees().ToList();
                  int intSum = emps
                        .Where(n => n.FavoriteNumber > 10)
                        .Sum(n => n.FavoriteNumber);
                  Assert.AreEqual(140, intSum);
            }


      }

      public class Employee
      {
            public int FavoriteNumber { get; set; }
            public int DeptId { get; set; }
            public string FirstName { get; set; }
            public string LastName { get; set; }

            public override string ToString()
            {
                  return string.Format("{0} {1} - {2}", this.FirstName, this.LastName, this.FavoriteNumber);
            }
      }

      public class Department
      {
            public int DeptId { get; set; }
            public string DeptName { get; set; }
      }

      public class Worker
      {
            public string FirstName { get; set; }
            public string DeptName { get; set; }
      }


}

Monday, May 30, 2011

Tool - WinSplit Revolution - move windows between monitors

A coworker showed me a useful (and free!) tool that conveniently positions open windows on your screen.

The tool is Winsplit Revolution, and it let you use hotkeys to instantly move your windows, such as flipping a window between different monitors, or having two windows split on the same monitor. It also provides hotkeys for many other features, including maximizing and minimizing (you can use default OS keys, but I find the winsplit ones more convenient).

As a free tool that just works, I'm glad to have it.

Sunday, May 15, 2011

Chicago Code Camp 2011 success

I was thrilled to participate in the 2011 Chicago Code Camp. We had about 300 people show up on Saturday to CLC, with 35 speakers, many volunteers, and a full day of learning and networking. Thanks to a dozen sponsors, the event was free to the public. This is the third year we've had a CCC, and it keeps getting better.

It really is a privilege to be part of such a community.

Monday, February 28, 2011

4 Types of automated tests - unit, integration, UI, and performance

A software engineer could spend their life continually improving test automation - it's a big field. While the sky is the limit, there are at least 4 types of automated tests: unit, integration, UI, and performance.
Some key ideas:
  • These build off of each other -
    • Unit --> Integration: Don't bother with complex integration tests if you can't even have a simple unit test.
    • Integration --> UI: It's going to be near impossible to do a UI test (which usually has poor APIs) if you can't at least integrate the backend (with at least has APIs - like web services, SQL, or C# calls).
    • UI --> Performance: If you can't at least functionally run the code from end-to-end, then you can't expect reliable performance measures on it. Yes, there are always exceptions, and semantics (one may consider "UI" to be fronted integration, or one may test performance on just the backend APIs and bypass the UI). But in general, these 4 tests are a very natural path to follow.
  • The higher you go, the more expensive: Unit tests (low-level) are cheapest, performance tests (high-level) are most expensive. So it's bad business to pay for an integration test to do the work of a unit test. It's like using an armani suit as a dish rag.
  • These 4 types of tests should be separated. You can call any code from a unit test (depending on security, you could even call APIs to shutdown the server), so you could mix all your tests into one test harness. But don't do this - it will burn you. For example, unit tests are generally fast (they're all in-memory), whereas UI and integration are much slower (databases, web services, IIS hosts, etc...) So you don't want them coupled together because the slow integration tests will bog down your fast unit tests, and then developers never run unit tests before check-in because "it takes too long".
  • Unit testing is but one tool. There is different types of code (algorithms, data containers, plumbing, installation scripts, UI, persistence plumbing, etc...). This requires an arsenal of developer skills, of which unit testing is one tool. With respect to unit testing and code coverage, the goal isn't N% coverage of all code, but rather N% coverage of unit-testable code. (You can get better coverage tools, like NCover, which can provide coverage when running integration, UI, and even manual tests run by QA, but that's a different story).
And it wouldn't be complete unless I had a pro/con table of each option:

Test TypeGood ForBad For
Unit
  • Units of in-memory code, like parsing, calculations, validations,
    algorithms, formatting, etc...
  • Because unit tests are in-memory, they usually run very fast, and
    hence can be run upon check-in and with each build. Therefore they provide
    the "first level of defense" to ensure code continues to functionally
    work.

  • Integration, like anything that hits networks - the tests will be too
    brittle, something will break
  • Plumbing or generated code (like designer.cs files or database
    mapping) - you'll just now have both tedious plumbing code and
    tedious unit tests.
Integration ("backend")Ensuring high-level flows work, such as you can
call a web service that loads or saves data to a database and writes
something to a file.
Anything that can be handled with a unit test
instead. For example, you likely wouldn't use an integration test to verify
every input combination for a text field.
UI ("frontend integration")Very-high level, functional tests.Anything that can be handled via backend
integration or unit tests.
PerformanceIdentifying performance problems that could be
costly to the business.
Any functional testing

Sunday, February 27, 2011

Chicago Code Camp 2011 is coming in May!

The Chicago Code Camp (CCC) returns on Saturday, May 14, 2011. This has been very successful in the past (2010, 2009).
I think the 10 reasons to attend the 2009 CCC still apply, including the great content, and it's free.
Check out the CCC site: http://www.chicagocodecamp.com/

Monday, January 17, 2011

Gaming Unit Test Metrics

Unit Testing is a popular buzzword - most developer jobs request it, teams want to say they have it, and most coding leaders actively endorse it. However, you won't be able to hire a team of devs who don't want to write unit tests, and then "force" them via measuring certain metrics. Developers can game metrics.

1. Metric: Code Coverage
The #1 unit testing metric is code coverage, and this is a good metric, but it's insufficient. For example, a single regular expression to validate an email could require many different tests - but a single test will give it 100% code coverage. Likewise, you could leverage a mocking framework to artificially get high code coverage by mocking out all the "real" code, and essentially having a bunch of "x = x".

2. Metric: Number of unit tests
Sure, everything being equal, 10 tests probably does more work than just 1 test, but everything is not equal. Developer style and the code-being-tested both vary. One dev may write a single test with 10 asserts, another dev may put each of those asserts in its own test. Also, you could have many tests that are useless because they're checking for the "wrong" thing or are essentially duplicates of each other.

3. Metric: % of tests passing
If you have a million LOC with merely 5 unit tests, having 100% tests passing because all 5 pass is meaningless.

4. Metric: Having X tests for every Y lines of code
A general rule of thumb is to have X unit tests (or X lines of unit testing code) for every Y lines of code. However, LOC does not indicate good code. Someone could write bloated unit test code ("copy and paste"), so this could be very misleading.

5. Metric: How long it takes for unit tests to run
"We have 5 hours of tests running, so it must be doing something!". Ignore for the moment that such long-running tests are no longer really "unit" tests, but rather integration tests. These could be taking a long time because they're redundant (loop through every of a 1 GB file), or extensively hitting external machines such that it's really testing network access to a database rather than business logic.

6. Metric: Having unit testing on the project plan
"We have unit testing as an explicit project task, so we must be getting good unit tests out of it!" Ignore for the moment that unit tests should be done hand-in-hand with development as opposed to a separate task - merely having tests as an explicit task doesn't mean it's actually going to be used for that.

7. Metric: Having high test counts on methods with high cyclometric complexity
This is useful, but it boils down to code coverage (see above) - i.e. find the method with high complexity, and then track that method's code coverage.

Conclusion
Obviously a combo of these metrics would drastically help track unit test progress. If you have at least N% code coverage, with X tests for every Y lines of code, and they all pass - it's better than nothing. But fundamentally the best way to get good tests is by having developers who intrinsically value unit testing, and would write the tests not because management is "forcing" them with metrics, but because unit tests are intrinsically valuable.