Sunday, February 26, 2012

Bluffing Line Count and why it's still useful

I'll take it as a given that every developer knows that line count is a problematic metric. You know you got problems when someone with power thinks that "Homer did twice as much work because he wrote twice as many lines". Why is line count so unreliable? Here are some reasons to get started:
1.       Significant lines vs. raw, i.e. whitespace, comments, brackets on new-line
2.       New code (creates lots of it) vs. production bug update (5 hours to fix 1 line)
3.       Generated files (proxies, designer) or massive code generation
4.       Copying 1000 lines from open-source online (algorithm works, so low risk, but you've added a lot of code)
5.       Complexity of code, i.e. a 20-line algorithm may be more work than 1000 line of casual UI and data hookup.
However, if you're trying to coordinate more than 10 developers, and you have no other metric, line count still has some value because it quickly tells you something is going on. (i.e. the "something is better than nothing" philosophy.) You've got to look at the trends, not the absolute values.
·         It's useful to know if your team's average developer produces 500 lines of code (LOC) per week (of course this varies from team to team), then seeing someone produce either 50 or 5000 should catch your attention. Sure, there may be a good reason, but you at least want to be aware of what that reason is. Is the guy generating 5000 massively copying and pasting code, re-inventing the wheel for quick-to-write utility code, or using a passive code-generator instead of your team's ORM framework? Is the guy only doing 50 not checking anything in, and waiting to surprise the team with 4 weeks of work the day before code freeze for one "glorious" check-in?
·         Line count is ubiquitous and everyone can understand it.
·         Line count is very cheap to calculate; many tools can provide this.
·         Line count is the basis for two more relevant metrics: code churn, which tells you how many lines per file is changing per changeset (and hence per developer), and code duplication (I personally love Simian for this).
·         You can also write reports splitting line count by file name to see the ratio of business, entity, data-access, unit test, UI, etc… For example, is someone checking in 1000 lines of business logic, but with zero unit tests? It's something worth investigating.
You cannot reduce an art like code craftsmanship to auto-generated metrics. But the metrics do offer clues to what is going on.  It's good to be aware, but never judge a developer on metrics alone.

Wednesday, January 18, 2012

The problem with "It's not what you know, it's who you know."

I wasn't the most popular kid growing up. Even in college as I lived up to the analytical stereotype and stayed home studying (a better word would be "experimenting" or "training"), my party-going acquaintances would assure me that I was investing in the wrong thing. "It's not what you know, it's who you know. So don't spend so much  effort with the books when it's the relationships that matter." And there certainly is some truth to this. We've all seen the stranger's perfect resume get passed over for the friend's average resume (the stranger is by definition unknown, and therefore risky, so there is business rational to pick the safe candidate over the risky one). People ultimately make the decisions, so people are important. It's one reason I so actively endorse the community user groups.
However, there must be balance. There are three caveats that this cliché misses:
1.       If what you know is valuable, then people will want to know you. Even a hermit who cures cancer will begrudgingly become famous. Recruiters in every major city are scouring over LinkedIn, user groups, monster, dice, and every online job board trying to find good candidates, offering bounties, and poaching top talent from competitor's. In other words, "what you know" will quickly open doors to "who you know" (and "who knows you").
2.       Really, it's not "who you know," but "who knows you." Sharing an elevator, or even a lunch, doesn't mean that they'll risk their reputation giving you a referral, or that you can "phone them for a favor".
3.       There are talkers and doers. Talkers can drop a name for every occasion, have 500+ social-networking friends, and can truthfully say things like "Oh, I know Acme's Chicago director, Bill, we met at last Autumn's pumpkin-throwing contest…" They could get the interview with their connections, but they could never pass the interview itself.
Of course, with "what you know" vs. "who you know", like most two-way debates in life, you'd prefer both. But in the field of software engineering, you can never sell-short the "what you know".

Monday, January 16, 2012

Command-line Cyclomatic Complexity in VS2008 with VS2010 free Metrics.exe

Visual  Studio had code complexity metrics, but they were only available in the GUI. (At least for code coverage you could call the private assemblies and roll your own command-line tool.) However, VS 2010 offers  a free power-tool that lets you run complexity metrics from the command line! The result is an xml file, so you can leverage that for anything you need.
These blogs tell more:
Part of the cool thing is even if you're still on VS2008 (!), and you can't buy a 3rd party tool (NDepend!), you can still use the 2010 power tools to call .Net 3.5 assemblies. So, you could install VS2010 on your build server and use the power tools on 2008 builds.

Thursday, January 5, 2012

Detecting if a file is a merge in TFS VersionControl database

I was trying to run some metric calculations on files within a changeset, but I only wanted new files – i.e. I wanted to filter out merged, branched, or renamed files. For example, if someone created a branch, that shouldn’t count as adding 1000 new files.
One solution I found was to check the Command column of the TfsVersionControl.dbo.tbl_Version table. I realize the TfsVersionControl is a transactional database, and reports are encouraged to go off of TfsWareHouse, but that didn’t seem to contain this field.
Here’s the relevant SQL (NOTE: this is for VS2008, I haven’t tested it on VS2010).
select 
      CS.ChangeSetId, FullPath, Command, CreationDate,
      case
            when Command in (2,5,6,7,10,34) then cast(1 as bit)
        else cast(0 as bit)
      end as IsNew
from TfsVersionControl..tbl_Version V with (nolock)
      inner join TfsVersionControl..tbl_ChangeSet CS with (nolock)
      on V.VersionFrom = CS.ChangeSetId
where CS.ChangeSetId = 20123

The question becomes, what does the “tbl_Version .Command” column mean, and why those specific values? I couldn’t find official documentation (probably because it’s discouraged to run queries on it), so I did a distinct search on 50,000 different changesets to find all values being used, and I worked backwards comparing it against the Team Explorer UI to conclude it appears to be a bitflag for actions:

Command
Bit value
add
1
edit
2
branch
4
rename
8
delete
16
undelete
32
branch
64
merge
128


Recall there can be multiple actions (hence the bit field), such as merge and edit. So, if you want to find new code – i.e. adds or edits, then we’d take the following bit flags: 2, 5, 6, 7, 10, and 34.

Is New?
Bit value
Actions
Yes
2
edit
Yes
5
add (folder)
Yes
6
type/edit
Yes
7
add (add file)
No
8
rename
Yes
10
rename/edit
No
16
delete
No
24
delete,rename
No
32
undelete
Yes
34
undelete, edit
No
68
branch
No
70
branch, edit
No
84
branch,delete
No
128
merge
No
130
merge, edit
No
136
merge,rename
No
138
merge,rename,edit
No
144
merge,delete
No
152
merge, delete, rename
No
160
merge, undelete
No
162
merge, undelete, edit
No
196
merge, branch
No
198
merge, branch, edit
No
212
merge, branch, delete


Of course, this is induction, and it’s possible I may have missed something, but given a large sampling and lots of spot-checking, it appears to be reliable.

Wednesday, January 4, 2012

It’s not your code, but it is your opportunity

I occasionally hear the developer say “my code”, as in “I’ll check in my code at the end of next week”, or “My code doesn’t need unit tests”.
In one sense, I want developers to think “this is my code” so they take pride in doing the best job possible. But really, it’s not your code, it’s the company’s code – they’re paying for it, and often legally they own it (i.e. it would be illegal to take chunks of code you wrote at one company and either privately sell it, or check it into another company’s source code repository).
This perspective really changes the discussion, i.e. “The company would like their code to be checked-in on a regular basis”, or “The company would like their code to be properly tested”.
However, it is the developer's "opportunity to learn" – i.e. the company keeps the code, but the developer keeps the improved skill from writing that code.

Tuesday, January 3, 2012

31 User Groups in the Midwest

Clark Sell did a great series on the various user groups in the Midwest. He provided a helpful recap here:
Here's the link for our Lake County .Net Users Group (LCNUG).
There's over a dozen groups in Illinois alone.  There's something for everyone. Given the benefits of such groups (meeting dedicated peers face to face, hearing from expert presenters, etc…), check it out.

Friday, December 30, 2011

The benefits to “check in early and often”

I am a huge advocate of checking in early and often. I’ve seen many a project get burnt by the developer who saves 3 weeks of work for a single “glorious” check-in.
I favor frequent check-ins because it’s:
  1. Cheaper integration. Someone once said “Integration is pay me now or pay me later”, and I find it much easier to pay now. Especially with automated builds and continuous integration, it’s much easier to check in 10 little changes than 1 big change (Sometimes I think of it like being easier to hold my breath for 30 seconds, ten times, as opposed to holding it for 5 minutes straight). Why? Because with bigger changes, you inevitably get farther out of sync – especially on critical shared files – and there’s more to forget.
  2. More objective measure of what you really have: Code that isn’t checked in, that just works on a developer’s machine, doesn’t really exist. They might as well say “it works in my head”. Once you actually get the code past a build server’s policy, then we can see what’s really there.
  3. Earlier Detection: We all know it’s cheaper to fix a bug or redesign the sooner you catch it. I’d rather developers check in code early so we can quick detect things (“why is there 5000 lines but no tests?”)
  4. More Modular: Checking in 10 chunks of code, where each one works, implies more granular and modular code. I.e. code that can at least be split into multiple check-ins is more modular than code that can’t be split at all.
Of course there’s always exceptions (you do a massive refactoring, etc…), but those should be the exception, not the rule.
Most of the time, in my experience, large check-ins by developers means something bad – spaghetti code, tightly-coupled code, code that was trying to hide under the radar until right before the deadline and then the developer says “oops, I just don’t have time to change it”, or something like that. Think of it like this: there is zero benefit to you to have to wait one month before seeing what a developer is doing, but there is benefit to early detection of code, so risk-reward wise it’s better to check-in early.
Note that for these purposes, a shelve set is not the same as a check-in. Shelvesets are private, and hence deliberately avoid the benefits listed above (which some say is a feature). For example, you mostly likely don’t have builds on a private shelfset. For a developer to say “I put my 20,000 lines in a shelveset” is misguided– use a branch instead if you need to.
So how to encourage check-in early and often?
You could write a whole chapter on this, but here's a short answer: You can explain the benefits so some developers are internally motivated, or you can make it official policy so that other developers are externally “motivated”. You can leverage the TFS Code Churn tables to automatically monitor activity, or even just view check-ins in Team Explorer, to see how often a developer checks in and how much code has changed. If a developer or contractor insists that they need to wait 1 month to check-in their code when “it’s ready”, you’ve got problems, much like if a developer insisted they didn’t need to follow any other policy or good practice.