Wednesday, March 21, 2012

Presenting at SDC on Build Servers and Metrics

UPDATE: Download PowerPoint Here

----
I'll be presenting at the Software Development Community (SDC) on Sunday, April 1st on Build Servers and Metrics. The SDC meets in Oakbrook.

http://www.meetup.com/SoftDev/events/43412322/

ALM tooling: Empowering teams with build servers and metrics
Everyone knows that automated builds are a good thing, but many teams don't leverage them fully because it's hard to get started. Tim will go over practical techniques and concepts for automating builds with TFS and MSbuild. Once you have an automated build, there are dozens of steps you can hook into it, such as metrics. Tim will walk through several core metrics, including line count, code churn, duplication, complexity, and test code coverage, as well as the concepts and pitfalls for adopting these within a team.

Chicago Code Camp is coming May 19

The fourth Chicago Code Camp is coming to CLC on May 19:

http://www.chicagocodecamp.com/

You can register here:

http://chicagocodecamp2012.eventbrite.com

Chicago Code Camp is a free, one day conference on Saturday May 19th, for developers of all skill levels and interests with multiple sessions running side-by-side throughout the day.

Sunday, February 26, 2012

Bluffing Line Count and why it's still useful

I'll take it as a given that every developer knows that line count is a problematic metric. You know you got problems when someone with power thinks that "Homer did twice as much work because he wrote twice as many lines". Why is line count so unreliable? Here are some reasons to get started:
1.       Significant lines vs. raw, i.e. whitespace, comments, brackets on new-line
2.       New code (creates lots of it) vs. production bug update (5 hours to fix 1 line)
3.       Generated files (proxies, designer) or massive code generation
4.       Copying 1000 lines from open-source online (algorithm works, so low risk, but you've added a lot of code)
5.       Complexity of code, i.e. a 20-line algorithm may be more work than 1000 line of casual UI and data hookup.
However, if you're trying to coordinate more than 10 developers, and you have no other metric, line count still has some value because it quickly tells you something is going on. (i.e. the "something is better than nothing" philosophy.) You've got to look at the trends, not the absolute values.
·         It's useful to know if your team's average developer produces 500 lines of code (LOC) per week (of course this varies from team to team), then seeing someone produce either 50 or 5000 should catch your attention. Sure, there may be a good reason, but you at least want to be aware of what that reason is. Is the guy generating 5000 massively copying and pasting code, re-inventing the wheel for quick-to-write utility code, or using a passive code-generator instead of your team's ORM framework? Is the guy only doing 50 not checking anything in, and waiting to surprise the team with 4 weeks of work the day before code freeze for one "glorious" check-in?
·         Line count is ubiquitous and everyone can understand it.
·         Line count is very cheap to calculate; many tools can provide this.
·         Line count is the basis for two more relevant metrics: code churn, which tells you how many lines per file is changing per changeset (and hence per developer), and code duplication (I personally love Simian for this).
·         You can also write reports splitting line count by file name to see the ratio of business, entity, data-access, unit test, UI, etc… For example, is someone checking in 1000 lines of business logic, but with zero unit tests? It's something worth investigating.
You cannot reduce an art like code craftsmanship to auto-generated metrics. But the metrics do offer clues to what is going on.  It's good to be aware, but never judge a developer on metrics alone.

Wednesday, January 18, 2012

The problem with "It's not what you know, it's who you know."

I wasn't the most popular kid growing up. Even in college as I lived up to the analytical stereotype and stayed home studying (a better word would be "experimenting" or "training"), my party-going acquaintances would assure me that I was investing in the wrong thing. "It's not what you know, it's who you know. So don't spend so much  effort with the books when it's the relationships that matter." And there certainly is some truth to this. We've all seen the stranger's perfect resume get passed over for the friend's average resume (the stranger is by definition unknown, and therefore risky, so there is business rational to pick the safe candidate over the risky one). People ultimately make the decisions, so people are important. It's one reason I so actively endorse the community user groups.
However, there must be balance. There are three caveats that this cliché misses:
1.       If what you know is valuable, then people will want to know you. Even a hermit who cures cancer will begrudgingly become famous. Recruiters in every major city are scouring over LinkedIn, user groups, monster, dice, and every online job board trying to find good candidates, offering bounties, and poaching top talent from competitor's. In other words, "what you know" will quickly open doors to "who you know" (and "who knows you").
2.       Really, it's not "who you know," but "who knows you." Sharing an elevator, or even a lunch, doesn't mean that they'll risk their reputation giving you a referral, or that you can "phone them for a favor".
3.       There are talkers and doers. Talkers can drop a name for every occasion, have 500+ social-networking friends, and can truthfully say things like "Oh, I know Acme's Chicago director, Bill, we met at last Autumn's pumpkin-throwing contest…" They could get the interview with their connections, but they could never pass the interview itself.
Of course, with "what you know" vs. "who you know", like most two-way debates in life, you'd prefer both. But in the field of software engineering, you can never sell-short the "what you know".

Monday, January 16, 2012

Command-line Cyclomatic Complexity in VS2008 with VS2010 free Metrics.exe

Visual  Studio had code complexity metrics, but they were only available in the GUI. (At least for code coverage you could call the private assemblies and roll your own command-line tool.) However, VS 2010 offers  a free power-tool that lets you run complexity metrics from the command line! The result is an xml file, so you can leverage that for anything you need.
These blogs tell more:
Part of the cool thing is even if you're still on VS2008 (!), and you can't buy a 3rd party tool (NDepend!), you can still use the 2010 power tools to call .Net 3.5 assemblies. So, you could install VS2010 on your build server and use the power tools on 2008 builds.

Thursday, January 5, 2012

Detecting if a file is a merge in TFS VersionControl database

I was trying to run some metric calculations on files within a changeset, but I only wanted new files – i.e. I wanted to filter out merged, branched, or renamed files. For example, if someone created a branch, that shouldn’t count as adding 1000 new files.
One solution I found was to check the Command column of the TfsVersionControl.dbo.tbl_Version table. I realize the TfsVersionControl is a transactional database, and reports are encouraged to go off of TfsWareHouse, but that didn’t seem to contain this field.
Here’s the relevant SQL (NOTE: this is for VS2008, I haven’t tested it on VS2010).
select 
      CS.ChangeSetId, FullPath, Command, CreationDate,
      case
            when Command in (2,5,6,7,10,34) then cast(1 as bit)
        else cast(0 as bit)
      end as IsNew
from TfsVersionControl..tbl_Version V with (nolock)
      inner join TfsVersionControl..tbl_ChangeSet CS with (nolock)
      on V.VersionFrom = CS.ChangeSetId
where CS.ChangeSetId = 20123

The question becomes, what does the “tbl_Version .Command” column mean, and why those specific values? I couldn’t find official documentation (probably because it’s discouraged to run queries on it), so I did a distinct search on 50,000 different changesets to find all values being used, and I worked backwards comparing it against the Team Explorer UI to conclude it appears to be a bitflag for actions:

Command
Bit value
add
1
edit
2
branch
4
rename
8
delete
16
undelete
32
branch
64
merge
128


Recall there can be multiple actions (hence the bit field), such as merge and edit. So, if you want to find new code – i.e. adds or edits, then we’d take the following bit flags: 2, 5, 6, 7, 10, and 34.

Is New?
Bit value
Actions
Yes
2
edit
Yes
5
add (folder)
Yes
6
type/edit
Yes
7
add (add file)
No
8
rename
Yes
10
rename/edit
No
16
delete
No
24
delete,rename
No
32
undelete
Yes
34
undelete, edit
No
68
branch
No
70
branch, edit
No
84
branch,delete
No
128
merge
No
130
merge, edit
No
136
merge,rename
No
138
merge,rename,edit
No
144
merge,delete
No
152
merge, delete, rename
No
160
merge, undelete
No
162
merge, undelete, edit
No
196
merge, branch
No
198
merge, branch, edit
No
212
merge, branch, delete


Of course, this is induction, and it’s possible I may have missed something, but given a large sampling and lots of spot-checking, it appears to be reliable.

Wednesday, January 4, 2012

It’s not your code, but it is your opportunity

I occasionally hear the developer say “my code”, as in “I’ll check in my code at the end of next week”, or “My code doesn’t need unit tests”.
In one sense, I want developers to think “this is my code” so they take pride in doing the best job possible. But really, it’s not your code, it’s the company’s code – they’re paying for it, and often legally they own it (i.e. it would be illegal to take chunks of code you wrote at one company and either privately sell it, or check it into another company’s source code repository).
This perspective really changes the discussion, i.e. “The company would like their code to be checked-in on a regular basis”, or “The company would like their code to be properly tested”.
However, it is the developer's "opportunity to learn" – i.e. the company keeps the code, but the developer keeps the improved skill from writing that code.