Showing posts with label TFS. Show all posts
Showing posts with label TFS. Show all posts

Thursday, January 5, 2012

Detecting if a file is a merge in TFS VersionControl database

I was trying to run some metric calculations on files within a changeset, but I only wanted new files – i.e. I wanted to filter out merged, branched, or renamed files. For example, if someone created a branch, that shouldn’t count as adding 1000 new files.
One solution I found was to check the Command column of the TfsVersionControl.dbo.tbl_Version table. I realize the TfsVersionControl is a transactional database, and reports are encouraged to go off of TfsWareHouse, but that didn’t seem to contain this field.
Here’s the relevant SQL (NOTE: this is for VS2008, I haven’t tested it on VS2010).
select 
      CS.ChangeSetId, FullPath, Command, CreationDate,
      case
            when Command in (2,5,6,7,10,34) then cast(1 as bit)
        else cast(0 as bit)
      end as IsNew
from TfsVersionControl..tbl_Version V with (nolock)
      inner join TfsVersionControl..tbl_ChangeSet CS with (nolock)
      on V.VersionFrom = CS.ChangeSetId
where CS.ChangeSetId = 20123

The question becomes, what does the “tbl_Version .Command” column mean, and why those specific values? I couldn’t find official documentation (probably because it’s discouraged to run queries on it), so I did a distinct search on 50,000 different changesets to find all values being used, and I worked backwards comparing it against the Team Explorer UI to conclude it appears to be a bitflag for actions:

Command
Bit value
add
1
edit
2
branch
4
rename
8
delete
16
undelete
32
branch
64
merge
128


Recall there can be multiple actions (hence the bit field), such as merge and edit. So, if you want to find new code – i.e. adds or edits, then we’d take the following bit flags: 2, 5, 6, 7, 10, and 34.

Is New?
Bit value
Actions
Yes
2
edit
Yes
5
add (folder)
Yes
6
type/edit
Yes
7
add (add file)
No
8
rename
Yes
10
rename/edit
No
16
delete
No
24
delete,rename
No
32
undelete
Yes
34
undelete, edit
No
68
branch
No
70
branch, edit
No
84
branch,delete
No
128
merge
No
130
merge, edit
No
136
merge,rename
No
138
merge,rename,edit
No
144
merge,delete
No
152
merge, delete, rename
No
160
merge, undelete
No
162
merge, undelete, edit
No
196
merge, branch
No
198
merge, branch, edit
No
212
merge, branch, delete


Of course, this is induction, and it’s possible I may have missed something, but given a large sampling and lots of spot-checking, it appears to be reliable.

Friday, December 30, 2011

The benefits to “check in early and often”

I am a huge advocate of checking in early and often. I’ve seen many a project get burnt by the developer who saves 3 weeks of work for a single “glorious” check-in.
I favor frequent check-ins because it’s:
  1. Cheaper integration. Someone once said “Integration is pay me now or pay me later”, and I find it much easier to pay now. Especially with automated builds and continuous integration, it’s much easier to check in 10 little changes than 1 big change (Sometimes I think of it like being easier to hold my breath for 30 seconds, ten times, as opposed to holding it for 5 minutes straight). Why? Because with bigger changes, you inevitably get farther out of sync – especially on critical shared files – and there’s more to forget.
  2. More objective measure of what you really have: Code that isn’t checked in, that just works on a developer’s machine, doesn’t really exist. They might as well say “it works in my head”. Once you actually get the code past a build server’s policy, then we can see what’s really there.
  3. Earlier Detection: We all know it’s cheaper to fix a bug or redesign the sooner you catch it. I’d rather developers check in code early so we can quick detect things (“why is there 5000 lines but no tests?”)
  4. More Modular: Checking in 10 chunks of code, where each one works, implies more granular and modular code. I.e. code that can at least be split into multiple check-ins is more modular than code that can’t be split at all.
Of course there’s always exceptions (you do a massive refactoring, etc…), but those should be the exception, not the rule.
Most of the time, in my experience, large check-ins by developers means something bad – spaghetti code, tightly-coupled code, code that was trying to hide under the radar until right before the deadline and then the developer says “oops, I just don’t have time to change it”, or something like that. Think of it like this: there is zero benefit to you to have to wait one month before seeing what a developer is doing, but there is benefit to early detection of code, so risk-reward wise it’s better to check-in early.
Note that for these purposes, a shelve set is not the same as a check-in. Shelvesets are private, and hence deliberately avoid the benefits listed above (which some say is a feature). For example, you mostly likely don’t have builds on a private shelfset. For a developer to say “I put my 20,000 lines in a shelveset” is misguided– use a branch instead if you need to.
So how to encourage check-in early and often?
You could write a whole chapter on this, but here's a short answer: You can explain the benefits so some developers are internally motivated, or you can make it official policy so that other developers are externally “motivated”. You can leverage the TFS Code Churn tables to automatically monitor activity, or even just view check-ins in Team Explorer, to see how often a developer checks in and how much code has changed. If a developer or contractor insists that they need to wait 1 month to check-in their code when “it’s ready”, you’ve got problems, much like if a developer insisted they didn’t need to follow any other policy or good practice.

Friday, December 2, 2011

10 Reasons why the build works locally but fails on the build server

This is a braindump:
1.       Developer did not check all the files in, or developer doesn't have the latest files (sometimes TFS hiccups getting latest dlls files).
2.       Different modes (release vs. debug). Either #if DEBUG, or project is unmarked in configuration manager.
3.       Different bin structure - each project gets its own (Default for visual studio), vs. single shared bin for all (default for TFS). This is especially common when different versions of the same assembly is referenced in multiple projects in the same solution.
4.       Different platform/configuration
5.       The build is running other steps (perhaps a packaging or command-line unit tests)
6.       Different bitness, say developer workstation is 64-bit, but build server is 32-bit, and some extra step breaks because of this.
7.       Rebuild-vs-build. Developer not running a rebuild. Hence there's an error in creating a dll, but it already exists on dev machine due to some other process, but build server fails.
8.       Workspace mapping is incorrect – TFS not getting all the files it needs
9.       Unit test code coverage – visual studio (at least 2008) can be very brittle running command line unit tests and code coverage.
10.   Treat warnings as compile errors – depending on your process, the build server may fail on these, but Visual studio may only flag you with a warning (which dev ignores)

Tuesday, June 28, 2011

Query files in the TFS VersionControl database

TFS provides an API that C# could programmatically query source control. However, even with Linq, that may become tedious coding. TFS also provides a TfsVersionControl database that you can query directly with SQL. This has power.
Why use the undocumented TfsVersionControl database when you're "encouraged" to use TfsWareHouse?
  1. The Transaction databases (TfsBuild, TfsVersionControl, TfsIntegration) are realtime, so you don't need to wait 30 minutes - 2 hours for it to refresh.
  2. Not all the info is migrated to the TfsWareHouse (or at least, I can't find it in any documentation). For example, the warehouse has a File table, but it doesn't contain all versioned files (such as binaries, images, etc...)
  3. The TFS warehouse may be corrupted (the process to sync it may be down)
Here's a simple (TFS 2008) query to get you started. It contains version, the full path, file name, and the CreateDate (when it was checked in). It's based on versioned items, so you can query history (you may also get duplicated, so you'd need to query that).
select
V.VersionFrom, V.FullPath, L.CreationDate,
Replace(V.ChildItem, '\', '') as [FileName],
V.*, L.*
from tbl_Version V (nolock)
inner join tbl_File L (nolock) on V.FileId = L.FileId
where V.ParentPath = '$\MyTeamProject\Folder\SubFolder\'
order by V.VersionFrom desc
Note that TFS by default stores paths in a different format, so you may need to convert:
·         '/' becomes \
·          '_' becomes >
·         '-' becomes " (double quote)