timstall

Tuesday, September 22, 2009

ConnectionTimeout vs. CommandTimeout

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/connectiontimeout_vs_commandtimeout.htm]

SQL timeouts can be very annoying, especially for internal development tools where performance isn't critical.

A lot of developers will try fixing this by modifying the connection string by appending "Connect Timeout=300". Normally this is easy because the connection string is stored in some config file.

However, it still usually fails. This is because there's a big difference between SqlConnection.ConnectionTimeout and SqlCommand.CommandTimeout.

If you're running a command, like a snippet of SQL or a stored proc, then your code needs to set the CommandTimeout. Something like so:

SqlConnection con = new SqlConnection(strDbCon);
SqlCommand cmd = con.CreateCommand();
cmd.CommandType = CommandType.Text;
cmd.CommandText = strText;
cmd.CommandTimeout = 10000; //no relation to con.ConnectionTimeout

Obviously, that's very minimalist code, but that's the general idea. For a robust data access layer, you'd make the timeout configurable.

Sunday, September 20, 2009

I don't have time to put on the parachute

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/i_dont_have_time_to_put_on_the_parachute.htm]

If you're jumping out of a plane at 5000 feet - you take the time to put on the parachute. Sure, you could save yourself a few seconds, and for the first 1000 feet having that parachute doesn't matter yet when you're still free-falling ("I saved schedule time by skipping the 'put-on-parachute' task!"). However, by the time you hit the ground, that parachute is life-saving.

Same thing with software projects and best practices. The project is like jumping out of a plane, and the best practices are like the parachute. Sure, some "best practices" are just useless marketing buzzwords. However, others - like unit testing - are the real deal. And a developer or manager who rejects unit testing (for new .Net code in the middle tier) because they "don't have time" is like jumping out of the plane without putting on your parachute. You save a bit of time upfront, but when the project goes to production and "crashes" into reality, the maintenance costs and untested boundary cases will kill it.

"I don't have time" sounds much more noble and business-like than "I don't understand that idea", or "I just don't feel like doing something new." But within the (very common) context of new .Net class-library development - given that testing frameworks are free (NUnit, MSTest), and that any developer can at least write their tests locally - regardless of management support, and that in certain scenarios it's actually faster to develop code by writing unit tests (because it stubs out the context), the "I don't have time" reply to unit testing certain scenarios is misguided. Sure there are always exceptions, and reasons that good devs may not write unit tests. Testing databases, or legacy code, or the UI, or integration is a different beast. However, in general, testing public methods in a C# class library, without external data dependencies, should be as reasonable as putting a parachute on while jumping out of a plane.

Wednesday, September 16, 2009

LCNUG - Sept 24 - C# 4.0

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/lcnug__sept_24__c_40.htm]

Mike Stall (MSFT) will present on the new features in C# 4.0, including named and optional parameters, dynamic support, scripting, office interop and No-PIA (Primary-Interop-Assemblies) support.

This is a great event for those interested in C# 4.0.

Tuesday, August 25, 2009

What is a number?

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/what_is_a_number.htm]

Having an application collect a number from the user seems simple. Just whip out a textbox, maybe add an "integer validator", and ta da! Sounds great, but endless things can go wrong (things can always go wrong, like with states, zip codes, or even labels). I'm certainly not saying that every textbox in every line of business app should handle all of this, but here are some things to be aware of:

Do you allow for special characters, like dollar signs "$", percents "%", or financial negative numbers "( )"? [Seriously, try formatting a negative dollar amount in Excel]
Do you allow commas?
Do you allow decimals?
Do you handle globalization? Some countries use the "," for a decimal point, and the "." for a comma (the inverse of what America does).
Do you allow both "-" (for negative) and "+" for positive, or is the "+" just implied by default?
Do you allow pre and post zeros: "001.00". Keep in mind that the zeros on the left are mathematically redundant, but the zeros on the right may count as "significant digits", and denote a degree of precision. The extra zeros in "1.25000" implies a much more precise number than just "1.25". However, most numeric types in .programming languages will truncate the trailing zeros.
Do you allow exponents for large numbers, like "1E+12" (that's 1 with 12 zeros after it: "1,000,000,000,000")?
Do you allow abbreviations, like "10M" for "10,000,000"? (FWIW, I'm not sure of any app that actually does this.)
Do you have a clearly defined range? For example, a smallint (Int16) stores much smaller numbers than a long (Int64). Most business values can be stored with a double. However, if you're extending a legacy system that only provides a byte or smallint, but it expects occasionally huge values, then you could get into trouble.
Is your validation lenient? For example, if the user enters multiple commas "23,5,6" - do you simply remove all the commas and have "2346", or is that an error?
How do you handle nulls? Do you use a sentinel value (like Int32.MinValue), or use the new Nullable data types, or something else? If using a nullable value, does it "convert up" - i.e. Int32.MinValue is null if the type if Int32, but it would be a valid non-null value if the type was Int64. If you have multiple systems reading the same value, and one system uses Int32, and the other uses Int64, will your system still work (this can happen when you start doing things like letting the user dynamically create their own pages or reports).
Not to be smart-alecky about it, but I'd just assume that unless otherwise told, this accepts base-10 inputs. I.e. "A5" and "FF" are invalid values - although there may be apps where that is legit (like specify a color in HTML for some blog or content-management software).

Also, does your system handle transformations? That's where you collect the input, store the raw data as something else, and display a newly formatted value. Perhaps the most common example is with percents. A user may type "50" into a textbox (with the "%" in the label next to it), you may save it in the database as "0.5", and you may format it back as "50%" on a report. Or a user may enter "+0" in a textbox, and you simple store and display it as "0".

So, some numerical inputs to test with (besides the obvious invalid numeric inputs, like "xyz") :

-23,456.000
-23.456,00 //Globalization
$45,345.00
$(4.00)
The max/min values for each of the numeric types.
Fill the entire textbox with 9's.
000000000
+23
23 //try the simple case
-0
+0
1.25E+12
1.25 E + 12 //note the spaces

The things to look for are (1) will these prompt validation messages, (2) how will each value be stored in the database, and (3) how will the value be re-formatted when it's displayed.

Perhaps the best standard is to see how Excel handles it, as Excel is probably the most famous application in the world that handles numbers.

Thursday, August 20, 2009

10 tips to integrate CodeSmith into your processes

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/10_tips_to_integrate_codesmith_into_your_processes.htm]

Say you've theoretically seen why code generation is so profitable, so you've downloaded a free trial of CodeSmith, and banged out a few templates. In other words, you've got code generation working on a single developer machine. That's great, but it's even better to have it adopted by the entire department. Here are some practical tips on how to integrate CodeSmith into your processes.

Aim for active regeneration - There are two kinds of generation, Active and Passive. Active means that the code is actively regenerated on a regular basis. Passive means it was generated just once, and then modified manually thereafter. The problem with Passive generation is that it lets developers create tons of code upfront, but then people get trigger-happy and use the generator to produce even more code, and you're stuck now maintaining it all. It's like a trip with no return ticket. It also misses out on many of the other benefits of codeGen - like mass updating code with some new change.
Always have a batch script - Yes, people can integrate into VS, or use the CodeSmith IDE. But to enforce uniformity, ensure that the right properties are passed in, and hook into your Continuous Integration (CI) build, you'll need a batch.
Run the codeGen from your CI build - This enforces active regeneration.
Consider not checking generated scripts into source control - This prevents synchronization errors between local developers and the build server. Yes, all code should ultimately be checked into source control, which is why we still check the templates themselves in, from which you can deterministically recreate all the target code. Your automated checkout script, which gets the latest from source control, can then run the templates and recreate the target code. NOTE - this only works if you're not using merge regions (which mix generated and custom code in the same file). If you use merge regions, then you need to check in the generated files.
Avoid merge reasons where possible - CodeSmith has this powerful feature called "merge regions" which lets you mix both generated and custom code. Sometimes you need this, but if you have a choice - always opt to put generated code in its own, dedicated file. This prevents synchronization issues, is less likely to break, is easier to handle overwriting files in active generation, and is easier for most developers to understand and maintain.
Ensure that code generation can be run on every developer's machine - Because you'll want to actively re-generate the code, you'll need each developer to be able to run those generation batch scripts locally. That means each developer will need a license for CodeSmith. This is absolutely not the place to be stingy. If developers cannot simply make a change and have the code re-generated, they will revolt against using code generation.
Clearly identify the generated files - Make sure an average developer can quickly identify that a given file is code-generated. You could name the file with a "*.CodeGen.cs" extension, put a comment disclaimer at the top ("//This file is code-generated. Any changes will be overwritten"), and not check the target code into source control so that it doesn't have any overlay icons (like what SVN offers).
Know your overwrite strategy - If the target file already exists (because you're actively re-generating), make sure you know the expected behavior. If you don't use merge regions, you can simply overwrite the file. Source control should be smart enough to see that the file has the exact same content, and hence it shouldn't be a burden. Worst case, you can have your generator, before it writes out the generated context, detect if the target is the same or not, and handle appropriately (not write anything, have your build server throw a synchronization error if they're different, etc...)
Don't output the DateTime or user info into the code - When someone first uses a code generator, it can be tempting to add as much "free" details into the target file - like displaying "//This code was generated by Homer, on August 20,2009 at 11:34 pm". You'd never maintain that by hand, so it initially looks cool to see all that crisp information in your file. However, the problem is that every time the code is regenerated, that kind of information changes, and the code continually appears to be updated. Furthermore, such comments don't give you anything that you couldn't already get from source control.
Have a backup developer - Make sure that at least one other developer on the team can use the generating tool.

Monday, August 17, 2009

When should you use Code Generation?

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/when_should_you_use_code_generation.htm]

Like everything, using code generation is a tradeoff. Basically, you want codeGen when the cost of writing the templates is less than the costs of writing what the template generates. For example, codeGen rocks at creating and maintaining tedious plumbing code - data access plumbing is the canonical example.

You should probably consider code generation when the target code:

Has similar patterns and deterministic rules, and little deviation from those rules. If there's an MS word doc or wiki page providing detailed instructions on how to write certain code, you may be able to send those instructions to the code generator instead.
Is large and brittle, and it can be described easily (i.e. 1 line in an xml config fiile to describe = 20 lines of C# and SQL coding).
Requires syncing with external data sources (like ADO.Net plumbing based on the database schema, such as var type, size, parameter order, count).
Requires multiple files to be kept in sync (like a SQL stored proc being in sync with C# ADO.Net wrapper).
Requires in-depth knowledge of a problem domain (like ADO.Net for data-access plumbing).
Is continually updated (like adding new classes to your DAL).
May need to be expanded in the future (like adding a whole new layer of webservices, or Audit triggers, to your DAL. Or, perhaps even migrating to a new language).

Code Generation is not a "Golden Hammer" - while it's great, it's not the perfect solution for everything. codeGen may not be the best solution if the target code:

Is very custom with no general pattern. If you can't abstract a pattern out of the target code, then you won't be able to write a generation template.
Is too small and trivial - In general, codeGen should decrease your total lines of code. So if you're writing a 50 line template to produce a single 30 line C# file, it's probably a bad ROI.
Can be refactored away, and doesn't need to exist in the first place.

Like everything else, in some contexts, it just isn't profitable - but in other contexts it's awesome. Some problems are cheaper to solve using other techniques, like unit testing, automation, open-source, DSLs, or other techniques; but every advanced developer should have code generation in their tool belt.

Monday, August 10, 2009

How Zip Codes can get complicated

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/how_zip_codes_can_get_complicated.htm]

I mentioned in a previous post that while states (in an address) seem simple - indeed most developers have made a dropdown to "pick your state" - in legacy apps they can quickly get complicated. Same thing applies to zip codes. It sounds like a secondary afterthought - "Oh, just add a field to the application so we can store numbers like 20500." However, it can quickly snowball:

Do you store the 4 digits at the end, like "20500-0003".
If you remove the spaces and dashes, you're left with just numbers - which seems easier to store and search on. So do you store it as an integer (205000003)? This might work if you're only looking at cities in the midwest or west coast, but some east coast states use zip codes that start with a "0", which would get truncated if stored as a number. Personally, I prefer to store them as a varchar, and then have a UI validation (for new) and backend scrubbing process (for existing) to standardize the format in the database.
Do you enforce valid zip codes only? Not every 9-digit combination of numbers is a valid zip code - i.e. there are not 1 trillion distinct codes that actually are used.
Many applications assume that one zip code belongs to one state - but there are scenarios where a single zip code is shared by multiple states (seriously).
Do you allow your zip code field to store international postal codes? Many US applications start off small, and only worry about the US market. Then some business sponsor says "we're missing out on the Country XYZ market, quick, update the app to handle foreign cities". This often causes changes to an address screen, and the quickest way to change it may be providing an "Out-Of-Country" option for the state dropdown, and allow the zip code to store international postal codes.
And do you handle all this in the UI with a rich control, or just use a "flexible" 10-character textbox?