timstall

Tuesday, August 25, 2009

What is a number?

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/what_is_a_number.htm]

Having an application collect a number from the user seems simple. Just whip out a textbox, maybe add an "integer validator", and ta da! Sounds great, but endless things can go wrong (things can always go wrong, like with states, zip codes, or even labels). I'm certainly not saying that every textbox in every line of business app should handle all of this, but here are some things to be aware of:

Do you allow for special characters, like dollar signs "$", percents "%", or financial negative numbers "( )"? [Seriously, try formatting a negative dollar amount in Excel]
Do you allow commas?
Do you allow decimals?
Do you handle globalization? Some countries use the "," for a decimal point, and the "." for a comma (the inverse of what America does).
Do you allow both "-" (for negative) and "+" for positive, or is the "+" just implied by default?
Do you allow pre and post zeros: "001.00". Keep in mind that the zeros on the left are mathematically redundant, but the zeros on the right may count as "significant digits", and denote a degree of precision. The extra zeros in "1.25000" implies a much more precise number than just "1.25". However, most numeric types in .programming languages will truncate the trailing zeros.
Do you allow exponents for large numbers, like "1E+12" (that's 1 with 12 zeros after it: "1,000,000,000,000")?
Do you allow abbreviations, like "10M" for "10,000,000"? (FWIW, I'm not sure of any app that actually does this.)
Do you have a clearly defined range? For example, a smallint (Int16) stores much smaller numbers than a long (Int64). Most business values can be stored with a double. However, if you're extending a legacy system that only provides a byte or smallint, but it expects occasionally huge values, then you could get into trouble.
Is your validation lenient? For example, if the user enters multiple commas "23,5,6" - do you simply remove all the commas and have "2346", or is that an error?
How do you handle nulls? Do you use a sentinel value (like Int32.MinValue), or use the new Nullable data types, or something else? If using a nullable value, does it "convert up" - i.e. Int32.MinValue is null if the type if Int32, but it would be a valid non-null value if the type was Int64. If you have multiple systems reading the same value, and one system uses Int32, and the other uses Int64, will your system still work (this can happen when you start doing things like letting the user dynamically create their own pages or reports).
Not to be smart-alecky about it, but I'd just assume that unless otherwise told, this accepts base-10 inputs. I.e. "A5" and "FF" are invalid values - although there may be apps where that is legit (like specify a color in HTML for some blog or content-management software).

Also, does your system handle transformations? That's where you collect the input, store the raw data as something else, and display a newly formatted value. Perhaps the most common example is with percents. A user may type "50" into a textbox (with the "%" in the label next to it), you may save it in the database as "0.5", and you may format it back as "50%" on a report. Or a user may enter "+0" in a textbox, and you simple store and display it as "0".

So, some numerical inputs to test with (besides the obvious invalid numeric inputs, like "xyz") :

-23,456.000
-23.456,00 //Globalization
$45,345.00
$(4.00)
The max/min values for each of the numeric types.
Fill the entire textbox with 9's.
000000000
+23
23 //try the simple case
-0
+0
1.25E+12
1.25 E + 12 //note the spaces

The things to look for are (1) will these prompt validation messages, (2) how will each value be stored in the database, and (3) how will the value be re-formatted when it's displayed.

Perhaps the best standard is to see how Excel handles it, as Excel is probably the most famous application in the world that handles numbers.

Thursday, August 20, 2009

10 tips to integrate CodeSmith into your processes

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/10_tips_to_integrate_codesmith_into_your_processes.htm]

Say you've theoretically seen why code generation is so profitable, so you've downloaded a free trial of CodeSmith, and banged out a few templates. In other words, you've got code generation working on a single developer machine. That's great, but it's even better to have it adopted by the entire department. Here are some practical tips on how to integrate CodeSmith into your processes.

Aim for active regeneration - There are two kinds of generation, Active and Passive. Active means that the code is actively regenerated on a regular basis. Passive means it was generated just once, and then modified manually thereafter. The problem with Passive generation is that it lets developers create tons of code upfront, but then people get trigger-happy and use the generator to produce even more code, and you're stuck now maintaining it all. It's like a trip with no return ticket. It also misses out on many of the other benefits of codeGen - like mass updating code with some new change.
Always have a batch script - Yes, people can integrate into VS, or use the CodeSmith IDE. But to enforce uniformity, ensure that the right properties are passed in, and hook into your Continuous Integration (CI) build, you'll need a batch.
Run the codeGen from your CI build - This enforces active regeneration.
Consider not checking generated scripts into source control - This prevents synchronization errors between local developers and the build server. Yes, all code should ultimately be checked into source control, which is why we still check the templates themselves in, from which you can deterministically recreate all the target code. Your automated checkout script, which gets the latest from source control, can then run the templates and recreate the target code. NOTE - this only works if you're not using merge regions (which mix generated and custom code in the same file). If you use merge regions, then you need to check in the generated files.
Avoid merge reasons where possible - CodeSmith has this powerful feature called "merge regions" which lets you mix both generated and custom code. Sometimes you need this, but if you have a choice - always opt to put generated code in its own, dedicated file. This prevents synchronization issues, is less likely to break, is easier to handle overwriting files in active generation, and is easier for most developers to understand and maintain.
Ensure that code generation can be run on every developer's machine - Because you'll want to actively re-generate the code, you'll need each developer to be able to run those generation batch scripts locally. That means each developer will need a license for CodeSmith. This is absolutely not the place to be stingy. If developers cannot simply make a change and have the code re-generated, they will revolt against using code generation.
Clearly identify the generated files - Make sure an average developer can quickly identify that a given file is code-generated. You could name the file with a "*.CodeGen.cs" extension, put a comment disclaimer at the top ("//This file is code-generated. Any changes will be overwritten"), and not check the target code into source control so that it doesn't have any overlay icons (like what SVN offers).
Know your overwrite strategy - If the target file already exists (because you're actively re-generating), make sure you know the expected behavior. If you don't use merge regions, you can simply overwrite the file. Source control should be smart enough to see that the file has the exact same content, and hence it shouldn't be a burden. Worst case, you can have your generator, before it writes out the generated context, detect if the target is the same or not, and handle appropriately (not write anything, have your build server throw a synchronization error if they're different, etc...)
Don't output the DateTime or user info into the code - When someone first uses a code generator, it can be tempting to add as much "free" details into the target file - like displaying "//This code was generated by Homer, on August 20,2009 at 11:34 pm". You'd never maintain that by hand, so it initially looks cool to see all that crisp information in your file. However, the problem is that every time the code is regenerated, that kind of information changes, and the code continually appears to be updated. Furthermore, such comments don't give you anything that you couldn't already get from source control.
Have a backup developer - Make sure that at least one other developer on the team can use the generating tool.

Monday, August 17, 2009

When should you use Code Generation?

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/when_should_you_use_code_generation.htm]

Like everything, using code generation is a tradeoff. Basically, you want codeGen when the cost of writing the templates is less than the costs of writing what the template generates. For example, codeGen rocks at creating and maintaining tedious plumbing code - data access plumbing is the canonical example.

You should probably consider code generation when the target code:

Has similar patterns and deterministic rules, and little deviation from those rules. If there's an MS word doc or wiki page providing detailed instructions on how to write certain code, you may be able to send those instructions to the code generator instead.
Is large and brittle, and it can be described easily (i.e. 1 line in an xml config fiile to describe = 20 lines of C# and SQL coding).
Requires syncing with external data sources (like ADO.Net plumbing based on the database schema, such as var type, size, parameter order, count).
Requires multiple files to be kept in sync (like a SQL stored proc being in sync with C# ADO.Net wrapper).
Requires in-depth knowledge of a problem domain (like ADO.Net for data-access plumbing).
Is continually updated (like adding new classes to your DAL).
May need to be expanded in the future (like adding a whole new layer of webservices, or Audit triggers, to your DAL. Or, perhaps even migrating to a new language).

Code Generation is not a "Golden Hammer" - while it's great, it's not the perfect solution for everything. codeGen may not be the best solution if the target code:

Is very custom with no general pattern. If you can't abstract a pattern out of the target code, then you won't be able to write a generation template.
Is too small and trivial - In general, codeGen should decrease your total lines of code. So if you're writing a 50 line template to produce a single 30 line C# file, it's probably a bad ROI.
Can be refactored away, and doesn't need to exist in the first place.

Like everything else, in some contexts, it just isn't profitable - but in other contexts it's awesome. Some problems are cheaper to solve using other techniques, like unit testing, automation, open-source, DSLs, or other techniques; but every advanced developer should have code generation in their tool belt.

Monday, August 10, 2009

How Zip Codes can get complicated

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/how_zip_codes_can_get_complicated.htm]

I mentioned in a previous post that while states (in an address) seem simple - indeed most developers have made a dropdown to "pick your state" - in legacy apps they can quickly get complicated. Same thing applies to zip codes. It sounds like a secondary afterthought - "Oh, just add a field to the application so we can store numbers like 20500." However, it can quickly snowball:

Do you store the 4 digits at the end, like "20500-0003".
If you remove the spaces and dashes, you're left with just numbers - which seems easier to store and search on. So do you store it as an integer (205000003)? This might work if you're only looking at cities in the midwest or west coast, but some east coast states use zip codes that start with a "0", which would get truncated if stored as a number. Personally, I prefer to store them as a varchar, and then have a UI validation (for new) and backend scrubbing process (for existing) to standardize the format in the database.
Do you enforce valid zip codes only? Not every 9-digit combination of numbers is a valid zip code - i.e. there are not 1 trillion distinct codes that actually are used.
Many applications assume that one zip code belongs to one state - but there are scenarios where a single zip code is shared by multiple states (seriously).
Do you allow your zip code field to store international postal codes? Many US applications start off small, and only worry about the US market. Then some business sponsor says "we're missing out on the Country XYZ market, quick, update the app to handle foreign cities". This often causes changes to an address screen, and the quickest way to change it may be providing an "Out-Of-Country" option for the state dropdown, and allow the zip code to store international postal codes.
And do you handle all this in the UI with a rich control, or just use a "flexible" 10-character textbox?

Thursday, August 6, 2009

What can go wrong with changing a text label?

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/what_can_go_wrong_with_changing_a_text_label.htm]

What could go wrong if the business sponsor just wants to change some little text on a label? It sounds like the simplest thing. And while it should be a simple change, it can sometimes get complicated (i.e. expensive).

Globalization - If the app needs to support multiple languages, then the new text needs to be translated into all those target languages.
Special Characters - The new text could support special characters that need to be encoded (like & or < or >).
Non-supported characters - The new label introduces a special character not in the original character set (such as some foreign language), and the application only supports basic ASCII.
Changes flow layout - The new label is longer or shorter, which changes the flow layout. For example, the new text is long enough to force wrapping (which pushes a row to high) or it doesn't wrap so it pushes a column to wide.
It's an image - Perhaps the label is actually an image (for fancier looking designs), not just text.
Text is not determined in the presentation layer - Perhaps the label is set dynamically through code or configs, such as pulling from a meta-data dictionary.
Text was dynamically built - Perhaps the label was dynamically concatenated via some existing logic (for example, it pluralizes the text if some count > 1), so you're not just setting a literal string anymore.
You don't own the text - Perhaps the label comes from an embedded, third-party component that you don't own and cannot easily change.
The label appears in multiple places - Perhaps the label appears in multiple places, and so all places need to be updated. For example, it also appears in a UI Report writer where you select the UI-friendly name instead of the database schema column/table.
(Bad design) - other code depends on the label text - Perhaps the application has bad design, and there's actually code that reads the value of the label to determine if it should do some other action. For example, if the label contains some word X, then hide section Y (as opposed to whatever method sets label to word X also hides section Y).
New font - Perhaps the new text actually introduces a new font that isn't available on the client (this isn't merely changing text, but it's associated with that kind of request).

Wednesday, July 22, 2009

Thinking that you can learn it all

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/thinking_that_you_can_learn_it_all.htm]

I think .Net is too huge for one person to learn it all, and it's just getting bigger - like a galaxy. However, sometimes an optimistic developer may get the temporary delusion that they can learn it all, or at least the parts that matter. How could someone become so optimistic?

Unexpected free time.
Something came easier than expected.
You're on a prestigious fast track project.
A real good teacher explained something very well (and quickly).
You're kidding yourself - you're just skimming, or only looking at buzzwords, not really digging into the tech.

Basically, if things are temporarily going well (i.e. you're absorbing new concepts really fast), it may be tempting to think that "ah ha, this learning thing has finally clicked, and it will always go fast from now on!" Oh, how I wish...

Sunday, July 19, 2009

Would you still write unit tests even if you couldn?t automatically re-run them tomorrow?

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/would_you_still_write_unit_tests_even_if_you_couldnt_automa.htm]

I am constantly amazed at how difficult it is to encourage software engineering teams to adopt unit testing. Everyone knows (wink wink) that you should test your own code, and we all love automation, and all the experts are pushing for it, and we all know how expensive bug fixes are, etc... Yet, there are still many experienced and good-hearted developers who simply don't write unit tests.

I think a critical question may be "Would you still write unit tests even if you couldn’t automatically re-run them tomorrow?"

Here's why - most managers who push unit tests do so saying something like "Yeah, it's a lot of extra work to write all that testing code right now, but you'll sure be glad in a month when you can automatically re-run them. Oh, and by the way, you can't go home today until you fix these three production issues."

The problem is this demotes unit testing to yet another "invest now; reward later" methodology. This is a crowded field, so it's easy to ignore a new-comer like "unit testing". Obviously, most devs live in the here and now, and they'll just trying to survive today, so they care much more about "invest now, reward now".

The "trick" with unit testing - at least with basic unit testing to at least get your foot in the door - is that it adds immediate value today. Even if you can't automate those tests tomorrow, it can often still help get the current code done faster and better. How is this possible?

Faster to develop - Unit testing is faster to developer because it stubs out the context. Say you have some static method buried deep within your web application. If it takes you 5 minutes to set up the data, recompile the host app, navigate to the page, and do whatever action triggers your method being called - that's a huge lag time. If you could write a unit test that directly calls that method, such that you can bypass all that rigmarole and run the static method in 5 seconds - and now you need to test 10 different boundary cases - you've just saved yourself a good chunk of time.
Think through your own code - Unit testing forces you to dog food your own code (especially for class-library APIs). It also force you to think out boundary conditions - per the previous question, if it takes several minutes to test one usage of a function, and that function has many different boundary cases, a time-pressed developer simply won't test all the cases.
Better design - Testable code encourages a more modular design that is more flexible to change, and easier to debug. Think of it like this: in order to write the unit test, you've got to be able to call the code from a context-free, class library; i.e. if a unit test can call it, then so could a windows service, web service, console app, windows app, or anything else. Every external dependency (i.e. the things that usually break in production due to bad configuration) has been accounted for.

Even if you could never re-run those unit tests after the code was written, they are still a good ROI. The fact that you can automatically re-run them, and get all the additional benefits, is what makes unit testing such a winner for most application development.