Monday, August 17, 2009

When should you use Code Generation?

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/when_should_you_use_code_generation.htm]

Like everything, using code generation is a tradeoff. Basically, you want codeGen when the cost of writing the templates is less than the costs of writing what the template generates. For example, codeGen rocks at creating and maintaining tedious plumbing code - data access plumbing is the canonical example.

You should probably consider code generation when the target code:

  • Has similar patterns and deterministic rules, and little deviation from those rules. If there's an MS word doc or wiki page providing detailed instructions on how to write certain code, you may be able to send those instructions to the code generator instead.
  • Is large and brittle, and it can be described easily (i.e. 1 line in an xml config fiile to describe = 20 lines of C# and SQL coding).
  • Requires syncing with external data sources (like ADO.Net plumbing based on the database schema, such as var type, size, parameter order, count).
  • Requires multiple files to be kept in sync (like a SQL stored proc being in sync with C# ADO.Net wrapper).
  • Requires in-depth knowledge of a problem domain (like ADO.Net for data-access plumbing).
  • Is continually updated (like adding new classes to your DAL).
  • May need to be expanded in the future (like adding a whole new layer of webservices, or Audit triggers, to your DAL. Or, perhaps even migrating to a new language).

Code Generation is not a "Golden Hammer" - while it's great, it's not the perfect solution for everything. codeGen may not be the best solution if the target code:

  • Is very custom with no general pattern. If you can't abstract a pattern out of the target code, then you won't be able to write a generation template.
  • Is too small and trivial - In general, codeGen should decrease your total lines of code. So if you're writing a 50 line template to produce a single 30 line C# file, it's probably a bad ROI.
  • Can be refactored away, and doesn't need to exist in the first place.

Like everything else, in some contexts, it just isn't profitable - but in other contexts it's awesome. Some problems are cheaper to solve using other techniques, like unit testing, automation, open-source, DSLs, or other techniques; but every advanced developer should have code generation in their tool belt.

Monday, August 10, 2009

How Zip Codes can get complicated

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/how_zip_codes_can_get_complicated.htm]

I mentioned in a previous post that while states (in an address) seem simple - indeed most developers have made a dropdown to "pick your state" - in legacy apps they can quickly get complicated. Same thing applies to zip codes. It sounds like a secondary afterthought - "Oh, just add a field to the application so we can store numbers like 20500." However, it can quickly snowball:

  • Do you store the 4 digits at the end, like "20500-0003".
  • If you remove the spaces and dashes, you're left with just numbers - which seems easier to store and search on. So do you store it as an integer (205000003)? This might work if you're only looking at cities in the midwest or west coast, but some east coast states use zip codes that start with a "0", which would get truncated if stored as a number. Personally, I prefer to store them as a varchar, and then have a UI validation (for new) and backend scrubbing process (for existing) to standardize the format in the database.
  • Do you enforce valid zip codes only? Not every 9-digit combination of numbers is a valid zip code - i.e. there are not 1 trillion distinct codes that actually are used.
  • Many applications assume that one zip code belongs to one state - but there are scenarios where a single zip code is shared by multiple states (seriously).
  • Do you allow your zip code field to store international postal codes? Many US applications start off small, and only worry about the US market. Then some business sponsor says "we're missing out on the Country XYZ market, quick, update the app to handle foreign cities". This often causes changes to an address screen, and the quickest way to change it may be providing an "Out-Of-Country" option for the state dropdown, and allow the zip code to store international postal codes.
  • And do you handle all this in the UI with a rich control, or just use a "flexible" 10-character textbox?

Thursday, August 6, 2009

What can go wrong with changing a text label?

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/what_can_go_wrong_with_changing_a_text_label.htm]

What could go wrong if the business sponsor just wants to change some little text on a label? It sounds like the simplest thing. And while it should be a simple change, it can sometimes get complicated (i.e. expensive).
  • Globalization - If the app needs to support multiple languages, then the new text needs to be translated into all those target languages.
  • Special Characters - The new text could support special characters that need to be encoded (like & or < or >).
  • Non-supported characters - The new label introduces a special character not in the original character set (such as some foreign language), and the application only supports basic ASCII.
  • Changes flow layout - The new label is longer or shorter, which changes the flow layout. For example, the new text is long enough to force wrapping (which pushes a row to high) or it doesn't wrap so it pushes a column to wide.
  • It's an image - Perhaps the label is actually an image (for fancier looking designs), not just text.
  • Text is not determined in the presentation layer - Perhaps the label is set dynamically through code or configs, such as pulling from a meta-data dictionary.
  • Text was dynamically built - Perhaps the label was dynamically concatenated via some existing logic (for example, it pluralizes the text if some count > 1), so you're not just setting a literal string anymore.
  • You don't own the text -  Perhaps the label comes from an embedded, third-party component that you don't own and cannot easily change.
  • The label appears in multiple places - Perhaps the label appears in multiple places, and so all places need to be updated. For example, it also appears in a UI Report writer where you select the UI-friendly name instead of the database schema column/table.
  • (Bad design) - other code depends on the label text - Perhaps the application has bad design, and there's actually code that reads the value of the label to determine if it should do some other action. For example, if the label contains some word X, then hide section Y (as opposed to whatever method sets label to word X also hides section Y).
  • New font -  Perhaps the new text actually introduces a new font that isn't available on the client (this isn't merely changing text, but it's associated with that kind of request).

Wednesday, July 22, 2009

Thinking that you can learn it all

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/thinking_that_you_can_learn_it_all.htm]

I think .Net is too huge for one person to learn it all, and it's just getting bigger - like a galaxy. However, sometimes an optimistic developer may get the temporary delusion that they can learn it all, or at least the parts that matter. How could someone become so optimistic?
  • Unexpected free time.
  • Something came easier than expected.
  • You're on a prestigious fast track project.
  • A real good teacher explained something very well (and quickly).
  • You're kidding yourself - you're just skimming, or only looking at buzzwords, not really digging into the tech.

Basically, if things are temporarily going well (i.e. you're absorbing new concepts really fast), it may be tempting to think that "ah ha, this learning thing has finally clicked, and it will always go fast from now on!" Oh, how I wish...

Sunday, July 19, 2009

Would you still write unit tests even if you couldn?t automatically re-run them tomorrow?

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/would_you_still_write_unit_tests_even_if_you_couldnt_automa.htm]

I am constantly amazed at how difficult it is to encourage software engineering teams to adopt unit testing. Everyone knows (wink wink) that you should test your own code, and we all love automation, and all the experts are pushing for it, and we all know how expensive bug fixes are, etc... Yet, there are still many experienced and good-hearted developers who simply don't write unit tests.

I think a critical question may be "Would you still write unit tests even if you couldn’t automatically re-run them tomorrow?"

Here's why - most managers who push unit tests do so saying something like "Yeah, it's a lot of extra work to write all that testing code right now, but you'll sure be glad in a month when you can automatically re-run them. Oh, and by the way, you can't go home today until you fix these three production issues."

The problem is this demotes unit testing to yet another "invest now; reward later" methodology. This is a crowded field, so it's easy to ignore a new-comer like "unit testing". Obviously, most devs live in the here and now, and they'll just trying to survive today, so they care much more about "invest now, reward now".

The "trick" with unit testing - at least with basic unit testing to at least get your foot in the door - is that it adds immediate value today. Even if you can't automate those tests tomorrow, it can often still help get the current code done faster and better. How is this possible?

  • Faster to develop - Unit testing is faster to developer because it stubs out the context. Say you have some static method buried deep within your web application. If it takes you 5 minutes to set up the data, recompile the host app, navigate to the page, and do whatever action triggers your method being called - that's a huge lag time. If you could write a unit test that directly calls that method, such that you can bypass all that rigmarole and run the static method in 5 seconds - and now you need to test 10 different boundary cases - you've just saved yourself a good chunk of time.
  • Think through your own code - Unit testing forces you to dog food your own code (especially for class-library APIs). It also force you to think out boundary conditions - per the previous question, if it takes several minutes to test one usage of a function, and that function has many different boundary cases, a time-pressed developer simply won't test all the cases.
  • Better design - Testable code encourages a more modular design that is more flexible to change, and easier to debug. Think of it like this: in order to write the unit test, you've got to be able to call the code from a context-free, class library; i.e. if a unit test can call it, then so could a windows service, web service, console app, windows app, or anything else. Every external dependency (i.e. the things that usually break in production due to bad configuration) has been accounted for.

Even if you could never re-run those unit tests after the code was written, they are still a good ROI. The fact that you can automatically re-run them, and get all the additional benefits, is what makes unit testing such a winner for most application development.

RELATED: Is unit testing a second class citizen?, How many unit tests are sufficient?, Backwards: "I wanted to do Unit Tests, but my manager wouldn't let me"

 

Sunday, July 12, 2009

The address's State field may contain more than just the 50 states

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/the_addresss_state_field_may_contain_more_than_just_the_50_.htm]

Most business applications eventually ask the user to enter an address. There's the user's address, shipping address, their company's address, maybe an emergency contact's address, address history, travel-related addresses, financial addresses, etc... Most addresses have a city, state, and zip. While city and zip seem simple (more on that later), many devs initially expect the "State" field to be simple - perhaps a two-character column that can store the 50 US states. However, it can quickly balloon to something much more complicated (especially if you're troubleshooting some legacy app). Besides the standard states, it could contain:

Military codes (reference)

Armed Forces AfricaAE
Armed Forces Americas (except Canada)AA
Armed Forces CanadaAE
Armed Forces EuropeAE
Armed Forces Middle EastAE
Armed Forces PacificAP

US Possessions (reference)

AMERICAN SAMOAAS
DISTRICT OF COLUMBIADC
FEDERATED STATES OF MICRONESIAFM
GUAMGU
MARSHALL ISLANDSMH
NORTHERN MARIANA ISLANDSMP
PALAUPW
PUERTO RICOPR
VIRGIN ISLANDSVI

Perhaps Canadian provinces? (reference)

AlbertaAB 
British ColumbiaBC 
ManitobaMB 
New BrunswickNB 
Newfoundland and LabradorNL 
Northwest TerritoriesNT 
Nova Scotia NS
NunavutNU 
OntarioON 
Prince Edward IslandPE 
QuebecQC 
SaskatchewanSK 
Yukon YT

Generic codes to indicate international use?

Foreign CountryFC
Out of CountryOC
Not ApplicableNA

Or, specific applications may try their own proprietary international mapping, like "RS" = Russia. This might work if you're only doing business with a handful of countries, but it doesn't scale well to the 200+ (?) existing other countries (i.e. I would not recommend this. Use a "Country" field instead is feasible).

Special codes to indicate an unknown, or empty state?

XX
  
*
..

Perhaps, for some reason, the application developer isn't storing just 2-char codes, but rather integer ids that map to another "States" table, so you see numbers like "32" instead of "NY" ("New York")?

Or, even worse, they're shoving non-state related information into the state column as a hack that "made something else easier".

How many distinct entries could you have?

  • With 26 letters, you've only got 26 ^ 2 = 676 options.
  • If you use numbers too, you've got (26 + 10) ^ 2 = 1296 options.
  • If you start using lower case letters (SQL is case-insensitive, but maybe this impacts managed code), then you've got (2 * 26 + 10) ^ 2 = 3844 options.
  • Add in some special characters (such as spaces, periods, asterisks, hyphen, underscores, etc...), maybe 10 of then (if the column isn't validated on strictly alpha-numeric), and you've got (2 * 26 + 10 + 10) ^2 = 5184 options.

That's potentially 100 time more than just the 50 US states. Of course, for new development, we'd all prefer some clearly-defined schema with referential integrity and a business-sensible range of values. However the real world of enterprise applications is messy, and you have to be prepared to see messy things.

It sounds simple, but that innocent "state" field can quickly get very complex.

Thursday, July 2, 2009

Why a manager may not want you to learn

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/why_a_manager_may_not_want_you_to_learn.htm]

I'm a huge advocate of learning. And it's natural for devs to want to pick up new stuff. However, many devs don't realize that they may report to a manager that actually wants to prevent them from learning new things - even on their own personal time. I think this type of manager is rare. However, it's good to be aware in case a manager is (perhaps unintentionally) "sabotaging" your learning.

I hesitated about writing this post lest it seem to cynical or jaded, but it's worth discussing as developers should be aware of such things. Note that there is not one specific person/event/incident that I have in mind, but rather glimpses of things over the last 10 years.

  • Their control - They may want to be in control, and you learning new things that they don't know takes away from their control.
    • They may want to understand the entire architecture themselves. It's sort of a "not built here" applied to a personal level - "If I don't already know it, it must not be necessary."
    • They may not want to learn the new stuff themselves. If you're a tech manager, and all your devs learn the next wave of technologies, it pressures you to learn the new wave as well, else you look obsolete.
    • They don't want you exposing their mistakes. Say a senior developer wrote a bad messaging framework. As long as no other employee has a clue about messaging, no one knows that they made a bad mistake.
    • They want you to "suffer" just like they did. Often new techs make it easier to do something, and rather than have the easy way out, you should do it the original way so you "understand what's really going on". Think using assembly language or C++ instead of a higher-level language like C#.
  • It doesn't support the immediate work
    • They may think it's a waste of time - "We've already invested in this architecture, we don't need anything else." Even though it's your own time, they'd rather you spend overtime on "useful features", like copying and pasting tedious code.
    • It competes with your day job - If you're researching some cool XNA technology, which is a lot more fun than the drudgery of some bad architecture, it may compete. Suppose you work at home, it might "distract" you.
    • It may be misapplied. New stuff is risky, and could be buggy or applied incorrectly - which would hurt the project.
    • Their afraid that "smart" developers are hard to manage. Smart developers can sometimes be total egomaniacs to work with (because they think they're so smart), and management may not want to even think about dealing with that.
  • You may leave
    • You may outgrow your company and leave-  If your company is stuck in the dark ages, they may want to keep everyone's technical skills "in the dark" as well, lest an employee "see the light" and leave.
    • It makes your more marketable, and you may leave. If you're stuck with some niche technology on an obsolete framework, you aren't very marketable and hence can't get another job, and hence your boss has tremendous control over you.

Examples of how a manager might unintentionally discourage a developer from learning:

  • Financially reject anything (like buying new books or tools or paying for a class)
  • Undermine your confidence ("Why would you need that") or question your motives.
  • Deny you resources, such as preventing you from installing anything on your machine (open source code, tools, etc...)
  • Never affirm new learning or innovation. They tell you "good job" for getting that feature done, but won't affirm picking up new technologies.
  • Never provide their software engineers with a continuing-education plan. Ask yourself, how do developers go "to the next level" in your team? Does management help them?

It's sad, but some companies are structured where it's not in the manager's best interests for the employees to "wise up". The managers want hard-working, honest people who are easy to manage, but they don't want to deal with innovation or smart developers.

LINK: Does your Project encourage learning