Wednesday, July 22, 2009

Thinking that you can learn it all

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/thinking_that_you_can_learn_it_all.htm]

I think .Net is too huge for one person to learn it all, and it's just getting bigger - like a galaxy. However, sometimes an optimistic developer may get the temporary delusion that they can learn it all, or at least the parts that matter. How could someone become so optimistic?
  • Unexpected free time.
  • Something came easier than expected.
  • You're on a prestigious fast track project.
  • A real good teacher explained something very well (and quickly).
  • You're kidding yourself - you're just skimming, or only looking at buzzwords, not really digging into the tech.

Basically, if things are temporarily going well (i.e. you're absorbing new concepts really fast), it may be tempting to think that "ah ha, this learning thing has finally clicked, and it will always go fast from now on!" Oh, how I wish...

Sunday, July 19, 2009

Would you still write unit tests even if you couldn?t automatically re-run them tomorrow?

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/would_you_still_write_unit_tests_even_if_you_couldnt_automa.htm]

I am constantly amazed at how difficult it is to encourage software engineering teams to adopt unit testing. Everyone knows (wink wink) that you should test your own code, and we all love automation, and all the experts are pushing for it, and we all know how expensive bug fixes are, etc... Yet, there are still many experienced and good-hearted developers who simply don't write unit tests.

I think a critical question may be "Would you still write unit tests even if you couldn’t automatically re-run them tomorrow?"

Here's why - most managers who push unit tests do so saying something like "Yeah, it's a lot of extra work to write all that testing code right now, but you'll sure be glad in a month when you can automatically re-run them. Oh, and by the way, you can't go home today until you fix these three production issues."

The problem is this demotes unit testing to yet another "invest now; reward later" methodology. This is a crowded field, so it's easy to ignore a new-comer like "unit testing". Obviously, most devs live in the here and now, and they'll just trying to survive today, so they care much more about "invest now, reward now".

The "trick" with unit testing - at least with basic unit testing to at least get your foot in the door - is that it adds immediate value today. Even if you can't automate those tests tomorrow, it can often still help get the current code done faster and better. How is this possible?

  • Faster to develop - Unit testing is faster to developer because it stubs out the context. Say you have some static method buried deep within your web application. If it takes you 5 minutes to set up the data, recompile the host app, navigate to the page, and do whatever action triggers your method being called - that's a huge lag time. If you could write a unit test that directly calls that method, such that you can bypass all that rigmarole and run the static method in 5 seconds - and now you need to test 10 different boundary cases - you've just saved yourself a good chunk of time.
  • Think through your own code - Unit testing forces you to dog food your own code (especially for class-library APIs). It also force you to think out boundary conditions - per the previous question, if it takes several minutes to test one usage of a function, and that function has many different boundary cases, a time-pressed developer simply won't test all the cases.
  • Better design - Testable code encourages a more modular design that is more flexible to change, and easier to debug. Think of it like this: in order to write the unit test, you've got to be able to call the code from a context-free, class library; i.e. if a unit test can call it, then so could a windows service, web service, console app, windows app, or anything else. Every external dependency (i.e. the things that usually break in production due to bad configuration) has been accounted for.

Even if you could never re-run those unit tests after the code was written, they are still a good ROI. The fact that you can automatically re-run them, and get all the additional benefits, is what makes unit testing such a winner for most application development.

RELATED: Is unit testing a second class citizen?, How many unit tests are sufficient?, Backwards: "I wanted to do Unit Tests, but my manager wouldn't let me"

 

Sunday, July 12, 2009

The address's State field may contain more than just the 50 states

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/the_addresss_state_field_may_contain_more_than_just_the_50_.htm]

Most business applications eventually ask the user to enter an address. There's the user's address, shipping address, their company's address, maybe an emergency contact's address, address history, travel-related addresses, financial addresses, etc... Most addresses have a city, state, and zip. While city and zip seem simple (more on that later), many devs initially expect the "State" field to be simple - perhaps a two-character column that can store the 50 US states. However, it can quickly balloon to something much more complicated (especially if you're troubleshooting some legacy app). Besides the standard states, it could contain:

Military codes (reference)

Armed Forces AfricaAE
Armed Forces Americas (except Canada)AA
Armed Forces CanadaAE
Armed Forces EuropeAE
Armed Forces Middle EastAE
Armed Forces PacificAP

US Possessions (reference)

AMERICAN SAMOAAS
DISTRICT OF COLUMBIADC
FEDERATED STATES OF MICRONESIAFM
GUAMGU
MARSHALL ISLANDSMH
NORTHERN MARIANA ISLANDSMP
PALAUPW
PUERTO RICOPR
VIRGIN ISLANDSVI

Perhaps Canadian provinces? (reference)

AlbertaAB 
British ColumbiaBC 
ManitobaMB 
New BrunswickNB 
Newfoundland and LabradorNL 
Northwest TerritoriesNT 
Nova Scotia NS
NunavutNU 
OntarioON 
Prince Edward IslandPE 
QuebecQC 
SaskatchewanSK 
Yukon YT

Generic codes to indicate international use?

Foreign CountryFC
Out of CountryOC
Not ApplicableNA

Or, specific applications may try their own proprietary international mapping, like "RS" = Russia. This might work if you're only doing business with a handful of countries, but it doesn't scale well to the 200+ (?) existing other countries (i.e. I would not recommend this. Use a "Country" field instead is feasible).

Special codes to indicate an unknown, or empty state?

XX
  
*
..

Perhaps, for some reason, the application developer isn't storing just 2-char codes, but rather integer ids that map to another "States" table, so you see numbers like "32" instead of "NY" ("New York")?

Or, even worse, they're shoving non-state related information into the state column as a hack that "made something else easier".

How many distinct entries could you have?

  • With 26 letters, you've only got 26 ^ 2 = 676 options.
  • If you use numbers too, you've got (26 + 10) ^ 2 = 1296 options.
  • If you start using lower case letters (SQL is case-insensitive, but maybe this impacts managed code), then you've got (2 * 26 + 10) ^ 2 = 3844 options.
  • Add in some special characters (such as spaces, periods, asterisks, hyphen, underscores, etc...), maybe 10 of then (if the column isn't validated on strictly alpha-numeric), and you've got (2 * 26 + 10 + 10) ^2 = 5184 options.

That's potentially 100 time more than just the 50 US states. Of course, for new development, we'd all prefer some clearly-defined schema with referential integrity and a business-sensible range of values. However the real world of enterprise applications is messy, and you have to be prepared to see messy things.

It sounds simple, but that innocent "state" field can quickly get very complex.

Thursday, July 2, 2009

Why a manager may not want you to learn

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/why_a_manager_may_not_want_you_to_learn.htm]

I'm a huge advocate of learning. And it's natural for devs to want to pick up new stuff. However, many devs don't realize that they may report to a manager that actually wants to prevent them from learning new things - even on their own personal time. I think this type of manager is rare. However, it's good to be aware in case a manager is (perhaps unintentionally) "sabotaging" your learning.

I hesitated about writing this post lest it seem to cynical or jaded, but it's worth discussing as developers should be aware of such things. Note that there is not one specific person/event/incident that I have in mind, but rather glimpses of things over the last 10 years.

  • Their control - They may want to be in control, and you learning new things that they don't know takes away from their control.
    • They may want to understand the entire architecture themselves. It's sort of a "not built here" applied to a personal level - "If I don't already know it, it must not be necessary."
    • They may not want to learn the new stuff themselves. If you're a tech manager, and all your devs learn the next wave of technologies, it pressures you to learn the new wave as well, else you look obsolete.
    • They don't want you exposing their mistakes. Say a senior developer wrote a bad messaging framework. As long as no other employee has a clue about messaging, no one knows that they made a bad mistake.
    • They want you to "suffer" just like they did. Often new techs make it easier to do something, and rather than have the easy way out, you should do it the original way so you "understand what's really going on". Think using assembly language or C++ instead of a higher-level language like C#.
  • It doesn't support the immediate work
    • They may think it's a waste of time - "We've already invested in this architecture, we don't need anything else." Even though it's your own time, they'd rather you spend overtime on "useful features", like copying and pasting tedious code.
    • It competes with your day job - If you're researching some cool XNA technology, which is a lot more fun than the drudgery of some bad architecture, it may compete. Suppose you work at home, it might "distract" you.
    • It may be misapplied. New stuff is risky, and could be buggy or applied incorrectly - which would hurt the project.
    • Their afraid that "smart" developers are hard to manage. Smart developers can sometimes be total egomaniacs to work with (because they think they're so smart), and management may not want to even think about dealing with that.
  • You may leave
    • You may outgrow your company and leave-  If your company is stuck in the dark ages, they may want to keep everyone's technical skills "in the dark" as well, lest an employee "see the light" and leave.
    • It makes your more marketable, and you may leave. If you're stuck with some niche technology on an obsolete framework, you aren't very marketable and hence can't get another job, and hence your boss has tremendous control over you.

Examples of how a manager might unintentionally discourage a developer from learning:

  • Financially reject anything (like buying new books or tools or paying for a class)
  • Undermine your confidence ("Why would you need that") or question your motives.
  • Deny you resources, such as preventing you from installing anything on your machine (open source code, tools, etc...)
  • Never affirm new learning or innovation. They tell you "good job" for getting that feature done, but won't affirm picking up new technologies.
  • Never provide their software engineers with a continuing-education plan. Ask yourself, how do developers go "to the next level" in your team? Does management help them?

It's sad, but some companies are structured where it's not in the manager's best interests for the employees to "wise up". The managers want hard-working, honest people who are easy to manage, but they don't want to deal with innovation or smart developers.

LINK: Does your Project encourage learning

Sunday, June 28, 2009

23 features of an enterprise data access layer

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/23_features_of_an_enterprise_data_access_layer_1.htm]

Most line of business applications will die unless they have a strong data access strategy. Enterprise apps simply cannot afford to hard-code thousands of in-line SQL calls to an aspx code behind; the maintenance and lack of reuse and testability will kill you. I realize entire books are written on data-access strategy (Fowler, Dino/Andrea), and by much smarter men than I, so I only offer this blog post as a summary and braindump. I'm sure I've inevitably missed several important aspects. I also realize that developers take their Data Access Layers (DAL) very seriously and personally, and may consider some features more or less important than others.

Must-have features - This will get you started.

  1. CRUD - Give you at least the basic CRUD (Create, Read, Update, Delete) functionality
  2. Sorted paging and filtering - Provide a simple way to handle sorted-paging and filtering
  3. Automatically generated - For the love of all that is good, do NOT write tons of manual data-access plumbing code by hand. Either code generate it, or use a dynamic ORM (like NHibernate)
  4. Serializable objects - Domain objects should be serializable so you can persist them across the wire (such as store them in a cache). Sometimes this is solved as easily as slapping on attributes to your objects.
  5. Handles concurrency - Even a where-clause check that simply compares a version (or datetime) stamp.
  6. Transactions - Support transactions across multiple tables, such as either using the SQL transaction keyword, or the ADO.Net transaction (or something else?)

Good-to-have features - When you start scaling up, you're really going to want these.

  1. FK and unique-index lookups - Provide those extra automatically generated FK and unique-index lookups on your tables.
  2. Meta-data driven - Perhaps you define your entity model in xml files, and your process generates the rest from that (tables, SP, DAL, entity C# classes, etc...)
  3. Mocked / Isolation-Framework-friendly - It could provide support for a mock database, or at least create interfaces for all the appropriate classes so you program against the interface instead of concrete classes.
  4. Batching - (Includes transaction management). If you don't have the ability to batch two DAL calls together, because remote calls are relatively expensive, you'll inevitably start squishing unrelated calls into single spaghetti-blobs for performance reasons.
  5. Insert an entire grid at once - This could be done via batching, or perhaps SQL 2008's new table value parameter.
  6. Handle database validation errors - Ability to capture database validation errors and return them to the business tier. For example, checking that a code must be unique. (See: Why put logic in SP?)
  7. Caching - for performance reasons, you'll eventually want to cache certain types of data. Ideally your DAL reads some cache-object config file and abstracts this all away, so you don't litter your codeBehinds with hard-coded cache calls. [LINK: thoughts on caching]
  8. Multiple types of databases - Access multiple different types of databases, such as main, historical, reporting, etc...
  9. Scales out to multiple, partitioned, databases - For example, your main application data store may be partitioned by user SSN, and hence you can spread out the load across multiple databases instead of having one, giant bottleneck.
  10. Integrate with a validation framework - Perhaps by applying attributes to the entity objects (like what Enterprise Library Validation Block does), you may want your generator to be able to read both database schema info and external override values from an xml file. For example, say you have an Employee object with a "FirstName" property which maps to the EmpInfo table's FirstName column, the generator could pull the varchar length and required attributes from the database column, and then possibly pull a required expression pattern from the override xml file.
  11. Audit trail for changes made - The business sponsors are going to want to see change history of certain fields, especially security and financial related ones.
  12. Create UI admin pages - Provide the ability to create the admin UI pages for easy maintenance of each table. Even if you don't actually deploy these, they're a great developer aid.

Wow - These are more advanced

  1. Partial update of an object - say you have a reusable Employee object with 30 fields, but you only need half those fields in some specific context, it can be beneficial to have a DAL that can be "smart" and updates only the fields that you used in a given context. Perhaps you could add a csv list to the base domain object (that Employee inherits from), and every time a property in Employee is set, it adds the field to that CSV list. Then, it passes that CSV list to the data access strategy, which only updates the fields in that list.
  2. Provide a data dictionary so it integrates into other processes. Building off the meta-data approach (where you can automatically generate lots of extra plumbing to assist with integration and abstraction layers), you can start doing some really fancy things:
    1. See every instance in the UI where a DB field was ultimately used
    2. Provide clients a managed abstraction layer that lets them write their own reports given the UI views - not the backend tables.
    3. Provide clients a managed abstraction layer that lets them do mass updates of their own data (this is a validation and security nightmare).
  3. N-level undo - I've never personally implemented or needed this, but I hear CLSA.Net has it.
  4. Return deep object graphs - Having a domain model is great, but there's the classic OOP - relational data mismatch. ORM mappers explicitly help solve this. Without some sort of ORM mapper, most application inevitably "settle" (?) for a transaction script or table module/active record approach. A deep object graph also requires lazy loading.
  5. Database independence - Configure your database for easy switch from SQL Server to Oracle. You could do this at compile time by re-writing your code-generator templates. I've heard some architects insist that you should be able to do this at run time as well via a provider model, and updating some information in the config file (I've never personally done this).

Data access is a re-occurring problem, so the community has evolved a lot of different solutions. Consider some of these:

  • ORM mappers
  • CodeSmith-related (generates code at compile time)
    • CLSA.Net - Rockford Lhotka's super enterprise framework, which has CodeSmith support.
    • .NetTiers - A set of CodeSmith templates to create the entire DAL and UI admin pages.
    • Your own custom-built thing (via CodeSmith)
  • Microsoft solutions
  • I've heard about, but never personally used these:
  • In-line SQL from your Aspx codebehind - ha ha, just kidding. Don't even think about it. Seriously... don't.

Thursday, June 18, 2009

BOOK: Microsoft .NET: Architecting Applications for the Enterprise

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/book_microsoft_net_architecting_applications_for_the_ente.htm]

As the .Net platform matures (almost version 4.0!), I'm seeing more and more good .Net architecture books coming out. One such book is Microsoft .NET: Architecting Applications for the Enterprise, by Dino Esposito and Andrea Saltarello.

 

The first section focused heavily on architectural principles. The book was worth getting just for Chapter 3 alone (Design Principles and Patterns), which provided a survey of the various concepts required for high-level architecture, such as OOP, Design Patterns, Structured Design, Separation of Concerns, Dependency Injection, Testability, Security, and AOP.

 

I also liked their chapter on DataAccess. They made a well-reasoned plug for NHiberante and the maintenance benefits of auto-generated dynamic SQL for the data access layer. I admit that I personally have "grown up" with a bias for code-generated stored procedures, but I can see the changing winds.

 

Their book is very focused on the standard N-tier layers: DataAccess, BusinessFacade, Service, and Presentation. Here's the table of contents:

  • Chapter 1: Architects and Architecture Today

  • Chapter 2: UML Essentials

  • Chapter 3: Design Principles and Patterns

  • Chapter 4: The Business Layer

  • Chapter 5: The Service Layer

  • Chapter 6: The Data Access Layer

  • Chapter 7: The Presentation Layer

  • Final Thoughts

  • Appendix: The Northwind Starter Kit

The book didn't discuss much on messaging, caching, validation, logging, system integrations, configuration, or other architectural components. However, most applications make or break on the data access strategy, so I can see the focus there. And, you could have an encyclopedia if you wanted to cover every aspect of enterprise architecture.

 

I found it interesting comparing the book to Fowler's landmark Patterns of Enterprise Application Architecture. Indeed, Dino and Andrea continually refer back to patterns in Fowler. The Dino/Andrea book almost seems intended as a sequel to Fowler's - it adds value by specializing in .Net, having the benefit of almost 6 years of hindsight, and providing constant web references and practical tools (many which didn't exist when Fowler wrote his book). Overall, it's a good read for any .Net Architect or aspiring developer. It's an especially good read for those who grew up as architects in a single company, and therefore may only have exposure to one way of doing architecture.

Tuesday, June 16, 2009

Why share knowledge with others?

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/why_share_knowledge_with_others.htm]

I'm a big advocate of knowledge sharing. However, I understand why some developers may be hesitant to do so. We live in an era of unprecedented competition. People (and therefore Companies) compete against one another for employment opportunities, promotions, recognition, mindshare, and plain-old-power. In today's cut-throat world, it almost seems counter-intuitive to "undermine" yourself by sharing your knowledge (read: "competitive advantage") with others (read: "competition").

While obviously you shouldn't share trade secrets and proprietary knowledge with the competition, I still think there are a lot of good reasons to share general knowledge with the community, and especially your coworkers.

  • It teaches you
    • Explaining something to others forces you to better understand it yourself.
    • It is self correcting - By sharing your knowledge, the other dev may be able to help you improve it by offering tweaks or suggesting gaps that you need to fill.
    • It provides a chance to practice your communication. To communicate, you need a "receiver", someone who wants to catch (i.e. listen to) whatever message you're "throwing". This means that the message needs to be relevant. What's more relevant than a message that is solving the problem that the "receiver" actively wants solved?
  • Social benefits
    • To have a friend, you need to be a friend. If you never share your stuff (knowledge, tricks, tools), people may be hesitant to share their stuff with you.
    • Some people just enjoy helping others.
  • It's part of your job
    • It frees up co-workers to do something useful (which could in turn benefit you, and more importantly, the company who is paying you), as opposed to them re-inventing what you've already done.
    • It's your job as a good developer - Good developers share knowledge. Also, when a company is paying you to code, it's no longer "your code", but the company's code, and therefore the company has a right that tricks and knowledge be shared amongst the team.
  • Demonstrate leadership
    • Knowledge-sharing increases your credibility. By sharing objective things that other devs can verify to be correct, these devs are more likely to trust you on subjective opinions or predictions that are much harder to verify.
    • Become a thought-leader - Often consulting firms encourage their star consultants to demonstrate thought leadership by blogging, writing articles, or contributing to open source projects. Sometimes this is great marketing for that company, and even leads to sales. For example, thousands (millions?) of people use CruiseControl from ThoughtWorks, which in turn gives ThoughtWorks name recognition and marketing.
    • It is necessary in order to be promoted. Leaders communicate, and usually the higher up you go, the wider the audience you need to share knowledge with. If you never practice knowledge sharing until you absolutely have to (via a job demand), then you will probably not be very good at it.
    • It helps make your approach the official standard (which may be good or bad). If you hide your tricks and code, only you will use them. If someone publicizes and shares their code - even if it's worse than yours - their code will be used by more people and hence adopted as the "official" team approach.

There are many ways to share your knowledge, such as: