Friday, November 27, 2009

BOOK: 97 things every software architect should know

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/book_97_things_every_software_architect_should_know.htm]

A few months back I finished the book 97 Things Every Software Architect Should Know. I've been slow on blogging, so I hadn't gotten around to my standard follow-up post for books I read.

The books consists of 97 short essays, by various accomplished architects, each with a quick insight. It was a casual and fun read, the kind of book that's easy to sneak in a few pages between changing the kids diapers and fixing the house. It's got a lot of good points. I especially recall an essay about how "the database is your fortress" - GUI and Middle Tier apps change, but you will always have your database.

For hard-core architecture, I thought that Microsoft .NET: Architecting Applications for the Enterprise and Patterns of Enterprise Application Architecture were much more systematic and thorough. But overall, it was still a fun read.

Thursday, October 8, 2009

Can you still be technical if you don't code?

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/can_you_still_be_technical_if_you_dont_code.htm]

Can you still be technical if you don't code? A lot of developers have a passion for the technology, do a great job in their current role of implementing solutions (which requires coding), and then get "promoted" into some "big picture" role that no longer does implementation - ironically the thing they did so well at. These higher-level roles often do lots of non-trivial things, but not actual coding. For example:

  • Infrastructure (Servers, SAN space, database access, network access)
  • Design decisions
  • Dealing with legacy code
  • Handling outsourcing, insourcing, consultants
  • Build-vs-buy
  • Vendor evaluations/score card; integrate the vendor's product into your own
  • Coordinate large-scale integration of many apps from different environments
  • Coordination among multiple product life cycles
  • Writing guidance docs
  • Code reviews
  • Occasional prototypes
  • Configuration

On one hand, these types of tasks require technical knowledge in that you wouldn't expect a non-technical person to perform them. On the other hand, they don't seem in the same category as hands-on coding.

What do you think - can you be technical (or remain technical) without actually writing code?

Monday, October 5, 2009

Reasons to NOT put version history in the comments

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/reasons_to_not_put_version_history_in_the_comments.htm]

Some coding standards ask that developers add revision history to the top of the method. Not just the normal "Summary" and "Parameter" tags that can be used to automatically create documentation, but rather a full blown revision log with developer name, date, and comment. This was bigger in the 80's and 90's, when source control and refactoring weren't as common. However, in these days, putting revision history in your comments has some really big problems:
  • Extra effort - It requires extra effort from developers, and usually the tedious kind of effort. It requires manual developer discipline, so architects don't have a good way to enforce this.
  • Bad for refactoring - It discourages refactoring of methods. Say you split a method in two - how do you split up the commented version history?
  • Source control already provides this - It doesn't tell you anything that source control history won't already give you. But even worse, it could be misleading - source control is the true authority, a developer could accidentally type (or forget) the wrong comments. Also, by documenting at the top of the method, it is hard to indicate what changed in the middle (whereas source control diff would instantly tell you).

Perhaps for SQL, I can see the benefit, so that when the DBA runs sp_helptext on a stored proc, they get a quick history (and databases aren't usually refactored like C# code). However, for middle tier code, putting revision history in comments seems like an unwise use of time.

LINK: Comments as version control
 

Thursday, October 1, 2009

What makes something Enterprise?

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/what_makes_something_enterprise.htm]

There's a world of difference between a prototype hammered out over a weekend, and an enterprise app ready for the harsh world of production. Here's a somewhat random brainstorm. In general (there's always an exception), Enterprise apps:

  • Are scalable - they handle large loads and can be called many times.
  • Have a retry strategy - for example, it tries pinging the external service three times before "failing".
  • Have a failover strategy, like an active-passive machine cluster for maximum uptime, and a disaster recovery site.
  • Send notifications.
  • Handles invalid data (like states, zip codes, and numbers).
  • Can integrate with other systems (perhaps providing web service wrappers, or command line APIs, or publicly accessible data repositories that other apps can modify) .
  • Are deployable - "it works on my machine" absolutely does not cut it.
  • Have Logging - this is especially useful for debugging in production, or measuring how many errors (and which types) are thrown.
  • Have long-running process (hours, days, or even weeks) - not just a single thread in memory.
  • Have async processes, which usually means concurrency and threading problems.
  • Support multiple instances of the app running. You can open two copies of word, or run two MSBuild scripts at the same time.
  • Handle product versioning.
  • Care about the hardware it's running on (enough CPU and memory).
  • Have a pluggable architecture - You may need to switch data providers (Oracle/SQL/Xml).
  • Have external data sources (web services, ftp file dumps, external databases).
  • Can scale out, such as adding more web servers, or splitting the database across multiple servers.
  • Have security constraints (both hacking, and functional).
  • Have process that are documented (not just for training, but also for legal auditing and compliance issues).

Much of this code isn't the fun, "glamorous" stuff. However, it's this kind of robustness that separates the "toys" from the enterprise workhorses.

See also: Enterprise Data Access, Enterprise Caching

Tuesday, September 29, 2009

A quick overview of enterprise object caching

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/a_quick_overview_of_enterprise_object_caching.htm]

Caching is a performance mechanism where you store an expensive-to-create value for future consumption. For example, you may cache dropdown values to spare yourself expensive database hits. Caching is one of those buzzwords that everyone knows your application should have, but surprisingly few really do.

Books have been written on caching, so this is just a quick brain dump based on my personal experiences. Also note that I refer to "the database" a lot because it's the main data dependency that most developers can relate to, but really, it could be anything (web service, external file, etc...).

Frontend and Backend Caching

 Frontend UIBackend
Where is it located?Local (in process)Remote (external machines)
ProFaster because it's local - no remote hitHandles updating stale data in distributed systems

Handles any CLR serializable object, independent of the UI layer.

ConDoes not handle updating values - data may be stale

Limited to just HttpContext.

Slower because it's remote - you still need to pay for the remote hit.
ExampleAsp.Net HttpContext.CacheMemcache, Velocity, others...

Obvious follow-up question: "Would you double cache something, taking data from the backend cache and saving it to the fronend cache?" Sure, if it benefited your specific scenario. Ideally the backend cache is totally encapsulated anyway, so your frontend UI developer wouldn't even know if the data they're working with came from a cache or not.

What is a good candidate for caching?

Any data that:

  • Takes a lot of time to create, either due to a remote hit (to a database or web service), or a large calculation time (like querying a million rows).
  • Has a small final result - you query a million rows just to return a single value.

  • Does not change - the problem with caching is stale data.

  • Has minimal dependencies. If your object touches 10 tables for creation, then there's a much greater chance of it becoming stale. This is one benefit of loosely coupled (and then batched) objects, instead of spaghetti code.

  • Has many reads, but very few writes. For example, system-level data (that everyone constantly requests) is good for caching, but employee-level data may not be.

  • Is retrieved externally and requires high uptime - for example you make a web service call to get some value, you call the service again 5 minutes later, the service is down, and you really wish you had even a "stale" copy of that data.

The canonical example would be something like city-state dropdowns. Say it's initially a remote database hit to some "City/State" tables, it returns a small amount of data, it changes infrequently (The US has had 50 states for the last half-century), so it's read many times but not prone to stale data.

What is a bad candidate for caching? Pretty much, the opposite of the good criteria.

Pitfalls with caching

Merely creating the dictionary isn't the problem. The problem will be integrating it (seamlessly) into your data persistence layer, and then ensuring the cache doesn't become stale - especially across a distributed environment - and then making sure it actually improves your performance instead of degrading it.

Stale data is the bane of caching. There are at least a few ways to deal with stale data:

  • Apply a time-out policy such that all data expires after N minutes, but that won't be acceptable for most scenarios.
  • If your app is changing the data (such as updating a details page), then have the DAL method (that calls the update SQL) also ping the cache to clear the stale object. This requires that your application somehow keeps track of which objects depend on which piece of data. It works great with a domain model, which requires more design upfront, but can have huge payoffs.
  • If someone else is directly changing the database, such as a DBA running ad-hoc SQL in production, then consider providing some admin console that lets them clear segments of the cache. For example, if the DBA did a mass-update of all salaries, then have the admin page allow you to flush all salary-related objects out of the cache. This requires some infrastructure for tagging each cached object (perhaps a master config file), and discipline from the DBA to check that admin page.
  • Beware of "database cache dependencies" (ASP.Net 2.0?), that claim to let you apply triggers to the database table, and then automatically clear the appropriate cache items when a specific row/column is updated. I've personally never gotten this to successfully work, have heard lots of horror stories (especially when deploying it across a DMZ), and it forces the domain design into the database instead of the middle tier. Although I'm all ears to anyone who's had a success story here.

Some other things to keep in mind:

  • Where should my cache tie in? Ideally, you'd want the backend cache abstracted via your dataAccess layer. This becomes very feasible with CodeGeneration. Whether data is pulled from cache or the live database is just a tuning option. Just like you don't want to put in-line SQL throughout your codebehind pages, you probably don't want to tightly-couple all your UI code to your cache provider. The frontend cache can be referenced in your UI, but again, be aware of too much plumbing code that "designs you into a corner".

  • Only temporary - A cache is not a persistent data store, it is not merely a "mirror" to scale out your database, as you must account for the cache being cleared. A data access method should always have a means to recreate the cached data.

  • Why use a remote cache? If both the cache and database servers both have remote hits, what's the difference? In its simplest form, this helps scale out the database (which is usually a bottleneck). Every hit on the cache is one less hit on the already-overloaded database. Even after the remote hit, the cache can have a much faster lookup time because while the cache stores a created object, the database may still need to query 1 million rows to collect the data.
  • Control Panel - You'll want to provide an easy way to flush the entire cache, or even segments of the cache, without restarting any servers. It's also great moral support to have a statistics page showing how many thousand (million) database calls have been spared.

  • Configuration - You'll want to provide a way to configure almost everything: the cache durations, what category of duration (short, medium, long), which objects get cached, and perhaps even turn off the entire cache for emergency troubleshooting. Caching is something that you want to tune based on actual production results. Ideally control of what gets cached is all abstracted to a few easy-to-manage config files. You do not want these config values hard-coded throughout your app.

  • Beware of over-caching - If done the wrong way, caching can actually screw your performance. Say you cache something that is continually obsolete, so instead of just doing the normal database hit, you continually also need to do the extra cache query.

Good things to read

There's an endless list of info on caching. Here are some that could be useful.

Yes, there's ton more that can be said about caching. Again, this is just a quick brain dump.

Tuesday, September 22, 2009

ConnectionTimeout vs. CommandTimeout

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/connectiontimeout_vs_commandtimeout.htm]

SQL timeouts can be very annoying, especially for internal development tools where performance isn't critical.

A lot of developers will try fixing this by modifying the connection string by appending "Connect Timeout=300". Normally this is easy because the connection string is stored in some config file.

However, it still usually fails. This is because there's a big difference between SqlConnection.ConnectionTimeout and SqlCommand.CommandTimeout.

If you're running a command, like a snippet of SQL or a stored proc, then your code needs to set the CommandTimeout. Something like so:

SqlConnection con = new SqlConnection(strDbCon);
SqlCommand cmd = con.CreateCommand();
cmd.CommandType = CommandType.Text;
cmd.CommandText = strText;
cmd.CommandTimeout = 10000; //no relation to con.ConnectionTimeout

Obviously, that's very minimalist code, but that's the general idea. For a robust data access layer, you'd make the timeout configurable.

Sunday, September 20, 2009

I don't have time to put on the parachute

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/i_dont_have_time_to_put_on_the_parachute.htm]

If you're jumping out of a plane at 5000 feet - you take the time to put on the parachute. Sure, you could save yourself a few seconds, and for the first 1000 feet having that parachute doesn't matter yet when you're still free-falling ("I saved schedule time by skipping the 'put-on-parachute' task!"). However, by the time you hit the ground, that parachute is life-saving.

Same thing with software projects and best practices. The project is like jumping out of a plane, and the best practices are like the parachute.  Sure, some "best practices" are just useless marketing buzzwords. However, others - like unit testing - are the real deal. And a developer or manager who rejects unit testing (for new .Net code in the middle tier) because they "don't have time" is like jumping out of the plane without putting on your parachute. You save a bit of time upfront, but when the project goes to production and "crashes" into reality, the maintenance costs and untested boundary cases will kill it.

"I don't have time" sounds much more noble and business-like than "I don't understand that idea", or "I just don't feel like doing something new." But within the (very common) context of new .Net class-library development - given that testing frameworks are free (NUnit, MSTest), and that any developer can at least write their tests locally - regardless of management support, and that in certain scenarios it's actually faster to develop code by writing unit tests (because it stubs out the context), the "I don't have time" reply to unit testing certain scenarios is misguided. Sure there are always exceptions, and reasons that good devs may not write unit tests. Testing databases, or legacy code, or the UI, or integration is a different beast. However, in general, testing public methods in a C# class library, without external data dependencies, should be as reasonable as putting a parachute on while jumping out of a plane.