Most line of business applications will die unless they have a strong data access strategy. Enterprise apps simply cannot afford to hard-code thousands of in-line SQL calls to an aspx code behind; the maintenance and lack of reuse and testability will kill you. I realize entire books are written on data-access strategy (Fowler, Dino/Andrea), and by much smarter men than I, so I only offer this blog post as a summary and braindump. I'm sure I've inevitably missed several important aspects. I also realize that developers take their Data Access Layers (DAL) very seriously and personally, and may consider some features more or less important than others.
Must-have features - This will get you started.
- CRUD - Give you at least the basic CRUD (Create, Read, Update, Delete) functionality
- Sorted paging and filtering - Provide a simple way to handle sorted-paging and filtering
- Automatically generated - For the love of all that is good, do NOT write tons of manual data-access plumbing code by hand. Either code generate it, or use a dynamic ORM (like NHibernate)
- Serializable objects - Domain objects should be serializable so you can persist them across the wire (such as store them in a cache). Sometimes this is solved as easily as slapping on attributes to your objects.
- Handles concurrency - Even a where-clause check that simply compares a version (or datetime) stamp.
- Transactions - Support transactions across multiple tables, such as either using the SQL transaction keyword, or the ADO.Net transaction (or something else?)
Good-to-have features - When you start scaling up, you're really going to want these.
- FK and unique-index lookups - Provide those extra automatically generated FK and unique-index lookups on your tables.
- Meta-data driven - Perhaps you define your entity model in xml files, and your process generates the rest from that (tables, SP, DAL, entity C# classes, etc...)
- Mocked / Isolation-Framework-friendly - It could provide support for a mock database, or at least create interfaces for all the appropriate classes so you program against the interface instead of concrete classes.
- Batching - (Includes transaction management). If you don't have the ability to batch two DAL calls together, because remote calls are relatively expensive, you'll inevitably start squishing unrelated calls into single spaghetti-blobs for performance reasons.
- Insert an entire grid at once - This could be done via batching, or perhaps SQL 2008's new table value parameter.
- Handle database validation errors - Ability to capture database validation errors and return them to the business tier. For example, checking that a code must be unique. (See: Why put logic in SP?)
- Caching - for performance reasons, you'll eventually want to cache certain types of data. Ideally your DAL reads some cache-object config file and abstracts this all away, so you don't litter your codeBehinds with hard-coded cache calls. [LINK: thoughts on caching]
- Multiple types of databases - Access multiple different types of databases, such as main, historical, reporting, etc...
- Scales out to multiple, partitioned, databases - For example, your main application data store may be partitioned by user SSN, and hence you can spread out the load across multiple databases instead of having one, giant bottleneck.
- Integrate with a validation framework - Perhaps by applying attributes to the entity objects (like what Enterprise Library Validation Block does), you may want your generator to be able to read both database schema info and external override values from an xml file. For example, say you have an Employee object with a "FirstName" property which maps to the EmpInfo table's FirstName column, the generator could pull the varchar length and required attributes from the database column, and then possibly pull a required expression pattern from the override xml file.
- Audit trail for changes made - The business sponsors are going to want to see change history of certain fields, especially security and financial related ones.
- Create UI admin pages - Provide the ability to create the admin UI pages for easy maintenance of each table. Even if you don't actually deploy these, they're a great developer aid.
Wow - These are more advanced
- Partial update of an object - say you have a reusable Employee object with 30 fields, but you only need half those fields in some specific context, it can be beneficial to have a DAL that can be "smart" and updates only the fields that you used in a given context. Perhaps you could add a csv list to the base domain object (that Employee inherits from), and every time a property in Employee is set, it adds the field to that CSV list. Then, it passes that CSV list to the data access strategy, which only updates the fields in that list.
- Provide a data dictionary so it integrates into other processes. Building off the meta-data approach (where you can automatically generate lots of extra plumbing to assist with integration and abstraction layers), you can start doing some really fancy things:
- See every instance in the UI where a DB field was ultimately used
- Provide clients a managed abstraction layer that lets them write their own reports given the UI views - not the backend tables.
- Provide clients a managed abstraction layer that lets them do mass updates of their own data (this is a validation and security nightmare).
- N-level undo - I've never personally implemented or needed this, but I hear CLSA.Net has it.
- Return deep object graphs - Having a domain model is great, but there's the classic OOP - relational data mismatch. ORM mappers explicitly help solve this. Without some sort of ORM mapper, most application inevitably "settle" (?) for a transaction script or table module/active record approach. A deep object graph also requires lazy loading.
- Database independence - Configure your database for easy switch from SQL Server to Oracle. You could do this at compile time by re-writing your code-generator templates. I've heard some architects insist that you should be able to do this at run time as well via a provider model, and updating some information in the config file (I've never personally done this).
Data access is a re-occurring problem, so the community has evolved a lot of different solutions. Consider some of these:
- ORM mappers
- NHibernate - A popular ORM mapper (and check out Hudson Akridge's blog for NHibernate expertise)
- CodeSmith-related (generates code at compile time)
- Microsoft solutions
- Strongly-typed .Net DataSets - This is what .Net first had.
- Linq-to-Sql - Great for RAD apps, but I'm personally not sure how well it scales. Be sure to run SQL-Profiler on its generated sql.
- Microsoft Entity Framework - Microsoft's offering for a domain-model approach (as opposed to their out-of-the-box typed datasets).
- I've heard about, but never personally used these:
- Castle ActiveRecord
- (Probably several others too...)
- In-line SQL from your Aspx codebehind - ha ha, just kidding. Don't even think about it. Seriously... don't.