timstall

Sunday, June 28, 2009

23 features of an enterprise data access layer

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/23_features_of_an_enterprise_data_access_layer_1.htm]

Most line of business applications will die unless they have a strong data access strategy. Enterprise apps simply cannot afford to hard-code thousands of in-line SQL calls to an aspx code behind; the maintenance and lack of reuse and testability will kill you. I realize entire books are written on data-access strategy (Fowler, Dino/Andrea), and by much smarter men than I, so I only offer this blog post as a summary and braindump. I'm sure I've inevitably missed several important aspects. I also realize that developers take their Data Access Layers (DAL) very seriously and personally, and may consider some features more or less important than others.

Must-have features - This will get you started.

CRUD - Give you at least the basic CRUD (Create, Read, Update, Delete) functionality
Sorted paging and filtering - Provide a simple way to handle sorted-paging and filtering
Automatically generated - For the love of all that is good, do NOT write tons of manual data-access plumbing code by hand. Either code generate it, or use a dynamic ORM (like NHibernate)
Serializable objects - Domain objects should be serializable so you can persist them across the wire (such as store them in a cache). Sometimes this is solved as easily as slapping on attributes to your objects.
Handles concurrency - Even a where-clause check that simply compares a version (or datetime) stamp.
Transactions - Support transactions across multiple tables, such as either using the SQL transaction keyword, or the ADO.Net transaction (or something else?)

Good-to-have features - When you start scaling up, you're really going to want these.

FK and unique-index lookups - Provide those extra automatically generated FK and unique-index lookups on your tables.
Meta-data driven - Perhaps you define your entity model in xml files, and your process generates the rest from that (tables, SP, DAL, entity C# classes, etc...)
Mocked / Isolation-Framework-friendly - It could provide support for a mock database, or at least create interfaces for all the appropriate classes so you program against the interface instead of concrete classes.
Batching - (Includes transaction management). If you don't have the ability to batch two DAL calls together, because remote calls are relatively expensive, you'll inevitably start squishing unrelated calls into single spaghetti-blobs for performance reasons.
Insert an entire grid at once - This could be done via batching, or perhaps SQL 2008's new table value parameter.
Handle database validation errors - Ability to capture database validation errors and return them to the business tier. For example, checking that a code must be unique. (See: Why put logic in SP?)
Caching - for performance reasons, you'll eventually want to cache certain types of data. Ideally your DAL reads some cache-object config file and abstracts this all away, so you don't litter your codeBehinds with hard-coded cache calls. [LINK: thoughts on caching]
Multiple types of databases - Access multiple different types of databases, such as main, historical, reporting, etc...
Scales out to multiple, partitioned, databases - For example, your main application data store may be partitioned by user SSN, and hence you can spread out the load across multiple databases instead of having one, giant bottleneck.
Integrate with a validation framework - Perhaps by applying attributes to the entity objects (like what Enterprise Library Validation Block does), you may want your generator to be able to read both database schema info and external override values from an xml file. For example, say you have an Employee object with a "FirstName" property which maps to the EmpInfo table's FirstName column, the generator could pull the varchar length and required attributes from the database column, and then possibly pull a required expression pattern from the override xml file.
Audit trail for changes made - The business sponsors are going to want to see change history of certain fields, especially security and financial related ones.
Create UI admin pages - Provide the ability to create the admin UI pages for easy maintenance of each table. Even if you don't actually deploy these, they're a great developer aid.

Wow - These are more advanced

Partial update of an object - say you have a reusable Employee object with 30 fields, but you only need half those fields in some specific context, it can be beneficial to have a DAL that can be "smart" and updates only the fields that you used in a given context. Perhaps you could add a csv list to the base domain object (that Employee inherits from), and every time a property in Employee is set, it adds the field to that CSV list. Then, it passes that CSV list to the data access strategy, which only updates the fields in that list.
Provide a data dictionary so it integrates into other processes. Building off the meta-data approach (where you can automatically generate lots of extra plumbing to assist with integration and abstraction layers), you can start doing some really fancy things:
1. See every instance in the UI where a DB field was ultimately used
2. Provide clients a managed abstraction layer that lets them write their own reports given the UI views - not the backend tables.
3. Provide clients a managed abstraction layer that lets them do mass updates of their own data (this is a validation and security nightmare).
N-level undo - I've never personally implemented or needed this, but I hear CLSA.Net has it.
Return deep object graphs - Having a domain model is great, but there's the classic OOP - relational data mismatch. ORM mappers explicitly help solve this. Without some sort of ORM mapper, most application inevitably "settle" (?) for a transaction script or table module/active record approach. A deep object graph also requires lazy loading.
Database independence - Configure your database for easy switch from SQL Server to Oracle. You could do this at compile time by re-writing your code-generator templates. I've heard some architects insist that you should be able to do this at run time as well via a provider model, and updating some information in the config file (I've never personally done this).

Data access is a re-occurring problem, so the community has evolved a lot of different solutions. Consider some of these:

ORM mappers
- NHibernate - A popular ORM mapper (and check out Hudson Akridge's blog for NHibernate expertise)
CodeSmith-related (generates code at compile time)
- CLSA.Net - Rockford Lhotka's super enterprise framework, which has CodeSmith support.
- .NetTiers - A set of CodeSmith templates to create the entire DAL and UI admin pages.
- Your own custom-built thing (via CodeSmith)
Microsoft solutions
- Strongly-typed .Net DataSets - This is what .Net first had.
- Linq-to-Sql - Great for RAD apps, but I'm personally not sure how well it scales. Be sure to run SQL-Profiler on its generated sql.
- Microsoft Entity Framework - Microsoft's offering for a domain-model approach (as opposed to their out-of-the-box typed datasets).
I've heard about, but never personally used these:
- Castle ActiveRecord
- (Probably several others too...)
In-line SQL from your Aspx codebehind - ha ha, just kidding. Don't even think about it. Seriously... don't.

Thursday, June 18, 2009

BOOK: Microsoft .NET: Architecting Applications for the Enterprise

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/book_microsoft_net_architecting_applications_for_the_ente.htm]

As the .Net platform matures (almost version 4.0!), I'm seeing more and more good .Net architecture books coming out. One such book is Microsoft .NET: Architecting Applications for the Enterprise, by Dino Esposito and Andrea Saltarello.

The first section focused heavily on architectural principles. The book was worth getting just for Chapter 3 alone (Design Principles and Patterns), which provided a survey of the various concepts required for high-level architecture, such as OOP, Design Patterns, Structured Design, Separation of Concerns, Dependency Injection, Testability, Security, and AOP.

I also liked their chapter on DataAccess. They made a well-reasoned plug for NHiberante and the maintenance benefits of auto-generated dynamic SQL for the data access layer. I admit that I personally have "grown up" with a bias for code-generated stored procedures, but I can see the changing winds.

Their book is very focused on the standard N-tier layers: DataAccess, BusinessFacade, Service, and Presentation. Here's the table of contents:

Chapter 1: Architects and Architecture Today
Chapter 2: UML Essentials
Chapter 3: Design Principles and Patterns
Chapter 4: The Business Layer
Chapter 5: The Service Layer
Chapter 6: The Data Access Layer
Chapter 7: The Presentation Layer
Final Thoughts
Appendix: The Northwind Starter Kit

The book didn't discuss much on messaging, caching, validation, logging, system integrations, configuration, or other architectural components. However, most applications make or break on the data access strategy, so I can see the focus there. And, you could have an encyclopedia if you wanted to cover every aspect of enterprise architecture.

I found it interesting comparing the book to Fowler's landmark Patterns of Enterprise Application Architecture. Indeed, Dino and Andrea continually refer back to patterns in Fowler. The Dino/Andrea book almost seems intended as a sequel to Fowler's - it adds value by specializing in .Net, having the benefit of almost 6 years of hindsight, and providing constant web references and practical tools (many which didn't exist when Fowler wrote his book). Overall, it's a good read for any .Net Architect or aspiring developer. It's an especially good read for those who grew up as architects in a single company, and therefore may only have exposure to one way of doing architecture.

Tuesday, June 16, 2009

Why share knowledge with others?

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/why_share_knowledge_with_others.htm]

I'm a big advocate of knowledge sharing. However, I understand why some developers may be hesitant to do so. We live in an era of unprecedented competition. People (and therefore Companies) compete against one another for employment opportunities, promotions, recognition, mindshare, and plain-old-power. In today's cut-throat world, it almost seems counter-intuitive to "undermine" yourself by sharing your knowledge (read: "competitive advantage") with others (read: "competition").

While obviously you shouldn't share trade secrets and proprietary knowledge with the competition, I still think there are a lot of good reasons to share general knowledge with the community, and especially your coworkers.

It teaches you
- Explaining something to others forces you to better understand it yourself.
- It is self correcting - By sharing your knowledge, the other dev may be able to help you improve it by offering tweaks or suggesting gaps that you need to fill.
- It provides a chance to practice your communication. To communicate, you need a "receiver", someone who wants to catch (i.e. listen to) whatever message you're "throwing". This means that the message needs to be relevant. What's more relevant than a message that is solving the problem that the "receiver" actively wants solved?
Social benefits
- To have a friend, you need to be a friend. If you never share your stuff (knowledge, tricks, tools), people may be hesitant to share their stuff with you.
- Some people just enjoy helping others.
It's part of your job
- It frees up co-workers to do something useful (which could in turn benefit you, and more importantly, the company who is paying you), as opposed to them re-inventing what you've already done.
- It's your job as a good developer - Good developers share knowledge. Also, when a company is paying you to code, it's no longer "your code", but the company's code, and therefore the company has a right that tricks and knowledge be shared amongst the team.
Demonstrate leadership
- Knowledge-sharing increases your credibility. By sharing objective things that other devs can verify to be correct, these devs are more likely to trust you on subjective opinions or predictions that are much harder to verify.
- Become a thought-leader - Often consulting firms encourage their star consultants to demonstrate thought leadership by blogging, writing articles, or contributing to open source projects. Sometimes this is great marketing for that company, and even leads to sales. For example, thousands (millions?) of people use CruiseControl from ThoughtWorks, which in turn gives ThoughtWorks name recognition and marketing.
- It is necessary in order to be promoted. Leaders communicate, and usually the higher up you go, the wider the audience you need to share knowledge with. If you never practice knowledge sharing until you absolutely have to (via a job demand), then you will probably not be very good at it.
- It helps make your approach the official standard (which may be good or bad). If you hide your tricks and code, only you will use them. If someone publicizes and shares their code - even if it's worse than yours - their code will be used by more people and hence adopted as the "official" team approach.

There are many ways to share your knowledge, such as:

Wikis
Email
Automated governance
Sample applications
Team meetings
1-on-1 meetings (like a Code Review)
Blogging

Sunday, June 14, 2009

The benefits of technical blogging

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/the_benefits_of_technical_blogging.htm]

By some estimates, there are over 50 million blogs. Granted, many of these are barely maintained, and they cover all topics (not just software engineering), but clearly there's something to this whole blogging thing. Keep in mind that the vast majority of these blogs are not profitable - so the authors are writing as hobbyists and not paid professionals. Millions of people - and thousands of software engineers - blog because there are a lot of benefits to it:

Record smaller thoughts to build up for a larger project like an article or even a book.
Instant gratification - Blogs let you publish your writing instantly, which is gratifying (as opposed to the lag time for an article or book)
Ability to share your thoughts with the world - and likewise get feedback on them.
Practice at writing, beyond just the quick email or IM context.
Provides an outlet for all the miscellaneous thoughts and code a developer comes across
Gets you to think deeper about ideas, knowing that you need to at least polish it to the level of a few paragraphs for public consumption. This helps you learn the topic yourself, as one of the best ways to learn is to teach the content to others.
Thought Leadership - some people (especially consultants where this is a resume credential) really like this. Good blog posts also build credibility.
Confidence booster - putting your thoughts out there to potentially millions of people requires courage, and repeatedly doing that (and growing from the feedback) builds confidence.
Write anything on any topic, regardless of what you're currently working on.
It helps others. For example, sometimes I post the most obscure syntax errors I find - and other people were troubled with the exact same issue, and actually find the post useful.
Blogging acts like a personal journal. I've gotten a kick out of seeing how some of my thoughts have evolved over the last 5 years.
Low barrier to entry - because blogs are (effectively) free, you can write small chunks, you can write anything you'd like, and you can automatically publish it, there's a very low barrier to get started.

Should you blog even if you don't get traffic? I think yes - if you enjoy it (such as for any of the reasons mentioned above). Traffic comes and goes, and the weirdest entries get tons of traffic while the other entries (that I expect to be popular) are almost ignored.

Related: Do you have time to blog?

Wednesday, June 10, 2009

Real life: Avoiding customization to build a Sandbox

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/real_life_avoiding_customization_to_build_a_sandbox.htm]

I have three kids - two sons and a daughter. Cute little buggers, so I wanted to make them a sandbox to play in. I figure doing something physical and outdoors, as opposed to watching TV, would be good for them. Plus, I'm all for anything that even remotely encourages them to do engineering. However, I have very, very, minimal wood-working skills. When it comes to woodworking, I am (at best) a hobbyist - by no means an expert. This means I was just trying to build a simple sandbox that works - no fancy wood cutting, things that take big vocabulary to describe, or expensive tools required. I made a crude box-like design, drove to the local home depot, and got help picking out the wood (12x1x6 inch cedar). The only cuts I made where ones that the home depot guy could do in the store - so no triangle, notched, or diagonal cuts. I hauled my precious wood back to my garage (read: not professional tool workshop), applied one coat of polyurthethane something (read: I hope that helps protect against weathering), and hammered the boards together. After digging the box into the ground and filling it with sand, it was good enough and I was done.

Why dump such a story on my technical blog? Because my hobbyist mentality towards a wood sandbox is essentially the same as many "programmers" hobbyist mentality towards the craft of software engineering. We both just want to get it done, make the end users happy, and maybe enjoy it along the way. If we miss out on an optimal technique, that's okay. Working with other people, it can be useful to understand that mindset.

And yes, the kids loved it.

Tuesday, June 9, 2009

An easy way to hack unvalidated web input

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/an_easy_way_to_hack_unvalidated_web_input.htm]

Many security bugs get overlooked because hackers use "special" tools that give them options that the original developers anticipate. For example, most web developers only test their application through a browser. While this is sometimes sufficient to check different querystring inputs, it gives one an enormous false sense of security. Consider two cases:

A button is disabled, therefore the developer assumes you can't click it (Example: A Save button is disabled because some data isn't valid)
Data is stored in a hidden field, therefore the developer assumes that you can't change it. (Example: A record's ID is stored in a hidden field)

To some people, this appears safe because IE doesn't let you click disabled buttons or change the value of hidden fields. However, IE is only used to collect the data from the page and aggregate it into a post that it sends to the webserver. What if the hacker is using a different tool besides IE that gives them more control on how to assemble and send this post (or an extension to IE that lets them modify fields)?

How to hack it

You could crack both the above cases using Visual Studio Team Test (or I bet Firebug, or the IE8 dev tools). Here's how:

Open up VS Team Test
Create a Test Project, and add a new web test. This opens up a recorder in IE.
Navigate to the page you want to crack, such as something that stores data in a hidden field.
Perform a normal save action, such as saving the page (which uses data from the hidden field)
Stop the recording. Play back the test just to make sure it works in the normal case. Note how for each request, VS provides you a new row and lets you see the exact request and response, as well as what visually appears in the Web Browser.
Now start the hacking. Go to the request where you saved the page (that collected data from the hidden field), and unbind that field and change its value to whatever you want. In this case, I simply had a field called "Hidden1", and I changed it's value to "abc". Note that just like IE, the webtest sends a post to the server. But unlike IE, you can easily change the contents of that post.
Rerun the webtest. The new, hacked, value is sent to the server.
Note that you can also script VS WebTests with C#. So, you could have a for-loop that re-submits a 1000 requests, varying the post values each time.

So, what can we do?

As a developer, we always need to validate and do security checks on the server - merely JS side validation is not enough. For example, knowing that that hidden field could be hacked, always validate that data as if it were a free-form textbox.

The problem is that this degree of extra validation (1) costs much more time, and (2) could be a big performance hit - which is especially ironic because it's for something that 99% of users won't even know about.

Allotting time requires managerial approval. However, most managers see security as one of those "nice-to-have" buzzwords, that they'll have implemented later "if time permits".

I think a good solution to this is to give management a live demo of your site being hacked (perhaps try it on the QA machine, not necessarily production). To really make a point, see if you can hack into their profile. Hopefully that will convince management that it's worth the time.

About the performance hit - I don't see a silver bullet. Most application performance can be improved by adding more resources somehow - more hardware, better design (caching, batching), coding optimizations, etc... Because applications vary so much, there's not a one-size-fits-all solution.

Monday, June 8, 2009

BOOK: Pragmatic Thinking and Learning: Refactor Your Wetware

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/book_pragmatic_thinking_and_learning_refactor_your_wetware.htm]

I'm fascinated with learning how to learn, so I was excited to finally read Andy Hunt's Pragmatic Thinking and Learning: Refactor Your Wetware. Recall that Andy is one of the co-stars of the hit The Pragmatic Programmers.

This is a good example of a higher-level, "non-syntax" book, something that transcends the "How to program XYZ" genre. (Shameless plug: I had written my own book: A Crash Course in Reasoning, but I can see why Andy's is in the top 3000 Amazon sales rank, and mine is barely in the top 3 million).

My favorite chapter was "Journey from Novice to Expert", as there is such a huge productivity gap here. He also continually emphasized the differences between the two parts of the brain, comparing it to a dual CPU, single master bus design.

It was an enjoyable read, similar to picking desserts out of a buffet. He had a lot of good quotes throughout the book:

"... software development must be the most difficult endeavor ever envisioned and practiced by humans." (pg. 1)
"... it's not the teacher teaches; it's that the student learns." (pg. 3)
"Don't succumb to the false authority of a tool or a model." (pg. 41)
"If you don't keep track of great ideas, you will stop noticing that you have them." (pg. 53). This is huge. The "slow times" during the day (driving, waiting in line, burping a sleeping baby) are great for mulling over random ideas. It's almost like collecting raindrops. I used to do this, but got lazy the last few years. Andy's chapter inspired me to go out, get some pocket-sized notebooks, and start jotting down random thoughts again (read: future blog entries).
"Try creating your next software design away from your keyboard and monitor..." (pg. 72). It's ironic, but often sitting in front of the computer, with all the internet distractions, can kill one's creativity.
"So if you aren't pair programming, you definitely need to stop every so often and step away from the keyboard." (pg. 85). I've seen many shops that effectively forbid pair programming, so this at least is a way to partially salvage a bad situation.
"... until recently, one could provide for one's family with minimal formal education or training." (pg. 146)
"... relegating learning activities to your 'free time' is a recipe for failure." (pg. 154)
"... documenting is more important than documentation." (pg. 179). The act of documenting forces you to think through things, where design costs upfront are much cheaper than implementation costs later.
"... we learn better by discovery, not instruction." (pg. 194).
"It's not that we're out of time; we're out of attention." (pg. 211)

Perhaps the best effect from reading this kind of book is that it makes you more aware, such that your subconscious mind is constantly thinking about learning.