Thursday, January 7, 2010

Coding is just the tip of the iceberg

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/coding_is_just_the_tip_of_the_iceberg.htm]

I love coding. The more I do software engineering, the more I realize that coding is just the tip of the iceberg. Consider tasks besides coding that are required for a successful project:

  • Identifying a business problem such that business sponsors are willing to pay for the product
  • Recruit the team to build the project
  • Provide the team the tools to develop the app (hardware & software)
  • Collecting business requirements
  • Coordinating with business partners, such as those providing data that the product will use
  • Designing a functional spec
  • Creating the architectural and technical designs
  • Decide on build vs. acquire (buy, open-source)
  • Outsource part of the project
  • Managing the project
  • Procuring the physical infrastructure that the app is deployed on
  • QA testing the app (functional, integration, user-acceptance, performance, etc...)
  • Deploying the app
  • Write training manuals for the app
  • Training support staff and users
  • Marketing the app such that people actually use it
  • Supporting the app

From start to finish, actually coding for a project may only be 5% - 10% of the total effort. That means that there's a huge portion of the project that is non-coding, and that huge portion can often overcome difficult coding tasks.

For example, say there is a component that is just difficult to program (it's complex, it's big, it occurs outside your expertise, etc...) You could possibly get around coding it yourself by maybe:

  • Buying or open-sourcing it (example: use an open-source tool or class library from CodePlex instead of writing it yourself)
  • Training the internal end users around using that feature ("we know the website has a bug, but just don't click the browser back button")
  • Using project management to get it punted, or out of scope
  • Convincing the business sponsor that the feature is not needed ("we don't need to invest all that time making a dancing paper clip assistant")
  • Better hardware (example, upgrading hardware for better performance)

The stars who keep delivering successful projects are familiar with this, and they are constantly mitigating challenges in one task by giving up something that doesn't matter from another task.

Sometimes you can solve hard coding problems by just sheer skill and coding right through it. But it's good to be aware of other techniques to work around the problem altogether.

Sunday, December 27, 2009

Estimating database table sizes using SP_SpaceUsed

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/estimating_database_table_sizes_using_sp_spaceused.htm]

One of Steve McConnell's tips from his great book on estimating (Software Estimation: Demystifying the Black Art) is that you should not estimate that which you can easily count. Estimating database table sizes is a great example of this. Sure, on one hand disk space is relatively cheap; on the other hand you want to know at least a ballpark estimate of how much space your app will need - will database size explode and no  longer fit on your existing SAN?

Here's a general strategy to estimate table size:

1. Determine the general schema for the table

Note the column datatypes that could be huge (like varchar(2000) for notes, or xml, or blob)

2. Find out how many rows you expect the table to contain

Is the table extending an existing table, and therefore proportional to it? For example, do you have an existing "Employee" table with 100,000 records, and you're creating a new "Employee_Reviews" table where each employee has a 2-3 reviews (and hence you're expecting 200,000 - 300,000 records)? If the table is completely new, then perhaps you can guess the rowcount based on expectations from the business sponsors.

If the table has only a few rows (perhaps less than 10,000 - but this depends), the size is probably negligible, and you don't need to worry about it.

3. Write a SQL script that creates and populates the table.

You can easily write a SQL script to create a new table (and add its appropriate indexes), and then use a WHILE loop to insert 100,000 rows. This can be done on a local instance of SQL Server. Note that you're not inserting the total number of rows you estimated -  i.e. if you estimated that table will contain 10M rows, you don't need to insert 10M rows - rather you'll want a "unit size", which you can then multiple by however many rows you expect. (Indeed, you don't want to wait for 10M rows to be inserted, and your test machine may not even have enough space for that much test data).

For variable data (like strings), use average sized data. For null columns, populate them based on how likely you think they're be used, but err on the side of more space.

Obviously, save your script for later.

4. Run SP_SPACEUSED

SP_SpaceUsed displays how much data a table is using. It shows results for both the data, as well as the indexes (never forget the index space).

You can run it as simply as:

exec SP_SPACEUSED 'TableTest1'

Now you can get a unit-size per row. For example, if the table has 3000KB for data, and 1500KB for indexes, and you inserted a 100K rows, then the average size per row is: (3000KB + 1500KB) / 100,000. Then, multiple that by however many rows you expect.

This may seem like a lot of work, and there are certainly ways to theoretically predict it by plugging into a formula. My concern is that it's too easy for devs to miscalculate the formula (like forgetting the indexes, not accounting the initial table schema itself, or just all the extra steps)

5. Estimate the expected growth

Knowing the initial size is great, but you also must be prepared for growth. We can make educated guesses based on the driving factors of the table size (maybe new customers, a vendor data feed, or user activity), and we can then estimate the size based on historical data or the business's expectations. For example, if the table is based on new customers, and the sales team expects 10% growth, then prepare for 10% growth. Of if the table is based on a vendor data feed, and historically the feed has 13% new records every year, then prepare for 13% growth.

Depending on your company's SAN and DBA strategy, be prepared to have your initial estimate at least include enough space for the first year of growth.

6. Add a safety factor

There will be new columns, new lookup and helper tables, a burst of additional rows, maybe an extra index - something that increases the size. So, always add a safety factor.

7. Prepare for an archival strategy

Some data sources (such as verbose log records) are prone to become huge. Therefore, always have a plan for archival - even if it's that you can't archive (such as it's a transactional table and the business requires regular transactions on historical data). However, sometimes you get lucky; perhaps the business requirements say that based on the type of data, you only legally need to carry 4 years worth of data. Or, perhaps after the first 2 years, the data can be archived in a data warehouse, and then you don't worry about it anymore (this just passes the problem to someone else).

Summary

Here's a sample T-SQL script to create the table and index, insert data, and then call SP_SpaceUsed:

USE [MyTest]
GO

if exists (select 1 from sys.indexes where [name] = 'IX_TableTest1')
    drop index TableTest1.IX_TableTest1

if exists (select 1 from sys.tables where [name] = 'TableTest1')
    drop table TableTest1

--=========================================
--Custom SQL table
CREATE TABLE [dbo].[TableTest1](
    [SomeId] [int] IDENTITY(100000,1) NOT NULL,
    [phone] [bigint] NOT NULL,
    [SomeDate] [datetime] NOT NULL,
    [LastModDate] [datetime] NOT NULL
) ON [PRIMARY]

--Index
CREATE UNIQUE NONCLUSTERED INDEX [IX_TableTest1] ON [TableTest1]
(
    [SomeId] ASC,
    [phone] ASC
) ON [PRIMARY]
--=========================================


--do inserts

declare @max_rows int
select @max_rows = 1000

declare @i as int
select @i = 1

WHILE (@i <= @max_rows)
BEGIN
    --=============
    --Custom SQL Insert (note: use identity value for uniqueness)
    insert into TableTest1 (phone, SomeDate, LastModDate)
    select 6301112222, getDate(), getDate()
    --=============

    select @i = @i + 1

END

--Get sizes
exec SP_SPACEUSED
'TableTest1'

 

Sunday, December 13, 2009

How not to estimate

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/how_not_to_estimate.htm]

Being responsible for the end-to-end solution really makes me think about how to estimate. The flipside is that it also makes me think about how not to estimate:
  1. Wild guess - sometimes this seems like your only option, but most of the time there is ways to improve on the guess (base it on similar projects, or split it into components and estimate those individually).
  2. What you think the boss wants to hear - This may seem like the easy way to initially win favor with the boss, but it will come back with a vengeance when the estimate drastically deviates from reality. Also, because the boss always wants to hear lower schedule times, this has a huge bias that will send you off course.
  3. Base it on unrelated projects - Historical data is great, but don't compare apples to oranges. That a WinForm app took 3 months tells you almost nothing about how long an ASP.Net app will take.
  4. Pick an arbitrary big number - During crunch time, it's easy to think that everything will magically be better "next week" or "next month" ("that will give us enough time to fix everything),  but then that time rolls around and the project is still behind schedule.

All of these are bad estimation methods because they miss the fundamental point - how long will the project really take to build in the real world? Wild guesses or the boss's wishes are not necessarily grounded in reality, so basing estimates on them is barking up the wrong tree.

I realize it's easy to say "how not to do something". I'd recommend Steve McConnell's book, Software Estimation: Demystifying the Black Art, for how to do a great job of estimating.

 

Friday, November 27, 2009

BOOK: 97 things every software architect should know

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/book_97_things_every_software_architect_should_know.htm]

A few months back I finished the book 97 Things Every Software Architect Should Know. I've been slow on blogging, so I hadn't gotten around to my standard follow-up post for books I read.

The books consists of 97 short essays, by various accomplished architects, each with a quick insight. It was a casual and fun read, the kind of book that's easy to sneak in a few pages between changing the kids diapers and fixing the house. It's got a lot of good points. I especially recall an essay about how "the database is your fortress" - GUI and Middle Tier apps change, but you will always have your database.

For hard-core architecture, I thought that Microsoft .NET: Architecting Applications for the Enterprise and Patterns of Enterprise Application Architecture were much more systematic and thorough. But overall, it was still a fun read.

Thursday, October 8, 2009

Can you still be technical if you don't code?

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/can_you_still_be_technical_if_you_dont_code.htm]

Can you still be technical if you don't code? A lot of developers have a passion for the technology, do a great job in their current role of implementing solutions (which requires coding), and then get "promoted" into some "big picture" role that no longer does implementation - ironically the thing they did so well at. These higher-level roles often do lots of non-trivial things, but not actual coding. For example:

  • Infrastructure (Servers, SAN space, database access, network access)
  • Design decisions
  • Dealing with legacy code
  • Handling outsourcing, insourcing, consultants
  • Build-vs-buy
  • Vendor evaluations/score card; integrate the vendor's product into your own
  • Coordinate large-scale integration of many apps from different environments
  • Coordination among multiple product life cycles
  • Writing guidance docs
  • Code reviews
  • Occasional prototypes
  • Configuration

On one hand, these types of tasks require technical knowledge in that you wouldn't expect a non-technical person to perform them. On the other hand, they don't seem in the same category as hands-on coding.

What do you think - can you be technical (or remain technical) without actually writing code?

Monday, October 5, 2009

Reasons to NOT put version history in the comments

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/reasons_to_not_put_version_history_in_the_comments.htm]

Some coding standards ask that developers add revision history to the top of the method. Not just the normal "Summary" and "Parameter" tags that can be used to automatically create documentation, but rather a full blown revision log with developer name, date, and comment. This was bigger in the 80's and 90's, when source control and refactoring weren't as common. However, in these days, putting revision history in your comments has some really big problems:
  • Extra effort - It requires extra effort from developers, and usually the tedious kind of effort. It requires manual developer discipline, so architects don't have a good way to enforce this.
  • Bad for refactoring - It discourages refactoring of methods. Say you split a method in two - how do you split up the commented version history?
  • Source control already provides this - It doesn't tell you anything that source control history won't already give you. But even worse, it could be misleading - source control is the true authority, a developer could accidentally type (or forget) the wrong comments. Also, by documenting at the top of the method, it is hard to indicate what changed in the middle (whereas source control diff would instantly tell you).

Perhaps for SQL, I can see the benefit, so that when the DBA runs sp_helptext on a stored proc, they get a quick history (and databases aren't usually refactored like C# code). However, for middle tier code, putting revision history in comments seems like an unwise use of time.

LINK: Comments as version control
 

Thursday, October 1, 2009

What makes something Enterprise?

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/what_makes_something_enterprise.htm]

There's a world of difference between a prototype hammered out over a weekend, and an enterprise app ready for the harsh world of production. Here's a somewhat random brainstorm. In general (there's always an exception), Enterprise apps:

  • Are scalable - they handle large loads and can be called many times.
  • Have a retry strategy - for example, it tries pinging the external service three times before "failing".
  • Have a failover strategy, like an active-passive machine cluster for maximum uptime, and a disaster recovery site.
  • Send notifications.
  • Handles invalid data (like states, zip codes, and numbers).
  • Can integrate with other systems (perhaps providing web service wrappers, or command line APIs, or publicly accessible data repositories that other apps can modify) .
  • Are deployable - "it works on my machine" absolutely does not cut it.
  • Have Logging - this is especially useful for debugging in production, or measuring how many errors (and which types) are thrown.
  • Have long-running process (hours, days, or even weeks) - not just a single thread in memory.
  • Have async processes, which usually means concurrency and threading problems.
  • Support multiple instances of the app running. You can open two copies of word, or run two MSBuild scripts at the same time.
  • Handle product versioning.
  • Care about the hardware it's running on (enough CPU and memory).
  • Have a pluggable architecture - You may need to switch data providers (Oracle/SQL/Xml).
  • Have external data sources (web services, ftp file dumps, external databases).
  • Can scale out, such as adding more web servers, or splitting the database across multiple servers.
  • Have security constraints (both hacking, and functional).
  • Have process that are documented (not just for training, but also for legal auditing and compliance issues).

Much of this code isn't the fun, "glamorous" stuff. However, it's this kind of robustness that separates the "toys" from the enterprise workhorses.

See also: Enterprise Data Access, Enterprise Caching