timstall: coding

Showing posts with label coding. Show all posts

Wednesday, July 20, 2011

Ghosts and Time Bombs

Having bugs that are reproducible on your local machine is a luxury. Enterprise production apps are often void of such luxuries. Indeed, often the reason the bug gets past developers, code reviews, QA, UAT, regression, and every other quality control measure is because it is not acting in an obviously deterministic way. Two common types of such bugs are "Time Bombs" and "Ghosts".

A Time bomb works perfectly, only to explode at some point in the future because it depends on some external dependency that eventually changes. These are usually deterministic, and can be reproduced if you know what you're looking for, but it's very hard to trace the exact cause. The temptation with the time bomb is that it's working perfectly right now, and everyone is always so busy, so they move on.

Examples of time bombs are:

· Dependency on the clock – Y2K was the most famous case. Other examples include code that doesn't account for the new year (say it sorts by month, and doesn't realize that January 2012 is greater than December 2011), or storing a total milliseconds as in Int32 (and it overflows after a month).

· Growing data – Say your system logs to a database table, and it works perfect on local and even QA tests where the database is constantly refreshed. But then in 6 months (after the developers have rolled off the project and no-one even knows about the log) the log file becomes so bloated that performance slows to a crawl and all the connections timeout.

· Memory leak – Similar to the growing data.

· Service contract changes or expires – In today's interconnected systems, it is common to have external data dependencies from third parties. What if a business owner or manager forgets to renew these contracts, or the schema of the contract fails, and hence the service constantly fails. Even worse – say you shell out to a third-party tool (with System.Diagnostics, and hide the window so there's no popup) that gives a visual EULA after such an expiration, and all you see if the process appears frozen because it's waiting for the (hidden) EULA?

· Expiring Cache – What if you store critical data in the cache on startup, but that data eventually expires without any renewal policy and the app crashes without it?

· Rare events with big impact – What if there's a annual refresh of an external data table? I've seen apps that work perfectly in prod for 8 months, processing some external file, and then unexpectedly explode because they're given an "annual refresh" file that is either too big, or has a different schema.

General ways to test for time bombs:

· Leave the app running for days on end.

· Forcibly kill the cache in the middle of running – will it recover?

· Do load testing on the database tables.

· Make sure you have archive and cleanup routines.

· Set the system clock to various values.

· Test for annual events.

Ghosts are bugs that work perfectly in every environment that you can control, but randomly show up in environments you can't. You can't reproduce them, and you don't have access to the environment, so it's ugly. Ghosts are tempting to ignore because they seem to go away. The problem is that if you can't control the ghost, then you can't control your own application, and that looks really bad to senior management. Examples of ghosts include:

· Concurrency, threading, and deadlocks – Because 99% of devs test their code as a single user stepping through the debugger, they'll almost never see concurrency issues.

· Environmental issues – For unknown reasons, the network has hiccups (limited bandwidth, so sometimes you app gets kicked out), or database occasionally runs significantly slower, causing your performance-tuned application to time out.

· Another process overwrites your data – Enterprise apps are not a closed system – there could be other services, database triggers, or batch jobs randomly interfering with your data.

· Hardware failures – What if the network is temporarily down, or the load balancer has a routing error (it was manually configured wrong during the last deploy?), or a disk is corrupt?

· Different OS or Windows updates – Sometimes devs create (and debug) on one OS version, but the app actually runs on another. This is especially common with client apps where you could create it on Windows 7 Professional, but it runs on Windows Vista. Throw in service packs and even windows updates, and there can be a lot of subtle differences.

· Load balancing – What if you have a web farm with 10 servers, and 9 work perfect, but the last one is broken (or deployed to incorrectly)? The app appears to work perfectly 90% of the time. Realistically, say it's a compound issue where the server only fails 10% of the time, then your app appears to work 99% of the time.

· Tedious logic with too many inputs – A complex HR application could have hundreds of test cases that work perfectly, but say everyone missed the obscure case that only occurs when a twice-terminated user logs in and tries to change their email address.

General ways to test for ghosts:

· Increase load and force concurrency (You can easily use a homegrown threading tool to make many web service or database calls at once, forcing a concurrency test).

· Simulate hardware failures – unplug a network cable or temporarily turn off IIS in your QA environment. Does the app recover?

· Allow developers some means for QA and Prod debug access –if you can finally reproduce that bug in prod (an nowhere else), the cheapest solution is to allow devs some means to troubleshoot it there. Perhaps they need to sit with a support specialist to use their security access, but find a way.

· Have tracers and profilers on all servers, especially web and database servers.

· Have a diagnostic check for your own app. How do you know your app is healthy? Perhaps a tool that pings every webservice (on each machine in the webfarm), or ensures each stored proc is correctly installed?

Monday, June 27, 2011

Linq: Creating new objects from selects and joins

I like Linq more every time I use is. I've posted about XLinq and using linq to sort and filter lists. You can also use Linq to join two objects and select properties from each to create a new collection of objects (somewhat like SQL), run ForEach clauses, and do simple functions like Distinct, Sum, and Count.

Here's a code sample (I prefer to do minimalist samples with a unit test syntax for easy demos):

using System;

using System.Text;

using System.Collections.Generic;

using System.Linq;

using Microsoft.VisualStudio.TestTools.UnitTesting;

namespace LinqDemo

{

/// <summary>

/// Summary description for UnitTest1

/// </summary>

[TestClass]

public class UnitTest1

{

private Employee[] GetEmployees()

{

return new Employee[]

{

new Employee(){ FirstName = "Homer", LastName ="Simpson", FavoriteNumber=7, DeptId=1},

new Employee(){ FirstName = "Marge", LastName ="Simpson", FavoriteNumber=18, DeptId=0},

new Employee(){ FirstName = "Bart", LastName ="Simpson", FavoriteNumber=99, DeptId=0},

new Employee(){ FirstName = "Monty", LastName ="Burns", FavoriteNumber=23, DeptId=9},

new Employee(){ FirstName = "Ned", LastName ="Flanders", FavoriteNumber=5, DeptId=0}

};

}

private Department[] GetDepartments()

{

return new Department[]

{

new Department(){ DeptId=1, DeptName="Safety Operator"},

new Department(){ DeptId=2, DeptName="Customer Service"},

new Department(){ DeptId=9, DeptName="Executive"},

};

}

[TestMethod]

public void SelectProperties()

{

//create data

Employee[] emps = GetEmployees();

//Use linq to get a distinct list from some property

List<string> lastNames = emps

.Where(n => n.FavoriteNumber > 10) //Some filter

.OrderBy(n => n.LastName)

.Select(n => n.LastName) //Select specific fields

.Distinct() //Get only distict elements

.ToList();

Assert.AreEqual("Burns", lastNames[0]);

Assert.AreEqual("Simpson", lastNames[1]);

}

[TestMethod]

public void JoinAndCreateAnotherObject()

{

//Join "Employee" and Department to create "Worker"

Employee[] emps = GetEmployees();

Department[] depts = GetDepartments();

//This could be useful is emps and depts came from

// different sources, or depts was cached

Worker[] workers =

(

from emp in emps

from dept in depts

where emp.DeptId == dept.DeptId

&& emp.DeptId > 0

select new Worker()

{

FirstName = emp.FirstName,

DeptName = dept.DeptName

}

).ToArray();

Assert.AreEqual("Homer", workers[0].FirstName);

Assert.AreEqual("Safety Operator", workers[0].DeptName);

}

[TestMethod]

public void ForEach()

{

//Use a single line to update a property

List<Employee> emps = GetEmployees().ToList();

Assert.AreEqual(7, emps[0].FavoriteNumber);

//Double everyone's favorte number

//Easier than writing a for-each loop

emps.ForEach(n => n.FavoriteNumber = n.FavoriteNumber * 2);

Assert.AreEqual(14, emps[0].FavoriteNumber);

}

[TestMethod]

public void Do_Aggregates()

{

//Get the sum of all numbers where the number is already > 10.

List<Employee> emps = GetEmployees().ToList();

int intSum = emps

.Where(n => n.FavoriteNumber > 10)

.Sum(n => n.FavoriteNumber);

Assert.AreEqual(140, intSum);

}

public class Employee

{

public int FavoriteNumber { get; set; }

public int DeptId { get; set; }

public string FirstName { get; set; }

public string LastName { get; set; }

public override string ToString()

{

return string.Format("{0} {1} - {2}", this.FirstName, this.LastName, this.FavoriteNumber);

}

public class Department

{

public int DeptId { get; set; }

public string DeptName { get; set; }

}

public class Worker

{

public string FirstName { get; set; }

public string DeptName { get; set; }

}

Thursday, March 25, 2010

LCNUG - Visual Programming Language with Robotics Studio

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/lcnug__visual_programming_language_with_robotics_studio.htm]

Last night, Lance Larson, president of the Madison .NET User Group, presented at LCNUG on Visual Programming Language (VPL) with Robotics Studio. It was an active presentation - he even had robots moving around the room!

Two things that really got my attention:

VPL applies to more than just robots. Many businesses continually hit the problem "how can I have a non-technical person still get technical things done?" For example, they'd like a business analyst program a workflow or rules engine without needing to actually code. Many workflow-related products provide some kind of drag & drop interface (like making a flowchart in Visio) to effectively write a program, but such an interface is difficult to build. It gets especially complicated when you have variables, conditions, looping, etc... Microsoft's VPL is powerful, and I wonder if it will be reused for their workflow products too.
Being surrounded by software, it's refreshing to see the hardware part of engineering - like physical robots that follow programming instructions.

Thanks Lance!

Tuesday, March 2, 2010

Visual Studio 2008 hanging

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/visual_studio_2008_hanging.htm]

There's a lot of reasons that Visual Studio hangs. It was hanging for me the other day when I tried to open, and I had a clean checkout. One solution that solved my current problem (thanks to a co-worker):

Close VS, delete the *.suo file, and try to re-open. I'm sure there's a reason why,

2 seconds later, VS was up and running.

Sunday, January 24, 2010

Is this code broken?

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/is_this_code_broken.htm]

What constitutes broken code? Everyone agrees that code that crashes in production and threatens to have developers fired is indeed broken. But where's the line? Is the following code broken:

The code logs incorrect data in an error log file?
The code displays incorrect values in a label (like a mis-formatted date)?
If a certain rare case occurs (like a button is pressed at exactly 12:00AM, or an uploaded file size is exactly 1.00 MB), then the code crashes?
The code has mis-leading names for variables and methods. For example, it has a method "IsNumber" that checks only for integers, or "IsLetter" that allows for special characters? Say the current program calls the method with correct data so that the app never crashes?

In all these cases, say the application essentially "works" and handles the main use cases.

The problem is maintenance. Maintaining and extending code is an expensive part of the total cost of ownership. You could spend hours tracking down a single erroneous line of code. Code that is low quality (tons of copy & paste, misnamed methods, misleading logic, etc...) is going to be a fortune to maintain. On the other hand, certain functional errors that have zero impact on the business can perhaps be documented as "known-issues" (i.e. the month is formatted in a label with a preceding zero like "03" instead of just "3".

I'd say it comes down to the business, and the code is broken if it costs the business more than it should - whether it be via maintenance costs or functional errors that hinder the end users. Perhaps the question isn't as much "is this code broken", but "how can we maximize the business-value of this code?"

Thursday, October 8, 2009

Can you still be technical if you don't code?

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/can_you_still_be_technical_if_you_dont_code.htm]

Can you still be technical if you don't code? A lot of developers have a passion for the technology, do a great job in their current role of implementing solutions (which requires coding), and then get "promoted" into some "big picture" role that no longer does implementation - ironically the thing they did so well at. These higher-level roles often do lots of non-trivial things, but not actual coding. For example:

Infrastructure (Servers, SAN space, database access, network access)
Design decisions
Dealing with legacy code
Handling outsourcing, insourcing, consultants
Build-vs-buy
Vendor evaluations/score card; integrate the vendor's product into your own
Coordinate large-scale integration of many apps from different environments
Coordination among multiple product life cycles
Writing guidance docs
Code reviews
Occasional prototypes
Configuration

On one hand, these types of tasks require technical knowledge in that you wouldn't expect a non-technical person to perform them. On the other hand, they don't seem in the same category as hands-on coding.

What do you think - can you be technical (or remain technical) without actually writing code?

Tuesday, September 22, 2009

ConnectionTimeout vs. CommandTimeout

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/connectiontimeout_vs_commandtimeout.htm]

SQL timeouts can be very annoying, especially for internal development tools where performance isn't critical.

A lot of developers will try fixing this by modifying the connection string by appending "Connect Timeout=300". Normally this is easy because the connection string is stored in some config file.

However, it still usually fails. This is because there's a big difference between SqlConnection.ConnectionTimeout and SqlCommand.CommandTimeout.

If you're running a command, like a snippet of SQL or a stored proc, then your code needs to set the CommandTimeout. Something like so:

SqlConnection con = new SqlConnection(strDbCon);
SqlCommand cmd = con.CreateCommand();
cmd.CommandType = CommandType.Text;
cmd.CommandText = strText;
cmd.CommandTimeout = 10000; //no relation to con.ConnectionTimeout

Obviously, that's very minimalist code, but that's the general idea. For a robust data access layer, you'd make the timeout configurable.

Monday, April 27, 2009

BOOK: Patterns of Enterprise Application Architecture

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/book_patterns_of_enterprise_application_architecture.htm]

I remember when Martin Fowler's Patterns of Enterprise Application Architecture came out back in 2002. I constantly heard good things, but never got around to reading it. Finally, I buckled down and went through it, and I'm glad I did.

Perhaps the biggest thing I liked about the book was the common vocabulary. Whenever I look at popular community projects (such as Enterprise Library, CLSA, or .NetTiers), or just read star bloggers, ones keeps hearing about all these pattern names (ActiveRecord, Gateway. Lazy Load, Repository, Registry, Service Layer, etc...). While gradually you pick them up, it's convenient just injecting them into your brain all at once.

I also thought his chapters on concurrency were excellent, especially how he explains the difference between an optimistic lock and pessimistic lock. (My simplified interpretation is that an optimistic lock is "optimistic" in that it assumes conflicts are either rare, or easy to resolve, and hence checks for conflicts at the end of a transaction. On the other hand, a pessimistic lock is "pessimistic" in that it assumes conflicts are either common, or very expensive to resolve, and hence prevents the conflict from ever occurring by locking at the beginning of a transaction).

He's also very "no-holds-barred" for doing things the best way. For example in the Metedata Mapping pattern he emphasizes using code generation or reflection - two concepts that for some reason many developers seem reluctant to use.

Lastly, reading a solid book like this just helps you think more about enterprise patterns as you go through your daily coding, and that's a valuable thing.

Sunday, April 26, 2009

Structs vs. Classes - seeing the functional difference

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/structs_vs_classes__seeing_the_functional_difference.htm]

There's a lot written about the difference between classes and structs, and comparing the two.

One thing I find helpful to really "get it" is to whip up a quick unit test, and functionally see the differences. While classes (being reference types) and structs (being value types) will have different memory and performance implications, it seems most devs are initially concerned with how they're functionally different, and they worry about performance "later".

Here's an example showing both a struct and a class object being passed into some method and having their property updated. The struct, being a value type, is copied when sent into the method, and hence the property doesn't "persist" outside of the method (the copy is discarded, the original left untouched). However, the class sends in a reference, and therefore the method is pointing to the same instance as the host caller, and hence the update "persists" for the class..

    [TestMethod]
    public void TestMethod1()
    {
      MyStruct s = new MyStruct();
      s.Name = "Struct1";

      MyClass c = new MyClass();
      c.Name = "Class1";

      UpdateStruct(s);
      UpdateClass(c);

      Assert.AreEqual("Struct1", s.Name);
      Assert.AreEqual("newClass", c.Name);
    }

    private void UpdateStruct(MyStruct s)
    {
      s.Name = "newStruct";
    }
    private void UpdateClass(MyClass c)
    {
      c.Name = "newClass";
    }

    public struct MyStruct
    {
      public string Name { get; set; }
    }
    public class MyClass
    {
      public string Name { get; set; }
    }

You could write tests for similar things - like showing how structs don't allow inheritance (which actually wouldn't even compile), but do allow interfaces. You can drill down even further by stepping through in the debugger, or checking Visual Studio's performance wizard.

Wednesday, March 4, 2009

Refactoring SQL code with file includes and variables

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/refactoring_sql_code_with_file_includes_and_variables.htm]

Everyone who maintains code knows that duplicated code is bad. While OOP languages like C# (and even xml "languages" like MSBuild) provide ways to refactor, SQL seems lacking in such features. If you run a similarity-analyzer (like Simian), you can probably see large amounts of duplicated code. Two common refactoring techniques to help with this would be:

Dynamic include – Include one snippet within another. For example, we could have a chunk of code that is duplicated in a stored proc, and a table function.
Variables – We can abstract any SQL code (table names, sql commands) to a variable.

Note that mere stored procs or user-defined-functions are insufficient, as they can’t handle all snippets (like declaring variables which are used in the calling function), or they have awful performance in the where clause.

We can use a technology “SqlCmds” to accomplish this. (http://msdn.microsoft.com/en-us/library/aa833281.aspx).

How to enable Sql Commands in SqlStudio:

http://msdn.microsoft.com/en-us/library/ms174187.aspx

Single query window – “On the Query menu, click SQLCMD Mode.”
For all windows – “To turn SQLCMD scripting on by default, on the Tools menu select Options, expand Query Execution, and SQL Server, click the General page, and then check the By default open new queries in SQLCMD Mode box.”

How to enable Sql Commands in Visual Studio

This requires the database edition of Visual Studio. Click the allow "SqlCmd" button on the tool bar.

Basic variable test

--set variable, and then use it - use the ":setvar" command
:setvar SomeTable TestCodeGen
select * from $(SomeTable)

-- environmental variables too!
select '$(COMPUTERNAME)' --returns my comp name (like 'TimStall2')

This means we could have an external script set the environmental variables (like the PrimaryDataBase), and then easily re-run those in the SQL Editor. Note that you can use the free tool SetX.exe to set environmental variables.

File Include – Basic (use the “:r” command)

--File 1 (var_def.inc):
:setvar PrimaryDB MyDatabase70

--File 2:
:r C:\Development\var_def.inc
select * from $(PrimaryDB).MySchema.TestCodeGen

For example, we could have a “header” file that includes a bunch of variable definitions (like all the PrimaryDB, ReportDB, etc…), and then include it wherever needed. Or, we could include any other SQL snippet. For example, we could use this to effectively make private functions (instead of only have global functions) that are encapsulated to a single stored proc.

File Include – avoid function in a where clause

--File 1 (myProc_func1.sql):
--some reusable snippet (note how is uses the variable '@myNum")
co = '1234' or NumberInteger > @myNum

--File 2:
declare @myNum integer
select @myNum = 10

select * from TestCodeGen
where
:r C:\Development\myProc_func1.sql
and LastChangedBy < GetDate()

Summary

One catch to all of this is that if you have your own database installation code via ADO.Net, you need to manually parse it yourself. However, that should be easy enough to do given the strict syntax of the SqlCmds.

Note that this is substituted “at compile time”. If you run the SQL Profiler, you won’t see the “:setvar” or “:r”, but rather the content already parsed. These techniques could be used to help refactor SQL code, just like similar techniques help refactor the code in other languages.

Tuesday, February 17, 2009

The different between Count, Length, and Index

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/the_different_between_count_length_and_index.htm]

When dealing with arrays and collections (like List), there are three integer "things" that can mess people up: Count, Length, and Index.

Count - refers to collections. This simply gets the number of items in the collection.
Length - refers to arrays. to quote: "Gets a 32-bit integer that represents the total number of elements in all the dimensions of the Array." (emphasis added). For a 1-d array, Count and Length seem similar. But for multi-dimensional arrays, the difference becomes apparent. A 3x2 array will have a length of 6. Because an array is allocate up front (as opposed to a collection that can grow or shrink), this conceptually makes sense. Length for an array doesn't change after declaration; Count for a List does (as you add or remove items).
Index - used by arrays, and some collections (like List), to indicate a specific item in the array or collection. Whereas Count and Length are 1-based properties, Index is a 0-based indexer.

This code snippet shows these in action:

    [TestMethod]
    public void TestMethod1()
    {
      //Length --> total number of items in the array
      // acts like "Count" for a 1-d array
      string[] astr = new string[]{"a","b","c"};
      Assert.AreEqual(3, astr.Length);

      // but very different for a 2-d array
      string[,] astr2 = new string[2, 3];
      Assert.AreEqual(6, astr2.Length); //Length = 2 * 3

      //Count --> 1-based
      List<string> lstr = new List<string>();
      lstr.Add("a");
      lstr.Add("b");
      Assert.AreEqual(2, lstr.Count);

      //Index --> 0-based
      Assert.AreEqual("a", astr[0]);
      Assert.AreEqual("b", lstr[1]);

    }