Sunday, June 14, 2009

The benefits of technical blogging

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/the_benefits_of_technical_blogging.htm]

By some estimates, there are over 50 million blogs. Granted, many of these are barely maintained, and they cover all topics (not just software engineering), but clearly there's something to this whole blogging thing. Keep in mind that the vast majority of these blogs are not profitable - so the authors are writing as hobbyists and not paid professionals. Millions of people - and thousands of software engineers - blog because there are a lot of benefits to it:
  1. Record smaller thoughts to build up for a larger project like an article or even a book.
  2. Instant gratification - Blogs let you publish your writing instantly, which is gratifying (as opposed to the lag time for an article or book)
  3. Ability to share your thoughts with the world - and likewise get feedback on them.
  4. Practice at writing, beyond just the quick email or IM context.
  5. Provides an outlet for all the miscellaneous thoughts and code a developer comes across
  6. Gets you to think deeper about ideas, knowing that you need to at least polish it to the level of a few paragraphs for public consumption. This helps you learn the topic yourself, as one of the best ways to learn is to teach the content to others.
  7. Thought Leadership - some people (especially consultants where this is a resume credential) really like this. Good blog posts also build credibility.
  8. Confidence booster - putting your thoughts out there to potentially millions of people requires courage, and repeatedly doing that (and growing from the feedback) builds confidence.
  9. Write anything on any topic, regardless of what you're currently working on.
  10. It helps others. For example, sometimes I post the most obscure syntax errors I find - and other people were troubled with the exact same issue, and actually find the post useful.
  11. Blogging acts like a personal journal. I've gotten a kick out of seeing how some of my thoughts have evolved over the last 5 years.
  12. Low barrier to entry - because blogs are (effectively) free, you can write small chunks, you can write anything you'd like, and you can automatically publish it, there's a very low barrier to get started.

Should you blog even if you don't get traffic? I think yes - if you enjoy it (such as for any of the reasons mentioned above). Traffic comes and goes, and the weirdest entries get tons of traffic while the other entries (that I expect to be popular) are almost ignored.

Related: Do you have time to blog?

Wednesday, June 10, 2009

Real life: Avoiding customization to build a Sandbox

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/real_life_avoiding_customization_to_build_a_sandbox.htm]

I have three kids - two sons and a daughter. Cute little buggers, so I wanted to make them a sandbox to play in. I figure doing something physical and outdoors, as opposed to watching TV, would be good for them. Plus, I'm all for anything that even remotely encourages them to do engineering. However, I have very, very, minimal wood-working skills. When it comes to woodworking, I am (at best) a hobbyist - by no means an expert. This means I was just trying to build a simple sandbox that works - no fancy wood cutting, things that take big vocabulary to describe, or expensive tools required. I made a crude box-like design, drove to the local home depot, and got help picking out the wood (12x1x6 inch cedar). The only cuts I made where ones that the home depot guy could do in the store - so no triangle, notched, or diagonal cuts. I hauled my precious wood back to my garage (read: not professional tool workshop), applied one coat of polyurthethane something (read: I hope that helps protect against weathering), and hammered the boards together. After digging the box into the ground and filling it with sand, it was good enough and I was done.

Why dump such a story on my technical blog? Because my hobbyist mentality towards a wood sandbox is essentially the same as many "programmers" hobbyist mentality towards the craft of software engineering. We both just want to get it done, make the end users happy, and maybe enjoy it along the way. If we miss out on an optimal technique, that's okay. Working with other people, it can be useful to understand that mindset.

And yes, the kids loved it.

 

Tuesday, June 9, 2009

An easy way to hack unvalidated web input

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/an_easy_way_to_hack_unvalidated_web_input.htm]

Many security bugs get overlooked because hackers use "special" tools that give them options that the original developers anticipate. For example, most web developers only test their application through a browser. While this is sometimes sufficient to check different querystring inputs, it gives one an enormous false sense of security. Consider two cases:

  • A button is disabled, therefore the developer assumes you can't click it (Example: A Save button is disabled because some data isn't valid)

  • Data is stored in a hidden field, therefore the developer assumes that you can't change it. (Example: A record's ID is stored in a hidden field)

To some people, this appears safe because IE doesn't let you click disabled buttons or change the value of hidden fields. However, IE is only used to collect the data from the page and aggregate it into a post that it sends to the webserver. What if the hacker is using a different tool besides IE that gives them more control on how to assemble and send this post (or an extension to IE that lets them modify fields)?

 

How to hack it

 

You could crack both the above cases using Visual Studio Team Test (or I bet Firebug, or the IE8 dev tools). Here's how:

  1. Open up VS Team Test

  2. Create a Test Project, and add a new web test. This opens up a recorder in IE.

  3. Navigate to the page you want to crack, such as something that stores data in a hidden field.

  4. Perform a normal save action, such as saving the page (which uses data from the hidden field)

  5. Stop the recording. Play back the test just to make sure it works in the normal case. Note how for each request, VS provides you a new row and lets you see the exact request and response, as well as what visually appears in the Web Browser.

    1.  

  6. Now start the hacking. Go to the request where you saved the page (that collected data from the hidden field), and unbind that field and change its value to whatever you want. In this case, I simply had a field called "Hidden1", and I changed it's value to "abc". Note that just like IE, the webtest sends a post to the server. But unlike IE, you can easily change the contents of that post.

    1.  

  7. Rerun the webtest. The new, hacked, value is sent to the server.

  8. Note that you can also script VS WebTests with C#. So, you could have a for-loop that re-submits a 1000 requests, varying the post values each time.

So, what can we do?

 

As a developer, we always need to validate and do security checks on the server - merely JS side validation is not enough. For example, knowing that that hidden field could be hacked, always validate that data as if it were a free-form textbox.

 

The problem is that this degree of extra validation (1) costs much more time, and (2) could be a big performance hit - which is especially ironic because it's for something that 99% of users won't even know about.

 

Allotting time requires managerial approval. However, most managers see security as one of those "nice-to-have" buzzwords, that they'll have implemented later "if time permits".

I think a good solution to this is to give management a live demo of your site being hacked (perhaps try it on the QA machine, not necessarily production). To really make a point, see if you can hack into their profile. Hopefully that will convince management that it's worth the time.

 

About the performance hit - I don't see a silver bullet. Most application performance can be improved by adding more resources somehow - more hardware, better design (caching, batching), coding optimizations, etc... Because applications vary so much, there's not a one-size-fits-all solution.

Monday, June 8, 2009

BOOK: Pragmatic Thinking and Learning: Refactor Your Wetware

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/book_pragmatic_thinking_and_learning_refactor_your_wetware.htm]

I'm fascinated with learning how to learn, so I was excited to finally read Andy Hunt's Pragmatic Thinking and Learning: Refactor Your Wetware. Recall that Andy is one of the co-stars of the hit The Pragmatic Programmers.

This is a good example of a higher-level, "non-syntax" book, something that transcends the "How to program XYZ" genre. (Shameless plug: I had written my own book: A Crash Course in Reasoning, but I can see why Andy's is in the top 3000 Amazon sales rank, and mine is barely in the top 3 million).

My favorite chapter was "Journey from Novice to Expert", as there is such a huge productivity gap here. He also continually emphasized the differences between the two parts of the brain, comparing it to a dual CPU, single master bus design.

It was an enjoyable read, similar to picking desserts out of a buffet. He had a lot of good quotes throughout the book:

  • "... software development must be the most difficult endeavor ever envisioned and practiced by humans." (pg. 1)
  • "... it's not the teacher teaches; it's that the student learns." (pg. 3)
  • "Don't succumb to the false authority of a tool or a model." (pg. 41)
  • "If you don't keep track of great ideas, you will stop noticing that you have them." (pg. 53). This is huge. The "slow times" during the day (driving, waiting in line, burping a sleeping baby) are great for mulling over random ideas. It's almost like collecting raindrops. I used to do this, but got lazy the last few years. Andy's chapter inspired me to go out, get some pocket-sized notebooks, and start jotting down random thoughts again (read: future blog entries).
  • "Try creating your next software design away from your keyboard and monitor..." (pg. 72). It's ironic, but often sitting in front of the computer, with all the internet distractions, can kill one's creativity.
  • "So if you aren't pair programming, you definitely need to stop every so often and step away from the keyboard." (pg. 85). I've seen many shops that effectively forbid pair programming, so this at least is a way to partially salvage a bad situation.
  • "... until recently, one could provide for one's family with minimal formal education or training." (pg. 146)
  • "... relegating learning activities to your 'free time' is a recipe for failure." (pg. 154)
  • "... documenting is more important than documentation." (pg. 179). The act of documenting forces you to think through things, where design costs upfront are much cheaper than implementation costs later.
  • "... we learn better by discovery, not instruction." (pg. 194).
  • "It's not that we're out of time; we're out of attention." (pg. 211)

Perhaps the best effect from reading this kind of book is that it makes you more aware, such that your subconscious mind is constantly thinking about learning.

Sunday, June 7, 2009

Farewell Paylocity, Hello Career Education Corporation

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/farewell_paylocity_hello_career_education_corporation.htm]

I've had the good fortune to serve at Paylocity the last four years. Inheriting a legacy system, I got the opportunity to develope, contribute to process, grow as an architect, and work with a very classy team.

For a variety of personal and professional reasons, I'll be switching to a new company soon - Career Education Corporation. CEC has a noble mission - to help people with continuing education (something I have a passion for). They have both an online and brick & mortar presence. It's an international company with several thousand employees, and a lot of opportunity. I'm looking forward to this new chapter in life.

Thursday, June 4, 2009

Why XLinq is awesome - the benefits

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/why_xlinq_is_awesome__the_benefits.htm]

I'm a big fan of Xml, especially for internal tools and processes. In order to juggle xml, I'd usually resort to XPath queries, the XmlDocument, and Xml Serializers. However, the new XLinq that came out with .Net 3.0, seems to completely overpower XPath for C# applications. It rocks.

You can see a general overview here. I also found the Linq and Xlinq chapters from C# 3.0 in a Nutshell to be wonderful. Linq alone is powerful, and XLinq applies linq to xml. There are several specific benefits of XLinq that I want to highlight:

Easy round-tripping from files or strings:

XLinq makes it very easy to load or save files from either a string or uri.

[TestMethod]
public void Create_1()
{
  //Load from string
  XElement x1 = XElement.Parse(
    @"
        
          30
        

      
");

  //Load from uri
  XElement x2 = XElement.Load(@"C:\temp\x2.xml");

  //save to string
  string s = x1.ToString();

  //save to file
  x2.Save(@"C:\temp\x2a.xml");
}

Easy DOM access and modification

I recall with XmlDocument (at least how I understood it), nodes were tightly coupled to their original XmlDocument, so it could be a pain to pull out an xml snippet from one doc and insert it into another. With XLinq, you can pretty much juggle it any way - insert/update/remove either attributes, nodes, or inner text.

    [TestMethod]
    public void ModifyDom_1()
    {
      //Load from string
      XElement x1 = XElement.Parse(
        @"
           
              30
           

         
"
);

      //Get a reference to a node:
      XElement xClient = x1.Element("client");

      //Modify DOM
      //  Attribute - insert new
      xClient.SetAttributeValue("new1", "val1");

      //  Attribute - update
      xClient.SetAttributeValue("enabled", "false");

      //  Attribute - remove
      xClient.SetAttributeValue("attr2", null);

      //  Node - insert new
      XElement xNew1 = XElement.Parse("bbb");
      XElement xNew2 = XElement.Parse("ccc");
      xClient.AddAfterSelf(xNew1);
      xClient.AddAfterSelf(xNew2);

      //  Node - remove
      xNew2.Remove();

      //  InnerText - update
      xClient.Element("timeout").Value = "60";

      //save to string
      string s = x1.ToString();
    }

Populating objects from an XLinq Query:

You can see a ton of gems in this snippet from the Chicago Code Camp website:

Say you have a class "Abstract" with properties AbstractCode, SpeakerCode, CoSpeakerCode, Author, Description, and Title.

There are two xml files - one for "Abstracts" and the other for "Speakers"


  
    Homer Simpson
    homer@email.com
    Nuclear engineer...
  

  ...




  
    How to be a good employee
    
      Eat donuts and sleep...
    

  

  ...

The following XLinq snippet shows how to:

  1. Instantiate an array of Abstract objects from xml by mapping the node, attribute, and inner text values.
  2. Filter the xml file by multiple expressions and complex logic
  3. Join multiple xml files together, such as the Abstract and Speaker xml files via a common attribute (SpeakerCode)
  4. Use external functions (LinqHelper.GetNonNull) in your XLinq query.
  5. Apply transformations (.Replace() ) to the xml data you're reading
  6. Order it all.

    XElement xeAbstracts = XElement.Load("Abstracts.xml");
    XElement xeSpeakers = XElement.Load("Speakers.xml");

    Abstract[] abs =
    (
      from a in xeAbstracts.Elements("Abstract")
      from s in xeSpeakers.Elements("Speaker")
      where a.Attribute("SpeakerCode").Value
        == s.Attribute("SpeakerCode").Value

      select new Abstract()
      {
        AbstractCode = a.Attribute("AbstractCode").Value,
        SpeakerCode = a.Attribute("SpeakerCode").Value,
        CoSpeakerCode = LinqHelper.GetNonNull(a.Attribute("CoSpeakerCode")),
        Author = s.Element("Name").Value,
        Description = a.Element("Description").Value.Replace("\r\n","
"
)
,
        Title = a.Element("Title").Value
       
      }
    ).OrderBy(n => n.Title).ToArray();

Many of these things would have been either difficult or impossible to do in XPath (unless I'm missing some major trick).

This alone makes a very powerful case for XLinq.

Wednesday, June 3, 2009

Why would someone put business logic in a stored procedure?

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/why_would_someone_put_business_logic_in_a_stored_procedure.htm]

I have a strong bias against injecting business logic into the stored procedures. It is not scalable, is hard to reuse, is a pain to test, has limited access to your class libraries, etc... However, I think there are some legit cases to put advanced T-SQL in a SP, but you're playing with fire.

Here are some reasons - basically either performance, functionality, or legacy constraints force you to.

  1. Aggregations, Joins, and Filters - If you need to sum up 1000 child rows, it's not feasible to return all that data to the app server and sum it in C#. Obviously using the SQL sum() function would be the better choice. Likewise, if your logic is part of a complex filter (say for a search page), it performs much faster to filter at the database and only pull over the data you need.
  2. Batch calls for performance - SQL Server is optimized for batch calls. A single update with a complex where clause will likely run much faster than 1000 individual updates with simple where clauses. For performance-critical reasons, this may force logic into the stored procedure. However, in this case, perhaps you can code-generate the SQL scripts from some business-rules input file, so you're not writing tons of brittle SQL logic.
  3. Database integration validation - Say you need to ensure that code is unique across all rows in the table (or for a given filter criteria). This, by definition, must be done on the database.
  4. Make a change to legacy systems where you're forced into using the SP - Much of software engineering is working with legacy code. Sometimes this forces you into no-win situations, like fixing some giant black box stored procedure. You don't have time to rewrite it, the proc requires a 1 line change to work how the client wants it, and making that change in the stored proc is the least of the evils.
  5. The application has no business tier - Perhaps this procedure is for a not for an N-tier app. For example, maybe it's for a custom report, and the reporting framework can only call stored procs directly, without any middle-tier manipulation.
  6. Performance critical code - Perhaps "special" code must be optimized for performance, as opposed to maintainability or development schedule. For example, you may have some rules engine that must perform, and being closer to the core data allows that. Of course, sometimes there may be ways to avoid this, such as caching the results, scaling out the database, refactoring the rules engine, or splitting it into CRUD methods that could be batched with an ORM mapping layer.
  7. Easy transactions - It can be far easier for a simple architecture to rollback a transaction in SQL than in managed code. This may press developers into dumping more logic into their procs.

Note that for any of these reasons - consider at least still testing your database logic, and refactoring your SQL scripts.

These "reasons" are probably a bad idea:

  1. Easy deployment - You can update a SQL script in production real quick! Just hope that the update doesn't itself have an error which accidentally screws all your production data. Also consider, why does the procedure need to be updated out-of-cycle in the first place? Was it something that would ideally have been abstracted out to a proper config file (whose whole point is to provide easy changes post-deployment)? Was it an error in the original logic, which could have been more easily prevented if it had been coded in a testable-friendly language like C#? Also, keep in mind that you can re-deploy a .Net assembly if you match its credentials (strong name, version, etc...), which is very doable given an automated build process.
  2. It's so much quicker to develop this way! Initially it may be faster to type the keystrokes, but the lack of testability and reusability will make the schedule get clobbered during maintenance.

In my personal experience, I see more data in the SP for the bad reasons ("it's real quick!")- the majority of the time it can be refactored out, to the benefit of the application and its development schedule.