timstall

Wednesday, June 10, 2009

Real life: Avoiding customization to build a Sandbox

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/real_life_avoiding_customization_to_build_a_sandbox.htm]

I have three kids - two sons and a daughter. Cute little buggers, so I wanted to make them a sandbox to play in. I figure doing something physical and outdoors, as opposed to watching TV, would be good for them. Plus, I'm all for anything that even remotely encourages them to do engineering. However, I have very, very, minimal wood-working skills. When it comes to woodworking, I am (at best) a hobbyist - by no means an expert. This means I was just trying to build a simple sandbox that works - no fancy wood cutting, things that take big vocabulary to describe, or expensive tools required. I made a crude box-like design, drove to the local home depot, and got help picking out the wood (12x1x6 inch cedar). The only cuts I made where ones that the home depot guy could do in the store - so no triangle, notched, or diagonal cuts. I hauled my precious wood back to my garage (read: not professional tool workshop), applied one coat of polyurthethane something (read: I hope that helps protect against weathering), and hammered the boards together. After digging the box into the ground and filling it with sand, it was good enough and I was done.

Why dump such a story on my technical blog? Because my hobbyist mentality towards a wood sandbox is essentially the same as many "programmers" hobbyist mentality towards the craft of software engineering. We both just want to get it done, make the end users happy, and maybe enjoy it along the way. If we miss out on an optimal technique, that's okay. Working with other people, it can be useful to understand that mindset.

And yes, the kids loved it.

Tuesday, June 9, 2009

An easy way to hack unvalidated web input

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/an_easy_way_to_hack_unvalidated_web_input.htm]

Many security bugs get overlooked because hackers use "special" tools that give them options that the original developers anticipate. For example, most web developers only test their application through a browser. While this is sometimes sufficient to check different querystring inputs, it gives one an enormous false sense of security. Consider two cases:

A button is disabled, therefore the developer assumes you can't click it (Example: A Save button is disabled because some data isn't valid)
Data is stored in a hidden field, therefore the developer assumes that you can't change it. (Example: A record's ID is stored in a hidden field)

To some people, this appears safe because IE doesn't let you click disabled buttons or change the value of hidden fields. However, IE is only used to collect the data from the page and aggregate it into a post that it sends to the webserver. What if the hacker is using a different tool besides IE that gives them more control on how to assemble and send this post (or an extension to IE that lets them modify fields)?

How to hack it

You could crack both the above cases using Visual Studio Team Test (or I bet Firebug, or the IE8 dev tools). Here's how:

Open up VS Team Test
Create a Test Project, and add a new web test. This opens up a recorder in IE.
Navigate to the page you want to crack, such as something that stores data in a hidden field.
Perform a normal save action, such as saving the page (which uses data from the hidden field)
Stop the recording. Play back the test just to make sure it works in the normal case. Note how for each request, VS provides you a new row and lets you see the exact request and response, as well as what visually appears in the Web Browser.
Now start the hacking. Go to the request where you saved the page (that collected data from the hidden field), and unbind that field and change its value to whatever you want. In this case, I simply had a field called "Hidden1", and I changed it's value to "abc". Note that just like IE, the webtest sends a post to the server. But unlike IE, you can easily change the contents of that post.
Rerun the webtest. The new, hacked, value is sent to the server.
Note that you can also script VS WebTests with C#. So, you could have a for-loop that re-submits a 1000 requests, varying the post values each time.

So, what can we do?

As a developer, we always need to validate and do security checks on the server - merely JS side validation is not enough. For example, knowing that that hidden field could be hacked, always validate that data as if it were a free-form textbox.

The problem is that this degree of extra validation (1) costs much more time, and (2) could be a big performance hit - which is especially ironic because it's for something that 99% of users won't even know about.

Allotting time requires managerial approval. However, most managers see security as one of those "nice-to-have" buzzwords, that they'll have implemented later "if time permits".

I think a good solution to this is to give management a live demo of your site being hacked (perhaps try it on the QA machine, not necessarily production). To really make a point, see if you can hack into their profile. Hopefully that will convince management that it's worth the time.

About the performance hit - I don't see a silver bullet. Most application performance can be improved by adding more resources somehow - more hardware, better design (caching, batching), coding optimizations, etc... Because applications vary so much, there's not a one-size-fits-all solution.

Monday, June 8, 2009

BOOK: Pragmatic Thinking and Learning: Refactor Your Wetware

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/book_pragmatic_thinking_and_learning_refactor_your_wetware.htm]

I'm fascinated with learning how to learn, so I was excited to finally read Andy Hunt's Pragmatic Thinking and Learning: Refactor Your Wetware. Recall that Andy is one of the co-stars of the hit The Pragmatic Programmers.

This is a good example of a higher-level, "non-syntax" book, something that transcends the "How to program XYZ" genre. (Shameless plug: I had written my own book: A Crash Course in Reasoning, but I can see why Andy's is in the top 3000 Amazon sales rank, and mine is barely in the top 3 million).

My favorite chapter was "Journey from Novice to Expert", as there is such a huge productivity gap here. He also continually emphasized the differences between the two parts of the brain, comparing it to a dual CPU, single master bus design.

It was an enjoyable read, similar to picking desserts out of a buffet. He had a lot of good quotes throughout the book:

"... software development must be the most difficult endeavor ever envisioned and practiced by humans." (pg. 1)
"... it's not the teacher teaches; it's that the student learns." (pg. 3)
"Don't succumb to the false authority of a tool or a model." (pg. 41)
"If you don't keep track of great ideas, you will stop noticing that you have them." (pg. 53). This is huge. The "slow times" during the day (driving, waiting in line, burping a sleeping baby) are great for mulling over random ideas. It's almost like collecting raindrops. I used to do this, but got lazy the last few years. Andy's chapter inspired me to go out, get some pocket-sized notebooks, and start jotting down random thoughts again (read: future blog entries).
"Try creating your next software design away from your keyboard and monitor..." (pg. 72). It's ironic, but often sitting in front of the computer, with all the internet distractions, can kill one's creativity.
"So if you aren't pair programming, you definitely need to stop every so often and step away from the keyboard." (pg. 85). I've seen many shops that effectively forbid pair programming, so this at least is a way to partially salvage a bad situation.
"... until recently, one could provide for one's family with minimal formal education or training." (pg. 146)
"... relegating learning activities to your 'free time' is a recipe for failure." (pg. 154)
"... documenting is more important than documentation." (pg. 179). The act of documenting forces you to think through things, where design costs upfront are much cheaper than implementation costs later.
"... we learn better by discovery, not instruction." (pg. 194).
"It's not that we're out of time; we're out of attention." (pg. 211)

Perhaps the best effect from reading this kind of book is that it makes you more aware, such that your subconscious mind is constantly thinking about learning.

Sunday, June 7, 2009

Farewell Paylocity, Hello Career Education Corporation

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/farewell_paylocity_hello_career_education_corporation.htm]

I've had the good fortune to serve at Paylocity the last four years. Inheriting a legacy system, I got the opportunity to develope, contribute to process, grow as an architect, and work with a very classy team.

For a variety of personal and professional reasons, I'll be switching to a new company soon - Career Education Corporation. CEC has a noble mission - to help people with continuing education (something I have a passion for). They have both an online and brick & mortar presence. It's an international company with several thousand employees, and a lot of opportunity. I'm looking forward to this new chapter in life.

Thursday, June 4, 2009

Why XLinq is awesome - the benefits

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/why_xlinq_is_awesome__the_benefits.htm]

I'm a big fan of Xml, especially for internal tools and processes. In order to juggle xml, I'd usually resort to XPath queries, the XmlDocument, and Xml Serializers. However, the new XLinq that came out with .Net 3.0, seems to completely overpower XPath for C# applications. It rocks.

You can see a general overview here. I also found the Linq and Xlinq chapters from C# 3.0 in a Nutshell to be wonderful. Linq alone is powerful, and XLinq applies linq to xml. There are several specific benefits of XLinq that I want to highlight:

Easy round-tripping from files or strings:

XLinq makes it very easy to load or save files from either a string or uri.

[TestMethod]
public void Create_1()
{
  //Load from string
  XElement x1 = XElement.Parse(
    @"

          30

      ");

  //Load from uri
  XElement x2 = XElement.Load(@"C:\temp\x2.xml");

  //save to string
  string s = x1.ToString();

  //save to file
  x2.Save(@"C:\temp\x2a.xml");
}

Easy DOM access and modification

I recall with XmlDocument (at least how I understood it), nodes were tightly coupled to their original XmlDocument, so it could be a pain to pull out an xml snippet from one doc and insert it into another. With XLinq, you can pretty much juggle it any way - insert/update/remove either attributes, nodes, or inner text.

    [TestMethod]
    public void ModifyDom_1()
    {
      //Load from string
      XElement x1 = XElement.Parse(
        @"

              30

          ");

      //Get a reference to a node:
      XElement xClient = x1.Element("client");

      //Modify DOM
      // Attribute - insert new
      xClient.SetAttributeValue("new1", "val1");

      // Attribute - update
      xClient.SetAttributeValue("enabled", "false");

      // Attribute - remove
      xClient.SetAttributeValue("attr2", null);

      // Node - insert new
      XElement xNew1 = XElement.Parse("bbb");
      XElement xNew2 = XElement.Parse("ccc");
      xClient.AddAfterSelf(xNew1);
      xClient.AddAfterSelf(xNew2);

      // Node - remove
      xNew2.Remove();

      // InnerText - update
      xClient.Element("timeout").Value = "60";

      //save to string
      string s = x1.ToString();
    }

Populating objects from an XLinq Query:

You can see a ton of gems in this snippet from the Chicago Code Camp website:

Say you have a class "Abstract" with properties AbstractCode, SpeakerCode, CoSpeakerCode, Author, Description, and Title.

There are two xml files - one for "Abstracts" and the other for "Speakers"

    Homer Simpson
    homer@email.com
    Nuclear engineer...

  ...


    How to be a good employee

      Eat donuts and sleep...


  ...

The following XLinq snippet shows how to:

Instantiate an array of Abstract objects from xml by mapping the node, attribute, and inner text values.
Filter the xml file by multiple expressions and complex logic
Join multiple xml files together, such as the Abstract and Speaker xml files via a common attribute (SpeakerCode)
Use external functions (LinqHelper.GetNonNull) in your XLinq query.
Apply transformations (.Replace() ) to the xml data you're reading
Order it all.

    XElement xeAbstracts = XElement.Load("Abstracts.xml");
    XElement xeSpeakers = XElement.Load("Speakers.xml");

    Abstract[] abs =
    (
      from a in xeAbstracts.Elements("Abstract")
      from s in xeSpeakers.Elements("Speaker")
      where a.Attribute("SpeakerCode").Value
        == s.Attribute("SpeakerCode").Value
      select new Abstract()
      {
        AbstractCode = a.Attribute("AbstractCode").Value,
        SpeakerCode = a.Attribute("SpeakerCode").Value,
        CoSpeakerCode = LinqHelper.GetNonNull(a.Attribute("CoSpeakerCode")),
        Author = s.Element("Name").Value,
        Description = a.Element("Description").Value.Replace("\r\n","
"),
        Title = a.Element("Title").Value

      }
    ).OrderBy(n => n.Title).ToArray();

Many of these things would have been either difficult or impossible to do in XPath (unless I'm missing some major trick).

This alone makes a very powerful case for XLinq.

Wednesday, June 3, 2009

Why would someone put business logic in a stored procedure?

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/why_would_someone_put_business_logic_in_a_stored_procedure.htm]

I have a strong bias against injecting business logic into the stored procedures. It is not scalable, is hard to reuse, is a pain to test, has limited access to your class libraries, etc... However, I think there are some legit cases to put advanced T-SQL in a SP, but you're playing with fire.

Here are some reasons - basically either performance, functionality, or legacy constraints force you to.

Aggregations, Joins, and Filters - If you need to sum up 1000 child rows, it's not feasible to return all that data to the app server and sum it in C#. Obviously using the SQL sum() function would be the better choice. Likewise, if your logic is part of a complex filter (say for a search page), it performs much faster to filter at the database and only pull over the data you need.
Batch calls for performance - SQL Server is optimized for batch calls. A single update with a complex where clause will likely run much faster than 1000 individual updates with simple where clauses. For performance-critical reasons, this may force logic into the stored procedure. However, in this case, perhaps you can code-generate the SQL scripts from some business-rules input file, so you're not writing tons of brittle SQL logic.
Database integration validation - Say you need to ensure that code is unique across all rows in the table (or for a given filter criteria). This, by definition, must be done on the database.
Make a change to legacy systems where you're forced into using the SP - Much of software engineering is working with legacy code. Sometimes this forces you into no-win situations, like fixing some giant black box stored procedure. You don't have time to rewrite it, the proc requires a 1 line change to work how the client wants it, and making that change in the stored proc is the least of the evils.
The application has no business tier - Perhaps this procedure is for a not for an N-tier app. For example, maybe it's for a custom report, and the reporting framework can only call stored procs directly, without any middle-tier manipulation.
Performance critical code - Perhaps "special" code must be optimized for performance, as opposed to maintainability or development schedule. For example, you may have some rules engine that must perform, and being closer to the core data allows that. Of course, sometimes there may be ways to avoid this, such as caching the results, scaling out the database, refactoring the rules engine, or splitting it into CRUD methods that could be batched with an ORM mapping layer.
Easy transactions - It can be far easier for a simple architecture to rollback a transaction in SQL than in managed code. This may press developers into dumping more logic into their procs.

Note that for any of these reasons - consider at least still testing your database logic, and refactoring your SQL scripts.

These "reasons" are probably a bad idea:

Easy deployment - You can update a SQL script in production real quick! Just hope that the update doesn't itself have an error which accidentally screws all your production data. Also consider, why does the procedure need to be updated out-of-cycle in the first place? Was it something that would ideally have been abstracted out to a proper config file (whose whole point is to provide easy changes post-deployment)? Was it an error in the original logic, which could have been more easily prevented if it had been coded in a testable-friendly language like C#? Also, keep in mind that you can re-deploy a .Net assembly if you match its credentials (strong name, version, etc...), which is very doable given an automated build process.
It's so much quicker to develop this way! Initially it may be faster to type the keystrokes, but the lack of testability and reusability will make the schedule get clobbered during maintenance.

In my personal experience, I see more data in the SP for the bad reasons ("it's real quick!")- the majority of the time it can be refactored out, to the benefit of the application and its development schedule.

Monday, June 1, 2009

Chicago Code Camp - reflections

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/chicago_code_camp__reflections.htm]

This weekend the LCNUG and Alt.Net groups hosted a very successful Chicago Code Camp at the College of Lake County in IL. We had about 150 people, 30 speakers, and a lot of knowledge transfer. I was privileged to be part of the group that helped facilitate it. We're in the process of trying to put speaker ppts and code samples up on the website.

Some misc reflections:

How did we make that massively awesome code camp website? We maintained it in ASP.Net, used xml files for the abstracts and speaker bios, and then had an automatic tool take screen scrapes of all the generated aspx pages. This saved the results as html files, which could then easily be xcopy-deployed to any FTP server without worrying about a backend data store or server-side support.

Did you have enough volunteers? I think in the end yes. People stepped up at the spur of the moment. I especially was impressed with how easy it was to move tables when you have 150 people. For lunch and cleanup, we needed to transport a lot of tables and boxes, and random people kept jumping in to help. What a great community!

Sounds great, but too bad I missed it. While we hope to get the ppt slides up soon, also consider checking out the blogs of the speakers. Even if the blog isn't explicitly listed, you can probably find it by googling the speaker name and adding the ".net" or "developer" keyword to the search.

What's next? Stay in touch with the Chicagoland community. Perhaps subscribe to some of the speaker blogs, or visit the LCNUG, ALT.Net, or any of the other user groups.