Monday, September 5, 2005

Excel Migration Part II: Calling a .Net Process

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/excel_migration_part_ii_calling_a_net_process.htm]

In my previous blog I discussed extending Excel with VBA vs. re-writing the app. I said I wanted to create a hybrid solution. Here I'll propose a solution for using .Net to improve how you view and share the final data with clients.

Essentially we want to create a "thin" report of our final data. This report is thin in the sense that it only contains the data and the means to present it. It does not contain calculation logic, validation, user interactions, security, etc... The context is that you've used your Excel sheet to calculate the data needed for a widget, and now you want to send that data to a client. You don't want to send them the entire workbook, as this is:

  • Platform Dependent - it requires that they have not just Excel, but the correct version of Excel.
  • Bulky - It physically takes up more space (you store all the rows, and all the calculations for each row)
  • Messy - while you can format things nicely in Excel (such that it prints okay), this is extra work to do. It is also much harder when you only want to display one row.
  • Excess - Suppose you're using the same sheet to manage widgets for 10 different clients, you only want to send one (or several) rows of output - not every row in your sheet.
  • Unsecure - Say you only want to send your output, not all the calculations you used. Your calculations may even be intellectual property that can't be distributed. Sure you could put a password on the Excel sheet, but that is an extra step and still gives away your calculations - albeit in a locked format. Wouldn't it be ideal to not even send the calculations at all?
  • Changeable - suppose you're sending this to a manager - do you really want to risk that manager "accidentally" fiddling with your calculations or breaking the Excel sheet? Ideally you could give them an idiot-proof document with no "movable parts" that cannot be broken.

One solution that solves all of this is to have a tool generate an Html report from only the Excel output that you want. Html would have the benefits:

  • Platform Independent - it has no need for Excel. Anyone in the world can read HTML
  • Light weight - a small, text-based file.
  • Easily formatted
  • Minimal - only send the output you want
  • Secure - it only includes the results, none of the proprietary calculations used to get those.
  • Unchangeable - a pure Html doc has no "movable parts". It is read only and impossible even for the most clumsy user to mess up.

How we would implement this:

  1. Create a .Net Console App that takes in the necessary info to read the Excel book and create the desired Html Report
  2. In Excel, have a processes to call this app via the command line.

Say this .Net console app, called CreateHtmlReportConsole,  would take four parameters: the physical path to the Excel book, the physical path to an html file that servers as a template for the report, the Excel worksheet, and the Excel row to pull data from. A sample command line may look like so (I've included the quotes in case your paths have spaces in them):

"C:\myProjects\CreateHtmlReportConsole.exe" "C:\ExcelBooks\Book1.xls" "C:\myProjects\Template.htm" 1 14

I'll show step #2 here, and discuss step#1 more in another blog. You could have a method in VBA (from a button or context menu) use the Shell command to call an external process like our .Net console app. For example:

Dim strCmd As String
Dim strCurrentDirectory As String
Dim strThisFileName As String

'Get current directory and filename:
strCurrentDirectory = ThisWorkbook.Path & "\"
strThisFileName = ThisWorkbook.Name

'Create a command line like:
' "C:\myProjects\CreateHtmlReportConsole.exe" "C:\ExcelBooks\Book1.xls" "C:\myProjects\Template.htm" 1 14
strCmd = strCurrentDirectory + "CreateHtmlReportConsole.exe """ + _
    strCurrentDirectory + strThisFileName + """ """ + _
    strCurrentDirectory + "Template.htm"" 1 " & intRow
Shell (strCmd)

Next step: how add dynamic context menus in Excel, from which you could call this code.

Thursday, September 1, 2005

Migrating an app from Excel to .Net, Part I

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/migrating_an_app_from_excel_to_net_part_i.htm]

I recently had the challenge of improving upon an Excel workbook. This was a complicated sheet, with 200 columns of ugly calculations that solved a difficult engineering problem. There were many rows, one for each widget being engineered. I'm going to blog about several issues I encountered for this.

The first issue I needed to deal with: should I extend Excel (with VBA) or rewrite the application in .Net. I see the following pros and cons:

 Extend Excel with VBARewrite in .Net
Pro
  • Potentially much quicker - not rewriting all the complex calculations
  • Excel is already designed for this kind of thing. It's easy to copy new rows, and it has a built in tools like GoalSeek (a numeric solver) of which there is no clear counterpart available in .Net (perhaps I could implement Newton's Method with numeric derivatives, but that takes some work would be very error prone for me to mess up).
  • Excel is easy to maintain for the end user, especially if they need to make a change.
  • Powerful platfom
  • Easier to test code with Test-Driven-Development (what if you need to modify a formula in your Excel book, and you want to make sure everything still works)
  • Splits out input, logic, and view into separate components. For example you could have a WinForm take (and validate) the input, a backend DLL could do all the logic, and then you could generate an Xml file or Html document with the final output results.
Con
  • Limited power, VBA is messy to deal with compared to .Net with C#, Visual Studio, and the .Net Framework
  • Excel combines the input, logic, and view all into one sheet. For example is you wanted to show the client only the calculated output, you'd either send them the entire sheet (which lets them see your calculations unless you do some fancy cell-hiding), or copy those values to a new document.
  • Lot of extra work
  • Hard for the end user to change or maintain

The final solution I'm looking at: make a hybrid. Continue using Excel for user input and calculations, extending it with VBA. But use .Net to read the Excel document and generate output views. I'll discuss this more in my next blog.

Monday, August 29, 2005

10 Tips on CodeSmith SubTemplates, Batch Mode, and XmlProperties

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/10_tips_on_subtemplates_batch_mode_and_xmlproperties.htm]

I've finished using CodeSmith to help with generating some of our tiers (or should I say "just started"?). So I posted some things I've learned from it: 10 Tips on SubTemplates, Batch Mode, and XmlProperties on the CodeSmith Peer Support Forum. This covers solutions I've found for:

  1. Passing values between subtemplates
  2. Handling that Sub template does not inherit parent's merge strategy
  3. Handling that target of RenderToFile must exist
  4. Handling Multi-level subtemplates
  5. Throwing and using errors in batch mode
  6. Embed XmlProperty filename instead of data
  7. Dynamically make your own batch file
  8. Only write new content
  9. Reusable code for XmlProperties
  10. Instantiate an XmlProperty at runtime

Details on the forum.

Wednesday, August 24, 2005

Using Sysinternal PsTools to find what has locked that process

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/using_sysinternal_pstools_to_find_what_has_locked_that_proce.htm]

Every get one of those frustrating errors "Cannot access file because it is being used by another process"? For example, you may be trying to delete or rename a file, but this error message won't let you. Sometimes it is as simple as closing an application that obviously has locked that file, but sometimes it's just more complicated - perhaps some service (which has no GUI, and therefore may be less intuitive) has locked the file. Ideally you could just click the file in Windows Explorer and it would tell you what locked the file and what to do.

There is a solution - Sysinternals provides a free tool, Process Explorer, that lets you find all the processes accessing a file. The tools requires no MSI; you can directly copy it to your machine and run it. It's like Task Manager on steroids (in a good sense).

Initially one may make sure that they don't have the file open in some sort of editor, but the problem may persist even if everything is closed. Go to Find > Find Handle, and type in the name or directory of the file. Click search, and it will show you all processes accessing that file. You can now kill those processes such that you can delete the file.

This is actually very educational - you'll see that IIS, other instances of Windows Explorer, certain background services, or apps that you wouldn't think of, all can lock files.

Monday, August 22, 2005

CodeSmith: Beyond Just Generation - Doing Simple Data Analysis

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/codesmith_beyond_just_generation__doing_simple_data_analysis.htm]

I've made several CodeSmith related posts on this blog:

There's another topic I'd like to explore - using CodeSmith for analyzing  Xml data and Database schemas. For example, suppose you wanted to compare if two tables had similar schemas. For example I wanted to see if one database had different column lengths than another for all tables in the database. There's several ways you can do this:

  • Manual --> bad (too slow, error prone, tedious, not-reproducible)
  • Build your own custom console app, use ADO.Net --> very slow to create, to easy to have errors
  • Find a custom tool specifically for DB comparisons --> yet another tool to learn, may provide far more functionality than needed
  • Use CodeSmith's Schema Explorer --> has potential!

CodeSmith's Schema explorer is extensive - it lets you start at the database and drill down to its collections of tables, columns, and individual properties. So it has the info needed for most tasks. It's also easy to use, so it saves you time in writing your own equivalent object. It's also a reusable tool. I.e. if you need to learn a technology, learn one that is helps out with more than just the current problem.  You could create two SchemaExplorer.DatabaseSchema objects, and then cycle through their tables and columns, comparing them.

CodeSmith also has the XmlProperty, which lets you analyze an Xml document. You could program compare the Xml doc to the database schema.

The "generated code" could be the results of your analysis. For example, wherever the condition you're searching for is met, you could Response.WriteLine the necessary info.

While CodeSmith is designed for CodeGeneration, it also can be used for simple analysis of data sources that are commonly used as inputs for that generate - such as Xml docs and Database schemas.

Sunday, August 14, 2005

Codesmith: Code Generation v. Refactoring

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/codesmith_code_generation_v_refactoring.htm]

Looking into CodeSmith, a good code generation tool, I started analyzing the differences between code generation and refactoring. Both help reduce redundancy. They are complimentary, with each concept having a different purpose:

  • Code Generation - Given input values and a template, automatically generate redundant code that varies only by the input values. This deals with how you create code.
  • Refactoring - Keep the functionality of your code the same, but improve the code itself via eliminating redundancy, improving clarity, etc... This deals with the final code, regardless of how that code was created.

Whenever you find yourself making repetitive code, you should at least check if these techniques can bail you out. Common examples may be (depending on your project) the data-access layer, the structure of business entities, documentation, unit tests, etc... Note that code generation and refactoring aren't opposites - but rather complementary. You can create a refactored class that still has some repetitiveness, and then code-generate it. For example, a fully refactored business entity may still have many properties that all follow the same format. For example, say your class has 10 such properties below. Given only the DataType and PropertyName, a code-generation template could crank this out:

public Double MyValue
{
    get
    {
        return _dblMyValue;
    }
    set
    {
        _dblMyValue = value;
    }
}

There are certainly places where repetitive code can be refactored such that there is no longer any redundancy, and therefore nothing to code-generate. A good code-generation tool is no excuse for not refactoring properly. However there are many places where refactoring alone is insufficient:

  1. Design time: Code generation is done at design-time, and thus offers a performance benefit over refactoring (executed at run time). Say you needed a class to access the properties of your business entities. You could use reflection at design time, or (assuming you know the entities) you could use code-generation to create the necessary code before hand.
  2. More functionality: Certain things can't effectively be refactored. If you needed to refactor a class that could handle many different Data Types, you'd need to use boxing./unboxing to handle the type-conversion. For example, pre-.Net 2.0 (which has generics) most collections, like HashTable or ArrayList, are not strongly typed. This requires boxing and unboxing which has obvious performance problems. .Net 1.1 does not provide a way to refactor the handling of various data types without the performance loss of boxing/unboxing. However you could use CodeSmith's strongly typed collection templates to automatically generate your own collections that don't require boxing. This is essentially still refactored because it provides additional functionality that the "refactored" version didn't meet - type safety.
  3. Beyond source code: A good code generation tool, like CodeSmith, can create any text file, not just object-oriented source code. For example, suppose you wanted to document your database schema. CodeSmith provides pre-packaged templates that do this. I am not a aware of a way that "refactoring" would solve this problem.
  4. Always automatic: Refactoring is great for object-oriented code with supporting unit tests. But it is difficult for many things outside of source code like documents or html pages. Short of other tools, these require a more manual approach. Code generation remains automatic

In conclusion, both refactoring and code generation are good things, but they are different and complimentary things.

Thursday, August 11, 2005

Things that fundamentally change how you program

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/things_that_fundamentally_change_how_you_program.htm]

There are a lot of things that fundamentally change how you program - you see something and the light bulb goes off. For me, the following was such a checklist (not in any order)
  • Object Oriented Code --> Many VB6 developers haven't needed to deal with OOP yet. However VB.Net is now object-oriented.
  • Unit Tests --> automated tests for specific units of code to assist with development and regression (usually run with NUnit)
  • Regex Expressions --> Manipulating text
  • Xml --> Consistent and standard data storage
  • Code Generation --> eliminate redundant code that can't just be refactored.
  • Consciously seeking Tools --> tools help you avoid or simplify tedious tasks.

Many people do some of these, but I've seen many developers reluctant to invest in the other areas. They'll write lines of text-parsing code before spending two hours investing in Regular Expressions.

I'm currently looking more into code generation with CodeSmith. It is a great tool, and I plan to post some cool findings on it.