Perhaps the most popular way to pass data between web-pages is via querystrings. This is used to both pass data to a new pop-up window, as well as to navigate between pages. (Side note: A querystring is the part of the URL that occurs after the ?. So in http://localhost/myWeb?id=3&name=Tim, id=3&name=Tim is the querystring. The querystring provides name-value pairs in the form of ?name1=value1&name2=value2...)
While this works great for simple alpha-numerics, it can be a problem to pass special characters in the URL, especially in different browsers.
- An ampersand would split the name-value pairs. (If you want to pass the value "A&B", but the & indicates a new name-value pair, then the value will be truncated to just "A". For example, in "id=A&B", getting the querystring "id" will return just "A", and B will be interpreted as its own key.
- Apostrophes, greater than or less than signs may be interpreted as a cross-site scripting attack by some security plug-ins. As a result, these plug-ins may block the entire page.
- Other special characters (like slash or space) may be lost or distorted when sending them into a url.
While some may argue that querystring values should only contain simple IDs, there are legitimate benefits to being able to pass special characters. For example:
- Legacy Systems - The client's legacy system could include & or ' in the primary key.
- Performance - You could be returning a value (such as a name like "O'reilly" or "Johnson & Sons") from a pop-up control. Just passing the id would require re-hitting the database. Therefore you could pass the name as well to help performance.
Fortunately there is a solution to handling special characters. .Net provides us the ability to Encode and Decode the URL using System.Web.HttpUtility.UrlEncode and HttpUtility.UrlDecode (note this is not HtmlEncode, which encodes html, and won't affect the &. We want Urls). This replaces problematic characters with URL-friendly equivalents.
The following table shows what UrlEncode translates:
ASCII Codes | Character | UrlEncode | |
Dec | Hex | ||
32 | 20 | + | |
34 | 21 | " | %22 |
35 | 22 | # | %23 |
36 | 24 | $ | %24 |
37 | 25 | % | %25 |
38 | 26 | & | %26 |
43 | 2B | + | %2b |
44 | 2C | , | %2c |
47 | 2F | / | %2f |
58 | 3A | : | %3a |
59 | 3B | ; | %3b |
60 | 3C | < | %3c |
61 | 3D | = | %3d |
62 | 3E | > | %3e |
63 | 3F | ? | %3f |
64 | 40 | @ | %40 |
91 | 5B | [ | %5b |
92 | 5C | \ | %5c |
93 | 5D | ] | %5d |
94 | 5E | ^ | %5e |
96 | 60 | ` | %60 |
123 | 7B | { | %7b |
124 | 7C | | | %7c |
125 | 7D | } | %7d |
126 | 7E | ~ | %7e |
While alpha-numerics aren't affected, these special characters aren't encoded either:
ASCII Codes | Character | |
Dec | Hex | |
95 | 5F | _ |
45 | 2D | - |
46 | 2E | . |
39 | 27 | ' |
40 | 28 | ( |
41 | 29 | ) |
42 | 2A | * |
33 | 21 | ! |
Side note 1: You can see the full ASCII tables online.
Side note: You can generate these tables with a simple loop like so:
|
Essentially UrlEncode replaces many problematic characters with "%" + their ASCII Hex equivalents.
Most of these remaining special characters don't pose a problem, except for the apostrophe that can cause cross-site scripting warnings. For that, one solution is to replace it with a unique token value, such as "%27". Note that we pick a reasonable token - "27" is the ASCII Hex for the apostrophe, and it follows the pattern of other Encodings. We could then write our own Encode and Decode methods that first apply the UrlEncode, and then replace the apostrophe with the token value. These methods could be abstracted to their own utility class:
|