Saturday, January 13, 2007

Make Every Web Page Printer friendly

You found an article online and want to print it to nicely punch thee holes on the side and store it with other useful stuff. Almost every site nowadays provides a printer-friendly page of this and that, but some sites don't and you're still stuck with staring at their printed navigation bar, footer, ads, etc, for ever and ever. All this online decor means nothing on paper. How do you strip page parts that are irrelevant for print?

What we've acquired with the electronic age is an unquenchable thirst for printing. The bigger the online archives the more we print. I think no PDAs or TabletPCs will change this trend.

The Print Media

The CSS standard defines a number of media types. Among them are screen and print. The screen type is intended for computer screens. The content styled for this media type is what you see in your browser window. The print media is for screen print previews and actual printing.

The trick here is to define two stylesheets for your site:

  • one for screen
  • one for print

Trimming The Template

It's easier to design a stylesheet for printing if your site follows some templated layout. There are different ways to apply page templates in ASP.NET and there's enough information about it on the web. To start off, you need to hide whatever loses sense on paper, namely: navigation, menus, footers (unless you absolutely want to retain copyrights and disclaimers), etc. You can hide them with a simple CSS declaration.

For example, you have a <div> with navigation at the top of each page. Provided its ID is nav you write the following CSS rule:

#nav {
display: none;
}

This rule takes the navigation bar out of the page flow and hides it. Along the same lines you remove other page elements.

As a personal suggestion: hide banners and ads. For example, if you have several <div> tags with class="adv", remove them the same way:

.adv {
display: none;
}

Most likely you get paid per click, not per print-and-annoy. I can't click your banner or ad. Even if I see a Google Ad hyperlink I still can't click it. Besides, the idea of looking at advertisements forever doesn't excite me.

By now you should hide everything you don't want to appear on a printed page.

Change Colors

I suggest you give the content a white background and make the text black. There's no purpose in leaving color (unless you target color printers for some reason). Besides, printers won't print your color background anyway. Remember to also remove background images and watermarks the same way.

body {
background: white;
color: black;
}

A few words about hyperlinks. When printed their styles won't matter. What difference does it make if you assigned a different color to hovered links? Therefore you can safely drop hovers altogether. There's a neat trick to style hyperlinks even further. We'll get to it in a minute.

Change Units

It's a widely held belief that using points with fonts is a very bad idea when designing for the web. Pixels, ems, percentages beat points on the screen. However, when you design a stylesheet for print pixels become a lot less useful and points make perfect sense. Therefore we express font sizes in points here:

body {
background: white;
color: black;
font: 11pt;
}
Still not convinced about the usefulness of points? Eric Meyer provides an excellent explanation of why points fit the bill for print:

Points are real-worlds measures, like inches or meters. There are 72 points to an inch, which makes 12 points one-sixth of an inch. It's a fairly standard text measure in print, and we're working in print now. Print, being a physical medium, is an excellent place to use physical measures like centimeters, picas, or points. That's how points suddenly become useful—and, by implication, how pixels suddenly become a lot less useful.

That's because there's no clearly defined mapping between pixels and the physical world. How many pixels should there be per inch? Some claim it should be 72ppi, but others hold out for 90ppi, 75ppi, or some other number. So when we go to print, which is a physical medium, pixels become a lot less useful than they are on screen.

Source: Eric Meyer on CSS, Project 6 "Styling For Print".

Exercise sound judgment when it comes to font size. I find some sites overzealous as they choose font that is too large. You print a small article and it spreads out over a dozen pages. Try a couple of different sizes with your fonts.

Adjust Width, Height And Margins

It's good style to not stretch your text across the entire screen because it becomes daunting to read it. Most designs go for 760px which looks just right on monitors with the 800x600 resolution. I prefer to clear out the width on the content and let default printer margins kick in. If the text is too wide, add some right and left margins. Using padding is risky since some browsers don't always play by the rules when it comes to padding.

Linking Print Stylesheet

Remember to link the screen stylesheet with the media="screen" attribute:

<link rel="stylesheet" type="text/css" href="screen.css"
media="screen" />

What happens if you omit this attribute? The stylesheet will apply to all (!) media, including print. This may lead to conflicts in rule definitions in the screen and print stylesheets with one taking over the other. Ideally you should have this:

<link rel="stylesheet" type="text/css" href="screen.css"
media="screen"/>
<link rel="stylesheet" type="text/css" href="print.css"
media="print"/>

Styling Hyperlinks

A bit of underlined text signifying a hyperlink is of little use in print. One neat trick CSS-compliant browsers can do is insert generated content. We can insert the URL of each link right after it in parenthesis:
a {
color: inherit;
text-decoration: none;
}

a:link, a:visited {
text-decoration: underline;
}

#main a:link:after,
#main a:visited:after {
font-size: 85%;
content: " [" attr(href) "]";
text-decoration: none;
}

#main a[href^="/"]:after {
content: " [http://www.yoursite.com" attr(href)
"]"
;
}

#main a[href^="javascript:"]:after {
content: " ";
}

This approach is used extensively at Netscape Devedge. If a hyperlink starts with a forward slash (/) the url of your site (www.yoursite.com) is prepended automatically. If it's a hyperlink which invokes some JavaScript code nothing is appended.

As of the time of this writing, and for the foreseeable future, Internet Explorer 6.0 is not fully CSS compliant and misses some really important bits and pieces of the CSS specification. Generated content is one of them. Therefore IE will ignore all these hyperlink decorations and your users will have to put up with printed hyperlinks that don't even show where they lead. Shame. Try printing in Opera of Mozilla Firefox and links will have their URLs appended.

Check Your Progress

Printing a page at a time to see how your stylesheet is coming together is a major hassle. To debug your print stylesheet you need to switch the two temporarily. You can change the stylesheet intended for screen to something that is not supported, such as tty. At the same time you edit the print stylesheet link as follows:

<link rel="stylesheet" type="text/css" href="screen.css"
media="tty" />

<link rel="stylesheet" type="text/css" href="print.css"
media="screen" />
With this change your print stylesheet will be used to style content on the screen. Instead of printing to verify your progress all you have to do is refresh the page! Remember to change media types back when you're done.

Are We There Yet?

Did we get it perfect? No. Feel free to add more stylistic tweaks to your print stylesheets. What I described here should suffice for most sites, though. For more ideas see CSS Design: Going to Print at A List Apart.


Also, Derek Featherstone outlines an interesting technique to style any web site
out there to your liking. See his article Print It Your Way.

The End Result

In the end you should have every web page being printer-friendly. You don't have to devise any special "printer-friendly" pages. The benefits should be clear: (1) simpler code maintenance since you do not need to maintain separate "printer-friendly" pages, and (2) happier users since they don't have to wade through meaningless content.

Session_Start or Session_OnStart?

Ever since I started to develop with ASP.NET, I’ve been wondering why global.asa from the ASP days quietly moved over to ASP.NET as global.asax. When you look at it, it just feels so outdated, so VBScript-ish, so loosely–typed. And what is the right way to name event handlers?

Even MSDN documentation is contradictory on this. For example, which is a correct handler for the Session.Start event? Session_Start or Session_OnStart? Documentation lists both in different places, and when you look it up newsgroups, you’ll read some pretty opinionated arguments about each.

For example, MSDN states:

You can use the Global.asax file to synchronize any event that is exposed by the HttpApplication base class. To do this, you must use the following naming pattern to author methods:

Application_EventName(AppropriateEventArgumentSignature)

According to Framework Design Guidelines, section 5.4.1, Custom event handler design, an “appropriate event argument signature” is one with a return type of void; object as the type of the first parameter of the event handler, and called sender; System.EventArgs or its subclass as the type of the second parameter of the event handler, called e.

Nevertheless, you see the following code snippet time and time again:

void Session_OnStart() {
// Session start-up code goes here.
}

void Session_OnEnd() {
// Session clean-up code goes here.
}

First, what happened to an appropriate signature? Second, the Session_End event is so flaky that you shouldn’t bank on it.

Here’s another quote from Handling Public Events:

In any of these classes, look for the public events they define. You can hook them in Global.asax using the syntax ModuleName_OnEventName. For example, the Start event of the SessionStateModule module is declared protected void Session_Start(Object sender, EventArgs e) in Global.asax. You can use the syntax Session_OnStart in a <script runat="server"></script> block in an ASP.NET application file to define an event handler for the Start event of the SessionStateModule.

So do you need that On prefix or not? And should handlers be private, protected or public? No wonder there’s so much confusion over this.

Quiz

How about a quiz? What if you define the following methods in global.asax (yes, all of them):

  • void Session_Start(object sender, EventArgs e)
  • void Session_Start()
  • void Session_OnStart(object sender, EventArgs e)
  • void Session_OnStart()
  • void Session_Start(object sender)

Having these five event handlers, which one(s) will be called? Place your bets.

And the Winner Is…

It turns out all of them, except the last one, will be called, and in the same order as listed! Not that anyone would want to declare more than one, but it demonstrates that the On prefix does not matter. Neither does it the access modifier—private, protected or public—matter. As long as the handler has no arguments, or has two arguments that resemble a correct signature, it will be called.

Digging Deeper

I got curious why this was taking place and started digging with Reflector. Eventually, I found a method, HookupEventHandlersForAppplicationAndModules, inside the HttpApplication class. I believe this is where event handlers of modules are magically wired.

This method goes through a list of methods that look like event handlers, extracts the part before the underscore (e.g. “Session” in Session_OnStart), and looks up an HttpModule with this name. It then extracts the part after the underscore (taking into account the optional On prefix), and then creates a delegate with the extracted name. Next, the method tries to add this delegate to the right event in the identified module. If an event handler had no parameters, a special ArglessEventHandlerProxy comes to the rescue.

Which Methods Are Picked Up?

The only missing bit is which methods in global.asax look like event handlers. You can see the algorithm in the ReflectOnMethodInfoIfItLooksLikeEventHandler method of HttpApplicationFactory. In a nutshell, this method checks if the method has a return type of void; and if the first argument to a method is of type object and the second—of type EventArgs or a derivative thereof. If the signature is satisfactory, the method is passed on as a possible event handler. A special case is a method without parameters—it’s treated as a potential event handler as well.

Conclusion

I hope at this point it is clear why all variations of Session.Start event handlers in global.asax, except the last one listed, were called. I still don’t understand the reason for all this late binding and the need to drag this file around. At least this article should settle disputes about the “proper” naming of event handlers within global.asax.

The Dark Side of File Uploads

I saw a December MSDN article, entitled Uploading Files in ASP.NET 2.0, and wanted to offer my comments on some gotchas with uploading files. I’ve spent countless hours and tried numerous hacks to tame file uploading and have enough bruses from hitting my head against the wall (figuratively speaking).

ASP.NET 1.x shipped with the HtmlInputFile control, while 2.0 has a brand new, FileUpload control, although its HTML counterpart is still there. As a quick recap, you declare an upload control as follows:

[ASP.NET 1.x]
Select File To Upload to Server:
<input id="MyFile" type="file" runat="server" />

[ASP.NET 2.0]
Select File To Upload to Server:
<asp:FileUpload id="FileUpload1" runat="server" />


Uploading files in ASP.NET is very inefficient. To be fair, IIS is a bigger offender than ASP.NET itself. When you pick a file and submit your form, IIS needs to suck it all in and only then you have access to the properties of uploaded file(s). IIS 5 does it this way. IIS 6 does it this way. IIS 7 promises to be more like Apache in this respect. Until then, there’s not much you can do about the fact that you have to sit through a long upload and wait. Neither can you display a meaningful progress bar because there’s no way to know how much is transmitted at any given
time.

Once IIS buffers your upload, ASP.NET takes it from there. By default, you can upload no more than 4096 KB (4 MB). To raise this limit, you need to adjust maxRequestLength in the <httpRuntime> config section.

The larger the file, the longer it takes to upload. ASP.NET kills requests that take too long; consequently you also need to increase executionTimeout.

In 1.0 and 1.1, the default is 90 seconds, in 2.0—110 seconds.

There’s also a new shutdownTimeout attribute, but I don’t understand its purpose yet.Files That Are Too Large


It gets really interesting if someone uploads a file that is too large. Regardless of what your maxRequestLength setting mandates, IIS has to guzzle it (remember?), and then ASP.NET checks its size against your size limit. At this point it throws an exception. Peek inside the GetEntireRawContent() method of HttpRequest and you see this:

HttpRuntimeConfig config1 =
(HttpRuntimeConfig) this._context.GetConfig("system.web/httpRuntime");

int num1 = (config1 != null) ? config1.MaxRequestLength : 0x400000;
if (this.ContentLength > num1)
{
this.Response.CloseConnectionAfterError();
throw new HttpException (400,
HttpRuntime.FormatResourceString("Max_request_length_exceeded"));
}
The rest of this method assembles the file piece by piece in case the file was preloaded only in part, and then checks if its size exceeds the imposed limit. In either case, if an end-user uploads an oversized file, he/she will see a timeout “white page of death”. I put together a sample project with a custom error page, but I always get a white page instead.

Since it is theoretically possible that the file isn’t loaded in full, I’d like to know if one can configure IIS to read it in chunks. I haven’t seen any guidance on this, and I’ve never seen articles that explain it. If somebody out there knows, please share.

How Do I Save Face?

It’s difficult to explain to an end-user that it’s not their fault that the file happened to be too large or the page took too long to upload a file and timed out. Is there a way to tap into this process early and save face by failing gracefully? I have a couple of ideas, none of them perfect.


You may override Page.OnError and inspect the HTTP code, which should be 400, if the exception happens to be of type HttpException. This is kludgy.

You may also implement an HttpModule, set up its BeginRequest handler and compare Request.ContentLength with the size limit (which you can read straight from web.config). If ContentLength is too high, redirect to a page with a meaningful error message. I believe ContentLength may or may not reflect the total size of the uploaded file, so this approach is
not 100% accurate.

Custom HttpModule to Track Progress

I think the best and most accurate solution would be to implement an HttpModule whose sole purpose would be to read a file in chunks and keep the page alive. This way it won’t time out, and you’ll be able to track progress and cancel an upload. Telerik has such a server control + HttpModule combo for sure. Other vendors should have similar offerings.

Uploading Multiple Files

The samples I’ve seen demonstrate 3–5 file field controls, all statically declared. Why 3? Why 5? What’s the magic number? There’s none. Since none of them show how to add file fields on the fly, I decided to write a sample that does.

When you upload several files from the same page, you can access them all via the Request.Files collection:

HttpFileCollection uploads = HttpContext.Current.Request.Files;

Let’s declare one file field which you can treat as an instance of HtmlInputFile thanks to the runat="server" attribute.

<p id="upload-area">
<input type="file" runat="server" size="60" />
</p>
<p>
<a href="#" onclick="addFileUploadBox(); return false;">Add file</a>
</p>
<p>
<asp:Button ID="btnSubmit" runat="server"
Text="Upload"
OnClick=
"btnSubmit_Click" />
</p>

This snippet has a link which ads a file field on the fly when clicked. addFileUploadBox is a JavaScript function that performs some DOM manipulation. I noticed that as long as you have at least one HtmlInputFile or FileUpload contol on the page, you can add as many other file fields as you want, and ASP.NET will nicely package them into the Request.Files collection. Go figure.

By clicking the Add file link you add multiple file fields, assign their id and name attributes (otherwise corresponding files won’t be thrown into Request.Files), and add them to the download area.

All this with only a single server-side upload control! The server-side code on the bottom shows how to process all uploaded files. Remember to check for zero file size in case someone added an upload box but didn’t pick a file. Feel free to copy and paste sample code and play with it.

Conclusion

File uploading in ASP.NET is a very imprecise and imperfect science. This is one area I’d love to see improved in the future. If you find yourself struggling with it, don’t worry—you’re in good company. Stick with stock server controls for rudimentary uploads. Otherwise look around for third-party products.

Friday, January 12, 2007

Anti-Cross Site Scripting Library

Cross-site scripting (XSS) attacks exploit vulnerabilities in Web-based applications that fail to properly validate and/or encode input that is embedded in response data. Malicious users can then inject client-side script into response data causing the unsuspecting user's browser to execute the script code. The script code will appear to have originated from a trusted-site and may be able to bypass browser protection mechanisms such as security zones.

These attacks are platform and browser independent, and can allow malicious users to perform malicious actions such as gaining unauthorized access to client data like cookies or hijacking sessions entirely.

Simple steps that developers can take to prevent XSS attacks in their ASP.NET applications include:

  • Validating and constraining input
  • Encoding output
For defence in depth, developers may wish to use the Microsoft Anti-Cross Site Scripting Library to encode output. This library differs from most encoding libraries in that it uses the "principle of inclusions" technique to provide protection against XSS attacks. This approach works by first defining a valid or allowable set of characters, and encodes anything outside this set (invalid characters or potential attacks). The principle of inclusions approach provides a high degree of protection against XSS attacks and is suitable for Web applications with high security requirements.


Frequently Asked Questions

Q. I am currently using the .NET Framework System.Web.HttpUtility.HtmlEncode and other encoding methods in this class to encode output. Does the Microsoft Anti-Cross Site Scripting Library address a vulnerability in these methods? Are the encoding methods provided in the .NET Framework safe to use?

A. The encoding methods native to the .NET Framework are safe to use and no vulnerability is being addressed by this release of the Microsoft Anti-Cross Site Scripting Library. The Microsoft Anti-Cross Site Scripting Library differs from these methods in that it uses the principle of inclusions technique, which first defines a set of valid characters so that anything outside that set is automatically encoded.

Q. If the encoding methods in the .NET Framework are safe to use, why would I use the methods in the Anti-Cross Site Scripting Library instead?

A. The Anti-Cross Site Scripting Library uses the principle of inclusions technique to provide protection against XSS attacks that some regard as industry best practice. Both this library and the .NET Framework encoding methods are safe to use and provide good protection against XSS attacks. The Anti-Cross Site Scripting Library now provides you with the option to use an encoding library that follows the principle of inclusions school of thought.

Q. The ASP.NET server controls (like TextBox, BulletedList, and so on) use the existing encoding methods in the .NET Framework. Why should I use the methods in the Anti-Cross Site Scripting Library when my server controls use the methods from the .NET Framework? Is there any way to force the server controls to use the methods from the Anti-Cross Site Scripting Library?

A. There currently is no way to force existing server controls to use the Anti-Cross Site Scripting Library. ASP.NET server controls that encode using methods from the Anti-Cross Site Scripting Library will be provided in future releases of this library.

Q. Are there any additional resources I can read to learn how to protect my Web applications against XSS attacks?

A. Yes, please refer to the following resources from the patterns & practices teams:


Download
Microsoft Anti-Cross Site Scripting Library V1.5 Download
Tutorial
How to Use the Microsoft Anti-Cross Site Scripting Library V1.5 to Protect the Contoso Bookmark Page