Friday 30 September 2011

HTML5 - Offline Web Applications

As I'm sure some of you are aware, one of the more highly anticipated features of the HTML5 spec is the ability to make websites available offline. This is becoming more and more useful with the explosion of the mobile/tablet market where internet connectivity may just not be available.

I've now got a bit of experience in dealing with this part of the spec so, I thought I'd share a few things with you. For the most part, making your site available offline is pretty simple, but before we start, let me make one thing clear, mainly because this caught me out a bit...

HTML5 offline web applications only truly work with static content.

When you think about it, this makes perfect sense. Usually dynamic content will require some sort of connection to a server and if you're offline then this isn't possible but, what caught me out is that even if there is a connection to a server (i.e. you do have your internet connection), then it's still not possible to update your content, well, not easily anyway.

So, why is this? Essentially, offline support works by the developer specifying what files should be loaded into a cache (the application cache, more about this later). Your users will then hit the site the first time, download all the files asked of them and the files specified by the web developer will be put into the browsers application cache. From this point on, every time the user visits that website, their web browser will check it's application cache for each and every file required by the website, if it finds that file within the application cache then it'll load it from there, if not, it'll go fetch it from the web server. So, if you're offline and the files required are in the browsers application cache then they'll be loaded from there, the web server will never be hit and there you have it, your website is available offline. However, this process happens regardless of whether you're offline or not. This causes problems for dynamic content, take this situation for example:
  1. User A goes to a website, and the file that contains the latest news story is put into the users application cache. 
  2. User A re-visits that site a few minutes later, the latest news story is loaded from the application cache but as the latest news story hasn't changed, everything looks fine.
  3. User A visits the site a week later. The latest news story is loaded from the application cache, the web server still isn't hit. Now, the latest news story is thoroughly out of date, your user is effectively looking at a snapshot of your website which was taken the first time they visited. This obviously isn't what you wanted.
There are ways to force the application cache to refresh (again, more about that in a bit) but, it's not straightforward and requires the user to visit the website twice so is less than ideal so, only use this for content that will very rarely change.

Ok, now I've got that warning out of the way, let's go into detail about how to actually implement this.

The whole HTML5 offline support revolves around getting files into the browsers application cache. To do this, you need to create a manifest file. What's a manifest file? Essentially, it's just a normal text file that has a specific format which will define which files to go in the application cache and which should be fetched from the web server (if available). A few details about the manifest file:
  • This file is defined within the <html> tag of your web page, so, for example:


<html manifest="/cache.manifest">
<head>
...
</head>
<body>
...
</body>
</html>


  • The file must be served with a content type of text/cache-manifest. How you do this depends on what web server you're running. Personally, when using ASP.NET, I set up a new HTTP Handler to handle .manifest files and set the ContentType on the Response object to be text/cache-manifest.
  • The first line of a manifest file must be CACHE MANIFEST
  • There are three different sections to manifest file:
    • CACHE - This section defines files that will be added to the browsers application cache and therefore, will be available offline.
    • NETWORK - This section defines files that will ALWAYS be loaded from the web server. If no network connection is available then these will error.
    • FALLBACK - If a resource can't be cached for whatever reason then this specifies the resource to use instead.
Let's see an example of a valid manifest file now:


CACHE MANIFEST
CACHE:
/picture.jpg
/mystyle.css

NETWORK:
*


So, what's going on here? Well, the files, picture.jpg and mystyle.css are both added to the application cache (note, that the HTML page you're currently viewing is by default, added to the cache). Under the network section there's a * symbol. This is a special wildcard symbol which effectively says "whatever isn't cached, go and fetch from the web server".
And that's it, you've now got an offline web application.

But.... when are things ever that simple to develop? There's a few more things you should know about developing offline web applications. I'm going to put to you a couple of scenarios and offer a solution to each:

Scenario 1: You've added a new file to your website and need it to be added to the application cache. How do you go about doing this?

Well, logic suggests you'd update your manifest file to include your new file and hey presto, it should be added. Well, you're half right. The problem is, with all HTTP requests, browsers will try and cache the files they retrieve, this is no different for manifest files. So, you'll update your manifest file but, the user won't ever retrieve the new manifest file due to the fact that the browser has cached the old version.

To solve this, I made sure that the manifest file is never cached by the browser and as I use an HTTP Handler to deliver the manifest file, that's easily accomplished by using something like this:

context.Response.Cache.SetCacheability(HttpCacheability.Public); 
context.Response.Cache.SetExpires(DateTime.MinValue);

Scenario 2: The content of one of the cached files has changed. How do I force the user to re-download the new file?

A web browser will only re-fetch cached files when it detects a change with the manifest file. In this particular case, there is no change with the manifest file so how do you get around this? I simply use comments within the manifest file. So, taking our previous example:


CACHE MANIFEST
#Version 1
CACHE:
/picture.jpg
/mystyle.css

NETWORK:
*


You'll see I've added a version comment. Now, when the content of one of the cached files changes, I increment the version comment and hey presto. The browser will detect the change and will re-fetch all the files to be cached. Be warned, you'll still have the problem of scenario 1 though!

And finally...
Just a few more things to bare in mind while you're developing:
  1. If for some reason, one of the files you wish to cache cannot be downloaded then the whole caching process fails. This can be a bit of a pain when you're trying to track down problems.
  2. There are JavaScript events you can hook in to, to see what's going on. There's an actual applicationCache object on the window object that exposes useful methods and events. (see here for more details and examples).
  3. To maximize the benefits of offline support, you could use local data storage to store data that could then be used offline and/or uploaded to a server when an internet connection is available. See the following for more information: Dive Into HTML5 - Storage for more information.
  4. While developing, I suggest you use Google Chrome as your browser. It provides some very useful tools that a developer can utilize for offline web application development, here's a couple I found particularly useful:
    1. If you hit F12 to bring up the developer tools then, go to the Resources tab, at the bottom there's an Application Cache option. This will list all the files currently stored in the application cache for the site you're currently viewing. It should help you track down problems when downloading particular files for the application cache. (If they're not listed then something's gone wrong!).
    2. Within the address bar, if you type: chrome://appcache-internals then Chrome will list all the applications it has stored within it's application cache. It then gives you the very handy option of deleting it meaning you can be assured that the next time you visit the site, new content will be fetched from the web server.
I've covered a fair amount here, but, if you want further resources, I've found that the Dive Into HTML5 website to be a great resource for all things HTML5-esque. For their article on Offline Web Applications, try here.

And that's it from me for the time being.
Good luck!