Wednesday 2 May 2012

HTML5 - Web Workers

It's been a while since my last blog but, here's the next chapter of my HTML5 overview, Web Workers.

So, what are Web Workers? Let's start off with some background information about JavaScript. JavaScript was originally developed by Netscape back in 1995. Its primary use was to allow developers to manipulate web pages, which, as you can imagine, were very basic back in 1995. In order to do this JavaScript was designed as a single-threaded language. Unlike its namesake Java (which is a completely unrelated language by the way) and many other languages, JavaScript does not support threads. The reason for this, I imagine, was very simple. How would you go about designing a multi-threaded language whose primary aim was to modify something (the Document Object Model (DOM)) that was shared between threads, without incurring deadlock problems? This problem remains unsolved. And so JavaScript runs on one single thread.

One Single Thread -  A one-trick pony?

Is it really a bad thing? I suppose it can be argued that in itself it's not. The design decision not to support threading in JavaScript was a good one. It makes it simpler to learn; it avoids some potentially horrific problems and; as the web has thrived in the past two decades, so has JavaScript. It can't be that bad right? Well, yes and no. When everything runs on one thread  it can lead to a very poor user experience. The User Interface (UI) can become non-responsive if not programmed correctly. In order to address this problem two functions were built into JavaScript: setTimeout and setInterval. These allow a piece of code to run after a pre-defined amount of time. The idea being that you could schedule long running code to run when the UI wasn't busy and the thread was free, essentially "hiding" the fact that JavaScript all runs on a single thread. These little hacks have allowed developers to get pretty inventive and have allowed JavaScript to flourish.

Ok, all is good then. What's the problem?

As I said, these are basically "hacks". What happens when the user starts clicking but you've already started to execute a long-running piece of code? You have a problem! The system will not be able to respond to the user's action until the code has completed its execution. And after all, some code, especially data centric code, just takes a long time to run. When this occurs, you'll see an error similar to this:


There's not a whole lot you can do about that. If you do have code that'll take a long time to run then you're a little stuck.

So, where do Web Workers come in?

Simple. Web workers bring multi-threading to JavaScript. They come with a few restrictions though and one is quite a biggy. Web workers cannot access the DOM. Allowing multiple threads access to a non-thread safe resource (the DOM) would cause all sorts of problems so the same design decision was made as in 1995. What they do allow you to do is to process and return data in a separate thread to the UI, so the time of seeing those pesky "unresponsive script" errors should now be gone forever!

Multi-threading eh? Woo! Where do I start?

Well, first you need to make sure you're using a web browser that actually supports web workers. To find that out you can simply visit caniuse.com and look it up. I should mention here that if you're using Chrome and the JavaScript file you're testing is stored locally and isn't running on a web server such as IIS, then you need to enable a flag on Chrome for everything to work. Simply start up Chrome with this command: chrome.exe --allow-file-access-from-files. This problem does not exist with Firefox. For more information, check out this Stack Overflow post.

Now that you are using a web worker enabled browser, you need to define your web worker. As the worker is in an entirely different thread, it has no access to loaded scripts, so you need to tell the worker which script to load. To do this we can use the following line:

var workerOne = new Worker('worker.js');


where worker.js is the name of your script.

Web workers communicate with the main UI thread in the form of messages. When a message is sent to a web worker, it causes the message event to fire within the thread. To hook in to this, your worker.js file needs to have the following content:


self.addEventListener('message', function(e) {
   var message = e.data;
  // Do something with the message
  self.postMessage(message.sort());
}, false);


To give you a quick overview of what's happening here, when the web worker is sent a message, the message event will be fired and the inner function defined above will run. It'll get the sent message by fetching it from the event object. Then, in a useful scenario some action would be performed based on that message. You'd then post a message back to the caller (usually the main UI thread). This could just be to notify the thread that it's completed or if you've done some data manipulation, you could post back the modified data. In the above example, the message is sorted and sent straight back.

So, that's the web worker defined. How do you now post messages to that worker thread so you can use it effectively? Well, you've defined your worker object earlier, you just need to:
a) Define what happens when the UI thread receives a message from the web worker and;
b) Send a message to the web worker which will start the whole process.

In much the same way that you need to hook into the message event within the web worker, you also need to hook into the message event on the web worker object itself, within the UI thread. Something like the following should do the job:


workerOne.addEventListener('message', function(e) {
        var numbersOne = e.data;         // Do something with this data
     }, false);


This will fire when a message is posted from the web worker to the UI thread. In the previous example, e.data will now contain your sorted data!

Ok, now all you need to do is send your original data to the web worker for processing. You use the same method as when you posted the message from the web worker to the UI thread but this time you perform it on the worker object within the UI thread, so you'll have something like this:


workerOne.postMessage([1,4,2,7,9,2,4,7,6,9,4]);


Now you have something that's a working demo, the array of integers (1,4,2,7,9,2,4,7,6,9,4) is sent to the web worker. The web worker starts up in it's own thread; picks that message up; sorts it and then sends the data back to the UI thread. The UI thread now has a sorted array of data but it hasn't actually done any processing to get that information. It has left the UI thread free, so to the user the system seems responsive. Ok, in this particular example with 10 or so integers there isn't going to be much of a difference, but when you're playing with millions of objects, this can have a significant impact.

Performance

While I was looking at this, I wondered if I could make use of Web Workers so that it would give some significant performance gains, especially in terms of data processing. If web workers work like standard threads then this should be fairly straightforward to test.

Here's my very simple test case:
How quickly can I sort three arrays containing two million integers each?

I'm going to test in three ways:
                1. Use standard javascript. Sort each array, one after the other and time how long it takes.
                2. Use a single web worker. The sorting of all  of the arrays will occurr in one web worker.
                3. Use a web worker for each array sort.

With what I knew about threads and web workers, I thought I'd find the following...
- The first and second test case would be comparatively similar in terms of time taken.
- The first test case would freeze the web browser until all data had been sorted. The other methods would not.
- The third test case would be the fastest, with all three sorting algorithms occurring in parallel. In theory, the time it takes for the third test case should be roughly 66% quicker than that of the first test case.

Each test case was repeated 10 times and an average time was taken, here are the results:

Test Case One: 11.24 seconds
Test Case Two: 13.75 seconds
Test Case Three: 7.21 seconds
(If you wish to actually repeat the demo yourself, you can pick up the files from here)

Interesting! Ok, I wasn't quite right about Test Case Three being 66% quicker, but it is around 33% quicker which isn't too bad. What is interesting is that test case two is almost 2.5 seconds slower than test case one. Just to open up a new web worker and to send/receive the massive arrays adds an extra 2.5 seconds to the processing time, that's almost a 22% time increase. That seems rather high to me but, it's good to know at least.

It's around about this time that I should mention just how the UI thread and worker threads post messages to each other as it can have an impact upon performance. You're transferring data across threads so you can’t just pass a variable by reference. Instead, you need to do a full copy of the variable. How this occurs depends on what you’re doing and how you’re doing it. If you’re passing across a string then the data will be serialized into JSON and sent to the worker thread. It’ll then be de-serialized at the other end. If however, you’re using a complex data type, File or Blob for example, then an algorithm called structured cloning will occur. This will effectively copy the contents of the variable, which for a variable containing megabytes worth of data, can be slow. There is however, another way! Google have come up with a concept of “transferable objects". This allows you to transfer the owner of an object from one thread to another using a zero-copy which is significantly faster. There is one down side to this: once you’ve transferred the object, you can’t then use it in the thread you transferred it from. It can only be accessed by the thread that has ownership. For more information on this, check out this page on HTML5 Rocks.

Ok, now I’ve got that covered, just out of interest, I thought I'd run the same tests as before but this time instead of using unsorted data I'd sort the data on already sorted data, making the sort function significantly faster (as it won't do anything meaningful). I was expecting to find the same sort of patterns as above, just with smaller numbers. Here's the actual results:

Test Case One: 1.91 seconds
Test Case Two: 4.10 seconds
Test Case Three: 3.22 seconds

Two interesting things are highlighted here:
  1. Test Case Two is slower than Test Case Three. Why? I haven't managed to find an answer to that yet. I can only assume that the overhead of sending all three arrays at once, which I wrap up into one object, performs badly when using the structured cloning algorithm to post messages to the worker thread.
  2. Test Case One is the fastest. This case doesn't use any fancy web workers, it's just plain old JavaScript executing each sort function one after another. So, by adding web workers, we've actually slowed down the data processing process, which is the exact opposite of what we were trying to achieve. The reason for this... the overhead of creating a web worker and communicating with it out-weighs the benefit we get by using a web worker and running data processing in parallel.
Eh? This makes things slower, not faster! What a waste of time!

Well, no. First, slower or not, the UI thread is always responsive when using web workers so, to your user, the system will seem faster than taking the traditional method. Secondly, although using web workers performed worse than the traditional approach in the last test, that won't be the case in all scenarios, as shown by the first experiment. If the overhead of creating a web worker and passing messages to and from it outweigh the amount of time saved by performing calculations in parallel on different web workers, then, yes, the overall performance will be worse, but, if you're performing a vast array of data manipulation on a great many records, then you should see a big performance gain. Like always though, it's best to see how it would perform with your actual data (or something similar). Only then will you be able to gauge just how much quicker Web Workers will make your web application, they are however a tool that you should definitely be aware of as we approach the on-coming HTML5 world!

Finally, if you want to follow this blog post up with further reading about HTML5 Web Workers, the best tutorial I found was posted on the Mozilla website, here.

Enjoy!

No comments:

Post a Comment