Clements Code: May 2012

Tuesday, 15 May 2012

IE, JavaScript and the Story of the Weeping Angels

I came across a very odd problem the other day in the way in which Internet Explorer handles DOM items with an ID.

Take the following piece of HTML for an example.

You can't get much simplier than that. Now say you want to access testElement and change the width of the element. You'd probably do that using the following piece of JavaScript code:

document.getElementById('testElement').style.width = '200px';

All very straightforward so far. There is another way of doing this though, one which isn't recommended but is supported by all the major browsers. You can simply write:

testElement.style.width = '200px';

If an element in your HTML has an ID, the browser will automatically put it in the window scope so you can access it directly. No need for document.getElementById. Cool eh?

Well, it turns out Internet Explorer supports this little feature in a bit of an odd way. Take the following HTML page:

<html>
<head><title></title>
<script language="JavaScript" type="text/javascript">

</script>
</head>
<body>
<div id="testElement"></div>
<script language="JavaScript" type="text/javascript">
// alert(testElement.id); // We'll uncomment this line a bit later.
window.testElement = new TestObject();
alert(testElement.id);
alert(window.testElement.id);
</script>
</body>
</html>

What you've done here is create a DOM element with an id of testElement. So, the browser should have created a window.testElement variable that'll give you the appropriate DOM element when accessed. You've then explitically defined the testElement variable to be a new TestObject. So in theory, when the first and second alert is shown, the testElement variable should be pointing at our TestObject. The id should therefore be 'TestObject'. In both alert boxes, 'TestObject' should be displayed.

When you run the above, that's exactly what happens. No big surprise there.

Ok, now uncomment the commented line. What I'd expect here is that the first box should display "testElement" as that's the id of the DOM element. You then assign the TestObject to testElement so, when the second and third alert box is shown, you'd expect to see "TestObject".

When you run the above, the first alert box displays 'testElement'. Good so far. The second alert box displays 'testElement'. Eh? That's surely wrong. The third alert box displays 'TestObject'. What? How can window.testElement and testElement be pointing at different things? They're the same variable! Comment the line again and everything goes back to normal. How can this be?!

Weeping Angels!

For you Doctor Who fans, you'll know what I'm talking about when I talk about Weeping Angels, but for those who have no idea, a weeping angel is a creature that, when looked at, automatically turns to stone. When not being viewed, they go about their usual business. It's a good analogy for this behaviour because, after a bit of experimenting, I found that as soon as you look at the testElement variable, it's at that point that the browser actually points the variable at the DOM element and makes it read only. This means that if you reference the variable anywhere, then it'll affect what your code is actually doing. Even if you're debugging and place a watch on the variable, it'll have the same effect. These kind of variables, in my book, are about as ugly as a weeping angel, just see the above picture for an example.

I should say, only Internet Explorer (I tested on IE9) seems to handle DOM variables like this. The above code behaves exactly as you'd expect in both Chrome and Firefox.

So, how to avoid this? As most JavaScript programmers know, programming in the global (window) scope is just bad practice, for a variety of reasons but the main one is doing so can lead to naming conflicts pretty easily, especially if you're using third party libraries. This problem re-affirms this. It is a naming conflict, just not in the traditional sense as the browser is doing some of the work for you. Anyway, if you avoid programming in global scope then you won't come across this problem. Unfortunately, from time to time, it's unavoidable, especially if the problem is actually caused by a third party library, like in my case. In these cases, as you saw before, if you reference the variable using window.variableName, then it seems that that will always point to your object, not the DOM item, which should hopefully give the behaviour that you want.

Enjoy!

Tuesday, 8 May 2012

C# - Method Overload Resolution

Some of even the most basic of concepts catch you out from time to time. I was working the other day and came across a problem where what was happening didn't immediately make much sense. So, I thought I'd post it up here as a reminder that even basic concepts of computer programming can leave you a little confused.

So, here's your overview of the problem... I had a piece of code that allowed me to run queries against a database. To add parameters to the query I had to write something like:

cursor.AddParameter("@ParameterName", "value");

All very straight forward so far. The problem arose because that piece of code can be run against either an Oracle database or a SQL Server database. These databases handle empty strings differently. In Oracle an empty string is treated exactly the same as a null value. In SQL Server an empty string is an empty string.

Anyway, I found a piece of code that read:

cursor.AddParameter("@ParameterName", string.Empty);

In this instance, the developer tested this on Oracle and didn't actually mean string.Empty. They meant null. But the code ran perfectly. The application was then hooked up to a SQL Server database and the query brought back the wrong records (actually, it didn't bring back any records at all). This was due to the difference in the Oracle and SQL Server databases.

So, an easy fix then. Change string.Empty to null and we're done. After all, that's what the original developer meant in the first place. Both Oracle and SQL server will handle it in the same manner and we're good to go. Or so I thought.

Here's the interface definition of ICursor (well, a stripped down definition at least) which is what our cursor variable in the above example is defined as:

public interface ICursor
{
void AddParameter(string name, object value);

void AddParameter(string name, Type type);
}

Who can see the problem?

The rules for resolving method overloads state that the method header with the most specific type match should be used. This makes perfect sense, if you had two methods defined: one that accepts an object and another that accepts a string. If you passed in a string then you'd expect the method defined with a string to be used.

However, null is a little special. It can match any reference type. Is the problem becoming more apparent now?

When we put string.Empty in as the second parameter, the string matches the first method, where the second parameter is an object. However, when we change the call to this:

cursor.AddParameter("@ParameterName", null);

the second method is now matched. The call does match the first method, null is a valid value for an object variable but it also matches the second method as well, as Type is a reference type. That is the most specific match and so that method is invoked.

Unfortunately, that second method does something entirely different and so my parameters weren't being mapped in my SQL query correctly and the application was falling over.

If I change the call so that I explicitly define the type of null to the lesser type, as below, then we have our solution.

cursor.AddParameter("@ParameterName", (object)null);

So, just a friendly reminder that even the fundamentals can catch you out from time to time!

As always, if you want to actually see this in action, I've knocked up a little demo solution which can be found here.

Have fun and happy coding!

Wednesday, 2 May 2012

HTML5 - Web Workers

It's been a while since my last blog but, here's the next chapter of my HTML5 overview, Web Workers.

So, what are Web Workers? Let's start off with some background information about JavaScript. JavaScript was originally developed by Netscape back in 1995. Its primary use was to allow developers to manipulate web pages, which, as you can imagine, were very basic back in 1995. In order to do this JavaScript was designed as a single-threaded language. Unlike its namesake Java (which is a completely unrelated language by the way) and many other languages, JavaScript does not support threads. The reason for this, I imagine, was very simple. How would you go about designing a multi-threaded language whose primary aim was to modify something (the Document Object Model (DOM)) that was shared between threads, without incurring deadlock problems? This problem remains unsolved. And so JavaScript runs on one single thread.

One Single Thread - A one-trick pony?

Is it really a bad thing? I suppose it can be argued that in itself it's not. The design decision not to support threading in JavaScript was a good one. It makes it simpler to learn; it avoids some potentially horrific problems and; as the web has thrived in the past two decades, so has JavaScript. It can't be that bad right? Well, yes and no. When everything runs on one thread it can lead to a very poor user experience. The User Interface (UI) can become non-responsive if not programmed correctly. In order to address this problem two functions were built into JavaScript: setTimeout and setInterval. These allow a piece of code to run after a pre-defined amount of time. The idea being that you could schedule long running code to run when the UI wasn't busy and the thread was free, essentially "hiding" the fact that JavaScript all runs on a single thread. These little hacks have allowed developers to get pretty inventive and have allowed JavaScript to flourish.

Ok, all is good then. What's the problem?

As I said, these are basically "hacks". What happens when the user starts clicking but you've already started to execute a long-running piece of code? You have a problem! The system will not be able to respond to the user's action until the code has completed its execution. And after all, some code, especially data centric code, just takes a long time to run. When this occurs, you'll see an error similar to this:

There's not a whole lot you can do about that. If you do have code that'll take a long time to run then you're a little stuck.

So, where do Web Workers come in?

Simple. Web workers bring multi-threading to JavaScript. They come with a few restrictions though and one is quite a biggy. Web workers cannot access the DOM. Allowing multiple threads access to a non-thread safe resource (the DOM) would cause all sorts of problems so the same design decision was made as in 1995. What they do allow you to do is to process and return data in a separate thread to the UI, so the time of seeing those pesky "unresponsive script" errors should now be gone forever!

Multi-threading eh? Woo! Where do I start?

Well, first you need to make sure you're using a web browser that actually supports web workers. To find that out you can simply visit caniuse.com and look it up. I should mention here that if you're using Chrome and the JavaScript file you're testing is stored locally and isn't running on a web server such as IIS, then you need to enable a flag on Chrome for everything to work. Simply start up Chrome with this command: chrome.exe --allow-file-access-from-files. This problem does not exist with Firefox. For more information, check out this Stack Overflow post.

Now that you are using a web worker enabled browser, you need to define your web worker. As the worker is in an entirely different thread, it has no access to loaded scripts, so you need to tell the worker which script to load. To do this we can use the following line:

var workerOne = new Worker('worker.js');

where worker.js is the name of your script.

Web workers communicate with the main UI thread in the form of messages. When a message is sent to a web worker, it causes the message event to fire within the thread. To hook in to this, your worker.js file needs to have the following content:

self.addEventListener('message', function(e) {
var message = e.data;
// Do something with the message
self.postMessage(message.sort());
}, false);

To give you a quick overview of what's happening here, when the web worker is sent a message, the message event will be fired and the inner function defined above will run. It'll get the sent message by fetching it from the event object. Then, in a useful scenario some action would be performed based on that message. You'd then post a message back to the caller (usually the main UI thread). This could just be to notify the thread that it's completed or if you've done some data manipulation, you could post back the modified data. In the above example, the message is sorted and sent straight back.

So, that's the web worker defined. How do you now post messages to that worker thread so you can use it effectively? Well, you've defined your worker object earlier, you just need to:
a) Define what happens when the UI thread receives a message from the web worker and;
b) Send a message to the web worker which will start the whole process.

In much the same way that you need to hook into the message event within the web worker, you also need to hook into the message event on the web worker object itself, within the UI thread. Something like the following should do the job:

workerOne.addEventListener('message', function(e) {
var numbersOne = e.data; // Do something with this data
}, false);

This will fire when a message is posted from the web worker to the UI thread. In the previous example, e.data will now contain your sorted data!

Ok, now all you need to do is send your original data to the web worker for processing. You use the same method as when you posted the message from the web worker to the UI thread but this time you perform it on the worker object within the UI thread, so you'll have something like this:

workerOne.postMessage([1,4,2,7,9,2,4,7,6,9,4]);

Now you have something that's a working demo, the array of integers (1,4,2,7,9,2,4,7,6,9,4) is sent to the web worker. The web worker starts up in it's own thread; picks that message up; sorts it and then sends the data back to the UI thread. The UI thread now has a sorted array of data but it hasn't actually done any processing to get that information. It has left the UI thread free, so to the user the system seems responsive. Ok, in this particular example with 10 or so integers there isn't going to be much of a difference, but when you're playing with millions of objects, this can have a significant impact.

Performance

While I was looking at this, I wondered if I could make use of Web Workers so that it would give some significant performance gains, especially in terms of data processing. If web workers work like standard threads then this should be fairly straightforward to test.

Here's my very simple test case:
How quickly can I sort three arrays containing two million integers each?

I'm going to test in three ways:
1. Use standard javascript. Sort each array, one after the other and time how long it takes.
2. Use a single web worker. The sorting of all of the arrays will occurr in one web worker.
3. Use a web worker for each array sort.

With what I knew about threads and web workers, I thought I'd find the following...
- The first and second test case would be comparatively similar in terms of time taken.
- The first test case would freeze the web browser until all data had been sorted. The other methods would not.
- The third test case would be the fastest, with all three sorting algorithms occurring in parallel. In theory, the time it takes for the third test case should be roughly 66% quicker than that of the first test case.

Each test case was repeated 10 times and an average time was taken, here are the results:

Test Case One: 11.24 seconds
Test Case Two: 13.75 seconds
Test Case Three: 7.21 seconds
(If you wish to actually repeat the demo yourself, you can pick up the files from here)

Interesting! Ok, I wasn't quite right about Test Case Three being 66% quicker, but it is around 33% quicker which isn't too bad. What is interesting is that test case two is almost 2.5 seconds slower than test case one. Just to open up a new web worker and to send/receive the massive arrays adds an extra 2.5 seconds to the processing time, that's almost a 22% time increase. That seems rather high to me but, it's good to know at least.

It's around about this time that I should mention just how the UI thread and worker threads post messages to each other as it can have an impact upon performance. You're transferring data across threads so you can’t just pass a variable by reference. Instead, you need to do a full copy of the variable. How this occurs depends on what you’re doing and how you’re doing it. If you’re passing across a string then the data will be serialized into JSON and sent to the worker thread. It’ll then be de-serialized at the other end. If however, you’re using a complex data type, File or Blob for example, then an algorithm called structured cloning will occur. This will effectively copy the contents of the variable, which for a variable containing megabytes worth of data, can be slow. There is however, another way! Google have come up with a concept of “transferable objects". This allows you to transfer the owner of an object from one thread to another using a zero-copy which is significantly faster. There is one down side to this: once you’ve transferred the object, you can’t then use it in the thread you transferred it from. It can only be accessed by the thread that has ownership. For more information on this, check out this page on HTML5 Rocks.

Ok, now I’ve got that covered, just out of interest, I thought I'd run the same tests as before but this time instead of using unsorted data I'd sort the data on already sorted data, making the sort function significantly faster (as it won't do anything meaningful). I was expecting to find the same sort of patterns as above, just with smaller numbers. Here's the actual results:

Test Case One: 1.91 seconds
Test Case Two: 4.10 seconds
Test Case Three: 3.22 seconds

Two interesting things are highlighted here:

Test Case Two is slower than Test Case Three. Why? I haven't managed to find an answer to that yet. I can only assume that the overhead of sending all three arrays at once, which I wrap up into one object, performs badly when using the structured cloning algorithm to post messages to the worker thread.
Test Case One is the fastest. This case doesn't use any fancy web workers, it's just plain old JavaScript executing each sort function one after another. So, by adding web workers, we've actually slowed down the data processing process, which is the exact opposite of what we were trying to achieve. The reason for this... the overhead of creating a web worker and communicating with it out-weighs the benefit we get by using a web worker and running data processing in parallel.

Eh? This makes things slower, not faster! What a waste of time!

Well, no. First, slower or not, the UI thread is always responsive when using web workers so, to your user, the system will seem faster than taking the traditional method. Secondly, although using web workers performed worse than the traditional approach in the last test, that won't be the case in all scenarios, as shown by the first experiment. If the overhead of creating a web worker and passing messages to and from it outweigh the amount of time saved by performing calculations in parallel on different web workers, then, yes, the overall performance will be worse, but, if you're performing a vast array of data manipulation on a great many records, then you should see a big performance gain. Like always though, it's best to see how it would perform with your actual data (or something similar). Only then will you be able to gauge just how much quicker Web Workers will make your web application, they are however a tool that you should definitely be aware of as we approach the on-coming HTML5 world!

Finally, if you want to follow this blog post up with further reading about HTML5 Web Workers, the best tutorial I found was posted on the Mozilla website, here.

Enjoy!