Sunday 9 December 2012

Web Apps and Pagination Queries

Paging controls are found on many popular web apps.
Amazon is a great example.
In a fair few web applications out there, you'll see the concept of pagination. That is you have a list of results and they'll be separated into pages which show a limited number of results (say 25 records per page). You can then usually go forward a page, go back a page or the first or last page. This is a common UI practice and I'm sure you've seen it before.

But, how is this implemented and what is the best way of doing so?

Well, the results shown on the page are usually held against a database. So the question quickly becomes:

"How can I query the database to bring back a given page of results?"

Now before I carry on lets first state that this blog post consists of database queries that have been run against an Oracle database (both the 10 and 11g versions). There's a strong possibility that they will not run against SQL Server, or any other database for that matter, but I suspect the general concepts remain the same. I should also state that I am by no means an Oracle expert. Yes, I know SQL reasonably well, but an expert? Far from it. So, if you find a better way of implementing the following features then it'd be great to here from you.

Now, back to the task in hand. We need to write a SQL command to run on an Oracle database that will bring back a "page" of data of a particular table and as an extra condition, the data returned must be in a given order. Let's also say that a page consists of 25 records.

You'll be surprised at how many ways there are to achieve this. In the 4.5 years that Qube have been using this functionality, our pagination queries have changed no less than four times. Some times we've made the odd small change to optimize the query, in other times we've re-written the whole thing and used a different technique. In each case, we've improved performance over the last iteration of changes. This is done with the goal of making our web application as fast as possible. Finally, we've now got to the stage where using one technique is faster on one table but slower than a different technique on another. This means there is no technique that's best for all tables. It'll depend entirely on the amount of data you have in each table, the indexes you've set up and the way in which your database is optimized.

I'm now going to take you through our little journey of finding the best possible pagination query. In all of the example queries I've highlighted the base query (the query which we want to paginate) in blue. Generally, the following methods wrap that base query up in order to implement paging. You'll need to change the base query to be the actual query you want to use pagination on.

Revision One - Retrieve All

So, our first option was to retrieve all records and let the application handle the pagination. So, if we have a table that consists of 100,000 records, we retrieve all records and then the application will display 25 at a time. The bonus of this method is it's extremely easy to implement. Also, moving from one page to the next is extremely quick, all the records are held in memory so navigating between them is simple and fast. There are however, some major downsides. For starters, retrieving all records is slow so the initial load time for the user is severely affected. If that's not bad enough (and it should be), the memory usage of such a technique is outrageously large, especially when you consider the fact that it's highly unlikely a user will want to navigate through the entire 100,000 records anyway, so you're using valuable memory on records that are never going to be shown. Not cool! Just with these two things in mind, this technique is going to scale very badly.

Example Query
SELECT COLUMN_ONE, COLUMN_TWO, COLUMN_THREE 
FROM TABLE_NAME 
ORDER BY PRIMARY_KEY

Revision Two - Nested Queries

Bringing back everything clearly isn't the way forward. So, with a little bit of use of ROWNUM (an Oracle feature which brings back the number of the row in a dataset, e.g. the first row has a number 1, the second has a number 2, etc.) we can bring back 25 rows at a time which will give us the paging functionality that we need.

Using this method the application doesn't need to store all the possible rows which lowers the memory consumption on the application significantly. It also improves the performance of the first page load. Instead of bringing back thousands of rows, the database only returns 25 rows and there's obvious performance implications for that. The downside however is that the performance of the next/last/first/last page buttons will be affected as the query will need to be re-run but for the next page of data. If all the rows were stored on the application from the original query then this additional database query wouldn't need to be run.

Example Query
SELECT * FROM (SELECT rownum as f2n_rownum, f2n_table.* 
               FROM (SELECT COLUMN_ONE,
                            COLUMN_TWO,
                            COLUMN_THREE 
                     FROM TABLE_NAME 
                     ORDER BY PRIMARY_KEY) f2n_table 
               WHERE rownum <= 25) 
WHERE f2n_rownum >= 1

Revision Three - Nested Queries Using WITH

As always, speed is key!
We've now got an implementation of our pagination query but is it the best implementation? Did you know that your average user expects your web application to load in two seconds or less and up to 40% of your users will leave your site if it hasn't responded after three seconds.1 That means these queries need to be as quick as possible, every second counts. When you're dealing with possibly thousands of records, it can be difficult to bring back results in that time frame.

So, we looked to see if we could improve the performance of our pagination query. If we can then that's a performance improvement across near all of our pages. As it turns out... there is a better way! Kind of.

We could improve the performance in two different ways. Firstly, we can use the WITH clause. This is known as subquery factoring.2 The second improvement is that we can tell the oracle optimizer how many rows we intend on using, this allows the optimizer to use this information to choose a faster explain plan for what we want. We do this by using optimizer hints in the query and in this hint we tell the optimizer that we want the first 25 rows brought back first (or however many rows are contained in your "page" of data). For more information on this try here: www.orafaq.com.

Example Query
SELECT /*+ FIRST_ROWS(25) */
       pageouter.* 
FROM (WITH page_query AS (SELECT COLUMN_ONE,
                                 COLUMN_TWO,
                                 COLUMN_THREE
                          FROM TABLE_NAME
                          ORDER BY PRIMARY_KEY
      SELECT page_query.*, 
             ROWNUM AS innerrownum 
      FROM page_query 
      WHERE rownum <= 25) pageouter 
WHERE pageouter.innerrownum >= 1

Revision Four - Nested Queries With ROW_NUMBER

Ok, so we're now using optimizer hints and we're using subquery factoring. All good stuff. But can we do more?

Well, we can. Kind of.

There is a SQL function called ROW_NUMBER(). It serves very much the same purpose as ROWNUM in Oracle but it works in both Oracle and SQL Server and oddly, when used can perform better than our previous queries but only in certain scenarios.

The problem here is I can't tell you why it performs better in certain scenarios, I can't even tell you in which scenarios it performs better but here is what I have found:

  • It seems to perform better than our previous methods if the query is modified to have a complex 'where' clause.
  • It seems to perform better than our previous methods if the data is ordered by a row that is not uniquely indexed.
  • The performance gains can be dramatic. In the previous examples, changing from one method to another may have seen an improvement ranging from nothing to a second or two. I've seen this method improve some queries by up to 5-8 seconds, especially on queries that order data by columns that aren't indexed.
Now I suspect all of this is very much dependent on the indexes you have set up on your tables, the amount of data in your tables, how you've got your database optimized and probably a  fair few other factors that I have no idea about so, the best way to know how this will perform for your queries is to test it.

Example Query
SELECT /*+ FIRST_ROWS(25) */
       *
FROM ( SELECT ROW_ONE,
              ROW_TWO,
              ROW_THREE,
              row_number() OVER(ORDER BY PRIMARY_KEY) innerrownum
       FROM   TABLE_NAME
     )
WHERE innerrownum BETWEEN 1 AND 25

Conclusion

I've shown you three different ways of implementing pagination within the database query. There are other ways which I haven't discussed. For example, you could follow this process:
  1. Run the query for ALL records (no paging) but insert the results of that query into a temporary table.
  2. Query that temporary table for the "page" of data that you want, using one of the methods above.
  3. When implementing the next/previous page function, you can then query the temporary table directly. 
Assuming that your original query isn't bringing back the entire table, you'll be selecting from a subset of the original data which should make the next/previous functionality faster. However, your original page load time will be slower as you'll need to insert the records into the temp table so, there's a trade off. 

I would imagine there's loads of other ways of doing this, if you find any that perform better than the above then let me know, it'd be great to hear from you!

And finally.... SQL Server

I couldn't end without mentioning the latest version of SQL Server and the good work Microsoft have been doing in this area. Microsoft have cottoned on to the fact that this paging functionality is now widely used and, as you can tell by this article, it isn't straight forward. They've gone out of their way to simplify this and built this functionality straight into the language making it very simple and, I would hope, a whole lot quicker than anything we can write in standard SQL.

I can't say I've had the pleasure of testing this but, according to the documentation, the feature is implemented by the introduction of two new keywords, OFFSET and FETCH NEXT and they're used in the following way:

SELECT COLUMN_ONE,
       COLUMN_TWO,
       COLUMN_THREE
FROM   TABLE_NAME
ORDER BY PRIMARY_KEY
OFFSET 0 ROWS
FETCH NEXT 25 ROWS ONLY

This tells the database to bring back the first 25 rows. To bring back the next page, you'd increase the offset by your page size (in our example, 25). For more info, check out raresql.com.

And it's that simple.

The sooner Oracle implement this functionality the better!


1 - Forrester Consulting, “eCommerce Web Site Performance Today: An Updated Look At Consumer Reaction To A Poor Online Shopping Experience” A commissioned study conducted on behalf of Akamai Technologies, Inc., August 17, 2009
2 - For more information on subquery factoring, see www.dba-oracle.com