New Top-N Queries and Pagination Syntax

At the RMOUG Training Days, a couple of weeks, ago @GeraldVenzl and I delivered a session demonstrating what it would take to get a REST enabled, web-based application up and running with Oracle Database 12c. During the session Gerald asked me to write a query to calculate the top 10 customers we had in terms of their total spend. Below is the query I came up with.

SELECT c.c_name, 
       c.c_custid loyaltyCardNo, 
       t.city, 
       t.state,
       SUM(salesAmount) total
FROM   customers c,
       t
WHERE  c.c_custid = t.loyaltyCardNo
GROUP BY c.c_name,
         c.c_custid,
         t.city,
         t.state
ORDER BY total DESC
FETCH FIRST 10 ROWS ONLY;

After the session I got a number of questions regarding the “FETCH FIRST 10 ROWS ONLY” syntax that I used, so I thought it would be worth explaining what it is and what happens under the covers when you execute it.

What is FETCH FIRST X ROWS ONLY?

FETCH FIRST is actually part of the new row limiting clause that was added to Oracle Database 12c to help simplify the syntax necessary for Top-N queries and queries where you need to page through a larger result set.

But how does it work?

Let’s look at the execution plan to see what actually happens during execution.

--------------------------------------------------------------------------------
 
| Id | Operation                 | Name     | ROWS | Bytes |TempSpc| Cost (%CPU)|
---------------------------------------------------------------------------------
 
|  0 | SELECT STATEMENT         |           |      |       |       | 29338 (100)|
|  1 | SORT ORDER BY            |           |   10 | 40700 |       | 29338   (1)|
|* 2 |   VIEW                   |           |   10 | 40700 |       | 29337   (1)|
|* 3 |   WINDOW SORT PUSHED RANK|           |  795K|    31M|    91M| 29337   (1)| 
|  4 |     HASH GROUP BY        |           |  795K|    31M|    91M| 29337   (1)| 
|* 5 |     HASH JOIN            |           | 1690K|    67M|       |  5337   (1)| 
|  6 |       TABLE ACCESS FULL  | CUSTOMERS |  341 |  6479 |       |     4   (0)| 
|  7 |       TABLE ACCESS FULL  | T         | 5045K|   110M|       |  5317   (1)| 
----------------------------------------------------------------------------------
Predicate Information (IDENTIFIED BY operation id):
---------------------------------------------------
2 - FILTER("from$_subquery$_003"."rowlimit_$$_rownumber"<=10)
3 - FILTER(ROW_NUMBER() OVER ( ORDER BY SUM("SALESAMOUNT") DESC )<=10)
5 - access("C"."C_CUSTID"="T"."LOYALTYCARDNO")

On line 3 of the plan we see a window function or analytic function of some sort is being used. And if we looks the predicate information under the plan we say  the FETCH FIRST ROWS is actually rewritten  to utilize the ROW_NUMBER analytic function (line 3 of plan), which was introduced in Oracle Database 8.1.6.

ROW_NUMBER assigns a unique number (starting with 1) to each row it’s applied to. By ordering the rows passed to the ROW_NUMBER function, it makes it easy to limit the number of rows returned by the query by putting a filter predicate on the unique number.

So our query was actually rewritten as follows:

SELECT s.*
FROM
  (SELECT c.c_name name,
          c.c_custid loyaltyCardNo,
          t.city,
          t.state,
          SUM(salesAmount) total ,
          ROW_NUMBER() OVER ( ORDER BY SUM("SALESAMOUNT") DESC ) rk
   FROM   customers c,
          t
   WHERE  c.c_custid = t.loyaltyCardNo
   GROUP BY c.c_name,
            c.c_custid,
            t.city,
            t.state 
  ) s
WHERE rk <=10;

So, other than some nice syntactic sugar, is there any benefit to using FETCH FIRST rather than the original approach to Top-N queries where we would have used an inline view?

After all, I could have written the query above using an inline view as follows:

SELECT * FROM
 (SELECT c.c_name,
         c.c_custid  loyaltyCardNo,
         t.city,
         t.state,
         SUM(salesAmount) total
  FROM   customers c,
         t
  WHERE  c.c_custid = t.loyaltyCardNo
  GROUP  BY c.c_name,
            c.c_custid,
            t.city,
            t.state
  ORDER BY total DESC
  )
WHERE rownum <=10;

Which would have resulted in a the following execution plan:

-------------------------------------------------------------------------
| Id | Operation                | Name     | ROWS  | Bytes | Cost (%CPU)|
--------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |           |      |       | 5477 (100)| 
|* 1 | COUNT STOPKEY            |           |      |       |           |
|  2 |   VIEW                   |           | 3410 |    13M| 5477   (4)|
|* 3 |   SORT ORDER BY STOPKEY  |           | 3410 |   173K| 5477   (4)|
|  4 |     HASH GROUP BY        |           | 3410 |   173K| 5477   (4)|
|* 5 |     HASH JOIN            |           | 3410 |   173K| 5475   (4)|
|  6 |       TABLE ACCESS FULL  | CUSTOMERS |  341 |  6479 |    4   (0)|
|  7 |       VIEW               | VW_GBC_6  |10180 |   328K| 5471   (4)|
|  8 |       HASH GROUP BY      |           |10180 |   228K| 5471   (4)|
|  9 |         TABLE ACCESS FULL| T         | 5045K|   110M| 5317   (1)|
------------------------------------------------------------------------
Predicate Information (IDENTIFIED BY operation id):
---------------------------------------------------
1 - FILTER(ROWNUM<=10)
3 - FILTER(ROWNUM<=10)
5 - access("C"."C_CUSTID"="ITEM_1")

In my tests the elapse time for both queries was the same, so there was no obvious performance benefit. But the new syntax does improve the readability of the query and provides us with an easier way to do pagination queries.

For example, if we need to show rows 11-20 of our result set with the new syntax the query simply becomes:

SELECT c.c_name, 
       c.c_custid loyaltyCardNo, 
       t.city, 
       t.state,
       SUM(salesAmount) total
FROM   customers c,
       t
WHERE  c.c_custid = t.loyaltyCardNo
GROUP BY c.c_name,
         c.c_custid,
         t.city,
         t.state
ORDER BY total DESC
OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY;

While the query using the inline view gets a lot more complicated.

SELECT * 
FROM
  (SELECT rownum rnum, a.* 
   FROM
     (SELECT c.c_name,
             c.c_custid loyaltyCardNo,
             t.city,
             t.state,
             SUM(salesAmount) total
      FROM   customers c,
             t
      WHERE  c.c_custid = t.loyaltyCardNo
      GROUP BY c.c_name,
               c.c_custid,
               t.city,
               t.state
      ORDER BY total DESC
      ) a
   WHERE rownum <=20 ) 
WHERE rnum >= 11;

I also found a 10% performance improvement on the pagination query performance with the new syntax, so if you do a lot of Top-N or pagination queries it might be worth checking out the new syntax!

One thought on “New Top-N Queries and Pagination Syntax”

  1. Hi Maria,

    I enjoy using this new functionality. It just saves me lots of time in everyday DBA work.
    I hope my Java developers, writing their Hibernate stuff, will embrace it too one day..
    I think you have made a typo mentioning a version number:
    ROW_NUMBER analytic function (line 3 of plan), which was introduced in Oracle Database 10g.

    It has been available since 8.1.6:
    http://docs.oracle.com/cd/A81042_01/DOC/server.816/a76962/816.htm#44129
    http://docs.oracle.com/cd/A81042_01/DOC/server.816/a76989/functi79.htm#86312

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: