I was recently asked if I would ever recommend locking table statistics as a way of preventing execution plans from changing. As, with a lot questions’ regarding the Optimizer and statistics the answer was “it depends”.
There are definitely some edge cases where locking the statistics is the best approach but I wouldn’t generally recommend it to achieve plan stability. Before I share the edge cases where I believe locking statistics would help, lets first look at exactly what locking table statistics does and why it’s not a good approach to plan stability.
From Oracle Database 10g onwards you have had the ability to lock the optimizer statistics for a table or schema, via the DBMS_STATS package.
BEGIN DBMS_STATS.LOCK_TABLE_STATS(‘SH’,’SALES’); END; /
The LOCK_STATS procedure locks all of the statistics related to a table, including the table statistics (number of rows, number of blocks etc.), column statistics (min and max values, number of distinct values etc.), histograms and the statistics on all dependent indexes. Once statistics are locked, no modifications can be made to those statistics until the statistics have been unlocked or if the FORCE option is used in the GATHER_*_STATS procedures.
So, locking the statistics ensures the optimizer will use the exact same set of statistics, for that table, every time. Assuming those statistics are representative and nothing else changes in the environment (e.g. parameter settings etc.), you should get the same execution plan every time.
The key word here is representative. If over time the statistics are no longer representative then there is not guarantee that you will get the same plan.
Why wouldn’t the statistics be representative over time?
If the table contains a date column, a sequence number or any other column whose value grows over time and those columns are then used in a where clause predicate, then the statistics won’t remain representative.
If one of this constantly growing columns is used in a where clause predicate and the value supplied is outside the domain of values represented by the [minimum, maximum] column statistics, the Optimizer prorates the cardinality estimate for that where clause predicate.
The Optimizer prorates the estimate based on the distance between the predicate value, and the maximum value (assuming the specified value is higher than the max). The farther the value is from the maximum or minimum value, and the lower the selectivity will be and the greater the chance a plan change will occur.
Not being able to guarantee that the locked statistics will remain representative is the main reason I wouldn’t recommend this approach for maintaining plan stability.
Instead I would strongly suggest you take a look at SQL Plan Management (SPM), which is Oracle’s preferred mechanism to guarantee plan stability. Nigel Bayliss (Product Manager for the Optimizer and Statistics) has written a great blog series on how to use SPM that is definitely worth a read if plan stability is what you are after.
So when would you lock table statistics?
There are a number of cases were locking the table statistics is a good idea. For example, if you have a highly volatile tables or intermediate work table, where the volume of data changes dramatically over a relatively short amount of time.
Imagine we were to keep track of orders via an orders queue table, which at the start of every day is empty. As the day progresses and orders are placed the table begins to fill up. Once each order is processed it is deleted from the tables, so by the end of the day the table is empty again. If we were to rely on the nightly automatic statistics-gathering task, it would always gather statistics on the table when it was empty, which would not be a representative set of statistics for the table by the middle of the next day. In this case, I would recommend manually gathering statistics on the volatile table in the middle of the data when it does have a representative set of data in it and then locking those statistics to prevent the automatic statistics-gathering task from over writing them.
Locking partition level statistics in conjunction with copying statistics can also be a solution for a certain type of ‘out of range’ error that can occur on large partitioned table, which I previous documented on the Optimizer blog.
Finding the right solution to guarantee plan stability will depend very much on your particular workload characteristics. This is definitely a case where “one sizes” won’t fit all. You may end up using a combination of the above methods but whatever approach you take, make sure the statistics you use are representative!
I agree with the principle.
I usually advocate coding for a few special cases, and any such case should do:
“unlock stats / fix stats / lock stats”.
The other particularly special cases (to my mind) are:
a) the batch data load – new stats needed as part of the on-going batch process. I often get asked (almost apologetically, occasionally) if this is okay.
b) “difficult” histograms. Until 12c that was ALL histograms but now it’s just tables where I think a hybrid histogram will be important but won’t be good enough if left to the default sample size – which is often too small to be stable (though a lot better than the old height-balanced).
Regards
Jonathan Lwis
This excellent website definitely has all of the information I needed about this subject and didn’t know who to ask.
Thanks for the insights.
Foued