Since it’s introduction in Oracle Database 8i, the DBMS_STATS package is Oracle’s preferred method for gathering statistics. With each new database release the DBMS_STATS package is extended to accommodate new approaches to gather statistics and new types of statistics.
Over the years, application developers and DBAs have written hundreds of scripts using the DBMS_STATS package to help gather and manage optimizer statistics effectively. However, once written these scripts are rarely modified to take advantage of the improvements in the DBMS_STATS package, which can result in suboptimal statistics.
Oracle Database 12 Release 2 makes it a lot easier to be able to manage this vast collection of scripts by includes a new DBMS_STATS preference called PREFERENCE_OVERRIDES_PARAMETER. When this preference is set to TRUE, it allows preference settings to override the parameter values specifically set in a DBMS_STATS command.
For example, if the global preference ESTIMATE_PERCENT is set to the default DBMS_STATS.AUTO_SAMPLE_SIZE, but a 10% sample size was specified in the in a DBMS_STATS.GATHER_TABLE_STATS command, setting PREFERENCE_OVERRIDES_PARAMETER to TRUE would automatically replace the 10% sample size with the default preference DBMS_STATS.AUTO_SAMPLE_SIZE.
Let’s take a look at this in action. Imagine we have a SALES table with 1.1 million rows in it.
SELECT COUNT(*) FROM sales; COUNT(*) ------------ 1164295
The following command is used to gather statistics on the SALES table.
BEGIN DBMS_STATS.GATHER_TABLE_STATS(ownname=>'SH', tabname=>'SALES', estimate_percent=>10); END; / PL/SQL PROCEDURE successfully completed. Elapsed: 00:00:09.08
If we check the sample size used for gathering statistics, we see that only 10% of the rows were used as requested and that a FREQUENCY histogram has been created on the CUST_ID column of the SALES table.
SELECT TABLE_NAME, sample_size FROM user_tables WHERE TABLE_NAME= 'SALES'; TABLE_NAME SAMPLE_SIZE ------------------------------ ----------- SALES 116665 SELECT column_name, num_distinct, num_nulls, histogram FROM user_tab_col_statistics WHERE TABLE_NAME='SALES'; COLUMN_NAME NUM_DISTINCT NUM_NULLS HISTOGRAM -------------- --------------- ------------ ---------- PROD_ID 137 0 NONE CUST_ID 161 0 FREQUENCY TIME_ID 135 0 NONE CHANNEL_ID 5 0 NONE PROMO_ID 22 0 NONE QUANTITY_SOLD 39 0 NONE AMOUNT_SOLD 152 0 NONE
Unfortunately, these statistics are not that accurate as there are over 600 distinct PROD_IDs and CUST_IDs in the SALES table but the statistics gather missed this information because it looked at such a small sample.
SELECT COUNT(DISTINCT prod_id), COUNT(DISTINCT cust_id) FROM sales; COUNT(DISTINCTPROD_ID) COUNT(DISTINCTCUST_ID) ---------------------- ---------------------- 651 630
In order to improve the accuracy of the statistics, we should use the recommended best practice value for the ESTIMATE_PERECENT parameter, AUTO_SAMPLE_SIZE. Let’s check the preference value for the ESTIMATE_PERECENT parameter on the SALES table.
SELECT DBMS_STATS.GET_PREFS(pname=>'ESTIMATE_PRECENT' , ownname=>'SH', tabname=>'SALES') FROM Dual; DBMS_STATS.GET_PREFS(PNAME=>'ESTIMATE_PERCENT',OWNNAME=>'SH',TABNAME=>'SALES') -------------------------------------------------------------------------------- DBMS_STATS.AUTO_SAMPLE_SIZE
We see it’s actually set to DBMS_STATS.AUTO_SAMPLE_SIZE, which is good news for us.
Rather than having to find and edit all of the existing statistics gathering script that refer to the SALES table, we can set the DBMS_STATS preference PREFERENCE_OVERRIDES_PARAMETER to TRUE on the SALES table, which will automatically override all of the none default parameter values used in the DBMS_STATS command with the corresponding preferences values.
BEGIN DBMS_STATS.SET_TABLE_PREFS(ownname=>'SH', tabname=>'SALES', pname=>'PREFERENCE_OVERRIDES_PARAMETER', pvalue=>'TRUE'); END; / PL/SQL PROCEDURE successfully completed.
So, in our case the ESTIMATE_PERCENT of 10% will actually be overridden and DBMS_STATS.AUTO_SAMPLE_SIZE will be used. Let’s see what impact that has on both the statistics gathering performance and the quality of the statistics gathered.
BEGIN DBMS_STATS.GATHER_TABLE_STATS(ownname=>'SH', tabname=>'SALES', estimate_percent=>10); END; / PL/SQL PROCEDURE successfully completed. Elapsed: 00:00:06.59
SELECT TABLE_NAME, SAMPLE_SIZE FROM user_tables WHERE TABLE_NAME= 'SALES'; TABLE_NAME SAMPLE_SIZE ------------------------------ ----------- SALES 1164295
So, you can see that the time it took to gather statistics was actually less than the time for the 10% but the sample size was the total number of rows in the table. But what about the quality of the statistics?
SELECT column_name, num_distinct, num_nulls, histogram FROM user_tab_col_statistics WHERE TABLE_NAME='SALES'; COLUMN_NAME NUM_DISTINCT NUM_NULLS HISTOGRAM -------------- --------------- ------------ ---------- PROD_ID 651 0 NONE CUST_ID 630 0 TOP-FREQUENCY TIME_ID 620 0 NONE CHANNEL_ID 5 0 NONE PROMO_ID 116 0 NONE QUANTITY_SOLD 44 0 NONE AMOUNT_SOLD 583 0 NONE
As you can see, we have a much more accurate set of statistics this time with each of the number of distinct values being correct. You will also notice we got a new type of histogram a TOP-FREQUENCY histogram this time. That’s because this new type of histogram is only gathered if the ESTIMATE_PERCENT is set to the default.
Remember setting PREFERENCE_OVERRIDES_PARAMETER to TRUE will override all none default parameters set in the DBMS_STATS.GATHER_*_STATS command, so you may need to adjust the default preferences for some of the parameters, for example METHOD_OPT if you don’t want to revert to the default.
You can set a table preference using the DBMS_STATS.SET_TABLE_PREFS procedure.
BEGIN DBMS_STATS.SET_TABLE_PREFS(ownname=>'SH', tabname=>'SALES', pname=>METHOD_OPT, pvalue=>'FOR ALL COLUMNS SIZE 1 FOR COLUMNS CUST_ID SIZE 254'); END;