How to use DBMS_STATS DIFF_TABLE_STATS functions

In 11g, Oracle introduced the DBMS_STAT.DIFF_TABLE_STATS functions to help you compare two sets of statistics for a table along with all its dependent objects (indexes, columns, partitions).

There are three versions of this function depending on where the statistics being compared are located:

  • DBMS_STAT.DIFF_TABLE_STATS_IN_HISTORY (compares statistics for a table from two timestamps in the past)
  • DBMS_STAT.DIFF_TABLE_STATS_IN_PENDING (compares pending statistics and statistics as of a timestamp or statistics from the data dictionary)
  • DBMS_STAT.DIFF_TABLE_STATS_IN_STATTAB (compares statistics from a user statistics table and the data dictionary, from two different user statistics tables, or a single user statistics table using two different STATSIDs)

The functions return a report that has three sections:

  1. Basic table statistics
    The report compares the basic table statistics (number of rows, blocks, etc.).
  2. Column statistics
    The second section of the report examines column statistics, including histograms.
  3. Index Statistics
    The final section of the report covers differences in index statistics.

Statistics will only be displayed in the report if the difference in the statistics exceeds a certain threshold (%). The threshold can be specified as an argument to the functions (PCTTHRESHOLD); the default value is 10%. The statistics corresponding to the first source, typically the current table stats in the data dictionary, will be used to compute the differential percentage.

The functions also return the MAXDIFFPCT (a number) along with the report. This is the maximum percentage difference between the statistics. These differences can come from the table, column, or index statistics.

Let’s look at an example.

Imagine you are upgrading from Oracle Database 11g to 19c. In 11g, you didn’t gather histograms, but in 19c, you want to see if the new histogram types will benefit your workload.

To ensure we don’t impact the application’s performance, we will gather the new set of statistics as pending stats, compare them to the current stats, and publish the new stats only if they improve our app’s performance. Our example will focus on the SALES table in MARIA’s schema.

-- Let's confirm how the system is currently setup
 
SELECT dbms_stats.get_prefs('PUBLISH','MARIA','SALES') FROM dual; 
 
DBMS_STATS.GET_PREFS('PUBLISH') 
------------------------------------------- 
TRUE
 
-- So statistics are automatically published to the data dictionary by default
 
--Lets check the current column stats for the SALES table 
 
SELECT column_name, histogram FROM user_tab_col_statistics WHERE TABLE_NAME='SALES';
 
COLUMN_NAME    HISTOGRAM    
______________ ____________ 
ORDER_ID       NONE         
PRODUCT_ID     NONE         
DATE_ID        NONE         
CUST_ID        NONE         
SUPPLIER_ID    NONE         
PRICE          NONE         
AMOUNT_SOLD    NONE         
REVENUE        NONE

Let’s enable pending stats and gather a fresh set of statistics for the SALES table with new types of histograms.

-- Enabling pending stats on the SALES table
BEGIN
  dbms_stats.set_table_prefs('MARIA', 'SALES', 'PUBLISH', 'FALSE');
END;
/
 
PL/SQL PROCEDURE successfully completed.
 
SELECT dbms_stats.get_prefs('PUBLISH', 'MARIA', 'SALES') FROM dual; 
 
DBMS_STATS.GET_PREFS('PUBLISH') 
------------------------------------------- 
FALSE
 
-- Gather a new set of statistics for MARIA.SALES that included new histograms types introduced in 12c
BEGIN
  dbms_stats.gather_table_stats('MARIA', 'SALES', method_opt=>'FOR ALL COLUMNS SIZE 254');
END;
/
 
PL/SQL PROCEDURE successfully completed.
SELECT TABLE_NAME, sample_size, last_analyzed 
FROM   all_tab_pending_stats 
WHERE  owner='MARIA' AND TABLE_NAME='SALES';
 
TABLE_NAME       SAMPLE_SIZE LAST_ANALYZED    
_____________ ______________ ________________ 
MY_SALES               10000 27-FEB-23

Now that we have a set of pending statistics let’s compare them to the current statistics using the DIFF_TABLE_STATS_IN_PENDING function.

The DBMS_STAT.DIFF_TABLE_STATS functions are table functions, so you must use the keyword TABLE when selecting from them. Otherwise, you will receive an OCI-21700 error saying the object does not exist.

Remember that the report is returned as a clob. To display the report correctly in SQLcl or SQL*Plus, you must use the set long command to define the width of a long.

-- set width of a long
SET long 99999
SET longchunksize 99999
 
-- Generate the report
SELECT report FROM TABLE(dbms_stats.diff_table_stats_in_pending('ADMIN','MY_SALES', systimestamp,0));
 
REPORT
________________________________________________________________________________________________________
###############################################################################
STATISTICS DIFFERENCE REPORT FOR:
.................................
TABLE         : SALES
OWNER         : MARIA
SOURCE A      : Statistics AS OF 24-FEB-23 11.30.34.322736 PM +00:00
SOURCE B      : Pending Statistics
PCTTHRESHOLD  : 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
NO DIFFERENCE IN TABLE / (SUB)PARTITION STATISTICS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
COLUMN STATISTICS DIFFERENCE:
.............................
COLUMN_NAME     SRC NDV     DENSITY    HIST NULLS   LEN  MIN   MAX   SAMPSIZ
...............................................................................
AMOUNT_SOLD     A   25      .04        NO   0       3    C106  C139  10000  
                B   25      .00005     YES  0       3    C106  C139  10000  
CUST_ID         A   21      .047619047 NO   0       6    C25F3 C5035 10000  
                B   21      .00005     YES  0       6    C25F3 C5035 10000  
DATE_ID         A   14      .071428571 NO   0       8    78680 78680 10000  
                B   14      .00005     YES  0       8    78680 78680 10000  
ORDER_ID        A   20      .05        NO   0       7    C5033 C5320 10000  
                B   20      .00005     YES  0       7    C5033 C5320 10000  
PRICE           A   17      .058823529 NO   0       6    C11A  C41E5 10000  
                B   17      .00005     YES  0       6    C11A  C41E5 10000  
PRODUCT_ID      A   30      .033333333 NO   0       6    C3052 C43F3 10000  
                B   30      .00005     YES  0       6    C3052 C43F3 10000  
REVENUE         A   29      .034482758 NO   0       6    C3355 C4092 10000  
                B   30      .00005     YES  0       6    C3355 C4092 10000  
SUPPLIER_ID     A   30      .033333333 NO   0       6    C164  C4025 10000  
                B   30      .00005     YES  0       6    C164  C4025 10000  
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
NO DIFFERENCE IN INDEX / (SUB)PARTITION STATISTICS
###############################################################################

From the report above, we can see no differences in the table stats (number of rows, blocks, etc.) and no differences in the index statistics. However, we do see substantial differences in the column stats.

Each column where the statistics vary is listed along with a copy of the statistics values from each source. Source A is the data dictionary, while Source B is the pending statistics, gathered with AUTO_SAMPLE_SIZE set to the default. You won’t notice a significant difference in the base column statistics (NDV, Min, Max, etc.), but we now have a histogram on each column in the pending stats (source B).

We must test these additional histograms using our application workload to confirm they are worth gathering. The easiest way is to set the parameter OPTIMIZER_USE_PENDING_STATISTICS to TRUE. This will allow the optimizer to use the pending statistics we gather just for this session. It won’t impact the existing application session, which will continue to use the current statistics in the data dictionary.

ALTER SESSION SET optimizer_use_pending_statistics=TRUE;

Once we confirm that the new histograms are beneficial and don’t cause any performance regressions, the pending statistics can be published, and the PUBLISH preference for the SALES table can be set back to TRUE.

-- Publish pending stats
BEGIN
  dbms_stats.publish_pending_stats('MARIA','SALES');
 END;
/
 
PL/SQL PROCEDURE successfully completed.
 
-- Set the PUBLISH preference back to the default on the SALES table
BEGIN
  dbms_stats.set_table_prefs('MARIA', 'SALES', 'PUBLISH', 'TRUE');
END;
/
 
PL/SQL PROCEDURE successfully completed.
 
-- Confirm the histograms that we gather as pending statistics are now visible
SELECT column_name, histogram FROM user_tab_col_statistics WHERE TABLE_NAME ='SALES';
 
COLUMN_NAME    HISTOGRAM
------------ ---------------
ORDER_ID       FREQUENCY    
PRODUCT_ID     HYBRID    
DATE_ID        FREQUENCY    
CUST_ID        HYBRID   
SUPPLIER_ID    HYBRID    
PRICE          FREQUENCY    
AMOUNT_SOLD    FREQUENCY    
REVENUE        FREQUENCY

If you are looking for an example of using the DBMS_STAT.DIFF_TABLE_STATS_IN_STATTAB function, you can check out my blog post on the Optimizer blog.

Leave a Reply

Your email address will not be published. Required fields are marked *