In 11g, Oracle introduced the DBMS_STAT.DIFF_TABLE_STATS functions to help you compare two sets of statistics for a table along with all its dependent objects (indexes, columns, partitions).
There are three versions of this function depending on where the statistics being compared are located:
- DBMS_STAT.DIFF_TABLE_STATS_IN_HISTORY (compares statistics for a table from two timestamps in the past)
- DBMS_STAT.DIFF_TABLE_STATS_IN_PENDING (compares pending statistics and statistics as of a timestamp or statistics from the data dictionary)
- DBMS_STAT.DIFF_TABLE_STATS_IN_STATTAB (compares statistics from a user statistics table and the data dictionary, from two different user statistics tables, or a single user statistics table using two different STATSIDs)
The functions return a report that has three sections:
- Basic table statistics
The report compares the basic table statistics (number of rows, blocks, etc.). - Column statistics
The second section of the report examines column statistics, including histograms. - Index Statistics
The final section of the report covers differences in index statistics.
Statistics will only be displayed in the report if the difference in the statistics exceeds a certain threshold (%). The threshold can be specified as an argument to the functions (PCTTHRESHOLD); the default value is 10%. The statistics corresponding to the first source, typically the current table stats in the data dictionary, will be used to compute the differential percentage.
The functions also return the MAXDIFFPCT (a number) along with the report. This is the maximum percentage difference between the statistics. These differences can come from the table, column, or index statistics.
Let’s look at an example.
Imagine you are upgrading from Oracle Database 11g to 19c. In 11g, you didn’t gather histograms, but in 19c, you want to see if the new histogram types will benefit your workload.
To ensure we don’t impact the application’s performance, we will gather the new set of statistics as pending stats, compare them to the current stats, and publish the new stats only if they improve our app’s performance. Our example will focus on the SALES table in MARIA’s schema.
-- Let's confirm how the system is currently setup SELECT dbms_stats.get_prefs('PUBLISH','MARIA','SALES') FROM dual; DBMS_STATS.GET_PREFS('PUBLISH') ------------------------------------------- TRUE -- So statistics are automatically published to the data dictionary by default --Lets check the current column stats for the SALES table SELECT column_name, histogram FROM user_tab_col_statistics WHERE TABLE_NAME='SALES'; COLUMN_NAME HISTOGRAM ______________ ____________ ORDER_ID NONE PRODUCT_ID NONE DATE_ID NONE CUST_ID NONE SUPPLIER_ID NONE PRICE NONE AMOUNT_SOLD NONE REVENUE NONE
Let’s enable pending stats and gather a fresh set of statistics for the SALES table with new types of histograms.
-- Enabling pending stats on the SALES table BEGIN dbms_stats.set_table_prefs('MARIA', 'SALES', 'PUBLISH', 'FALSE'); END; / PL/SQL PROCEDURE successfully completed. SELECT dbms_stats.get_prefs('PUBLISH', 'MARIA', 'SALES') FROM dual; DBMS_STATS.GET_PREFS('PUBLISH') ------------------------------------------- FALSE -- Gather a new set of statistics for MARIA.SALES that included new histograms types introduced in 12c BEGIN dbms_stats.gather_table_stats('MARIA', 'SALES', method_opt=>'FOR ALL COLUMNS SIZE 254'); END; / PL/SQL PROCEDURE successfully completed. SELECT TABLE_NAME, sample_size, last_analyzed FROM all_tab_pending_stats WHERE owner='MARIA' AND TABLE_NAME='SALES'; TABLE_NAME SAMPLE_SIZE LAST_ANALYZED _____________ ______________ ________________ MY_SALES 10000 27-FEB-23
Now that we have a set of pending statistics let’s compare them to the current statistics using the DIFF_TABLE_STATS_IN_PENDING function.
The DBMS_STAT.DIFF_TABLE_STATS functions are table functions, so you must use the keyword TABLE when selecting from them. Otherwise, you will receive an OCI-21700 error saying the object does not exist.
Remember that the report is returned as a clob. To display the report correctly in SQLcl or SQL*Plus, you must use the set long command to define the width of a long.
-- set width of a long SET long 99999 SET longchunksize 99999 -- Generate the report SELECT report FROM TABLE(dbms_stats.diff_table_stats_in_pending('ADMIN','MY_SALES', systimestamp,0)); REPORT ________________________________________________________________________________________________________ ############################################################################### STATISTICS DIFFERENCE REPORT FOR: ................................. TABLE : SALES OWNER : MARIA SOURCE A : Statistics AS OF 24-FEB-23 11.30.34.322736 PM +00:00 SOURCE B : Pending Statistics PCTTHRESHOLD : 0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ NO DIFFERENCE IN TABLE / (SUB)PARTITION STATISTICS ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ COLUMN STATISTICS DIFFERENCE: ............................. COLUMN_NAME SRC NDV DENSITY HIST NULLS LEN MIN MAX SAMPSIZ ............................................................................... AMOUNT_SOLD A 25 .04 NO 0 3 C106 C139 10000 B 25 .00005 YES 0 3 C106 C139 10000 CUST_ID A 21 .047619047 NO 0 6 C25F3 C5035 10000 B 21 .00005 YES 0 6 C25F3 C5035 10000 DATE_ID A 14 .071428571 NO 0 8 78680 78680 10000 B 14 .00005 YES 0 8 78680 78680 10000 ORDER_ID A 20 .05 NO 0 7 C5033 C5320 10000 B 20 .00005 YES 0 7 C5033 C5320 10000 PRICE A 17 .058823529 NO 0 6 C11A C41E5 10000 B 17 .00005 YES 0 6 C11A C41E5 10000 PRODUCT_ID A 30 .033333333 NO 0 6 C3052 C43F3 10000 B 30 .00005 YES 0 6 C3052 C43F3 10000 REVENUE A 29 .034482758 NO 0 6 C3355 C4092 10000 B 30 .00005 YES 0 6 C3355 C4092 10000 SUPPLIER_ID A 30 .033333333 NO 0 6 C164 C4025 10000 B 30 .00005 YES 0 6 C164 C4025 10000 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ NO DIFFERENCE IN INDEX / (SUB)PARTITION STATISTICS ###############################################################################
From the report above, we can see no differences in the table stats (number of rows, blocks, etc.) and no differences in the index statistics. However, we do see substantial differences in the column stats.
Each column where the statistics vary is listed along with a copy of the statistics values from each source. Source A is the data dictionary, while Source B is the pending statistics, gathered with AUTO_SAMPLE_SIZE set to the default. You won’t notice a significant difference in the base column statistics (NDV, Min, Max, etc.), but we now have a histogram on each column in the pending stats (source B).
We must test these additional histograms using our application workload to confirm they are worth gathering. The easiest way is to set the parameter OPTIMIZER_USE_PENDING_STATISTICS to TRUE. This will allow the optimizer to use the pending statistics we gather just for this session. It won’t impact the existing application session, which will continue to use the current statistics in the data dictionary.
ALTER SESSION SET optimizer_use_pending_statistics=TRUE;
Once we confirm that the new histograms are beneficial and don’t cause any performance regressions, the pending statistics can be published, and the PUBLISH preference for the SALES table can be set back to TRUE.
-- Publish pending stats BEGIN dbms_stats.publish_pending_stats('MARIA','SALES'); END; / PL/SQL PROCEDURE successfully completed. -- Set the PUBLISH preference back to the default on the SALES table BEGIN dbms_stats.set_table_prefs('MARIA', 'SALES', 'PUBLISH', 'TRUE'); END; / PL/SQL PROCEDURE successfully completed. -- Confirm the histograms that we gather as pending statistics are now visible SELECT column_name, histogram FROM user_tab_col_statistics WHERE TABLE_NAME ='SALES'; COLUMN_NAME HISTOGRAM ------------ --------------- ORDER_ID FREQUENCY PRODUCT_ID HYBRID DATE_ID FREQUENCY CUST_ID HYBRID SUPPLIER_ID HYBRID PRICE FREQUENCY AMOUNT_SOLD FREQUENCY REVENUE FREQUENCY
If you are looking for an example of using the DBMS_STAT.DIFF_TABLE_STATS_IN_STATTAB function, you can check out my blog post on the Optimizer blog.