I got a great follow up question to my earlier blog on Online Statistics Gathering; that I thought might be of interest to other and worthy of a short post of its own.
The question related to sample size used to gather histograms on a table that originally had its statistics gathered via a direct path load operation.
Let’s look at any example:
CREATE TABLE t AS SELECT * FROM dba_objects; TABLE created. SELECT COUNT(*) FROM t; COUNT(*) ---------- 67289 SELECT num_distinct, num_nulls, histogram, num_buckets, sample_size, notes FROM user_tab_col_statistics WHERE TABLE_NAME='T' AND column_name='OWNER'; NUM_DISTINCT NUM_NULLS HISTOGRAM NUM_BUCKETS SAMPLE_SIZE NOTES ------------ ---------- --------------- ----------- ----------- -------------------------------------------------------------------------------- 28 0 NONE 1 67289 STATS_ON_LOAD
So, as you can see in the above example, the table T is benefiting from an online statistics gathering as the note column says “STATS_ON_LOAD“.
Since there is no column usage information available during the initial table creation, no histograms are created. So let’s run a query to generate some.
SELECT COUNT(*), SUM(object_id) FROM t WHERE owner='SYS'; COUNT(*) SUM(OBJECT_ID) ---------- -------------- 51070 1665378066
Now that we have some column usage information, let’s gather statistics using the “GATHER AUTO” option and review the statistics we get.
EXEC dbms_stats.gather_table_stats(USER,'T', options=>'GATHER AUTO'); PL/SQL PROCEDURE successfully completed. SELECT num_distinct, num_nulls, histogram, num_buckets, sample_size, notes FROM user_tab_col_statistics WHERE TABLE_NAME='T' AND column_name='OWNER'; NUM_DISTINCT NUM_NULLS HISTOGRAM NUM_BUCKETS SAMPLE_SIZE NOTES ------------ ---------- --------------- ----------- ----------- -------------------------------------------------------------------------------- 28 0 FREQUENCY 22 5338 HISTOGRAM_ONLY
You will notice we did get a frequency histogram on the OWNER column and the sample size used was 5,338.
But why was the sample size so small when we know there are over 67,000 rows in our table T?
The culprit here is the options setting “GATHER AUTO“.
If you gather statistics with the option “GATHER AUTO“, on a table that previous had it’s statistics gathered on load, what you are asking Oracle to do is gather only the missing statistics for your table.
In our case the only missing statistics are column level histogram.
Since Oracle only has to gather histograms, it picks the fastest method possible, which is to sample. The default sampling size for histogram creation is approximately 5,500 non-null values for the column. So in our case the sample turn out to be 5,338.
Now let’s see what happens when we gather statistics without the option “GATHER AUTO“.
EXEC dbms_stats.gather_table_stats(USER,'T'); PL/SQL PROCEDURE successfully completed. SELECT num_distinct, num_nulls, histogram, num_buckets, sample_size, notes FROM user_tab_col_statistics WHERE TABLE_NAME='T' AND column_name='OWNER'; NUM_DISTINCT NUM_NULLS HISTOGRAM NUM_BUCKETS SAMPLE_SIZE NOTES ------------ ---------- --------------- ----------- ----------- -------------------------------------------------------------------------------- 28 0 FREQUENCY 22 67289
In this case what we actually asked Oracle to do is re-gather statistics on table T from scratch. Therefore Oracle scanned the entire table to gather the table and column level statistics including the histograms. Hence a sample size of 67,289.
You should also notice that the NOTES column is now empty indicating that we are no longer using any statistics that were gathered as part of the initial load.
Wow, this is my question from AskTom
https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:5141284300346101910#9533044300346296322
Thanks for taking this up to blog post. you rock !