# Histogram sample size and Online Statistics Gathering

I got a great follow up question to my earlier blog on Online Statistics Gathering; that I thought might be of interest to other and worthy of a short post of its own.

The question related to sample size used to gather histograms on a table that originally had its statistics gathered via a direct path load operation.

Let’s look at any example:

```CREATE TABLE t AS SELECT * FROM dba_objects;

TABLE created.

SELECT COUNT(*) FROM t;

COUNT(*)
----------
67289

SELECT num_distinct, num_nulls, histogram, num_buckets, sample_size, notes
FROM user_tab_col_statistics
WHERE TABLE_NAME='T'
AND column_name='OWNER';

NUM_DISTINCT NUM_NULLS     HISTOGRAM     NUM_BUCKETS SAMPLE_SIZE NOTES
------------ ---------- --------------- ----------- ----------- --------------------------------------------------------------------------------

28               0         NONE            1          67289       STATS_ON_LOAD```

So, as you can see in the above example, the table T is benefiting from an online statistics gathering as the note column says “STATS_ON_LOAD“.

Since there is no column usage information available during the initial table creation, no histograms are created. So let’s run a query to generate some.

```SELECT COUNT(*), SUM(object_id) FROM t WHERE owner='SYS';

COUNT(*) SUM(OBJECT_ID)
---------- --------------
51070     1665378066```

Now that we have some column usage information, let’s gather statistics using the “GATHER AUTO” option and review the statistics we get.

``` EXEC dbms_stats.gather_table_stats(USER,'T', options=&gt;'GATHER AUTO');

PL/SQL PROCEDURE successfully completed.

SELECT num_distinct, num_nulls, histogram, num_buckets, sample_size, notes
FROM user_tab_col_statistics
WHERE TABLE_NAME='T'
AND column_name='OWNER';

NUM_DISTINCT NUM_NULLS    HISTOGRAM   NUM_BUCKETS    SAMPLE_SIZE NOTES
------------ ---------- --------------- ----------- ----------- --------------------------------------------------------------------------------
28               0        FREQUENCY         22         5338       HISTOGRAM_ONLY```

You will notice we did get a frequency histogram on the OWNER column and the sample size used was 5,338.

But why was the sample size so small when we know there are over 67,000 rows in our table T?

The culprit here is the options setting “GATHER AUTO“.

If you gather statistics with the option “GATHER AUTO“, on a table that previous had it’s statistics gathered on load, what you are asking Oracle to do is gather only the missing statistics for your table.

In our case the only missing statistics are column level histogram.

Since Oracle only has to gather histograms, it picks the fastest method possible, which is to sample. The default sampling size for histogram creation is approximately 5,500 non-null values for the column. So in our case the sample turn out to be 5,338.

Now let’s see what happens when we gather statistics without the option “GATHER AUTO“.

``` EXEC dbms_stats.gather_table_stats(USER,'T');

PL/SQL PROCEDURE successfully completed.

SELECT num_distinct, num_nulls, histogram, num_buckets, sample_size, notes
FROM user_tab_col_statistics
WHERE TABLE_NAME='T'
AND column_name='OWNER';

NUM_DISTINCT NUM_NULLS    HISTOGRAM   NUM_BUCKETS    SAMPLE_SIZE NOTES
------------ ---------- --------------- ----------- ----------- --------------------------------------------------------------------------------
28               0        FREQUENCY         22         67289```

In this case what we actually asked Oracle to do is re-gather statistics on table T from scratch. Therefore Oracle scanned the entire table to gather the table and column level statistics including the histograms. Hence a sample size of 67,289.

You should also notice that the NOTES column is now empty indicating that we are no longer using any statistics that were gathered as part of the initial load.

This entry was posted in AskTom, Optimizer, Oracle Database 12c New Features and tagged , , . Bookmark the permalink.

### 1 Response to Histogram sample size and Online Statistics Gathering

1. Rajeshwaran, Jeyabal says:

Wow, this is my question from AskTom