Online Statistics Gathering

Although there have been a number of significant changes to the Oracle Database in 12c, some of my favorite new features are actually the small enhancements, designed to make your life a little easier. So, I decided to spend some time this year sharing the details on these small but extremely useful features.

One such enhancement is Online Statistics Gathering.

Whenever an index is created, Oracle automatically gathers optimizer statistics for that index. The database piggybacks the statistics gather on the full data scan and sort operation necessary for the index creation. This approach has worked so well since it was introduced in 9i, few people even realize it’s happening.

Although there have been a number of significant changes to the Oracle Database in 12c, some of my favorite new features are actually the small enhancements, designed to make your life a little easier. So, I decided to spend some time this year sharing the details on these small but extremely useful features.

One such enhancement is Online Statistics Gathering.

Whenever an index is created, Oracle automatically gathers optimizer statistics for that index. The database piggybacks the statistics gather on the full data scan and sort operation necessary for the index creation. This approach has worked so well since it was introduced in 9i, few people even realize it’s happening.

SELECT TO_CHAR(SYSDATE,'DD-MON-YYYY HH24:MI:SS') WALL_CLOCK_TIME 
FROM dual;
 
WALL_CLOCK_TIME
-----------------------------
03-JAN-2017 21:29:44 
 
SELECT index_name, 
       TO_CHAR(last_analyzed,'DD-MON-YYYY HH24:MI:SS') 
       last_analyzed,
       leaf_blocks,
       distinct_keys,
       clustering_factor
FROM   user_indexes; 
 
no ROWS selected
 
CREATE INDEX t_s_idx ON items (s);
 
INDEX created.
 
SELECT index_name, 
       TO_CHAR(last_analyzed,'DD-MON-YYYY HH24:MI:SS') 
       last_analyzed,
       leaf_blocks,
       distinct_keys,
       clustering_factor
FROM   user_indexes; 
 
INDEX_NAME LAST_ANALYZED	 LEAF_BLOCKS DISTINCT_KEYS  CLUS_FACT
---------- -------------------- ----------- ------------- ----------
T_S_IDX    03-JAN-2017 21:29:45    6135	      1000000        999839

In Oracle Database 12c, the same technique is now applied for direct path operations such as, Create Table As Select (CTAS) and Insert /+APPEND */ As Select (IAS) operations into an empty table or partition.
Piggybacking the statistics gather as part of the data loading operation, means no additional full data scan is required to have statistics available immediately after the data is loaded. The additional time spent on gathering statistics is small compared to a separate statistics collection process, and it guarantees accurate statistics are readily available from the get-go.

SELECT TO_CHAR(SYSDATE,'DD-MON-YYYY HH24:MI:SS') WALL_CLOCK_TIME 
FROM   dual;
 
WALL_CLOCK_TIME
-----------------------------
03-JAN-2017 21:40:07
 
CREATE TABLE items2 AS SELECT * FROM items; 
 
TABLE created.
 
SELECT TABLE_NAME, 
       num_rows,
       TO_CHAR(last_analyzed,'DD-MON-YYYY HH24:MI:SS') last_analyzed
FROM   user_tables;
 
TABLE_NAME   NUM_ROWS  LAST_ANALYZED
---------- ----------  ---------------------
ITEMS	      1000000  20-OCT-2016 05:58:26
ITEMS2	      1000000  03-JAN-2017 21:40:09
 
SELECT column_name, 
       num_distinct, 
       num_nulls, 
       histogram, 
       notes
FROM   user_tab_col_statistics
WHERE  TABLE_NAME='ITEMS2';
 
COLUMN_NAME NUM_DISTINCT  NUM_NULLS HISTOGRAM	     NOTES
----------- ------------ ---------- --------------- --------------
N		  624640	  0 NONE	    STATS_ON_LOAD
S		  988416	  0 NONE	    STATS_ON_LOAD

Be aware that online statistics gathering will not occur if any data exists in the table or partition, as the overhead of reading the existing data could greatly increase the time it takes to complete the data load. Only base table and column statistics are gathered. Histograms are not gathered automatically, as they require a second pass of the data, which again could adversely affect the elapse time for the database load.

Histograms can be gathered manually without re-gathering base column statistics by using new GATHER AUTO option in the DBMS_STATS package.

BEGIN
dbms_stats.Gather_table_stats('SH','ITEMS2',options=>'GATHER AUTO');
END;
/ 

If for some reason you wish to turn off this new automatic online statistics gathering, you can do so by setting the underscore parameter _optimizer_gather_stats_on_load to FALSE.

8 thoughts on “Online Statistics Gathering”

  1. hi Maria,

    Excuse me if my question is too basic, but to my knowledge, no question is stupid until I am not clear on it.

    Though I understood the concept from this, Am still trying to understand from this post with what is the benefit that we are gaining from 12c? i.e., what was it before 12c and what is it with 12c. Please help me in understanding.

    Many Thanks,

    1. Hi Sunil,

      Prior to 12c, if you created a table via a CTAS command or insert a set of data into an empty table or partition via an IAS command, there would be no statistics generated. All subsequent queries would have to use dynamic sampling during parse to generate some basic statistics about the table until either the DBA manually gathered statistics or the nightly statistics gathering task kicks in.

      Having the statistics automatically gathered as part of the data load, means all subsequent queries will have accurate statistics immediately after the load completes. This reduces the parse time for the subsequent queries, as no dynamic sampling is required and ensures the optimizer has the information it needs to generate a optimal execution plan.

      Thanks,
      Maria

      1. Thanks, Maria, Got it. So, like an Index, now from 12c if _optimizer_gather_stats_on_load parameter is set to true stats on the table will be available (histograms excepted) to get accurate run times.

        Thank you got it.

        1. This property is stored in which table? i.e., I want to set this property true at system level,

          I can run a command

          alter system set “_optimizer_gather_stats_on_load”=true;

          But I want to see what is the current value for this property, where can I check the same.

          Many Thanks,

  2. Thanks for this post, as of 12c(12.1.0.2) this online stats gathering would work only for Heap table and not for IOT’s.

    Ofcourse it is documented here
    http://docs.oracle.com/database/121/TGSQL/tgsql_statscon.htm#GUID-CDDB5A54-0991-4E68-A9D7-2305777B608B

    demo@ORA12C> select * from v$version;

    BANNER
    ——————————————————————————– —
    Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 – 64bit Production
    PL/SQL Release 12.1.0.2.0 – Production
    CORE 12.1.0.2.0 Production
    TNS for 64-bit Windows: Version 12.1.0.2.0 – Production
    NLSRTL Version 12.1.0.2.0 – Production

    5 rows selected.

    demo@ORA12C> create table t(
    2 object_id,
    3 owner,
    4 object_name,
    5 object_type,
    6 constraint t_pk primary key(object_id) )
    7 organization index
    8 as
    9 select object_id,owner,object_name,object_type
    10 from all_objects;

    Table created.

    demo@ORA12C> select num_rows,blocks,last_analyzed,avg_row_len
    2 from user_tables
    3 where table_name =’T’;

    NUM_ROWS BLOCKS LAST_ANALYZ AVG_ROW_LEN
    ———- ———- ———– ———–

    1 row selected.

    demo@ORA12C> select column_name,num_distinct,num_nulls,last_analyzed
    2 from user_tab_col_statistics
    3 where table_name =’T’
    4 order by column_name;

    no rows selected

    demo@ORA12C> select index_name, leaf_blocks,distinct_keys,
    2 clustering_factor,num_rows,last_analyzed
    3 from user_ind_statistics
    4 where table_name =’T’;

    INDEX_NAME LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR NUM_ROWS LAST_ANALYZ
    ———- ———– ————- —————– ———- ———–
    T_PK

    1 row selected.

    demo@ORA12C> exec dbms_stats.gather_table_stats(user,’T’);

    PL/SQL procedure successfully completed.

    demo@ORA12C> select num_rows,blocks,last_analyzed,avg_row_len
    2 from user_tables
    3 where table_name =’T’;

    NUM_ROWS BLOCKS LAST_ANALYZ AVG_ROW_LEN
    ———- ———- ———– ———–
    77423 09-JAN-2017 46

    1 row selected.

    demo@ORA12C>
    demo@ORA12C> select column_name,num_distinct,num_nulls,last_analyzed
    2 from user_tab_col_statistics
    3 where table_name =’T’
    4 order by column_name;

    COLUMN_NAM NUM_DISTINCT NUM_NULLS LAST_ANALYZ
    ———- ———— ———- ———–
    OBJECT_ID 77423 0 09-JAN-2017
    OBJECT_NAM 42128 0 09-JAN-2017
    E

    OBJECT_TYP 24 0 09-JAN-2017
    E

    OWNER 19 0 09-JAN-2017

    4 rows selected.

    demo@ORA12C> select index_name, leaf_blocks,distinct_keys,
    2 clustering_factor,num_rows,last_analyzed
    3 from user_ind_statistics
    4 where table_name =’T’;

    INDEX_NAME LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR NUM_ROWS LAST_ANALYZ
    ———- ———– ————- —————– ———- ———–
    T_PK 564 77423 0 77423 09-JAN-2017

    1 row selected.

    demo@ORA12C>

  3. Hi Maria, does it make sense to disable online statistics gathering when you are importing via data pump and the next step is to import the schema stats from source database with DBMS_STATS.IMPORT_SCHEMA_STATS ?

    I am trying to reduce migration time as much as possible…

    Best regards,
    Xabi.

Leave a Reply to Maria Colgan Cancel reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: