How to identify which indexes can be safely dropped in Oracle

The longer an application has been successfully run, the more likely you are to have indexes that are no longer used or beneficial. Removing these indexes not only saves space but can also improve the performance of any DML operations.

But knowing which indexes can be safely dropped can be tricky.

In Oracle Database 12c, things got a little easier with the introduction of a new view called DBA_INDEX_USAGE.

The DBA_INDEX_USAGE view displays cumulative index usage statistics, but unlike previous releases, it’s not just a binary value (YES or NO). This new view shows how often an index was used, who accessed it, and how effective it was via histogram based on the number of rows returned. The index access information is captured in memory and periodically flushed to disk every 15 minutes. You can query the last_flush_Time in v$INDEX_USAGE_INFO to determine when it was updated.

You should also be aware that index usage is sampled by default rather than tracked by all index usage. The sampling approach will likely notice all index usage in a standard running system, where indexes are repeatedly accessed, but it can easily miss indexes used for one-off tests. You can change the tracking approach so that we track every execution, but there is a significant performance overhead for this.

As always, it is easier to understand the benefits of a dictionary view by looking at an example. In my example below, I join the DBA_INDEX_USAGE view to USER_INDEXES via an outer join to ensure I capture information on indexes that are never used because these indexes won’t appear in DBA_INDEX_USAGE at all.

SELECT i.index_name, u.total_access_count tot_access, u.total_exec_count exec_cnt,
       u.bucket_0_access_count B0, u.bucket_1_access_count B1, u.bucket_2_10_access_count B2_10,
       u.bucket_11_100_access_count B11_100, u.bucket_101_1000_access_count B101_1K,
       u.bucket_1000_plus_access_count B1K, u.last_used
FROM    DBA_INDEX_USAGE u
RIGHT JOIN DBA_INDEXES i
ON     i.index_name = u.name 
WHERE  i.owner='MARIA'
ORDER BY u.total_access_count;

The output of this query should look something like this.

INDEX_NAME     	TOT_ACCESS    EXEC_CNT   B0      B1      B2_10  B11_100   B101_1K     B1K     LAST_USED
---------------- ---------- ---------- ------- ------- ------- -------- ---------- ---------- -----------
PROD_CUST_SALES       1  	1       0       0       0       0        	0          1 	06-APR-23
INDX_LOC       	      2  	2       0       1       1       0		0          0 	12-APR-23
INDX_DEPTNO    	     19        19      18       1       0       0   	        0          0 	26-APR-23
PROD_SUP_INDX        27        27      25       0       0       0        	2          0 	26-APR-23
EMPNO_PK_IND         82        82      48       32      0       2        	0          0 	26-APR-23
CHAN_SOLD
PROD_SUB_IDX

As you can see in the output above, for each index, we get the total number of accesses (TOT_ACCESS), the execution count (EXEC_CNT), the last date the index was used (LAST_USED), and a histogram of how many rows were returned across several buckets.

Let’s look at the  EMPNO_PK INDEX  index (the third row from the bottom). This index returned 0 rows 48 times out of  82 total accesses and a single row 32 times. So, it looks like this index is commonly used and valuable.

However, if we look at PROD_CUST_SALES (the first row), it was accessed only once and returned over 1,000 rows. Is that a helpful index?

Maybe/maybe not. It may be beneficial if this index access replaces a full table scan of an extremely wide table. If, on the other hand, it’s only marginally cheaper than a full table scan, it may not be worth the overhead of maintaining the index.

But perhaps the most exciting part of the query output above is that we have two indexes (CHAN_SOLD, PROD_SUB_IDX) that have never been accessed. These are good candidates to be dropped.

But dropping an index can be risky. What can you do to ensure you don’t get fired for dropping the wrong index?

Rather than dropping the index immediately, marking the index invisible is far easier. The optimizer will be unaware of the index in the future and won’t select it as part of any execution plan. However, the index will continue to be maintained.

ALTER INDEX prod_sub_idx INVISIBLE;

If no one complains about their query performance digressing after some time, you can safely drop the index. If, on the other hand, someone does complain, you can alter the index visible again in a matter of seconds.

ALTER INDEX prod_sub_idx VISIBLE;

You can also create a new index as invisible. The optimizer is unaware of the new index until you can verify it improves performance by setting the parameter OPTIMIZER_USE_INVISBLE_INDEXES within a single session. This will allow the optimizer to consider the new index only for SQL statements issued within your session. All other sessions will continue to use the existing execution plans.

-- New indexes can be marked invisible until you have an opportunity to prove they improve performance
CREATE INDEX my_idx ON t(x, object_id) INVISIBLE; 
 
-- Test newly created invisible indexes by setting OPTIMIZER_USE_INVISBLE_INDEXES to TRUE
ALTER SESSION SET optimizer_use_invisible_indexes  = TRUE;

Update on March 15th, 2024
There have been several comments on this post around what each of the columns in the DBA_INDEX_USAGE view actual record and Jonathan Lewis has written a great explanation of each column in the view in the first part of his blog series on Index Usage.

Better diagnostics for SQL regressions in 19c and beyond #JoelKallmanDay

When diagnosing and correcting a performance regression for a SQL statement, it can often be challenging to find out what is happening during the execution and why your corrective measures are not working.

In today’s blog, I want to share several enhancements introduced in recent Oracle Database releases to help you better understand how to improve a SQL statement and understand why your corrective measures aren’t working as you hoped.

Enhanced Execution Plan

One of the most valuable tools at our disposal is the execution plan. Over the last several years, the Oracle Optimizer team has made a considerable effort to share even more insights and guidance on what might be impacting the performance of a SQL statement via the execution plan.

The Note section under the plan

The note section under the execution plan contains valuable information on what has happened during parse and execution. Over the last several releases, you will find a lot more information appearing in the note section, including details on the following features:

  • Dynamic Sampling – indicates a non-default value for the parameter OPTIMIZER_DYNAMIC_SAMPLING or that one or more objects referenced in the query are missing statistics.
  • Plan Stability Features – indicates that a store outline, SQL profile, SQL patch SQL directives, or SQL plan baseline influenced the plan chosen.
  • Adaptive plan – indicates that the plan shown through the Explain Plan command may be different from the plan used during execution as the Optimizer will use run-time statistics to make the final decision on what join types and aggregation methods to use. It is best to view the actual plan used via v$SQL_PLAN after the initial execution.
  • Statistics Feedback – indicates the plan was re-evaluated based on previous execution statistics, and a new plan was found.

Hint Usage Reporting in Oracle Database 19c

Starting in 19c, you will see a new hint usage report under the execution plan. The new report gives you a better understanding of what happened to each hint in a SQL statement.

Remember, the database doesn’t issue an error if a hint is invalid or not applicable for a SQL statement. The database will treat the hint as a comment, effectively ignoring it. Previously, the only way to find this information was via a 10053 trace file, and even then, it wasn’t very detailed.

The new hint report shows which hints were used and which were ignored and explains why hints weren’t used. The most common reasons for a hint not to be used are:

  • Syntax errors – the hint contains a typo or an invalid argument.
  • Unresolved hints – the hint contains an invalid argument or is not applicable to the SQL statement. For example, you request an index be used, but no index exists on the table.
  • Unused hints – the hint can’t be used in this specific scenario. For example, you requested a Hash Join for a non-equality join condition.
  • Conflicting hints – multiple hints provide conflicting directives.

In the example below, the hint NO_QURY_TRANSFORMATION was reported to contain a syntax error. The word query is misspelled in the hint, so the hint can’t be used.

---------------------------------------------------------------------------   
| Id  | Operation                      | Name     | Rows  | Bytes | Cost (%CPU)|   
--------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT               |          |       |       |    47 (100)|    
|   1 |  HASH GROUP BY                 |          |   269 | 37929 |    47   (7)|    
|   2 |   HASH JOIN                    |          |   269 | 37929 |    46   (5)|    
|   3 |     TABLE ACCESS STORAGE FULL  | SALES    | 10000 | 90000 |     5   (0)|   
|*  4 |     TABLE ACCESS STORAGE FULL  | PRODUCTS | 43108 |  4841K|    40   (3)|    
--------------------------------------------------------------------------------    
Predicate Information (identified by operation id):                                            
---------------------------------------------------                                            
   4 - access("ITEM_1"="P"."PROD_ID")                                                          
   
Hint Report (identified by operation id / Query Block Name / Object Alias):                    
Total hints for statement: 1 (E - Syntax error (1))
--------------------------------------------------------------------------
 0 -  SEL$1                                                                                  
         E -  NO_QURY_TRANSFORMATION 
Note                                                                                           
-----
- dynamic statistics used: dynamic sampling (level=2)                                       
                                                              

In this second example, I provided two hints on how to access the employees table. One hint requested that the primary key index be used, and the other requested that the access leverage parallel execution.

SELECT /*+ index(e empno_pk_ind) parallel(e 8) */ e.empno, ename 
FROM    employees eWHERE   e.empno < 7700;

----------------------------------------------------------------------------
|Id  | Operation			     | Name	    | Rows | Bytes |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT		     |	            |	   |	   |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED | EMPLOYEES    |	 8 |	80 |
|*  2 |   INDEX RANGE SCAN		     | EMPNO_PK_IND |	 8 |	   |
----------------------------------------------------------------------------
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------
	 U -  parallel(e 8)

The parallel hint is not used, as I’ve supplied an invalid combination of hints. An index range scan can’t be parallelized unless the index is partitioned, which is not true in this example. Therefore, the Optimizer can not honor both hints.

Note Nigel Bayliss, the Optimizer product manager, has also blogged about Optimizer hint reporting and has shared details on some limitations regarding execution hints such as GATHER_OPTIMIZER_STATISTICS, APPEND, etc.

SQL Analysis Report in Oracle Database 23c

In Oracle Database 23c, the execution plan got another new section: a SQL Analysis Report. This handy addition helps you diagnose common problems that can cause suboptimal execution plans. For example, the new report will point out situations where you are:

  • Missing join conditions
  • Have a WHERE clause predicate that prevents an index from being used
  • Have a datatype mismatch in a WHERE clause predicate
  • Using a UNION instead of a UNION ALL
EXPLAIN PLAN FOR
SELECT * FROM addresses
 WHERE UPPER(state) = 'CA';
 
SELECT * FROM dbms_xplan.display();
 
Plan hash value: 3184888728
 
--------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)|
--------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |   239 | 13384 |    61   (0)| 
|*  1 |  TABLE ACCESS FULL| ADDRESSES |   239 | 13384 |    61   (0)| 
--------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(UPPER("state")='CA')
 
SQL Analysis Report (identified by operation id/Query Block Name/Object Alias):
-----------------------------------------------------------------
 
   1 -  SEL$1 / "ADDRESSES"@"SEL$1" 
-  The following columns have predicates which preclude their use as keys in index range scan. Consider rewriting the predicates. "STATE"

Again, Nigel Bayliss has blogged about this in more detail on the official Optimizer blog.

Note that SQL Monitor active reports also contain SQL Analysis reports. You can find them under the new  SQL Analysis tab.

New Optimizer Dictionary Views

Along with enhancements to the execution plan in 23c, we added new data dictionary views to help identify what happens when we parse and execute a SQL statement.

Apps and users frequently change the value for one or more parameters that impact the Optimizer at a session level. As a DBA or performance engineer, you are often unaware of these logon triggers or ALTER SESSION commands. The first hint that the environment may have changed is when you see a plan change.

To check if a change in the optimizer environment caused the plan change, you can check the column OPTIMIZER_ENV_HASH_VALUE in V$SQL, V$SQLAREA, and DBA_HIST_SQLSTAT. However, this column doesn’t tell you what has changed.

Starting in 23c, you can query a new dictionary view, DBA_HIST_OPTIMIZER_ENV_DETAILS, to find out exactly what has changed in the optimizer environment when we parsed and executed a SQL statement.

Using the optimizer_env_hash_value for the original and new plan, you can query DBA_HIST_OPTIMIZER_ENV_DETAILS and get a list of the parameters settings for each scenario, which you can then compare to find the difference.

SELECT DISTINCT optimizer_env_hash_value
FROM  dba_hist_sqlstat
WHERE sql_id = 'bsvavk15n7cra'
ORDER BY 1;

OPTIMIZER_ENV_HASH_VALUE                                                        
------------------------                                                        
              1309615723                                                        
              2369923737 

SELECT name, value
FROM dba_hist_optimizer_env_details
WHERE optimizer_env_hash_value = 1309615723
ORDER BY 1;

NAME                                               VALUE                        
-------------------------------------------------- -------------------------    
BlockChain_ledger_infrastructure                   0                            
CLI_internal_cursor                                0                            
PMO_altidx_rebuild                                 0                            
_adaptive_window_consolidator_enabled              true                         
_add_stale_mv_to_dependency_list                   true  
   .
   .

You can get more information on how to leverage the DBA_HIST_OPTIMIZER_ENV_DETAILS view in MOS NOTE:2953121.1Examining the Optimizer Environment within Which a SQL Statement was Parsed in AWR.

How to use DBMS_STATS DIFF_TABLE_STATS functions

In 11g, Oracle introduced the DBMS_STAT.DIFF_TABLE_STATS functions to help you compare two sets of statistics for a table along with all its dependent objects (indexes, columns, partitions).

There are three versions of this function depending on where the statistics being compared are located:

  • DBMS_STAT.DIFF_TABLE_STATS_IN_HISTORY (compares statistics for a table from two timestamps in the past)
  • DBMS_STAT.DIFF_TABLE_STATS_IN_PENDING (compares pending statistics and statistics as of a timestamp or statistics from the data dictionary)
  • DBMS_STAT.DIFF_TABLE_STATS_IN_STATTAB (compares statistics from a user statistics table and the data dictionary, from two different user statistics tables, or a single user statistics table using two different STATSIDs)

The functions return a report that has three sections:

  1. Basic table statistics
    The report compares the basic table statistics (number of rows, blocks, etc.).
  2. Column statistics
    The second section of the report examines column statistics, including histograms.
  3. Index Statistics
    The final section of the report covers differences in index statistics.

Statistics will only be displayed in the report if the difference in the statistics exceeds a certain threshold (%). The threshold can be specified as an argument to the functions (PCTTHRESHOLD); the default value is 10%. The statistics corresponding to the first source, typically the current table stats in the data dictionary, will be used to compute the differential percentage.

The functions also return the MAXDIFFPCT (a number) along with the report. This is the maximum percentage difference between the statistics. These differences can come from the table, column, or index statistics.

Let’s look at an example.
Continue reading “How to use DBMS_STATS DIFF_TABLE_STATS functions”

How to use Oracle Database In-Memory for free!

Oracle recently announced a new free base level for the Oracle Database In-Memory Option, their real-time analytics capability. With the base level, you can allocate up to 16GB of memory to the In-Memory column store, per instance without having to have an addition In-Memory license. Larger column stores will still need the additional license. Continue reading “How to use Oracle Database In-Memory for free!”

How to implement Data-Driven Apps – Using many Single Purpose Database or with a single Converged Database?

There is an on-going debate in our community about the best approach for developing cloud-native or data-driven apps. On one side, you have folks who say use a single-purpose “best-of-breed” database for each data type or workload you have. While the other half say, you should use a single converged database. So, which approach is right for you and your projects?

Let’s examine some of the pros and cons of each approach.

Single-purpose Databases

Single-purpose databases or purpose-built databases as they are often as known, are engineered to help solve a single or small number of problems. Given their narrow focus, they can ignore the tradeoffs usually required when trying to accommodate multiple data types or workloads. It also allows them to use a convenient data model that fits the purpose and to adopt APIs that seem natural for that data model. They offer less functionality than converged databases, and therefore, fewer APIs, making it easier to start developing against them. Their simplicity means they do a few things very well, but other things not at all. For example, a lot of single-purpose databases scale well, because they offer no strong consistency guarantees.

At first glance, single-purpose databases appear to be a good option. Developers are happy because they get exactly what they need to begin a project. However, when you look at the bigger picture, single-purpose databases can cause a lot of pain and end up costing more in the long run.

Continue reading “How to implement Data-Driven Apps – Using many Single Purpose Database or with a single Converged Database?”

What is a Converged Database?

At the recent OOW European conference there was a lot talk about Converged Databases and how they can greatly simplify data-driven app development.

But if you missed the conference, you might find yourself wondering what exactly is a Converged Database and what is the difference between a Converged Database and an Autonomous Database?

So, I thought it would be a good idea to write a short blog post explaining what a Converged Database is and how it relates to the Oracle Autonomous Database.

What is a Converged Database?

A Converged Database is a database that has native support for all modern data types (JSON, Spatial, Graph, etc. as well as relational), multiple workloads (IoT, Blockchain, Machine Learning, etc.) and the latest development paradigms (Microservice, Events, REST, SaaS, CI/CD, etc.) built into one product.

By having support for each of these datatype, workloads, and paradigms as features within a converged database, you can support mixed workloads and data types in a much simpler way. You don’t need to manage and maintain multiple systems or worry about having to provide unified security across them.

You also get synergy across these capabilities. For example, by having support for Machine Learning algorithms and Spatial data in the same database, you can easily do predictive analytics on Spatial data.  The Oracle Database is a great example of a Converged Database, as it provides support for Machine Learning, Blockchain, Graph, Spatial, JSON, REST, Events, Editions, and IoT Streaming as part of the core database at no additional cost.

A good analogy for a Converged Database is a smartphone. In the past, if you wanted to take a picture or video you would need a camera. If you wanted to navigate somewhere you would need a map or a navigation system. If you wanted to listen to music, you needed an iPod and if you wanted to make phone calls, you would also need a phone.

But with a smartphone, all of these products have been converged into one. Each of the original products is now a feature of the smartphone. Having all of these features converged into a single product inherently makes your life easier, as you can stream music over the phone’s data plan or upload pictures or videos directly to social media sites.
Continue reading “What is a Converged Database?”

Can I use an Autonomous Database to develop new applications?

Yes, Oracle Autonomous Database (ADB) is the ideal platform for new application development.

With this family of cloud services, developers no longer have to wait on others to provision hardware, install software, and create a database for them. With ADB, developers can easily and instantly deploy an Oracle database without worrying about having to manual tune it or capacity planning. This allows developers to start developing in minutes and concentrate on solving business problems without all of the usual distractions.

ADB has the most advanced SQL and PL/SQL support accelerating developer productivity by minimizing the amount of application code required to implement complex business logic. It also has a complete set of integrated Machine Learning algorithms, simplifying the development of applications that perform real-time predictions such as personalized shopping recommendations, customer churn rates, and fraud detection.

What Development Tools should I use with ATP?

Continue reading “Can I use an Autonomous Database to develop new applications?”

Oracle Database 19c is now available!

Today, April 25th, Oracle Database 19c became available to downloaded from Oracle.com.

Oracle Database 19c is the final member of the 12.2 family a.k.a 12.2.0.3 and is, therefore, the ‘long term support’ release. This means it will come with 4 years of premium support and 3 years of extended support. Making this release the version of the database that most folks are going to upgrade to next.

So, what can you expect?

There are hundreds of useful enhancements in Oracle Database 19c as well as several new features. Dom Giles‘s latest post on the Oracle Database Insider blog has all the details on the new release, while I’ve listed just a couple of my personal favourites below.

Continue reading “Oracle Database 19c is now available!”