How to identify which indexes can be safely dropped in Oracle

The longer an application has been successfully run, the more likely you are to have indexes that are no longer used or beneficial. Removing these indexes not only saves space but can also improve the performance of any DML operations.

But knowing which indexes can be safely dropped can be tricky.

In Oracle Database 12c, things got a little easier with the introduction of a new view called DBA_INDEX_USAGE.

The DBA_INDEX_USAGE view displays cumulative index usage statistics, but unlike previous releases, it’s not just a binary value (YES or NO). This new view shows how often an index was used, who accessed it, and how effective it was via histogram based on the number of rows returned. The index access information is captured in memory and periodically flushed to disk every 15 minutes. You can query the last_flush_Time in v$INDEX_USAGE_INFO to determine when it was updated.

You should also be aware that the Index usage is sampled by default rather than tracking all index usage. The sampling approach will likely notice all index usage in a standard running system, where indexes are repeatedly accessed, but it can easily miss indexes used for one-off tests. You can change the tracking approach so that we track every execution, but there is a significant performance overhead for this.

As always, it is easier to understand the benefits of a dictionary view by looking at an example. In my example below, I join the DBA_INDEX_USAGE view to USER_INDEXES via an outer join to ensure I capture information on indexes that are never used because these indexes won’t appear in DBA_INDEX_USAGE at all.

SELECT i.index_name, u.total_access_count tot_access, u.total_exec_count exec_cnt,
       u.bucket_0_access_count B0, u.bucket_1_access_count B1, u.bucket_2_10_access_count B2_10,
       u.bucket_11_100_access_count B11_100, u.bucket_101_1000_access_count B101_1K,
       u.bucket_1000_plus_access_count B1K, u.last_used
FROM    DBA_INDEX_USAGE u
RIGHT JOIN DBA_INDEXES i
ON     i.index_name = u.name 
WHERE  i.owner='MARIA'
ORDER BY u.total_access_count;

The output of this query should look something like this.

INDEX_NAME     	TOT_ACCESS    EXEC_CNT   B0      B1      B2_10  B11_100   B101_1K     B1K     LAST_USED
---------------- ---------- ---------- ------- ------- ------- -------- ---------- ---------- -----------
PROD_CUST_SALES       1  	1       0       0       0       0        	0          1 	06-APR-23
INDX_LOC       	      2  	2       0       1       1       0		0          0 	12-APR-23
INDX_DEPTNO    	     19        19      18       1       0       0   	        0          0 	26-APR-23
PROD_SUP_INDX        27        27      25       0       0       0        	2          0 	26-APR-23
EMPNO_PK_IND         82        82      48       32      0       2        	0          0 	26-APR-23
CHAN_SOLD
PROD_SUB_IDX

As you can see in the output above, for each index, we get the total number of accesses (TOT_ACCESS), the execution count (EXEC_CNT), the last date the index was used (LAST_USED), and a histogram of how many rows were returned across several buckets.

Let’s look at the  EMPNO_PK INDEX  index (the third row from the bottom). This index returned 0 rows 48 times out of  82 total accesses and a single row 32 times. So, it looks like this index is commonly used and valuable.

However, if we look at PROD_CUST_SALES (the first row), it was accessed only once and returned over 1,000 rows. Is that a helpful index?

Maybe/maybe not. It may be beneficial if this index access replaces a full table scan of an extremely wide table. If, on the other hand, it’s only marginally cheaper than a full table scan, it may not be worth the overhead of maintaining the index.

But perhaps the most exciting part of the query output above is that we have two indexes (CHAN_SOLD, PROD_SUB_IDX) that have never been accessed. These are good candidates to be dropped.

But dropping an index can be risky. What can you do to ensure you don’t get fired for dropping the wrong index?

Rather than dropping the index immediately, marking the index invisible is far easier. The optimizer will be unaware of the index in the future and won’t select it as part of any execution plan. However, the index will continue to be maintained.

ALTER INDEX prod_sub_idx INVISIBLE;

If no one complains about their query performance digressing after some time, you can safely drop the index. If, on the other hand, someone does complain, you can alter the index visible again in a matter of seconds.

ALTER INDEX prod_sub_idx VISIBLE;

You can also create a new index as invisible. The optimizer is unaware of the new index until you can verify it improves performance by setting the parameter OPTIMIZER_USE_INVISBLE_INDEXES within a single session. This will allow the optimizer to consider the new index only for SQL statements issued within your session. All other sessions will continue to use the existing execution plans.

-- New indexes can be marked invisible until you have an opportunity to prove they improve performance
CREATE INDEX my_idx ON t(x, object_id) INVISIBLE; 
 
-- Test newly created invisible indexes by setting OPTIMIZER_USE_INVISBLE_INDEXES to TRUE
ALTER SESSION SET optimizer_use_invisible_indexes  = TRUE;

Better diagnostics for SQL regressions in 19c and beyond #JoelKallmanDay

When diagnosing and correcting a performance regression for a SQL statement, it can often be challenging to find out what is happening during the execution and why your corrective measures are not working.

In today’s blog, I want to share several enhancements introduced in recent Oracle Database releases to help you better understand how to improve a SQL statement and understand why your corrective measures aren’t working as you hoped.

Enhanced Execution Plan

One of the most valuable tools at our disposal is the execution plan. Over the last several years, the Oracle Optimizer team has made a considerable effort to share even more insights and guidance on what might be impacting the performance of a SQL statement via the execution plan.

The Note section under the plan

The note section under the execution plan contains valuable information on what has happened during parse and execution. Over the last several releases, you will find a lot more information appearing in the note section, including details on the following features:

  • Dynamic Sampling – indicates a non-default value for the parameter OPTIMIZER_DYNAMIC_SAMPLING or that one or more objects referenced in the query are missing statistics.
  • Plan Stability Features – indicates that a store outline, SQL profile, SQL patch SQL directives, or SQL plan baseline influenced the plan chosen.
  • Adaptive plan – indicates that the plan shown through the Explain Plan command may be different from the plan used during execution as the Optimizer will use run-time statistics to make the final decision on what join types and aggregation methods to use. It is best to view the actual plan used via v$SQL_PLAN after the initial execution.
  • Statistics Feedback – indicates the plan was re-evaluated based on previous execution statistics, and a new plan was found.

Hint Usage Reporting in Oracle Database 19c

Starting in 19c, you will see a new hint usage report under the execution plan. The new report gives you a better understanding of what happened to each hint in a SQL statement.

Remember, the database doesn’t issue an error if a hint is invalid or not applicable for a SQL statement. The database will treat the hint as a comment, effectively ignoring it. Previously, the only way to find this information was via a 10053 trace file, and even then, it wasn’t very detailed.

The new hint report shows which hints were used and which were ignored and explains why hints weren’t used. The most common reasons for a hint not to be used are:

  • Syntax errors – the hint contains a typo or an invalid argument.
  • Unresolved hints – the hint contains an invalid argument or is not applicable to the SQL statement. For example, you request an index be used, but no index exists on the table.
  • Unused hints – the hint can’t be used in this specific scenario. For example, you requested a Hash Join for a non-equality join condition.
  • Conflicting hints – multiple hints provide conflicting directives.

In the example below, the hint NO_QURY_TRANSFORMATION was reported to contain a syntax error. The word query is misspelled in the hint, so the hint can’t be used.

---------------------------------------------------------------------------   
| Id  | Operation                      | Name     | Rows  | Bytes | Cost (%CPU)|   
--------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT               |          |       |       |    47 (100)|    
|   1 |  HASH GROUP BY                 |          |   269 | 37929 |    47   (7)|    
|   2 |   HASH JOIN                    |          |   269 | 37929 |    46   (5)|    
|   3 |     TABLE ACCESS STORAGE FULL  | SALES    | 10000 | 90000 |     5   (0)|   
|*  4 |     TABLE ACCESS STORAGE FULL  | PRODUCTS | 43108 |  4841K|    40   (3)|    
--------------------------------------------------------------------------------    
Predicate Information (identified by operation id):                                            
---------------------------------------------------                                            
   4 - access("ITEM_1"="P"."PROD_ID")                                                          
   
Hint Report (identified by operation id / Query Block Name / Object Alias):                    
Total hints for statement: 1 (E - Syntax error (1))
--------------------------------------------------------------------------
 0 -  SEL$1                                                                                  
         E -  NO_QURY_TRANSFORMATION 
Note                                                                                           
-----
- dynamic statistics used: dynamic sampling (level=2)                                       
                                                              

In this second example, I provided two hints on how to access the employees table. One hint requested that the primary key index be used, and the other requested that the access leverage parallel execution.

SELECT /*+ index(e empno_pk_ind) parallel(e 8) */ e.empno, ename 
FROM    employees eWHERE   e.empno < 7700;

----------------------------------------------------------------------------
|Id  | Operation			     | Name	    | Rows | Bytes |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT		     |	            |	   |	   |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED | EMPLOYEES    |	 8 |	80 |
|*  2 |   INDEX RANGE SCAN		     | EMPNO_PK_IND |	 8 |	   |
----------------------------------------------------------------------------
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------
	 U -  parallel(e 8)

The parallel hint is not used, as I’ve supplied an invalid combination of hints. An index range scan can’t be parallelized unless the index is partitioned, which is not true in this example. Therefore, the Optimizer can not honor both hints.

Note Nigel Bayliss, the Optimizer product manager, has also blogged about Optimizer hint reporting and has shared details on some limitations regarding execution hints such as GATHER_OPTIMIZER_STATISTICS, APPEND, etc.

SQL Analysis Report in Oracle Database 23c

In Oracle Database 23c, the execution plan got another new section: a SQL Analysis Report. This handy addition helps you diagnose common problems that can cause suboptimal execution plans. For example, the new report will point out situations where you are:

  • Missing join conditions
  • Have a WHERE clause predicate that prevents an index from being used
  • Have a datatype mismatch in a WHERE clause predicate
  • Using a UNION instead of a UNION ALL
EXPLAIN PLAN FOR
SELECT * FROM addresses
 WHERE UPPER(state) = 'CA';
 
SELECT * FROM dbms_xplan.display();
 
Plan hash value: 3184888728
 
--------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)|
--------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |   239 | 13384 |    61   (0)| 
|*  1 |  TABLE ACCESS FULL| ADDRESSES |   239 | 13384 |    61   (0)| 
--------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(UPPER("state")='CA')
 
SQL Analysis Report (identified by operation id/Query Block Name/Object Alias):
-----------------------------------------------------------------
 
   1 -  SEL$1 / "ADDRESSES"@"SEL$1" 
-  The following columns have predicates which preclude their use as keys in index range scan. Consider rewriting the predicates. "STATE"

Again, Nigel Bayliss has blogged about this in more detail on the official Optimizer blog.

Note that SQL Monitor active reports also contain SQL Analysis reports. You can find them under the new  SQL Analysis tab.

New Optimizer Dictionary Views

Along with enhancements to the execution plan in 23c, we added new data dictionary views to help identify what happens when we parse and execute a SQL statement.

Apps and users frequently change the value for one or more parameters that impact the Optimizer at a session level. As a DBA or performance engineer, you are often unaware of these logon triggers or ALTER SESSION commands. The first hint that the environment may have changed is when you see a plan change.

To check if a change in the optimizer environment caused the plan change, you can check the column OPTIMIZER_ENV_HASH_VALUE in V$SQL, V$SQLAREA, and DBA_HIST_SQLSTAT. However, this column doesn’t tell you what has changed.

Starting in 23c, you can query a new dictionary view, DBA_HIST_OPTIMIZER_ENV_DETAILS, to find out exactly what has changed in the optimizer environment when we parsed and executed a SQL statement.

Using the optimizer_env_hash_value for the original and new plan, you can query DBA_HIST_OPTIMIZER_ENV_DETAILS and get a list of the parameters settings for each scenario, which you can then compare to find the difference.

SELECT DISTINCT optimizer_env_hash_value
FROM  dba_hist_sqlstat
WHERE sql_id = 'bsvavk15n7cra'
ORDER BY 1;

OPTIMIZER_ENV_HASH_VALUE                                                        
------------------------                                                        
              1309615723                                                        
              2369923737 

SELECT name, value
FROM dba_hist_optimizer_env_details
WHERE optimizer_env_hash_value = 1309615723
ORDER BY 1;

NAME                                               VALUE                        
-------------------------------------------------- -------------------------    
BlockChain_ledger_infrastructure                   0                            
CLI_internal_cursor                                0                            
PMO_altidx_rebuild                                 0                            
_adaptive_window_consolidator_enabled              true                         
_add_stale_mv_to_dependency_list                   true  
   .
   .

You can get more information on how to leverage the DBA_HIST_OPTIMIZER_ENV_DETAILS view in MOS NOTE:2953121.1Examining the Optimizer Environment within Which a SQL Statement was Parsed in AWR.

How to find differences between Oracle production and test envs

Determining why things behave differently in production versus a test environment can be time-consuming and painful.

That is until you realize there is a convenient package that allows you to compare the two environments quickly.

Introduced in Oracle Database 11g, the DBMS_COMPARISON package allows you to compare objects, schemas, or data between two databases or schemas (within the same database) to find where they differ.

Let’s take a closer look at how this works.

Create the Comparison

First, you must create a comparison using the DBMS_COMPARISON.CREATE_COMPARISON procedure. You will need to supply a unique name for the Comparison, the schema name, and, optionally, the object name in the current database you wish to compare. You must also supply the location of the other schema you want to compare. The comparison schema can be in the same database (specify NULL for DBLINK_NAME). Still, in my case, below, I’m comparing our product system to our test system, which is reachable from production via a database link called ‘orcl2_test’.

BEGIN
     dbms_comparison.create_comparison(
      					  comparison_name => 'COMP_SALES',
      					  schema_name     => 'SH',
      					  object_name     => 'SALES',
      					  dblink_name     => 'orcl2_test'
      					  );
END;
/  
PL/SQL PROCEDURE successfully completed.

Note: You need a unique index on both tables to compare tables, as shown above.

Run the Comparison

Now we have created the Comparison; we can execute it using the DBMS_COMPARISON.COMPARE function. The COMPARE function returns a BOOLEAN, indicating whether the objects are consistent. It returns TRUE when no differences are found and FALSE when discrepancies are found.

However, the details of the differences are not returned by the function. Instead, the differences are stored in dictionary tables; user_comparison_columns, user_comparison_row_dif, and user_comparison_scan. If a discrepancy is found, you can find details on the differences by querying these tables using the scan ID, which is returned in the variable scan_info.

How you call the COMPARE function is a little more complex than a standard PL/SQL function call. I’ve wrapped the function call in an IF NOT statement in the example below. If the DBMS_COMPARISON.COMPARE function returns FALSE (IF NOT FALSE = TRUE), the function will print out the SCAN ID, so we can use it to query the dictionary table. If, on the other hand, the function returns TRUE (IF NOT TRUE = FALSE), it will return nothing.

DECLARE
        scan_info DBMS_COMPARISON.COMPARISON_TYPE;
BEGIN
         IF NOT DBMS_COMPARISON.COMPARE
       		     ( comparison_name => 'COMP_SALES'
       		     , scan_info       => scan_info
       		     , perform_row_dif => TRUE
       		     ) THEN
           DBMS_OUTPUT.PUT_LINE('Scan ID:'||scan_info.scan_id);
        END IF;
END;
 /
 
Scan ID:3

Find the details on the Comparison

As you can see, there must be a discrepancy between our production and test environments, as the COMPARE function returned a SCAN ID of 3. We need to query the dictionary tables using the comparison name and the scan id to determine the difference. Below is the query we need to use. You will notice I’ve included two case statements to indicate if the difference was found on the local database (in this case, production) or on the Remote database (in this case, our test environment).

SELECT c.column_name, r.index_value, 
       CASE WHEN r.local_rowid IS NULL  THEN 'No' ELSE 'Yes' END  LOC, 
       CASE WHEN r.remote_rowid IS NULL THEN 'No' ELSE 'Yes' END REM 
  2  FROM   user_comparison_columns c, user_comparison_row_dif r, user_comparison_scan s
  3  WHERE  c.comparison_name = 'COMP_SALES'
  4  AND    r.scan_id         = s.scan_id
  5  AND    r.status	      = 'DIF'
  6  AND    c.index_column    = 'Y'
  7  AND    c.comparison_name = r.comparison_name
  8  AND    s.scan_id         = 3  
  9  ORDER BY r.index_value;
 
COLUMN_NAME    INDEX_VALUE    LOC    REM    
______________ ______________ ______ ______ 
ID             5000           Yes    Yes

The result of our query shows a difference in the data that occurs in the row with ID 5000. However, it doesn’t show us which column has the problem. I must extract the row with ID=5000 from both systems and manually compare them.

SELECT * FROM sales WHERE id=5000;
 
ORDER_ID        CUST_ID    PRODUCT_ID    SUPPLIER_ID     DATE_ID     AMOUNT_SOLD     PRICE     TAX_CODE         ID                                                                                     
____________ ___________ _____________ ______________ ____________ ______________ ___________ __________ ___________  
   248173057    25162649        610090        1229054  18-JAN-23                10    140          .5          5000 
 
SELECT * FROM  sales@orcl2_test WHERE id=5000;
 
ORDER_ID        CUST_ID    PRODUCT_ID    SUPPLIER_ID     DATE_ID     AMOUNT_SOLD     PRICE     TAX_CODE         ID                                                                                       
____________ ___________ _____________ ______________ ____________ ______________ ___________ __________ ___________  
   248173057    25162649        610090        1229054  18-JAN-23                10    140        xx.xx        5000

As you can see from the output above, the difference is in the TAX_CODE column. The TAX_CODE has been masked in the test environment for this particular row.

You may be wondering why Oracle couldn’t show me the exact difference between the two tables. If I had a unique index on the TAX_CODE column, Oracle could have told me the value that differed. But since this is not a unique column, Oracle can only pinpoint the row for me using the unique index on the ID column.

This blog post is part of a series on useful PL/SQL packages. Other blogs in this series include:

How to determine which view is the right view to use in Oracle?

Database views are a handy tool to help obfuscate complex joins from developers and analysts. However, knowing which view to use can be tricky, especially when faced with multiple views with similar-sounding names. After all, you don’t want to join various tables only to discover that all the information you needed was in just a single table. But how do you know which view has the data you are after and will allow you to get it with the minimum number of joins?

The DBMS_UTILITY.EXPAND_SQL_TEXT procedure expands any references to a view within a query, turning it into a subquery in the original statement that displays the full query text behind that view. This trick lets you see where and how the needed data will be retrieved.

Let’s look at an elementary example of this procedure in action, using a SELECT * statement on a view called SALES_V.

SET serveroutput ON 
DECLARE 
    l_clob CLOBBEGIN 
    dbms_utility.Expand_sql_text(input_sql_text => 'SELECT * FROM sales_v', output_sql_text => l_clob); 
    dbms_output.Put_line(l_clob)END/

The result of this procedure call is the following output.

SELECT "A1"."order_id" "ORDER_ID", 
       "A1"."time_id"  "TIME_ID", 
       "A1"."cust_id"  "CUST_ID", 
       "A1"."prod_id"  "PROD_ID" 
FROM   (SELECT "A3"."order_id" "ORDER_ID", 
               "A3"."time_id"  "TIME_ID", 
               "A3"."cust_id"  "CUST_ID", 
               "A3"."prod_id"  "PROD_ID" 
        FROM   "SH"."sales" "A3", 
               "SH"."products" "A2" 
        WHERE  "A3"."prod_id" = "A2"."prod_id") "A1"

The subquery with the alias A1 above is the view definition for SALES_V.

It’s a simple two-table join between SALES (alias A3) and PRODUCTS (alias A2). Although the view only returns columns from the SALES table (A3), it does come with the overhead of a join. The execution plan for our simple SELECT * query below shows that.

PLAN_TABLE_OUTPUT                                                                          
__________________________________________________________________________________________ 
Plan hash VALUE: 2857462611                                                                
 
---------------------------------------------------------------------------------------    
| Id  | Operation                  | Name     | ROWS  | Bytes | Cost (%CPU)| TIME     |    
---------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT           |          |  9773 |   314K|     7   (0)| 00:00:01 |    
|*  1 |  HASH JOIN                 |          |  9773 |   314K|     7   (0)| 00:00:01 |    
|   2 |   TABLE ACCESS STORAGE FULL| PRODUCTS |  4999 | 29994 |     2   (0)| 00:00:01 |    
|   3 |   TABLE ACCESS STORAGE FULL| SALES    | 10000 |   263K|     5   (0)| 00:00:01 |    
---------------------------------------------------------------------------------------    
 
Predicate Information (IDENTIFIED BY operation id):                                        
---------------------------------------------------                                        
 
   1 - access("S"."PROD_ID"="P"."PROD_ID")

In this case, it would be best to find an alternative view that only accesses the SALES table or access the SALES table directly. Below is the plan for a direct select statement from the SALES table, and as you can see, the cost of this plan is lower.

EXPLAIN PLAN FOR
SELECT s.order_id, s.date_id, s.cust_id, s.product_id 
FROM sales s;
 
Explained.
 
SELECT * FROM TABLE(dbms_xplan.display());
 
PLAN_TABLE_OUTPUT                                                                      
______________________________________________________________________________________ 
Plan hash VALUE: 781590677                                                             
 
-----------------------------------------------------------------------------------    
| Id  | Operation                 | Name  | ROWS  | Bytes | Cost (%CPU)| TIME     |    
-----------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT          |       | 10000 |   263K|     5   (0)| 00:00:01 |    
|   1 |  TABLE ACCESS STORAGE FULL| SALES | 10000 |   263K|     5   (0)| 00:00:01 |    
-----------------------------------------------------------------------------------

Over the last couple of releases, Oracle has added several handy PL/SQL packages and procedures you might not know about. So, I put together a short blog series highlighting some of my favorites. This blog post is part of that series. Other blogs in this series include How to add a SLEEP COMMAND to your Oracle PL/SQL code and How diff optimizer statistics.

How do I add a SLEEP to a PL/SQL Stored Procedure?

Over the last couple of releases, Oracle has added several handy PL/SQL packages and procedures you might not know about. So, I put together a short blog series highlighting some of my favorites. First up, DBMS_SESSION.SLEEP().

Oracle has always enabled you to add a sleep command to your stored procedures to suspend a session for a specified number of seconds, as shown in the code below.

    DECLARE
       v_start TIMESTAMP;
       v_end   TIMESTAMP; 
    BEGIN
       v_start := SYSTIMESTAMP;
       -- Sleep for 10 seconds
       DBMS_LOCK.SLEEP(10);
       v_end   := SYSTIMESTAMP;
       DBMS_OUTPUT.PUT_LINE('This procedure started at ' ||v_start);
       DBMS_OUTPUT.PUT_LINE('This procedure ended   at ' ||v_end);
    END;
   /
 
This PROCEDURE started AT 10-SEP-22 12.39.40.587041 AM
This PROCEDURE ended   AT 10-SEP-22 12.39.50.637738 AM
 
PL/SQL PROCEDURE successfully completed.
Elapsed: 00:00:10.02

However, the sleep function was part of the DBMS_LOCK package, which is not granted to PUBLIC, by default, due to the other more powerful functions inside that package. That means you had to beg the DBA or the security team to give you access to this package just to put your session to sleep for a few minutes.

DBMS_SESSION.SLEEP()

Things got a lot easier starting in Oracle Database 18c, as the sleep function is now available in the DBMS_SESSION package, which is granted to PUBLIC by default. That means you can call the function without any additional privileges. Plus, the function code in DBMS_SESSION.SLEEP is identical to DBMS_LOCK.SLEEP, so you can do a simple find and replace in your code!

DECLARE
       v_start TIMESTAMP;
       v_end   TIMESTAMP; 
    BEGIN
       v_start := SYSTIMESTAMP;
       -- Sleep for 10 seconds
       DBMS_SESSION.SLEEP(10);
       v_end   := SYSTIMESTAMP;
       DBMS_OUTPUT.PUT_LINE('This procedure started at ' ||v_start);
       DBMS_OUTPUT.PUT_LINE('This procedure ended   at ' ||v_end);
   END;
   /
This PROCEDURE started AT 10-SEP-22 12.39.40.587041 AM
This PROCEDURE ended   AT 10-SEP-22 12.39.50.637738 AM
PL/SQL PROCEDURE successfully completed.
Elapsed: 00:00:10.02

Jonathan Lewis made a great point in the comments below about the granularity of the DBMS_SESSION.SLEEP is 1/100th of a second. If you want to introduce a sleep in milliseconds (or less), you can call Java from PL/SQL, as he demonstrated in his blog posts, little-sleeps.

How to create a parameterized view in Oracle

Database views have been used for decades to help simplify both ad-hoc queries and reporting for developers, analysts, and end-users. But the problem with defining views is they tend to be either too specific (not easily reused) or not specific enough (too generic to be performant).

Imagine we need a view to help developers quickly find the details about orders waiting to ship, which have a total value of $100 or more. How would you create such a view?

Would you create a particular view that returns only the columns needed for this scenario with restrictive where clause predicates that limit the orders to only those that have not shipped and have a value greater than $100?

 CREATE OR REPLACE VIEW orders_waiting 
AS 
SELECT i.order_id, i.product_id, i.price, i.description 
FROM   orders o, order_items i
WHERE  o.order_status &gt; 6   -- orders waiting to ship
AND    o.order_total &gt;= 100
AND    o.order_id = i.order_id;

Or do you create a more generic view that could be used for other reports and queries, even though it won’t be as performant as the dedicated view above?

 CREATE OR REPLACE VIEW orders_waiting 
AS 
SELECT o.order_status, o.order_total,
       o.cust_id, o.order_date, o.rep,       
       o.order_mode, o.promotion_id, 
       i.* 
FROM   orders o, order_items i
WHERE  o.order_id = i.order_id;

Regardless of your approach, you will compromise either reusability or performance.

Starting in Oracle Database 19c (19.7), you no longer have to compromise, as Oracle introduced parameterized views or SQL Table Macros.

SQL table macros are expressions found in a FROM clause that acts as a polymorphic or parameterized view. Let’s look at how this can help us with the view we need to create to find the details about orders waiting to ship, which have a total value of $100 or more.

Creating a SQL Macro is similar to creating a PL/SQL function with an additional SQL_MACRO clause. The SQL MACRO clause can take an argument SCALAR (available from 21c onwards) or TABLE (19c onwards), but if you leave it blank, it defaults to a TABLE macro.

CREATE OR REPLACE FUNCTION orders_waiting_to_ship
RETURN CLOB sql_macro AS
 stmt CLOB;
BEGIN
 ...
 RETURN stmt;
END orders_waiting_to_ship;
/

A SQL Macro always returns the view you define as text (VARCHAR, CHAR, or CLOB). The database will resolve that view definition and makes it part of the SQL statement that calls the SQL Macro. I’m capturing the view definition in a CLOB using the variable stmt above.

Let’s add the text for the view and the parameters we want to pass.

CREATE OR REPLACE FUNCTION orders_waiting_to_ship(order_value INTEGER)
RETURN CLOB sql_macro AS
 stmt CLOB;
BEGIN
 stmt := '
  SELECT i.*
  FROM   orders o, order_items i
  WHERE  o.order_status &gt; 6
  AND    o.order_total &gt;= order_value
  AND    o.order_id = i.order_id';
 
 RETURN stmt;
END orders_waiting_to_ship;
/
 
FUNCTION ORDERS_WAITING_TO_SHIP compiled

In the code example above, I’m using the more selectivity view definition, but instead of specifying the order_total to be $100 or higher, I’m going to parameterize the value of the order. That way, we can reuse this SQL Table Macro regardless of what order_value is needed. You will notice I have also added a parameter to the function called order_value, which the user will pass in when they call the function.

You will also notice that my function ORDERS_WAITING_TO_SHIP compiled without errors. This doesn’t mean my view definition or syntax is correct. You will only see an error at runtime. Remember, a SQL TABLE Macro produces your view definition as text that the optimizer will insert into the SQL statement that calls it.

It’s straightforward to call a SQL Table Macro. The Table Macro goes in the FROM clause of the SQL statement. In the example below, I’m calling our orders_waiting_to_Ship SQL Macro and passing it the value 100.

 SELECT *
FROM orders_waiting_to_ship(100) 
ORDER BY order_id;
 
   ORDER_ID    PRODUCT_ID    PRICE DESCRIPTION    
___________ _____________ ________ ______________ 
      10110            10       23 lipstick       
      10110             7       17 Lip gloss      
      10110             8       30 Mascarra
      10110             1       35 Foundation
      20209             9       15 Blusher        
      20209            10       23 lipstick  
      20209             1       35 Foundation
      20209             2       32 Coverup
      30307             7       17 Lip gloss      
      30307             8       30 Mascarra       
      30307             1       35 Foundation       
      30307             2       32 Coverup

We could take it a step further and generalize our SQL Table Macro a little more by parameterizing the order_status and the order_value; that way, our developers can use it to check the status of shipped orders and orders waiting to ship.

 CREATE OR REPLACE FUNCTION orders_chk(order_value INTEGER, stat INTEGER)
RETURN CLOB sql_macro AS
 stmt CLOB;
BEGIN
 stmt := '
 SELECT i.*
 FROM   orders o, order_items i
 WHERE  o.order_status = orders_chk.stat
 AND    o.order_total &gt;= orders_chk.order_value
 AND    o.order_id = i.order_id';
 
 RETURN stmt;
END orders_chk;
/
 
FUNCTION ORDERS_CHK compiled
 
SELECT *
FROM orders_chk(10,6)
ORDER BY order_id;
 
   ORDER_ID    PRODUCT_ID    PRICE DESCRIPTION     ......
___________ _____________ ________ ______________ __________ 
      10110            10       23 lipstick        ......
      10110             7       17 Lip gloss       ......
      10110             8       30 Mascarra        ......
      10110             1       35 Foundation      ......
      10111             9       15 Blusher         ......         
      10112             1       35 Foundation      ......
      10113             1       17 Lip gloss       ......

There are several restrictions on SQL TABLE Macros you should be aware of:

    • When used directly in a SQL statement, the SQL_MACRO annotation is disallowed with RESULT_CACHE, PARALLEL_ENABLE, and PIPELINE.
    • SQL statements with WITH clauses are not supported in SQL macros.
    • A SQL macro always runs with invoker rights.
    • SQL macros in views are always executed with the privileges of the view owner.
    • SQL macros can’t be used in virtual column expression, function-based indexes, editioning views, or materialized views.
    • SQL macros can’t be used in type methods.

How to use DBMS_STATS DIFF_TABLE_STATS functions

In 11g, Oracle introduced the DBMS_STAT.DIFF_TABLE_STATS functions to help you compare two sets of statistics for a table along with all its dependent objects (indexes, columns, partitions).

There are three versions of this function depending on where the statistics being compared are located:

  • DBMS_STAT.DIFF_TABLE_STATS_IN_HISTORY (compares statistics for a table from two timestamps in the past)
  • DBMS_STAT.DIFF_TABLE_STATS_IN_PENDING (compares pending statistics and statistics as of a timestamp or statistics from the data dictionary)
  • DBMS_STAT.DIFF_TABLE_STATS_IN_STATTAB (compares statistics from a user statistics table and the data dictionary, from two different user statistics tables, or a single user statistics table using two different STATSIDs)

The functions return a report that has three sections:

  1. Basic table statistics
    The report compares the basic table statistics (number of rows, blocks, etc.).
  2. Column statistics
    The second section of the report examines column statistics, including histograms.
  3. Index Statistics
    The final section of the report covers differences in index statistics.

Statistics will only be displayed in the report if the difference in the statistics exceeds a certain threshold (%). The threshold can be specified as an argument to the functions (PCTTHRESHOLD); the default value is 10%. The statistics corresponding to the first source, typically the current table stats in the data dictionary, will be used to compute the differential percentage.

The functions also return the MAXDIFFPCT (a number) along with the report. This is the maximum percentage difference between the statistics. These differences can come from the table, column, or index statistics.

Let’s look at an example.
Continue reading “How to use DBMS_STATS DIFF_TABLE_STATS functions”

How to do a Fuzzy Text Search on JSON #JoelKallmanDay

More and more apps I deal with store data as JSON documents in the Oracle Database. It is exceptionally convenient for the developers but doesn’t always make it easy to know exactly what data we have stored.

The good news is that Oracle offers multiple ways to help you understand precisely what data you have stored in those JSON documents. For example, you can use the built-in JSON Data Guide, which will trawl through all your documents and return a list of all the attributes you have stored.

But suppose you are interested in searching through your documents and only returning those that contain a particular word, value, or a variation thereof. In that case, you will want to take advantage of Oracle’s fuzzy text search or approximate string matching.

Imagine we have a table that stores movie reviews as JSON documents. I’m doing this demo in 19c, so I’m using a VARCHAR2 column, but from 21c onwards, you can use a JSON column.

CREATE TABLE movie_reviews( 
                     title varchar2(200),
                     cust_id NUMBER(26), 
                     cust_reviews varchar2(32000)
                     CONSTRAINT cr_is_json CHECK (cust_reviews IS JSON));

If you really want a JSON column in 19c, you can take advantage of this trick I learned from @Connor_mc_d.

 CREATE TABLE movie_rewiews( title varchar2(200), 
                             cust_id NUMBER(26), 
                             cust_reviews BLOB,
                             CHECK (cust_reviews IS json format oson));

Each review document contains details on the movie id, the star rating, and the review.

{“movie_id”: 5641,
“star_rating”: 1,
"Feedback":" Loved the tv show, but hated the movie. I am so disappointed."
}

We have been asked to find all the reviews that contain the word disappoint or variations of it. To facilitate a fuzzy text search, we need to create a text-based index on the feedback column inside the review documents.

CREATE SEARCH INDEX review_search_ind ON movie_reviews(cust_reviews) FOR JSON;

Once the index is created, we can run a fuzzy text search using the following query:

SELECT m.title, m.cust_reviews.feedback AS customer_review
FROM   movie_reviews m
WHERE  JSON_TEXTCONTAINS(m.cust_reviews, '$.feedback', 'fuzzy((disappoint))');

This results in the following entries being returned.

TITLE                    COMMENTS
-------------------- ----------------------------------------------------------------
Can You Ever Forgive Me? This movie was so disappointing
Top Gun                  Tom Cruise never disappoints. Definitely worth a watch.
Vice                     Perry’s performance in this movie is just so disappointing
Baywatch                 Loved the tv show but hated the movie. I’m so disappointed.
La La Land               Rent this movie you won’t be disappointed!
Batman                   Complete Disappointment

Alternatively, you can use the abbreviated syntax, which will return the same results as above:

SELECT m.cust_reviews.feedback AS comments
FROM movie_reviews m
WHERE JSON_TEXTCONTAINS(cust_reviews, '$.feedback', '?disapoint');

You can also use the stem search operator $. That will match verb forms sharing the same stem, so $disappoint will match “disappointing,” “disappointed,” and “disappoints,” but not “disappointment.”

SELECT m.cust_reviews.feedback AS comments
FROM movie_reviews m
WHERE JSON_TEXTCONTAINS(cust_reviews, '$.feedback', '$disapoint');

This blog was made possible by the lovely Roger Ford, the product manager for Oracle Text and JSON, who has taught me everything I know about text searches in the Oracle Database.

SQL Tuning: How to fix a SQL statement that picks a Nested Loop instead of a Hash Join

At one point or another during your career, you will face a situation where the optimizer picks the wrong join method. As tempting as it is to brute-force the plan you want via a hint, it’s always better to understand why the Optimizer made a decision and provide the necessary information to allow the Optimizer to select the plan you want by default.

In this short video below, I explain how the Optimizer determines the join method based on the cardinality of the table on the left-hand side of the join. I also provide a simple set of steps to help you identify the common problems that can cause an incorrect join method to be used and guidance on how to supply the necessary information, so the Optimizer will automatically select the appropriate join method.

Adaptive joins, introduced in 12c, can help address some cases where the Optimizer selects the wrong join method but not in all cases. It’s always better to know how to save yourself rather than relying on the built-in safety net.

Happy Tuning!

Why Oracle Implement Blockchain in the Database

The primary focus of conventional data security technologies like passwords, firewalls, and data encryption is to keep criminals out of your company and your data stores.

But what protects your data, especially your essential asset (contracts, property titles, account statements, etc.), from being modified or even deleted by folks who gain access to your systems legitimately or illegitimately (hackers)?

Crypto-secure Data Management

This is where Blockchain can help. Layering Blockchain technologies on top of conventional data security features provide an extra level of protection that prevents illicit modifications or deletes of data.

What is Blockchain?

When we think of Blockchain, many of us instantly think of decentralized peer to peer apps that only permit consensus-based data changes. However, adopting these apps requires new development methodologies, speciality data stores and potentially new business practices, which is complicated and expensive!

But if we take a closer look at Blockchain technologies, we see four critical components; immutability, cryptographic digests, cryptographic signatures, and distributed systems. Each part works to protect against a different aspect of illicit data changes performed using legitimate user credentials or by hackers.

Integrating these Blockchain technologies into the Oracle Database brings the critical security benefits of Blockchain to mainstream applications with minimal or no changes required. Providing the full functionality of the world’s leading database on crypto-protected data.

In the video below, Juan Loaiza explains how Oracle implemented Blockchain technologies in the Oracle Database and how they can be used to protect your essential business data. I’ve also included a brief description of these features under the video.

How do Blockchain technologies work in the Oracle Database?

To protect against illicit data changes made by rogue insiders or malicious actors using insiders’ credentials, Oracle has introduced Immutable tables (insert-only tables) in Oracle Database 21c (21.3).

Immutable Tables

With an Immutable table, it is possible to insert new data, but existing data cannot be changed or deleted by anyone using the database, even the database administrators (SYSDBA). It is also impossible to change an immutable table’s definition or convert it to an updatable table. However, an Immutable table appears like any other table in the database from an application’s point of view. It can store both relational data and JSON documents, and it can be indexed and partitioned or used as the basis of a view.

Blockchain Tables

To protect against illicit changes made by hackers, Oracle has introduced Blockchain tables. Blockchain tables are immutable tables that organize rows into several chains. Each row, except the first row in the chain, is chained to the previous row via a cryptographic digest or hash. The hash is automatically calculated on insert based on that row’s data and the hash value of the previous row in the chain. Timestamps are also recorded for each row on insertion.

Any modification to data in a Blockchain table breaks the cryptographic chain because the hash value of the row will change. You can verify the contents of a blockchain table have not been modified since they were inserted using the DBMS_BLOCKCHAIN_TABLE.VERIFY_ROWS procedure.

DECLARE
actual_rows NUMBER;
verified_rows NUMBER;
 
BEGIN
 
SELECT COUNT(*)
INTO actual_rows
FROM admin.my_bc_tab;
 
dbms_blockchain_table.verify_rows(
schema_name =&gt; 'admin',
table_name =&gt; 'MY_BC_TAB',
number_of_rows_verified =&gt; verified_rows);
 
DBMS_OUTPUT.put_line('Actual_rows='||actual_rows|| ' Verified Rows=' || verified_rows);
END;
/

End-User Data Signing

Even with Immutable or Blockchain tables, data can be falsely inserted in an end user’s name by someone using stolen credentials. To address this vulnerability, Oracle allows end-users to cryptographically sign the data they insert using their private key that is never passed to the database.

Each end-user registers a digital certificate containing their public key with this database. This digital certificate allows the database to validate the end-users signature when new data is inserted. Even if a hacker manages to steal a valid set of credentials without the private key, the data insert signature won’t match and will therefore not be accepted.

It’s also possible for end-users to ensure the database has received their changes by requesting Oracle countersign the newly inserted data. Oracle returns a crypto-receipt to the user, ensuring nothing on the mid-tier can filter specific data to prevent it from being recorded.

Distributing Cryptographic Digest

Even with cryptographically chained rows, sophisticated cyber-criminals or authorities could illicitly change data via a large-scale cover-up, where the entire database is replaced. To detect such a cover-up, Oracle enables schema owners to sign and distribute the cryptographic digest for a blockchain table periodically. Remember, the digest can’t be used to infer the data in the table, but authorized users can use it to validate the chain and confirm their newly inserted data is present. The crypto-digest can be posted to an independent public store or blockchain, like Ethereum or sent out by email or made available via a REST API.

A cover-up can easily be detected by comparing the previously published digests to the current table content. Also, distributing the publicly across multiple independent services prevents an authority or cyber-attacker from deleting all the separate copies.

Getting Started With Blockchain

Both Immutable and Blockchain tables are free features of the Oracle  Database. No additional licenses or software is needed to take advantage of these new table types, which are completely transparent to all new and existing applications.

Also, note Oracle has backported Immutable tables and Blockchain tables to Oracle Database 19c (19.11 and 19.10, respectively). Please check My Oracle Support for more details before attempting to use Blockchain tables in 19.10.

For more information on Blockchain check out the Oracle Blockchain blog, Oracle Blockchain LiveLabs or the Oracle Blockchain documentation.