How to do a Fuzzy Text Search on JSON #JoelKallmanDay

More and more apps I deal with store data as JSON documents in the Oracle Database. It is exceptionally convenient for the developers but doesn’t always make it easy to know exactly what data we have stored.

The good news is that Oracle offers multiple ways to help you understand precisely what data you have stored in those JSON documents. For example, you can use the built-in JSON Data Guide, which will trawl through all your documents and return a list of all the attributes you have stored.

But suppose you are interested in searching through your documents and only returning those that contain a particular word, value, or a variation thereof. In that case, you will want to take advantage of Oracle’s fuzzy text search or approximate string matching.

Imagine we have a table that stores movie reviews as JSON documents. I’m doing this demo in 19c, so I’m using a VARCHAR2 column, but from 21c onwards, you can use a JSON column.

CREATE TABLE movie_reviews( 
                     title varchar2(200),
                     cust_id NUMBER(26), 
                     cust_reviews varchar2(32000)
                     CONSTRAINT cr_is_json CHECK (cust_reviews IS JSON));

If you really want a JSON column in 19c, you can take advantage of this trick I learned from @Connor_mc_d.

 CREATE TABLE movie_rewiews( title varchar2(200), 
                             cust_id NUMBER(26), 
                             cust_reviews BLOB,
                             CHECK (cust_reviews IS json format oson));

Each review document contains details on the movie id, the star rating, and the review.

{“movie_id”: 5641,
“star_rating”: 1,
"Feedback":" Loved the tv show, but hated the movie. I am so disappointed."
}

We have been asked to find all the reviews that contain the word disappoint or variations of it. To facilitate a fuzzy text search, we need to create a text-based index on the feedback column inside the review documents.

CREATE SEARCH INDEX review_search_ind ON movie_reviews(cust_reviews) FOR JSON;

Once the index is created, we can run a fuzzy text search using the following query:

SELECT m.title, m.cust_reviews.feedback AS customer_review
FROM   movie_reviews m
WHERE  JSON_TEXTCONTAINS(m.cust_reviews, '$.feedback', 'fuzzy((disappoint))');

This results in the following entries being returned.

TITLE                    COMMENTS
-------------------- ----------------------------------------------------------------
Can You Ever Forgive Me? This movie was so disappointing
Top Gun                  Tom Cruise never disappoints. Definitely worth a watch.
Vice                     Perry’s performance in this movie is just so disappointing
Baywatch                 Loved the tv show but hated the movie. I’m so disappointed.
La La Land               Rent this movie you won’t be disappointed!
Batman                   Complete Disappointment

Alternatively, you can use the abbreviated syntax, which will return the same results as above:

SELECT m.cust_reviews.feedback AS comments
FROM movie_reviews m
WHERE JSON_TEXTCONTAINS(cust_reviews, '$.feedback', '?disapoint');

You can also use the stem search operator $. That will match verb forms sharing the same stem, so $disappoint will match “disappointing,” “disappointed,” and “disappoints,” but not “disappointment.”

SELECT m.cust_reviews.feedback AS comments
FROM movie_reviews m
WHERE JSON_TEXTCONTAINS(cust_reviews, '$.feedback', '$disapoint');

This blog was made possible by the lovely Roger Ford, the product manager for Oracle Text and JSON, who has taught me everything I know about text searches in the Oracle Database.

Why Oracle Implement Blockchain in the Database

The primary focus of conventional data security technologies like passwords, firewalls, and data encryption is to keep criminals out of your company and your data stores.

But what protects your data, especially your essential asset (contracts, property titles, account statements, etc.), from being modified or even deleted by folks who gain access to your systems legitimately or illegitimately (hackers)?

Crypto-secure Data Management

This is where Blockchain can help. Layering Blockchain technologies on top of conventional data security features provide an extra level of protection that prevents illicit modifications or deletes of data.

What is Blockchain?

When we think of Blockchain, many of us instantly think of decentralized peer to peer apps that only permit consensus-based data changes. However, adopting these apps requires new development methodologies, speciality data stores and potentially new business practices, which is complicated and expensive!

But if we take a closer look at Blockchain technologies, we see four critical components; immutability, cryptographic digests, cryptographic signatures, and distributed systems. Each part works to protect against a different aspect of illicit data changes performed using legitimate user credentials or by hackers.

Integrating these Blockchain technologies into the Oracle Database brings the critical security benefits of Blockchain to mainstream applications with minimal or no changes required. Providing the full functionality of the world’s leading database on crypto-protected data.

In the video below, Juan Loaiza explains how Oracle implemented Blockchain technologies in the Oracle Database and how they can be used to protect your essential business data. I’ve also included a brief description of these features under the video.

How do Blockchain technologies work in the Oracle Database?

To protect against illicit data changes made by rogue insiders or malicious actors using insiders’ credentials, Oracle has introduced Immutable tables (insert-only tables) in Oracle Database 21c (21.3).

Immutable Tables

With an Immutable table, it is possible to insert new data, but existing data cannot be changed or deleted by anyone using the database, even the database administrators (SYSDBA). It is also impossible to change an immutable table’s definition or convert it to an updatable table. However, an Immutable table appears like any other table in the database from an application’s point of view. It can store both relational data and JSON documents, and it can be indexed and partitioned or used as the basis of a view.

Blockchain Tables

To protect against illicit changes made by hackers, Oracle has introduced Blockchain tables. Blockchain tables are immutable tables that organize rows into several chains. Each row, except the first row in the chain, is chained to the previous row via a cryptographic digest or hash. The hash is automatically calculated on insert based on that row’s data and the hash value of the previous row in the chain. Timestamps are also recorded for each row on insertion.

Any modification to data in a Blockchain table breaks the cryptographic chain because the hash value of the row will change. You can verify the contents of a blockchain table have not been modified since they were inserted using the DBMS_BLOCKCHAIN_TABLE.VERIFY_ROWS procedure.

DECLARE
actual_rows NUMBER;
verified_rows NUMBER;
 
BEGIN
 
SELECT COUNT(*)
INTO actual_rows
FROM admin.my_bc_tab;
 
dbms_blockchain_table.verify_rows(
schema_name => 'admin',
table_name => 'MY_BC_TAB',
number_of_rows_verified => verified_rows);
 
DBMS_OUTPUT.put_line('Actual_rows='||actual_rows|| ' Verified Rows=' || verified_rows);
END;
/

End-User Data Signing

Even with Immutable or Blockchain tables, data can be falsely inserted in an end user’s name by someone using stolen credentials. To address this vulnerability, Oracle allows end-users to cryptographically sign the data they insert using their private key that is never passed to the database.

Each end-user registers a digital certificate containing their public key with this database. This digital certificate allows the database to validate the end-users signature when new data is inserted. Even if a hacker manages to steal a valid set of credentials without the private key, the data insert signature won’t match and will therefore not be accepted.

It’s also possible for end-users to ensure the database has received their changes by requesting Oracle countersign the newly inserted data. Oracle returns a crypto-receipt to the user, ensuring nothing on the mid-tier can filter specific data to prevent it from being recorded.

Distributing Cryptographic Digest

Even with cryptographically chained rows, sophisticated cyber-criminals or authorities could illicitly change data via a large-scale cover-up, where the entire database is replaced. To detect such a cover-up, Oracle enables schema owners to sign and distribute the cryptographic digest for a blockchain table periodically. Remember, the digest can’t be used to infer the data in the table, but authorized users can use it to validate the chain and confirm their newly inserted data is present. The crypto-digest can be posted to an independent public store or blockchain, like Ethereum or sent out by email or made available via a REST API.

A cover-up can easily be detected by comparing the previously published digests to the current table content. Also, distributing the publicly across multiple independent services prevents an authority or cyber-attacker from deleting all the separate copies.

Getting Started With Blockchain

Both Immutable and Blockchain tables are free features of the Oracle  Database. No additional licenses or software is needed to take advantage of these new table types, which are completely transparent to all new and existing applications.

Also, note Oracle has backported Immutable tables and Blockchain tables to Oracle Database 19c (19.11 and 19.10, respectively). Please check My Oracle Support for more details before attempting to use Blockchain tables in 19.10.

For more information on Blockchain check out the Oracle Blockchain blog, Oracle Blockchain LiveLabs or the Oracle Blockchain documentation.

Oracle Database 21c is now available on the Oracle Cloud

It looks like the holidays have come early this year for those of you with an Oracle Cloud account because starting today you can now create an Oracle 21c database!

That’s right, Oracle Database 21c is now production in the Oracle Cloud on the Oracle Cloud Database Service and the Autonomous Database Free Tier Service in Ashburn (IAD), Phoenix (PHX), Frankfurt (FRA) and London (LHR) regions. General availability of Oracle Database 21c for on-prem platforms (including Exadata, Linux and Windows) will be in 2021.

Creating a 21c Oracle Autonomous Database on the Always Free Tier

New features in Oracle Database 21c include Blockchain tables, SQL Macros (checkout the LiveSQL lab), a Native JSON datatype, In-Memory Hybrid Scans (using the in-memory column store like an index) and the ability to execute JavaScript inside the Oracle Database!

More details on what to expect from Oracle Database 21c can be found on the main Oracle Database Blog or in the Oracle Database Documentation.

If you don’t have an Oracle Cloud account yet, you can always sign-up for an Oracle Always Free Tier account at https://www.oracle.com/cloud/free/

%d bloggers like this: