How to do a Fuzzy Text Search on JSON #JoelKallmanDay

More and more apps I deal with store data as JSON documents in the Oracle Database. It is exceptionally convenient for the developers but doesn’t always make it easy to know exactly what data we have stored.

The good news is that Oracle offers multiple ways to help you understand precisely what data you have stored in those JSON documents. For example, you can use the built-in JSON Data Guide, which will trawl through all your documents and return a list of all the attributes you have stored.

But suppose you are interested in searching through your documents and only returning those that contain a particular word, value, or a variation thereof. In that case, you will want to take advantage of Oracle’s fuzzy text search or approximate string matching.

Imagine we have a table that stores movie reviews as JSON documents. I’m doing this demo in 19c, so I’m using a VARCHAR2 column, but from 21c onwards, you can use a JSON column.

CREATE TABLE movie_reviews( 
                     title varchar2(200),
                     cust_id NUMBER(26), 
                     cust_reviews varchar2(32000)
                     CONSTRAINT cr_is_json CHECK (cust_reviews IS JSON));

If you really want a JSON column in 19c, you can take advantage of this trick I learned from @Connor_mc_d.

 CREATE TABLE movie_rewiews( title varchar2(200), 
                             cust_id NUMBER(26), 
                             cust_reviews BLOB,
                             CHECK (cust_reviews IS json format oson));

Each review document contains details on the movie id, the star rating, and the review.

{“movie_id”: 5641,
“star_rating”: 1,
"Feedback":" Loved the tv show, but hated the movie. I am so disappointed."
}

We have been asked to find all the reviews that contain the word disappoint or variations of it. To facilitate a fuzzy text search, we need to create a text-based index on the feedback column inside the review documents.

CREATE SEARCH INDEX review_search_ind ON movie_reviews(cust_reviews) FOR JSON;

Once the index is created, we can run a fuzzy text search using the following query:

SELECT m.title, m.cust_reviews.feedback AS customer_review
FROM   movie_reviews m
WHERE  JSON_TEXTCONTAINS(m.cust_reviews, '$.feedback', 'fuzzy((disappoint))');

This results in the following entries being returned.

TITLE                    COMMENTS
-------------------- ----------------------------------------------------------------
Can You Ever Forgive Me? This movie was so disappointing
Top Gun                  Tom Cruise never disappoints. Definitely worth a watch.
Vice                     Perry’s performance in this movie is just so disappointing
Baywatch                 Loved the tv show but hated the movie. I’m so disappointed.
La La Land               Rent this movie you won’t be disappointed!
Batman                   Complete Disappointment

Alternatively, you can use the abbreviated syntax, which will return the same results as above:

SELECT m.cust_reviews.feedback AS comments
FROM movie_reviews m
WHERE JSON_TEXTCONTAINS(cust_reviews, '$.feedback', '?disapoint');

You can also use the stem search operator $. That will match verb forms sharing the same stem, so $disappoint will match “disappointing,” “disappointed,” and “disappoints,” but not “disappointment.”

SELECT m.cust_reviews.feedback AS comments
FROM movie_reviews m
WHERE JSON_TEXTCONTAINS(cust_reviews, '$.feedback', '$disapoint');

This blog was made possible by the lovely Roger Ford, the product manager for Oracle Text and JSON, who has taught me everything I know about text searches in the Oracle Database.

2 thoughts on “How to do a Fuzzy Text Search on JSON #JoelKallmanDay”

Leave a Reply

Your email address will not be published. Required fields are marked *