How to do a Fuzzy Text Search on JSON #JoelKallmanDay

More and more apps I deal with store data as JSON documents in the Oracle Database. It is exceptionally convenient for the developers but doesn’t always make it easy to know exactly what data we have stored.

The good news is that Oracle offers multiple ways to help you understand precisely what data you have stored in those JSON documents. For example, you can use the built-in JSON Data Guide, which will trawl through all your documents and return a list of all the attributes you have stored.

But suppose you are interested in searching through your documents and only returning those that contain a particular word, value, or a variation thereof. In that case, you will want to take advantage of Oracle’s fuzzy text search or approximate string matching.

Imagine we have a table that stores movie reviews as JSON documents. I’m doing this demo in 19c, so I’m using a VARCHAR2 column, but from 21c onwards, you can use a JSON column.

CREATE TABLE movie_reviews( 
                     title varchar2(200),
                     cust_id NUMBER(26), 
                     cust_reviews varchar2(32000)
                     CONSTRAINT cr_is_json CHECK (cust_reviews IS JSON));

If you really want a JSON column in 19c, you can take advantage of this trick I learned from @Connor_mc_d.

 CREATE TABLE movie_rewiews( title varchar2(200), 
                             cust_id NUMBER(26), 
                             cust_reviews BLOB,
                             CHECK (cust_reviews IS json format oson));

Each review document contains details on the movie id, the star rating, and the review.

{“movie_id”: 5641,
“star_rating”: 1,
"Feedback":" Loved the tv show, but hated the movie. I am so disappointed."
}

We have been asked to find all the reviews that contain the word disappoint or variations of it. To facilitate a fuzzy text search, we need to create a text-based index on the feedback column inside the review documents.

CREATE SEARCH INDEX review_search_ind ON movie_reviews(cust_reviews) FOR JSON;

Once the index is created, we can run a fuzzy text search using the following query:

SELECT m.title, m.cust_reviews.feedback AS customer_review
FROM   movie_reviews m
WHERE  JSON_TEXTCONTAINS(m.cust_reviews, '$.feedback', 'fuzzy((disappoint))');

This results in the following entries being returned.

TITLE                    COMMENTS
-------------------- ----------------------------------------------------------------
Can You Ever Forgive Me? This movie was so disappointing
Top Gun                  Tom Cruise never disappoints. Definitely worth a watch.
Vice                     Perry’s performance in this movie is just so disappointing
Baywatch                 Loved the tv show but hated the movie. I’m so disappointed.
La La Land               Rent this movie you won’t be disappointed!
Batman                   Complete Disappointment

Alternatively, you can use the abbreviated syntax, which will return the same results as above:

SELECT m.cust_reviews.feedback AS comments
FROM movie_reviews m
WHERE JSON_TEXTCONTAINS(cust_reviews, '$.feedback', '?disapoint');

You can also use the stem search operator $. That will match verb forms sharing the same stem, so $disappoint will match “disappointing,” “disappointed,” and “disappoints,” but not “disappointment.”

SELECT m.cust_reviews.feedback AS comments
FROM movie_reviews m
WHERE JSON_TEXTCONTAINS(cust_reviews, '$.feedback', '$disapoint');

This blog was made possible by the lovely Roger Ford, the product manager for Oracle Text and JSON, who has taught me everything I know about text searches in the Oracle Database.

Why Oracle Implement Blockchain in the Database

The primary focus of conventional data security technologies like passwords, firewalls, and data encryption is to keep criminals out of your company and your data stores.

But what protects your data, especially your essential asset (contracts, property titles, account statements, etc.), from being modified or even deleted by folks who gain access to your systems legitimately or illegitimately (hackers)?

Crypto-secure Data Management

This is where Blockchain can help. Layering Blockchain technologies on top of conventional data security features provide an extra level of protection that prevents illicit modifications or deletes of data.

What is Blockchain?

When we think of Blockchain, many of us instantly think of decentralized peer to peer apps that only permit consensus-based data changes. However, adopting these apps requires new development methodologies, speciality data stores and potentially new business practices, which is complicated and expensive!

But if we take a closer look at Blockchain technologies, we see four critical components; immutability, cryptographic digests, cryptographic signatures, and distributed systems. Each part works to protect against a different aspect of illicit data changes performed using legitimate user credentials or by hackers.

Integrating these Blockchain technologies into the Oracle Database brings the critical security benefits of Blockchain to mainstream applications with minimal or no changes required. Providing the full functionality of the world’s leading database on crypto-protected data.

In the video below, Juan Loaiza explains how Oracle implemented Blockchain technologies in the Oracle Database and how they can be used to protect your essential business data. I’ve also included a brief description of these features under the video.

How do Blockchain technologies work in the Oracle Database?

To protect against illicit data changes made by rogue insiders or malicious actors using insiders’ credentials, Oracle has introduced Immutable tables (insert-only tables) in Oracle Database 21c (21.3).

Immutable Tables

With an Immutable table, it is possible to insert new data, but existing data cannot be changed or deleted by anyone using the database, even the database administrators (SYSDBA). It is also impossible to change an immutable table’s definition or convert it to an updatable table. However, an Immutable table appears like any other table in the database from an application’s point of view. It can store both relational data and JSON documents, and it can be indexed and partitioned or used as the basis of a view.

Blockchain Tables

To protect against illicit changes made by hackers, Oracle has introduced Blockchain tables. Blockchain tables are immutable tables that organize rows into several chains. Each row, except the first row in the chain, is chained to the previous row via a cryptographic digest or hash. The hash is automatically calculated on insert based on that row’s data and the hash value of the previous row in the chain. Timestamps are also recorded for each row on insertion.

Any modification to data in a Blockchain table breaks the cryptographic chain because the hash value of the row will change. You can verify the contents of a blockchain table have not been modified since they were inserted using the DBMS_BLOCKCHAIN_TABLE.VERIFY_ROWS procedure.

DECLARE
actual_rows NUMBER;
verified_rows NUMBER;
 
BEGIN
 
SELECT COUNT(*)
INTO actual_rows
FROM admin.my_bc_tab;
 
dbms_blockchain_table.verify_rows(
schema_name => 'admin',
table_name => 'MY_BC_TAB',
number_of_rows_verified => verified_rows);
 
DBMS_OUTPUT.put_line('Actual_rows='||actual_rows|| ' Verified Rows=' || verified_rows);
END;
/

End-User Data Signing

Even with Immutable or Blockchain tables, data can be falsely inserted in an end user’s name by someone using stolen credentials. To address this vulnerability, Oracle allows end-users to cryptographically sign the data they insert using their private key that is never passed to the database.

Each end-user registers a digital certificate containing their public key with this database. This digital certificate allows the database to validate the end-users signature when new data is inserted. Even if a hacker manages to steal a valid set of credentials without the private key, the data insert signature won’t match and will therefore not be accepted.

It’s also possible for end-users to ensure the database has received their changes by requesting Oracle countersign the newly inserted data. Oracle returns a crypto-receipt to the user, ensuring nothing on the mid-tier can filter specific data to prevent it from being recorded.

Distributing Cryptographic Digest

Even with cryptographically chained rows, sophisticated cyber-criminals or authorities could illicitly change data via a large-scale cover-up, where the entire database is replaced. To detect such a cover-up, Oracle enables schema owners to sign and distribute the cryptographic digest for a blockchain table periodically. Remember, the digest can’t be used to infer the data in the table, but authorized users can use it to validate the chain and confirm their newly inserted data is present. The crypto-digest can be posted to an independent public store or blockchain, like Ethereum or sent out by email or made available via a REST API.

A cover-up can easily be detected by comparing the previously published digests to the current table content. Also, distributing the publicly across multiple independent services prevents an authority or cyber-attacker from deleting all the separate copies.

Getting Started With Blockchain

Both Immutable and Blockchain tables are free features of the Oracle  Database. No additional licenses or software is needed to take advantage of these new table types, which are completely transparent to all new and existing applications.

Also, note Oracle has backported Immutable tables and Blockchain tables to Oracle Database 19c (19.11 and 19.10, respectively). Please check My Oracle Support for more details before attempting to use Blockchain tables in 19.10.

For more information on Blockchain check out the Oracle Blockchain blog, Oracle Blockchain LiveLabs or the Oracle Blockchain documentation.

Data-Driven Apps – What are they and the easiest way to develop them

At the moment, we hear a lot about how businesses need to become data-driven to remain competitive, how business need to understand their customer’s needs and quickly deliver value to those customers.

But how do you do that?

You take advantage of data-driven apps that allow users to create value or insights from data in real-time. 

What are Data-Driven Apps?

Data-driven apps operate on a diverse set of data (spatial, documents, sensor, transactional, etc.) pulled from multiple different sources, often in real-time and create value from that data in very different ways to traditional applications. For example, they may use Machine Learning to make real-time recommendations to customers or detect fraudulent transactions. Or use Graph analytics to identify influencers in a community and target them with specific promotions or perhaps use spatial data to keep track of deliveries.

These apps are also frequently deployed on multiple platforms, including mobile devices as well as standard web browsers, which means they need a flexible, scalable and reliability deployment platform. Given the demands on these apps, they need to be continuously developed to adapt to new use cases or user needs, and all updates must happen online as they have to be available 24×7.

When building data-driven apps, developers need to leverage an ever-increasing set of data processing and machine learning algorithms to meet these requirements.

So how should you go about developing and deploying data-driven apps quickly, efficiently, and more importantly, in a maintainable way?

Data-Driven Techniques and Technology

You take a data-first approach, or as A. Neil Pappalardo put it ‘A Minimize Code, Maximize Data‘ approach. In other words, you bring the algorithms to the data, not the data to the algorithms.

In the video below, I explain how to take advantage of the built-in features and functionality of the Oracle Database to develop and deploy data-driven apps efficiently. I also share some easy to follow code examples to demonstrate how much simpler your application code can be if you use this approach! Continue reading “Data-Driven Apps – What are they and the easiest way to develop them”

How to implement Data-Driven Apps – Using many Single Purpose Database or with a single Converged Database?

There is an on-going debate in our community about the best approach for developing cloud-native or data-driven apps. On one side, you have folks who say use a single-purpose “best-of-breed” database for each data type or workload you have. While the other half say, you should use a single converged database. So, which approach is right for you and your projects?

Let’s examine some of the pros and cons of each approach.

Single-purpose Databases

Single-purpose databases or purpose-built databases as they are often as known, are engineered to help solve a single or small number of problems. Given their narrow focus, they can ignore the tradeoffs usually required when trying to accommodate multiple data types or workloads. It also allows them to use a convenient data model that fits the purpose and to adopt APIs that seem natural for that data model. They offer less functionality than converged databases, and therefore, fewer APIs, making it easier to start developing against them. Their simplicity means they do a few things very well, but other things not at all. For example, a lot of single-purpose databases scale well, because they offer no strong consistency guarantees.

At first glance, single-purpose databases appear to be a good option. Developers are happy because they get exactly what they need to begin a project. However, when you look at the bigger picture, single-purpose databases can cause a lot of pain and end up costing more in the long run.

Continue reading “How to implement Data-Driven Apps – Using many Single Purpose Database or with a single Converged Database?”

What is a Converged Database?

At the recent OOW European conference there was a lot talk about Converged Databases and how they can greatly simplify data-driven app development.

But if you missed the conference, you might find yourself wondering what exactly is a Converged Database and what is the difference between a Converged Database and an Autonomous Database?

So, I thought it would be a good idea to write a short blog post explaining what a Converged Database is and how it relates to the Oracle Autonomous Database.

What is a Converged Database?

A Converged Database is a database that has native support for all modern data types (JSON, Spatial, Graph, etc. as well as relational), multiple workloads (IoT, Blockchain, Machine Learning, etc.) and the latest development paradigms (Microservice, Events, REST, SaaS, CI/CD, etc.) built into one product.

By having support for each of these datatype, workloads, and paradigms as features within a converged database, you can support mixed workloads and data types in a much simpler way. You don’t need to manage and maintain multiple systems or worry about having to provide unified security across them.

You also get synergy across these capabilities. For example, by having support for Machine Learning algorithms and Spatial data in the same database, you can easily do predictive analytics on Spatial data.  The Oracle Database is a great example of a Converged Database, as it provides support for Machine Learning, Blockchain, Graph, Spatial, JSON, REST, Events, Editions, and IoT Streaming as part of the core database at no additional cost.

A good analogy for a Converged Database is a smartphone. In the past, if you wanted to take a picture or video you would need a camera. If you wanted to navigate somewhere you would need a map or a navigation system. If you wanted to listen to music, you needed an iPod and if you wanted to make phone calls, you would also need a phone.

But with a smartphone, all of these products have been converged into one. Each of the original products is now a feature of the smartphone. Having all of these features converged into a single product inherently makes your life easier, as you can stream music over the phone’s data plan or upload pictures or videos directly to social media sites.
Continue reading “What is a Converged Database?”

%d bloggers like this: