Can GPUs speed up Database workloads?

Recently there has been a lot of interest or hope that Graphics processing units or GPUs would be able to transparently accelerate database workloads. So, I thought it was worth investigating what Oracle is up to regarding getting transparent performance gains from both CPUs and GPUs, as the Oracle Database has a long history of adopting new technologies as they become available.

Let’s start with GPUs.

It is important to understand the basic architectural benefits and tradeoffs of GPUs in order to determine whether they will provide  value for database workloads.

GPUs are dedicated highly parallel hardware accelerators that sit on the PCI bus. The huge number of parallel computation engines provided by these devices accelerate tasks that require large numbers of computations on small amounts of data.  For example, GPUs are extremely effective for Blockchain applications because these require billions of computations on a few megabytes of data.  GPUs are also good for deep learning algorithms since these perform repeated computational  loops on megabytes to gigabytes of data and of course GPUs are great for graphics because three-dimensional imaging requires millions of computations on every image.  The workload patterns here are all the same – lots of computation on modest amounts of data.

So, can GPUs improve database workloads?

Based on the description above it’s possible that GPUs could be used to accelerate Analytic workloads.  However, GPUs will have little or no benefit for OLTP style workloads.

GPUs offer the potential to accelerate analytic processing through two mechanisms:

  1. Adding a lot more parallel processing
  2. Using higher bandwidth, but much smaller specialized memory called High Bandwidth Memory (HBM).

However, database analytics don’t completely fit the GPU mold or sweet spot.

Analytics typically perform a small number of simple calculations on large amounts of data, often hundreds of gigabytes to petabytes of data.  For example, a typical analytic query will apply a simple predicate (e.g. filter sales region or date) and then perform a simple aggregation function (e.g. sum or average).

SELECT s.customer_name, SUM(s.amount_sold)
FROM    sales s
WHERE s.sales_region = 'CA'
GROUP BY s.customer_name;

It’s unlikely the volume of data processed by an analytics query will fit in the local GPU memory, therefore data will have to be moved back and forth across the PCI bus. This limits the total throughput to the PCI bus bandwidth which is dramatically lower than the local memory bandwidth.  This doesn’t mean that GPUs won’t provide any benefits for analytics, but users should not expect the dramatic benefits seen in other applications. It is just not architecturally possible.

All that said, Oracle, and other vendors, have found that some database analytics algorithms can in fact run faster on GPUs than using conventional processing methods.  However, care should be taken when reading performance comparisons showing huge advantages for GPUs. Typically, these comparisons contrast performance using traditional database algorithms vs new and highly optimized GPU algorithms.  Furthermore, these comparisons often use easily available but un-optimized and un-parallelized open-source databases that are orders of magnitude slower than commercial databases for analytics.

But it’s not all doom and gloom.  Changes in hardware are coming that will see PCI buses get faster, and future GPUs will reduce their PCI bus communication disadvantages by adding direct high bandwidth communication with the main CPUs.

So, what about today? Is there any hope to get transparent improvements in database performance fro CPUs?

The answer is yes!

Oracle Database 12c introduced a new columnar in-memory formats to greatly accelerate analytics.  The columnar in-memory algorithms make extensive use of SIMD vector instructions that are already present in standard CPUs today.

SIMD Vector instructions accelerate analytics by processing many data elements in a single instruction.  SIMD Vector instructions benefit from having full access to the very large caches and memory bandwidth that exist in current CPU sockets.  An advantage of SIMD vector instructions is that they are present in all existing CPUs and add no further cost, complexity, or power usage to existing hardware.

Oracle continues to rapidly add new SIMD vector algorithms to the database to take further advantage of these specialized instructions.   Oracle is also enhancing the parallel algorithms that execute SQL to take further advantage of SIMD instructions.  What’s really great about this is all performance gains are completely transparent to applications and require no effort from the customer other than installing the software.

Oracle has also been actively working with Intel and other chip vendors for many years to add additional SIMD vector instructions to CPUs for the specific purpose of accelerating Oracle Database algorithms. Some of these instructions are now becoming available, and more instructions will become available as new CPU chips are released in the next few years.

In summary, Oracle is actively improving its analytic algorithms by further leveraging SIMD Vector instructions and improving parallelism.  Oracle is working with both conventional CPU vendors and GPU vendors to add new hardware capabilities that specifically optimize database processing.  Current GPUs can be shown to run some analytic algorithms faster but achieving these advantages in a non-benchmark environment is challenging because these algorithms only work for a subset of analytic functions, and data needs to be moved back and forth across the PCI bus.

One thought on “Can GPUs speed up Database workloads?”

  1. Hi,
    I understand that for use with small querys would not be useful to use GPU, but when we think of complex queries where the calculation of the execution plan is often sub optimized, would not GPU use be useful for this case?

Leave a Reply

Your email address will not be published. Required fields are marked *