Over the last few years there has been a rapid surge in the adoption of smart devices. Everything from phones and tablets, to smart meters and fitness devices, can connect to the Internet and share data. You only have to follow @MarkRittman and his experiences with getting his kettle to boil remotely to see just how many devices within your own home can connect to the internet.
And I've finally found a use for my Google Home – if I say "time for a cuppa" it tells the Amazon Echo to put the kettle on pic.twitter.com/l87eyxOweT
— Mark Rittman (@markrittman) April 16, 2017
With all of these smart devices, comes a huge increase in the frequency and volume of data being ingested into and processed by databases. This scenario is commonly referred to as the Internet of Things or IoT.
Some people assume that a NoSQL database is required for an IoT workload because the ingest rate required exceeds the capabilities of a traditional relational database. This is simply not true.
If we look at some more industrial examples, telecoms process tens of millions of Call Detail Records (CDRs) per second or connected manufacturing equipment, where every machine in a production line sends information about which component is currently being worked on, you quickly realize that these types of workloads have actually been around a really long time. And if we look at how the industry leading company in each of these fields manages their workloads, we find they have all been using the Oracle Database.
With that in mind, I began working with Cecilia Gervasio Grant, Weifeng Shi and my boss Juan Loaiza to define a set of best practices or management techniques not just to facilitate IoT workloads but to excel at them. We drew on our years of experiences with leading telecoms, financial institutions, retailers etc. to make realistic recommendation and to share real-world code examples, as well as performance numbers.
Throughout the paper we use a simple analogy of someone going grocery shopping to help explain the reasoning behind each of the tuning techniques recommended. For example, if you know you are going shopping for a lot of stuff, you get a big cart right away rather than selecting a small hand basket. The same is true in the database. You want to allocate a BIGFILE tablespace with large initial extents, so that you can ingest and store large volumes of data efficiently.
You would also never select one item at a time and pay for it before you select the next item on your list, which is akin to a single row insert followed by a commit within the Oracle Database. Instead, you would walk through the store, collecting all of the items on your list and then go to the check out once, which is why we recommend using array insert followed by a commit within the database.
We hope you find the recommendations outlined in the paper useful and would love to hear if you have used these or other techniques in your IoT or high-volume ingest workloads.
2 thoughts on “Best Practices For Large Volume or IoT Workloads”
Thanks for the great article, very useful in non-IOT scenarios as well. We had one requirement to ingest lot of audit logging data for every screen by every user and in the solution, we used partitions (local indexes) – the only issue that we faced was during the maintenance (dropping old partitions), we’re invalidating lot of cursors and there is no way to avoid this. This is 24×7 application with a high degree of concurrency.
Oracle should find a way to avoid reparsing during partition drop, I know it is a fundamental which is hard to change but maybe I put to product managers if they can find a way. 🙂
Unfortunately there is on way to avoid the cursor invalidation in 11.2 and 12.
However, in 12.2 we have done some work to implement fine-grained cursor invalidation so TRUNCATE PARTITION does not invalidate cursors, except for cursors that use global index access in certain cases.