In-Memory Relational Database
In-memory database(IMDB) solutions offer breakthrough performance for systems which use a relational database on the back end. In comparison to conventional RDBMS systems which employ a disk storage mechanism, IMDBs rely on a data storage that fully resides in RAM. This way, the IMDB systems try to maximally leverage the performance difference between RAM and hard disk drives. The difference is shown on the diagram below:
Figure 1. RAM is 300,000 times faster than traditional hard drives and 2000 times faster than SSD drives.
IMDBs have been on the market since early 2000’s. These days there is a great variety of vendors offering IMDB including big players like Microsoft, Oracle, IBM. This is a sign of both maturity and success of this technology.
In this blog I’ll explore one scenario where IMDB can be extremely helpful: real-time reporting solutions.
The requirements for a real-time reporting system are usually the following:
Velocity. The questions asked to the system should be answered immediately. The reports should be running almost instantly.
Low latency. The data must be as current as possible. No more than 5 second lag in data freshness is acceptable.
Flexibility. The system should be capable for custom ad-hoc querying rather than only provide a predefined set of reports.
It is easy to support all three requirements when the data volumes to report against are small (< 100Gb). Powerful reporting tools like Tableau, Microstrategy offer this functionality out of the box.
If the data volumes are moderate (100 GB and more), one can easily implement 2 of the 3 real-time reporting requirements:
Low latency and flexibility. The reports that hit the database directly, but Velocity is bad with reports running for minutes or even hours.
Velocity and flexibility. Data can be cached in the in-memory cubes (most of the reporting tools provide this capability), but Latency is poor because the freshness of data is compromised, as it usually takes some time to refresh the cube.
Velocity and low Latency. Reporting data can be pre-aggregated to reduce the volumes thus restoring Velocity. But in this case the variety of the queries that can be built is drastically reduced meaning Flexibility is compromised.
IMDB solutions can solve this problem. There is no need for in-memory cache if your reporting data already resides in RAM. And the reports are still rapid, flexible and current.
Real Life Example
As technical architect at Coherent Solutions, I have recently lead architecture, design and implementation of real-time reporting solution for one of our clients, a large online retailer that wanted to convert their hourly Sales dashboard into a real-time dashboard. The dashboard represented sales KPI’s for current day/month/year as well as comparison to last year numbers.
The solution required a complex integration of data from variety of source systems: Order Processing, Merchandising, Credit, Fulfillment.
The legacy implementation is depicted in the following diagram:
As you can see from the diagram, the legacy solution could guarantee reporting data that is at most 2 hours old. That came at expense of a complex staging environment used to provide a two-step data load process, as well as timing cube builds to not overlap related table loads.
The Coherent team worked with the company’s engineers on the real-time solution and proposed the following modifications:
The new ODS (Operational Data Storage) was an IMDB implemented using SQL Server 2014 In-Memory OLTP technology. When the system was launched in Production, any transactions that occurred in the source systems were reflected in the dashboard in less than 5 seconds. Furthermore, the IMDB allowed just-in-time data integration and reporting, eliminating staging environment and cached cubes.
A fly in the ointment
Of course, there are some concerns related to usage of IMDB systems:
Hardware limitations. The size of the in-memory database is limited by the amount of RAM available on a physical server. These days most of the server hardware supports up to 2TB of RAM.
Hardware costs. Running a large amount of RAM is expensive in comparison to disk drive space. Also, RAM is not able to preserve the data after machine reboot, so RAM storage solutions are usually supplemented with data-preserving hardware, which makes it even more expensive.
Despite these limitations, IMDB solutions could be very helpful in performance optimization of midsize databases bringing them up to the next level of user experience.