Which databases are preferable for OLTP systems

SAP HANA operating concepts - technology easily explained!

Estimated reading time: 8 minutes

Today our colleague Jürgen Meynert explains everything about the SAP HANA operating concepts in detail. So this article is a bit more technical and longer than usual. But these are school books too and we all learned a lot from them. We hope you enjoy discovering and understanding!

With HANA, SAP has been developing a new technical basis for its applications for several years. The motivation for this may be based on the fact that a technological paradigm shift is imminent with non-volatile main memory (NVRAM - non-volatile RAM).

In-memory technology fundamentally requires new programming models that cannot be implemented by adapting existing software, but rather require radically new approaches. A paradigm shift is imminent not only in hardware, but also in software technology. In the course of technical progress, the access speed of storage systems has not been able to keep pace with the growth in processor speed. At CPU clock rates of 3 GHz, which corresponds to cycle times of 0.3 nanoseconds, processing steps in the processor take on the order of nanoseconds (ns), while access to external storage is in the range of milliseconds (ms). That's a disproportionate 1 to 1,000,000!

As a consequence, CPUs wait most of the time for IO in information processing applications. Now it is not enough just to make the storage faster, for example with ultra-fast flash devices, since the light and thus also the data in the ns range can only cover a very limited distance (<30 cm in 1 ns). In the end, faster data access can only be achieved by keeping the data available close to the processor: in RAM or, better still, in the cache.

Code to data

The processing speed can be accelerated further by executing application code directly in the database, which avoids comparatively high latencies in communication between the application and the database. So if data was previously channeled through the database to the application, application code will be brought to the data in the future. This is the best way to describe the paradigm shift: Instead of “Data to Code” it will in future be called “Code to Data”.

However, RAM is currently still volatile, so write operations in the main memory must be secured by a persistence layer, i.e. ultimately storage again. RAM is already well equipped for read access, even to very large amounts of data, since computers with high RAM capacities (up to several TB) are available at reasonable prices with ever increasing packing density of the storage elements and a simultaneous drop in prices. Since reading from RAM is now efficient and sensible, the focus of SAP Hana and other in-memory technologies is primarily on reading applications such as reporting and business intelligence (OnLine Analytic Processing, OLAP). For transactional systems (OnLine Transaction Processing, OLTP) one can gain advantages from the fact that on the one hand online reporting on the transactional data is possible without loss of performance in the transaction processing or that code sections with high communication volume between database and application from a relocation to the database are already possible benefit. Regardless of whether it is OLAP or OLTP, the in-memory DB (IMDB) needs persistence, because the data has disappeared from the RAM when the computer is switched off at the latest.

Persistence Layer and Performance

Since the data accesses in an IMDB predominantly take place in the RAM, one could expect that the storage as a persistence layer plays a minor role in terms of performance and primarily serves as a safeguard so that no data is lost. However, the requirements of SAP in terms of persistence performance were and are in some cases higher than for classic databases. In general, two writing mechanisms can be identified in databases - logwriter and datawriter. The logwriter documents every single change (insert, update, delete) that is carried out on the database promptly (synchronously) in a separate area. The data writer updates the changes to the tables in the storage from time to time (asynchronously) and ensures a consistent, but usually not up-to-date (as it is asynchronous) image of the database. The logwriter is critical for transaction processing and for database recovery, should it ever be necessary. A transaction is only considered completed when the log writer has reported it as documented. Only then can processing continue. This ensures that the last valid status can be restored after an unplanned termination of the database by updating the last consistent data image with the log entries not yet recorded there (roll forward).

Logwriter & datawriter

In the early revisions of Hana, the logwriter was designed to write all changes in small block sizes in the log area. In the case of extensive changes in the database, this led to a considerable number of IO operations. That is why the requirement from SAP at the time was that the persistence had to be able to write at least 100,000 IOps (IO operations per second). This can only be achieved with a reasonable amount of effort using local flash devices (PCI-based). This is why most Hana single-node installations had and still have PCIe-based flash devices. Hana was later expanded to include a ScaleOut architecture in the event that the maximum possible main memory expansion within a computer was no longer sufficient to completely store a larger database. With this option, Hana can be distributed over several computer nodes.

The computers can be designed so that not all are active, but that one or more nodes can also be configured as failover in the event that an active node fails. However, this requires an (external) persistence that can be read by all computers, because otherwise a failover node cannot read the data from a failed computer. This meant that the concept of writing log data to a local device very quickly was no longer tenable. Accordingly, the logwriter was optimized so that it could write variable block sizes. This meant that the high IO rates were no longer necessary. In a scale-out scenario, almost 20,000 IOps per computer node was sufficient. Nevertheless, SAP maintained the 100,000 IOps for single nodes until the very recent past.

In addition to the log writer, there is, as already mentioned, the data writer. First of all, you should think that it is not critical in terms of performance, since it writes asynchronously. In fact, Hana writes so-called savepoints at configurable intervals - the default is five minutes. The performance of the storage must be designed in such a way that, at least in terms of throughput, the volume changed between two savepoints can be written in the available time interval. Since the data writer works according to the copy-on-write principle, the write load tends to be sequential, since changed blocks are not overwritten, but the changes are stored in newly allocated blocks. This simplifies the requirements for persistence, because sequential IO can be implemented much more efficiently than random IO.

Since Hana’s column-based internal architecture is comparable to one hundred percent indexed data, there are more internal reorganizations than other databases at Hana, which are then also mapped to persistence. This increases the data writer's requirement for write throughput. On the other hand, one should expect that the requirements for the IO throughput when reading data are rather low, since Hana should actually read data in RAM. That may be true for normal operation, but it is not true in the event that Hana is booted up. Assuming that 1 TB of data has to be read into the main memory, this still takes 20 minutes with a throughput of 1 GB / s. That wouldn't be a problem if database restarts were the exception. Since Hana is currently being continuously developed with the aim of one day making optimal use of NVRAM, updates must be installed at regular intervals, which are often accompanied by a restart of the database. This explains the requirement from SAP to equip the persistence with high throughput rates for reading in the data area.

OLAP versus OLTP

Even if, as mentioned above, the main area of ​​application of IMDBs tends to be in OLAP, SAP is already on the way to propagate OLTP applications on Hana (Suite on Hana). It is technically possible to use both single nodes and scale-out architectures for OLTP systems. From a performance point of view, however, there is a significant difference. As already explained, there can be a performance advantage on Hana for OLTP applications if code segments are relocated to the database in order to avoid time-consuming communication between the application and the database. However, if Hana is distributed over several computer nodes in a ScaleOut landscape, it becomes very difficult to distribute code and data tables over the nodes in such a way that the code segments can find their tables on the same computer on which they are currently running. Because if the code has to fetch the data from a neighboring node, there is again a need for communication between the nodes, which happens with a comparatively high latency, as if the code had remained on the application server. For this reason, a single-node implementation from Hana for OLTP is definitely preferable to a ScaleOut architecture.

At the same time, SAP has so far insisted on fast (internal) log devices for Hana as a single node. Internal log devices, however, are not acceptable for business-critical OLTP applications, since a loss of the computer or the log device is also accompanied by a loss of data. Business-critical data, especially the log data, should always be written (mirrored) to a second location so that in an emergency the database can be recovered from a second source up to the last completed transaction.

Fujitsu integrated the Hana single node architecture into the FlexFrame operating concept at an early stage and placed the log data on external, mirrorable storage units. The previously required 100,000 IOps are not available there, but from a technical point of view they are no longer necessary. For Hana, however, the secure and flexible operation known from FlexFrame is guaranteed for business-critical applications with the typical high SLAs. In the meantime, SAP is also moving away from the high IO requirements for the log writer in order to prepare Hana for flexible integration into data center operations.

Efficient operating concept and shadow databases

The requirement for secure data storage and an efficient operating concept has been solved by integrating Hana in FlexFrame. With mirrored shared storage, high availability is guaranteed both locally and across data centers. Another open point is the problem of recovery times. Depending on the size of the database, a complete restart can take excessively long even with high-performance IO channels. As part of the further development of Hana, SAP is working on the concept of the shadow database, which would ideally minimize switching times, since shadow databases usually run almost synchronously with the primary data.If the primary database fails, activation and full recovery of the shadow database would only take a few minutes before operations can resume.

Shadow databases are not yet available in Hana today, but as a preliminary step, Hana offers the option of system replication, which ensures that the log data is replicated synchronously to a second instance and that the columnstore (the column structure) of Hana in the Main memory is preloaded and updated. In the event of a failover, the entire columnstore does not have to be reloaded, as most of it has already been pre-loaded. This allows the recovery times in critical environments to be reduced to a reasonable level.

The recommendation for applications that allow only minimal downtime would be to operate a second instance with system replication locally to the productive Hana instance and, in the event of a disaster, to mirror the productive persistence in a second data center. Since the instance with the system replication only uses the computer resources to a small extent, other, non-productive systems could be run in parallel on the computer node.

ScaleOut

It remains to be discussed how a ScaleOut architecture is to be assessed compared to a single node. Basically, both for OLTP and OLAP, if the database size is the same, the single node is the preferred alternative, provided the RAM capacities allow it. There are two main reasons for this. The first was already discussed in the discussion in connection with OLTP. Communication between the database nodes takes a comparatively long time and has a negative impact on performance. Especially with OLAP applications, the problem of skillfully assigning code segments to the data is not as relevant as with OLTP, since queries can usually be processed in a well-distributed manner due to their mathematical structure. Nevertheless, the problem of latency remains, because the partial results of a query must finally be brought together on one node and consolidated into one final result.

A second problem arises, for example, with joins that go over tables that are distributed over several nodes. Before the join can be carried out, the data from the tables involved must be transferred to and temporarily stored on the node on which the join is carried out. This costs time on the one hand and additional main memory on the other. With a single node, there is no data transfer or intermediate storage, since all data is local. This results in a recommendation that applications should be served with a single node instance for as long as possible.

The current developments in hardware technology meet this approach. With the hardware officially available in February 2014, it will be possible to install up to 12 TB of RAM in a machine. In the meantime, SAP has announced that with the new hardware for OLTP applications it will support up to 6 TB on one computer for productive systems and for OLAP up to 2 TB with eight populated sockets compared to 1 TB in the past. That sounds plausible because the CPU performance of the new generation of processors has roughly doubled. However, the performance of the Hana technology has also improved steadily and significantly in recent years, so that from a technical point of view one can imagine even greater RAM expansion than 2 TB for a node in a scale-out architecture in the future.

Keywords: Code to Data, Data Center, In-Memory Computing, NVRAM - non volatile RAM, SAP