SAP HANA update

I have, in the past, written somewhat critically about SAP HANA. A part of the problem is that the product is something of a moveable feast with new versions appearing on a pretty regular basis—perhaps not surprising for a technology that is still relatively new—but, nevertheless, this means that it is difficult to keep up with. Also, historically at least, there has been a perception that SAP has been over-hyping SAP HANA. For example, Larry Dignan, reporting for ZDNet, recently wrote that “Vishal Sikka, SAP’s technology lead and member of the company’s executive board, stopped short of saying HANA would cure cancer, but not by much.” While Larry was no doubt writing this with some of his tongue in his cheek it is worth noting that, in SAP’s defence, the company (using SAP HANA) has been working with the Stanford University Medical Center and the National Center for Tumor Diseases in Heidelberg, Germany (NCT), to help the Human Genome Research Institute to put the therapeutic promise of the Human Genome within reach. And the White House recently honoured these three companies for the genomics advances they have achieved (Nov 2013).

All that being said, I have had a recent update from SAP and it is worth reporting a number of salient points. The first thing is that SAP HANA is intended to support both OLTP and analytic processing in the same instance. By SAP’s surveys, around a quarter to a third of HANA customers use it this way, with another quarter using it purely in warehousing environments and the remainder being specialised developments created by partners. In addition, in data warehousing environments, HANA is not seen by SAP as the sum total of its solution. In particular, it sees SAP HANA integrating into existing environments with Oracle, Teradata, to SAP IQ (formerly Sybase IQ), Hadoop, streaming analytics systems like SAP ESP and other data sources. In the case of SAP IQ, SAP HANA will manage perhaps the last 6 months’ or a year’s worth of data while seasonal or trending information can be stored in IQ.

Speaking of SAP IQ, HANA primarily stores data in columns (rows are an option) and SAP is now using the same compression techniques as employed in SAP IQ, except that in SAP HANA it is bit-level compression rather than byte-level, so it should be even more efficient. As an aside, bit-level compression has already been added to SAP IQ 16, released in 2013. Also, on the issue of compression, SAP has assured me that when conducting query processing the data is never decompressed except for result sets. Of course, this will also apply to intermediate result sets and that is likely to explain why I have criticised the product in the past for what I have described as spikes in memory usage. But the truth is that all database products are going to be subject to the materialisation of intermediate results and there is no reason to suppose that SAP HANA is either any better or any worse when it comes to this than any other vendor. Indeed, intermediate results are dropped as soon as the data is joined, while the results are cached. What will be important is that queries are written in such a way that the materialisation of intermediate results is minimised. Of course, this also applies to business intelligence and analytics that may be used in conjunction with SAP HANA (or any other database).

While on the subject of memory, the standard maximum memory for SAP HANA is 1TB. However, the company tells me that this is not the technical limit—this is only what is certified as standard—and actually you can run up to 4TB per sever. For example, Intel, in its recent Ivy Bridge launch demonstrated an Intel 4TB E7 v2 Ivy Bridge system using SAP HANA and SAS together to analyse a real-time oil and gas pipeline pump malfunction. The SAS/SAP HANA analysis came back in 5 seconds, 129x faster than the E7 v1 using SAS and a disk-based database.

In addition, you can scale out using multiple servers. Moreover, a feature I particularly like is that SAP HANA has data virtualisation built in, so that you can optimise (and the database optimiser knows about the data virtualisation) queries across distributed SAP HANA servers and also when using SAP HANA in conjunction with SAP IQ. This will also be relevant when you want to combine structured and unstructured elements (SAP HANA has a text pre-processor and also supports machine-generated data) in a single query. Thus for most environments there should be no issues about the availability of sufficient memory.

Some other notable affinities include cache affinity, support for time series (which will be extended in the next release: it is currently limited to time ranges), the ability to implement indexes and create aggregates if you want to (you usually won’t), and all the sort of ACID, CRUD, high availability and other features you would expect from an OLTP environment.

To summarise: I still think that SAP over-hypes SAP HANA and the fact is that it is not the answer to a maiden’s prayer. That said, for the right application/analytic environments it should certainly represent a credible solution.