In-memory? That’s so yesterday!

Not for the first time, IBM has been underselling itself. I refer specifically to DB2 with BLU Acceleration. Given this second sentence you might be surprise at the first: IBM has been heavily marketing BLU. However, it’s true.

Let me go back. One of the slides that IBM prepared for its BLU launch showed how a query processing 10TB of data could actually fit into 8MB of memory. This is how it works: assuming a 10x compression ratio, that 10TB of raw data is only 1TB on disk; but let’s suppose that table you are processing has 200 columns and you only want to read 2 of them then that’s two orders of magnitude reduction to 10GB; now use the newly introduced data skipping technology and that’s a further reduction by 1/10^th to 1GB. Next, bear in mind that you have parallel processing across each core and you get a reduction to 32MB and, finally, you have vector processing which means a further reduction by a quarter to 8MB. There the slide stops.

Of course, this is all highly theoretical: you might want 20 columns and not 2 but on the other hand there are not many queries that access 10TB of data in the first place.

Bu the interesting thing, and where IBM has been under-selling itself, is that it doesn’t tell you any more about that 8MB. Because, a rarely mentioned feature of BLU is that it uses processor cache wherever possible and L3 cache in particular. And, as it happens, L3 cache typically starts at 8MB (up to about 24MB currently) so, in this particular example, and if it was the only query running – which it won’t be – then the whole query could fit in L3 cache. What’s important about that is that L3 cache is around an order of magnitude faster than RAM.

And that’s why this article has the heading it does. Consider that IBM is a major manufacturer of processing chips with its Power series. Given that DB2 with BLU Acceleration can exploit processor cache, wouldn’t you expect IBM to be developing its next generation of chips with a great deal more L3 cache? I am speculating here and I haven’t discussed this with IBM at all: but it makes sense to me albeit that there may be hardware constraints that I am not aware of.

But, if I’m right, where does that leave HANA and other such (purely) in-memory technologies? If processor cache can be significantly expanded then the drive must be towards exploiting this as much as possible and in-memory processing will be limited to non-urgent, less important queries – and if they are not urgent and less important then why not leave them on disk? Will in-memory processing soon become passé?