Abstractions

I’m an old database hand-and I think that using such technologies effectively is still an aspect of “good IT governance”. I can remember times before RDBMSs (Relational Database Management Systems) and deciding that the speed of IBM’s IMS Fast Path database outweighed its lack of flexibility, and that the new Oracle thing had problems with its optimiser, some 20 or 30 years ago. How things have changed and RDBMSs are now-almost-ubiquitous (in the public consciousness, anyway; IMS is still a current product). But you can still make me lose my rag by saying things like “RDBMS is all very well, and very flexible but it is a bit slow“. Speed is simply nothing to do with the RDBMS specification (which is a logical implementation of relational set theory)-if an RDBMS is slow, it is simply because it has been implemented badly (usually, because someone thinks that tables have to exist in physical storage as well as in the logical view-you can make such implementations fast, but it’s a struggle).

There are a few counterexamples of fast RDBMS implementations with complete abstraction between the logical view and the physical storage. Intersystems Caché is a fine example-on disk, the data is in a sparse array of bits; logically it can be a fast RDBMS. Or OODBMS. Or Mumps etc. Similarly, Sybase IQ looks like an RDBMS accessed with SQL-but underneath it’s entirely different (and column, not row oriented).

Now, I’ve just been talking to Emma McGrattan (SVP Engineering at Ingres) about VectorWise, which brings abstraction to Ingres RDBMS.

Put simply, VectorWise provides a SQL (logical RDBMS) front end to a physically column-oriented data store, with code that takes full advantage of modern 8086 chip architectures. But they key issue it addresses is that access to main memory, not disk, is the performance bottleneck today and it is essential to use on-chip cache effectively. What VectorWise means is that BI-type queries run many times faster than they would in a traditional RDBMS implementation, optimised for transaction processing. What this means in turn is that the cheap commodity multicore hardware can run your BI applications as fast as you hoped when you boot the stuff, without applications having to be rewritten for the new architectures. This will also go some way towards addressing multicore issues.

That’s good, obviously, especially as Ingres also comes with an enterprise culture (probably from its time in CA) and attitude to support; as well as with an Open Source culture. There are lots of good things about Open Source software but if the support isn’t similar to what you’d get from IBM/DB2, few enterprises will be interested.

What is really interesting to me, however, is the possibilities going forward (and here I must stress that I’m not revealing an Ingres roadmap; these are just my speculations).

For a start, Ingres currently addresses two kinds of processing with a row-oriented and column-oriented store. Both look like an RDBMS and use SQL but, underneath, you replicate row-oriented transactional data into a column-oriented store for BI queries. That’s sensible (and pretty common). But, there’s no fundamental reason why you shouldn’t have one store, with the optimiser choosing to store data in a row-oriented or column-oriented form, as best suits the current workload and chip architecture. The advantage of this would be that near-real-time decision support applications could access transactional data directly without the latency associated with moving data into a different kind of store.

Then again, this technology could revolutionise programming tools. At the moment, most tools produce horrid SQL queries, from programmers who don’t understand query design anyway, and the Ingres optimiser refactors these queries into sensible ones that run efficiently on modern chips. That’s good-but suppose (we are in the Open Source world) someone takes the Ingres optimiser and embeds it in a programmers IDE so that queries are optimised at the start-and feedback from the optimiser can educate programmers in query design? Might that be more efficient overall?

And, of course, VectorWise isn’t limited to Intel x86 architecture (Ingres is already looking at AMD and reports similar application). If Ingres ever gets around to implementing VectorWise for different platforms (mainframes etc.) you could simple take an application optimised for x86 and simply recompile it with VectorWise for any other platform. And then we get back to the ideal of database portability SQL promised before Oracle got its hands on it in the interest of “efficiency”. The difference with the way VectorWise optimises for particular hardware platforms is that, unlike SQL extensions, the hardware optimisation is entirely transparent to programmers. Abstraction is a wonderful thing-if it is done completely and absolutely, because it isolates business automation choices from technology changes and vendor marketing decisions.