NoSQL and NewSQL

NoSQL started as an abbreviation for Not SQL and has since morphed into Not Only SQL because vendors have realised that, actually, people rather like SQL and that there are a lot of people with SQL experience who are less costly to employ, and more easily recruited, than Java or other programmers.

Actually, the focus on not having SQL was a mistake. It was not that SQL didn’t perform well, it was that the architecture of relational products is not well suited to certain types of task. But it is all too easy to fall into the trap of equating SQL with relational technology when that doesn’t have to be the case. The whole point of a relational architecture is that the physical instantiation of the database is separate from its logical implementation, even though the vast majority of relational products have relational storage as well as a relational access layer. That this doesn’t have to be the case is illustrated by DB2 with its different storage engine for relational and XML-based data respectively.

Anyway, the NoSQL vendors have realised their mistake and added SQL capability. However, what I really what to discuss is so-called NewSQL databases. These are databases that have been designed from the outset to have a SQL interface (rather than being added on later) but which do not necessarily have a relational storage engine underneath.

The first of these NewSQL databases to appear was VoltDB, yet another brainchild of Michael Stonebraker. This is essentially a standard relational database with all the unnecessary gubbins that has accumulated over 40 years of relational development thrown out so that this is a much leaner and meaner machine than its traditional counterparts. As a result it performs better and has a much smaller footprint than the merchant databases. However, this is true for all the NewSQL databases.

Another NewSQL database is Xeround. This is a cloud-based offering with elastic scaling and NoSQL roots. Like all the other NewSQL databases it is focused on transaction processing. Apart from its inherent capabilities, the other major differentiator for Xeround is that it looks like MySQL, which makes it easy for existing MySQL users to port into the cloud.

There are two more NewSQL databases that are particularly interesting: NuoDB (previously NimbusDB) and JustOneDB. Neither has yet got to the point of general release with NuoDB just starting its beta program and JustOneDB shortly to do so.

NuoDB uses a distributed object architecture (as do many NoSQL databases) in a peer-to-peer environment that the company likens to BitTorrent. When you update a record you append the changes to the existing data rather than replacing it so you always have a historic data view of your database. The architecture involves the use of transaction nodes and archive nodes where the former is held in memory and the latter use key/value (NoSQL) storage for holding the data. With multiple archive nodes able to hold the same data there is no requirement to take back-ups and there is similarly no need to replicate data for high availability purposes and there is no need for partitioning. This will sound very like a standard NoSQL database. The big difference is that it has been designed specifically to support SQL and is fully ACID compliant.

JustOneDB is something else again. It too is fully ACID compliant (as is the Oracle NoSQL Database) and is designed to support SQL. In its case, the environment looks just like PostgreSQL and it runs in the cloud on Heroku. However, it uses a completely different storage architecture that the company refers to as tunnel storage. I am not at liberty to explain what tunnel storage means (I can say that it is neither column-based nor key/value based) but I have gone down into the weeds of the technology and it does make sense. Features include the fact that the architecture is implicitly aware of joins and it does not require either indexes or partitioning. One thing that is interesting about it is that it could potentially support query processing (for example, you never need to do full table scans) as well as OLTP though it is on the latter that the company is focusing. Again, JustOneDB always appends data and never deletes what was there originally. However, it differs from its competitors by a focus on scaling up rather than scaling out, at least for the present. That is, how you scale on a single server rather than across servers. The company’s view (a not unreasonable one) is that ideally you want scale out as late as possible and that’s why it is focusing on scaling up. Internal running of TPC-H benchmarks suggests performance improvements compared to standard PostgreSQL of orders (100X +) of magnitude.

NoSQL is interesting if you are keen on MongoDB, Cassandra or Hadoop but these are outside the realms of most people’s experience. NewSQL databases, on the other hand, live right where most people have spent most of their time and, to that extent, they are a lot more interesting and are certainly worth checking out.