dashDB

Written By:
Published:
Content Copyright © 2015 Bloor. All Rights Reserved.
Also posted on: The IM Blog

IBM’s dashDB was released a little over a year ago. At that time it only had a capacity of 20GB. Something to play with, something to try out, but not really very useful. At the back end of last year capacity was increased to 1TB: rather more useful but still not exactly exciting in the context of big data. Now, however, IBM has released an MPP (massively parallel processing) version of dashDB which supports up to 20TB and the company expects to extend this capacity further by the end of the year.  Now this starts to get interesting.

So, what is dashDB? From a business perspective it is a managed data warehouse running in the cloud, as a service. It is particularly notable that it integrates with IBM Cloudant, which is IBM’s cloud-based JSON database. If you want to be able to analyse JSON documents from Cloudant you simply click the “create warehouse” button and the software will automatically shred the relevant documents, determine an appropriate schema and create a dashDB instance. Neat. Once created, the emphasis is on analysing the data using R (an R IDE is included), although there are other options available. Integration with IBM DataWorks is planned to support data preparation.

From a technical point of view dashDB is a combination of DB2 with BLU acceleration plus the in-database analytics of IBM PureData System for Analytics (Netezza). Both IBM Guardium (for database activity monitoring) and IBM Optim (for data masking) are or will shortly be available as a part of dashDB for security purposes to protect sensitive information. The product leverages on-disk encryption technology from DB2 and plans to leverage the capabilities of the recently acquired Aspera for high speed transfer of data.

What is especially interesting about this is that the SQL used by DB2 and the SQL used by Netezza are not the same but the team behind dashDB is driving compatibility between the two, using the same techniques that the company has previously used with Oracle compatibility (and dashDB includes Oracle compatibility). Currently, the compatibility is up to 84% and the company expects it to be over 90% shortly. If this approach follows the same path as Oracle then we would expect it to approach 99% in due course.

This has broader implications. The dashDB team is not tied to the release schedules of DB2 or Netezza and, in particular, it is driving BLU code going forward. At some point in the future we can therefore expect that the SQL incompatibility between DB2 and Netezza will more or less disappear but there are other implications: for example, DB2 BLU does not currently run on MPP implementations (it is SMP-based only) but the fact that dashDB is now running MPP systems implies that DB2 cannot be far behind.

Finally, dashDB is a part of IBM’s Cloud Data Services group, along with Cloudant. Also in the group’s portfolio is a recently launched graph database, running on IBM BlueMix (as do both dashDB and Cloudant). At present this graph database is experimental only, so you can try it for free. Once it gets formally released as a product – assuming it will be – I will review this in more detail.