Mixed query workloads – making it work

In the first two articles in this series (linked here) I have highlighted the importance of being able to manage mixed query workloads and why it is an issue. In this article I will focus on the techniques that you might use to support this sort of functionality.

The first, and perhaps most obvious, technique is to provide support for prioritisation. At the lowest level this may simply involve prioritisation for short running queries (as provided by Netezza, DATAllegro et al), though there will have to be some way of specifying what a “short running query” actually is. Ideally, you would like to be able to say that, for example, a short running query is any query that runs, or potentially runs, in less than 3 seconds and the database should be able to recognise any queries that potentially fall into that category and then treat them appropriately.

More sophisticated tools will take prioritisation further and allow the allocation of a variety of different priorities to different queries. However, this has the disadvantage that it requires administrator input, monitoring and tuning to ensure that priorities are continuously optimised. Unless a substantial amount of automation can be applied to this process, this may preclude appliance (and other) vendors that are trying to keep things as simple (and inexpensive) as possible, from using this approach.

A second approach adopted by some vendors is with respect to the allocation of resources that any query (or query type) may use. Here it is possible either to impose a ceiling or a floor, so that a certain query cannot use more than a certain allocation (or proportion) of available resources or, conversely, that it will be guaranteed appropriate resources. Again, however, you really want to automate this as far as possible and, further, you really need some sort of correlation between the resource allocation and the prioritisation of queries (whether that is just on a short running query basis or something more complex) so that the administrative overhead does not become too great. As an example, HP NeoView includes a capability called Adaptive Segmentation that allocates only a limited number of processing nodes to short running queries because such a query simply cannot take advantage of massive parallelism: it would be silly for a look-up query, for instance.

Another possibility, which can be used either separately or together with the previous techniques, is simply to provide capabilities that speed up the processing of short running (and other) queries in order to ameliorate the mixed query workload issue in general. In other words, try to take the management aspect of workload management out of the equation, thereby reducing administration overheads. Support for look-up indexes is one example of this. Another would be to offer a range of facilities, as Dataupia does, for example, with multi-dimensional aggregates (with pre-calculation and refresh capabilities) simple look-ups (large tables requiring few joins or large set operations) and complex queries (fully normalized data models) all using different data access methods to support simultaneous queries across a wide range of query types.

The bottom line is that suppliers have a careful balancing act to follow between providing necessary facilities on the one hand and imposing an additional administrative burden on the other. My guess would be that the appliance vendors will typically lean towards minimising the latter and therefore focus on those facilities that are most amenable to automation. This will perhaps mean that the likes of IBM or Teradata will offer some additional capabilities but at the expense of a greater administrative burden. Whether you see that as a good thing or a bad thing will depend on your perspective.