Mixed query workloads – why it is a problem

In the previous article in this series I discussed why support for mixed query workloads is an important element within data warehousing. In particular, you need to be able to ensure that short running queries that support real-time operations really do run in a short period of time and that their performance is not impeded by longer running analytics and mining operations. In this article I will focus more on the issues that make it a problem, while in the final article I will consider the sorts of facilities that one might expect a vendor to offer.

The problem with many short running queries is that they are not, in fact, queries at all. They are often simply look-ups. For example, a call centre operative wants to know the total lifetime value for a customer that he or she is talking to (or, actually, the script that is based on that total lifetime value). Now, this is not something that you would normally calculate on the fly, as it is quite a complex calculation: so you would typically work this out for all customers (or for customers whose data has changed) in a weekly (say) batch run and then store this information so that it is available as required. So, what the call centre application has to do is simply to reference this data as it is needed. In other words, it is what, in a transaction processing environment, you would call a read. And this is why general-purpose databases like Oracle, DB2, and SQL Server (along with the vendors that run alongside these, such as Dataupia) have an in-built advantage here, precisely because these have these sorts of transaction processing capabilities intrinsically as a part of their database offering, whilst specialist data warehousing suppliers have had to add facilities to optimise this sort of data access, especially where they do not support indexes.

Let me explain this last point. It is not trivial to find a single record if you do not have an index or its equivalent (such as a column-based database). Thus DATAllegro, for example, specifically supports look-up indexes even though it otherwise eschews indexes. Of course, there is an alternative, which is to store the customer lifetime value in your CRM application and not in the warehouse at all. However, this pre-supposes that that application has that ability.

The other problem with short running queries is simply the fact that other queries may take up too much of the data warehouse resource, resulting in those short running queries not being short at all. There are, in fact, two problems here: the first is that heavy-duty queries of one sort or another might, without appropriate workload management, take over the resources needed for the short running queries. Secondly, the sheer volume of short running queries may create a problem in and of itself. In other words there are simply too many users (who may be actual users or might also be applications or business processes) for the system to handle, in which case you have a scalability issue rather than a mixed query workload issue, which is what applies when it is heavy-duty queries impacting on performance, which will be the subject of the last article in this series.