Reader’s assistance required - I need help with use cases for query approximation

Written By:
Content Copyright © 2013 Bloor. All Rights Reserved.
Also posted on: Accessibility

I am currently researching query approximation. One of the interesting points about business intelligence and analytics – but not discovery or machine learning algorithms—is that they are designed to provide you with accurate answers. Which is fine. But it isn’t always necessary. For example, you may need to know that this particular store sold 5,891 of this particular product last month for sales reporting reasons but if you are doing trend analysis you don’t really need to know that it was 5,453 last month and 4,918 the month before: 4,900, 5,500 and 5,900 will be sufficiently accurate for most purposes.

Now, getting these approximations could save you a little in computing power, given the right functionality, but not much. However, there exists a variety of environments where you need to iteratively query the data. For example, suppose that you are the Chancellor of the Exchequer: you want to know the effects of a possible package of tax cuts, new taxes, incentives and so on and you will continue to iterate through a range of possible scenarios to arrive at a short list of possible candidates to implement. Of course, by their nature, the outcomes you calculate will be approximate but whatever data warehouse you have won’t know that so it will run a comprehensive query for each possible combination—each of which will take a prolonged period of time to calculate. So, this becomes a lengthy process.

Unfortunately, businesses don’t have the months that chancellors have to prepare a budget. So, what usually happens is that people resort to manual processes, perhaps assisted by spreadsheets, to run through all the options. However, this isn’t a short process either.

The idea behind query approximation is that you can get a rough answer, together with an appropriate confidence level as to its accuracy, in just a few minutes, rather than waiting for the hours or days that fully accurate queries might take to run and, as a result, you can iterate your queries very much faster. In effect, it brings “agile” to query processing.

Now, here is where I need help from readers. I would like suggestions for use cases where this sort of approach would be useful. I have one or two already, for example in AdTech, but I would really like some more: so please post any suggestions here—thanks!