Data warehouses and databases, alongside business intelligence tools, are designed to produce exact answers to whatever questions you care to ask. This even applies to big data environments such as Hadoop. However, do you always need to know the exact answer to any particular question? For example, if you sell thousands of a particular product every day, do you really need to know exactly how many you sold last month? Surely, the answer is no: you can quite happily round to the nearest hundred, thousand or tens of thousands.
That being the case, why do you have a data warehousing or analytic environment that insists on calculating exact answers for you all the time? The short answer is that is what vendors offer you. What if—and it is an important if—an approximate answer could be provided in significantly less time and with a lower resource requirement and deliver the answers you need? Moreover, what if the reduced resources needed for these types of queries could be freed up for those analytic processes that do require precise answers, thereby improving performance for these also?
This paper examines approximate query processing (AQP) as an approach to particular types of query where approximations are appropriate. AQP has been the subject of considerable research for about the last 15 years, but has largely been confined to academia with relatively little appearing in commercial products to date. This paper argues in favour of more AQP capabilities, at least as an option. We will discuss both AQP as a generic set of capabilities as well as use cases where it may be especially useful.