skip to Main Content

Data Warehousing

Last Updated:
Analyst Coverage: and

A data warehouse is a database implementation that supports the storage and analysis of historic data, either for the purpose of exploring what has happened in the past in order to understand that past, or as the basis for predicting what may happen in the future.

Whereas a database for Transaction Processing DBMSs is essentially a write many times and read many times environment, a data warehouse is a write-once read-many environment. There is therefore a particular emphasis on read performance. Data warehouses also differ from transactional databases in that they typically store much larger amounts of data and therefore scalability is a much bigger issue.

In practice, there are different types of data warehouse implementation. An Enterprise Data Warehouse has historically been seen as the central repository of all relevant data required for analysis purposes; a data warehouse (with or without the tag “enterprise”) supports operational queries (short look-up queries from such things as call centres) as well as business intelligence and analytics; and a Data Mart is used for analysis within a particular domain or department. The widespread deployment of data marts has historically meant that the control implied by the use of an Enterprise Data Warehouse has been difficult to attain.

Data warehouses and marts support the use of business intelligence, analytic, statistical and reporting tools that are either used to examine what has happened in the past or (increasingly) predict what is going to happen in the future. Often, a data warehouse will be designed to do both of these. In addition, an Enterprise Data Warehouse—but not a Data Mart—may support call centre and similar operatives who need to look-up customer information on a regular basis. This range of requirements puts an onus on the database to include features that will allow all of these different functions to operate in an efficient manner.

Because of the complexity involved, some suppliers in this space do not target the whole range of data warehousing requirements but specialise in particular subsets thereof. In particular, some vendors focus on data marts only and, in particular, on supporting complex analytics and statistics that go beyond normal run of the mill business intelligence environments.

Anyone who wants to understand what has been happening in order to inform future strategy, to predict future trends or actions or those interested in detecting and/or preventing nefarious activity of various kinds. Relevant managers and C level executives in any of the following cross-industry areas (amongst many others) should be interested:

  • Customer acquisition and retention
  • Customer up-sell and cross-sell
  • Supply chain optimisation
  • Fraud detection and prevention
  • Telco network analysis
  • Marketing optimisation

Apart from Big Data the major trend in the marketplace is away from the concept of a single Enterprise Data Warehouse that stores a ‘golden copy of the truth’, which is surrounded by data marts that are linked back to the Enterprise Data Warehouse. While fine in theory it is now increasingly recognised that this is impractical in practice, at least in organisations of any size. The concept has therefore arisen of the “logical data warehouse”. This is essentially an Enterprise Data Warehouse where the data is not in a single place but is distributed across multiple (heterogeneous) systems, including both data warehouses and data marts.

Enabling the logical data warehouse requires an understanding of where data is located throughout the environment and the ability to query across data repositories. Data virtualisation provides this capability. In addition, some data may need to be replicated across databases and there will also be a requirement to synchronise data where that is the case. Ideally, one would also like a tool that told you the best place to store particular data elements. There is no single suite of tools that will currently enable a logical data warehouse. It is likely that we will see support for homogeneous logical data warehouses before heterogeneous support.

The second significant trend within the data warehousing market is the use of in-memory and flash or solid state disks to support the functioning of the underlying database. While such facilities will improve performance they do not generally impact on the capabilities of the solution per se.

Over the last couple of years there has been a significant consolidation of the market: IBM acquired Netezza, Microsoft acquired DATAllegro, HP dropped NeoView and bought Vertica, Teradata is now the owner of Aster Data, and EMC acquired Greenplum (now Pivotal). It remains to be seen how successful HP and Pivotal are in this space, given that neither of them has a proven track record of selling software.

IBM is distinguishing between its traditional offerings and Netezza by marketing the former as for operational analytics (that is, as a traditional data warehouse) and the latter for Analytics (that is, as a data mart). Meanwhile, Oracle has released Exadata X3 and Oracle 12c is now available.

Almost all vendors now offer massively parallel solutions and those that don’t (notably Infobright and Actian) are on the verge of announcing relevant products. One vendor, illuminate, appears to have gone out of business. Amazon has recently announced its cloud-based warehousing offering, which is based on ParAccel (now acquired by Actian) technology. This acquisition of ParAccel must raise major doubts over its future, given that Vectorwise (the Actian data warehousing platform) and ParAccel were previously competitors.

Needless to say, all the vendors have jumped onto the Big Data bandwagon in one way or another and while big data solutions may be separate from data warehousing they will also often be complementary. Thus the ability to integrate closely with relevant big data solutions will be important.


  • Attunity (logo)
  • biGENiUS (logo)
  • EXASOL (logo)
  • IBM (logo)
  • Infoworks (logo)
  • Kx Systems (logo)
  • Magnitude Software (logo)
  • TimeXtender (logo)
  • WhereScape (logo)

These organisations are also known to offer solutions:

  • 1010Data
  • Amazon
  • BIReady
  • Calpont
  • Dataupia
  • Infobright
  • Kalido
  • Kognitio
  • Microsoft
  • Oracle
  • Pivotal
  • Sand Technologies
  • Starburst
  • Teradata
  • Tokutek
  • VectorNova
  • XtremeData
Cover for the Trivadis biGENiUS InBrief

Trivadis biGENiUS

Trivadis started to develop what is now biGENiUS in 2005, consolidating its efforts in the data warehousing automation space resulting in the launch of biGENiUS in 2018.
Cover for TimeXtender Discovery Hub

TimeXtender Discovery Hub

This paper discusses TimeXtender Discovery Hub, an automated, centralised data management platform for Microsoft environments.
Cover for What's Hot in Data?

What’s Hot in Data

In this paper, we have identified the potential significance of a wide range of data-based technologies that impact on the move to a data-driven environment.
The cover of SQL Engines on Hadoop

SQL Engines on Hadoop

There are many SQL on Hadoop engines, but they are suited to different use cases: this report considers which engines are best for which sets of requirements.
Cover for Managing Data Lakes

Managing Data Lakes

This paper discusses why data lakes need to be managed and the sorts of capabilities that are required to manage them.
Cover for TCO for Business Intelligence

TCO for Business Intelligence

This paper initially started as an investigation into comparative pricing for BI solutions and then evolved into a consideration of TCO for such environments.
Cover for IBM DB2 with BLU Acceleration onPower Systems: how it compares

IBM DB2 with BLU Acceleration onPower Systems: how it compares

We consider the relative merits of IBM DB2 with BLU Acceleration running on Power Systems as compared to HANA, Exadata and SQL Server on x86-based platforms.
post (Icon)

SAP HANA update

An update on SAP's in-memory database
post (Icon)

In-memory? That’s so yesterday!

DB2 BLU doesn't just in-memory techniques but also L3 cache
post (Icon)

Kognitio: clarifying misunderstandings

There aspects of Kognitio and its offering that are sometimes misunderstood, so I thought I should clear some things up.
Cover for IBM PureData System for Operational Analytics vs Oracle Exadata X3

IBM PureData System for Operational Analytics vs Oracle Exadata X3

The basic theme of this paper is to provide a comprehensive comparison of IBM's and Oracle's offerings for large-scale traditional data warehousing environments.
Cover for Analytics in Telecommunications

Analytics in Telecommunications

In this paper we will describe some of the key analytic areas that are important to telecommunications companies and discuss the argument in favour of using third party analytic application providers.
Back To Top