Database Management Systems
Analyst Coverage: Philip Howard
A database management system (DBMS) stores data and provides facilities for managing that data. Modern (relational and later) database technologies are based on a separation between the logical representation of the data and its physical instantiation so that one can be changed without affecting the other. It is often (not always) the case, however, that the physical storage model mirrors the logical representation.
A major feature of the management aspect of many databases is that they support declarative access to the data: that is, they allow programs to state what data they want to access without defining how that is to be achieved. This is primarily supported through the use of SQL (structured query language) although there are other declarative languages. In order to support best performance when accessing data via SQL, many database management systems include an optimiser, which can re-write poorly written or generated code and can determine how best to execute any query.
There are a number of different types of DBMS that have evolved over many years (and some comparatively recently) to handle different types of data or to provide alternative ways of handling the same data. A DBMS differs from a file system mainly in that it offers data-analysis-based structures that aid data usage and facilities for replicating, reorganising and recovering data; as well as facilities for tuning data access efficiency.
Large organisations typically have many databases installed, of various types, which may be of greater or lesser complexity and sophistication. For most of the past 30 years the relational database has been the standard database model adopted by most enterprises, although a rump of vendors offering alternative approaches have clung on to some level of market share. The last decade, however, has seen the emergence of new approaches, first in data warehousing and, more recently, for transaction processing.
Historically, SQL has been the de facto standard for use in conjunction with database technology even where the technology involved was not relational (for which SQL was designed). However, a challenge to this approach has now appeared and many programmers, who dislike the constraints of SQL, have embraced this development with glee. The introduction of these new types of “NoSQL” database represents both opportunities and challenges for organisations adopting them. Used judiciously they can support new types of application as well as types of data that are not well-suited to a relational model. There are also significant potential cost savings and efficiency benefits. However, there are a plethora of options and selecting appropriate solutions, both for today and tomorrow, poses a significant challenge.
In addition to the direct implications for enterprises there are also important considerations for software houses and ISVs (independent software vendors) since these new database models allow the development of new solutions. As far as end users are concerned there is typically little concern about the underlying database for an application provided that it requires virtually no administration – that it is hidden from the user – so ISVs have less concern about needing to base their solutions around any of the leading merchant databases. A particular area of interest for ISVs is for what are known as “breakthrough” applications that are hybrid operational/analytic applications that are typically based on machine-generated, instrumented or sensor-based data.
DBMS is central to most business operations
- Programmers should care – because the DBMS delivers useful (reusable) management and integrity functionality they don’t have to write. Having said this, it is commonly the case that developers do not make use of these capabilities as much, perhaps, as they should. it is also worth noting that the availability of a database optimiser is not an excuse for writing poorly coded SQL.
- Business people should care – firstly because it is the database that stores the information that enables the operation of the business on an on-going basis and, secondly, because the DBMS delivers integrity and reliability – and managed rollback after errors. Indeed, all leading database vendors now offer high availability (protection aginst unplanned outages) and many offer continuous availability (protection against both planned and unplanned outages).
- The CEO should care because managed data is a core asset of the organisation and for the same reasons that business people in general should care.
The database management market is a particularly exciting place at present and it can be broken into three broad categories: transaction processing, data warehousing, and big data. Each of these is discussed in more detail on the relevent pages.
The major vendors are divided as to the best approach to database software. Oracle, for example, promulgates a single shared disk solution (currently Exadata X3) for both transaction processing and data warehousing environments; while IBM, by way of contrast, offers different versions of DB2 for these environments, using a shared disk architecture for transaction processing and a shared nothing architecture for data warehousing.
In addition to this debate, rifts have appeared within both transaction processing and data warehousing environments as to which is the best approach. For example, within transaction processing there are a number of so-called NewSQL vendors that now compete with traditional technologies and there are even NoSQL databases that support transactional processing without using SQL at all; plus (of course) NotOnly SQL databases support a mix of query languages, including SQL.
Within the data warehousing community there are similar questions about columnar versus row-based relational tables; the rise of in-memory databases, the use of flash or solid-state disks (which also applies within transaction processing), clustered versus no-clustered solutions and so on.
And then, of course, there is “big data”. To be clear, big data does not necessarily mean lots of data. What it really refers to is the ability to process any type of data: what is typically (but erroneously) referred to as semi-structured and unstructured data as well as structured data. Current thinking is that these will typically live alongside conventional solutions as separate technologies, at least in large organisations, but this will not always be the case.
There is considerable acquisition activity in the DBMS space, after years of comparative inactivity, as major vendors such as IBM and Oracle buy up innovative technologies (and their customers) to add to their current stable and effective, but somewhat old fashioned, offerings. Most of these acquisitions have been limited to the data warehousing space and it has seen new entrants into the market, notably EMC (acquiring Greenplum: now Pivotal). Most recently Actian (previously Ingres) has acquired Versant, Pervasive Software and ParAccel.
In terms of existing vendors the most important recent changes have been the introduction by Oracle of Exadata X3, the release by IBM of its PureData family of products and the ongoing development of SAP HANA. Also important is the support for a graph store in IBM DB2, which means that the product now supports three different storage engines: relational, XML, and the graph store. Teradata has done something similar with its polymorphic storage for Aster Data. We believe that these are important developments that represent the future for database management systems: historically it was one storage model and one management layer, increasingly it will be multiple storage models but still with one management layer.