Transaction Processing DBMSs
Analyst Coverage: Philip Howard
Transaction processing means dividing information processing up into individual, indivisible operations, called transactions, that complete or fail as a whole; a transaction can’t remain in an intermediate, incomplete, state (so other processes can’t access the transaction’s data until either the transaction has completed or it has been “rolled back” after failure). Transaction processing is designed to maintain database integrity (the consistency of related data items) in a known, consistent state.
A transaction, a typical example of which would be a customer order, consists of a series of events (accepting the order, allocating stock and so forth) that are treated as a whole. While theoretically transaction processing is a subset of event processing the integrity considerations described above mean that it is treated as a separate function.
Transaction processing databases are databases that have been designed specifically to optimise the performance of transaction processing, which is often referred to as OLTP (online transaction processing).
Relational database management systems (DBMSs) were originally designed to support transaction processing. An example of a transaction would be processing a customer order and all the ancillary requirements that go along with that, such as decrementing inventory, arranging shipping, placing manufacturing orders and so on. Transactions may also occur in other environments, for example between an organisation and its staff (HR applications), in marketing, production control and elsewhere. Typically, but not always, these types of applications have been implemented on top of relational databases.
However, relational DBMSs are not the only vehicles for transaction processing for historical reasons as well as (sometimes) for valid technical reasons. In addition to transactional applications based on purely relational data, relational databases have subsumed XML and object-oriented capabilities over the years so that hybrid transactional environments that require the use of these technologies alongside relational data also fall into this category though pure-play XML and object-oriented databases remain part of the market for specialist applications.
In addition, there are still many companies that rely on older, hierarchical DBMSs for OLTP (on-line transaction processing) partly because of the difficulties involved with migrating away from those platforms and also partly because these types of databases are very efficient for this type of workload. There are also large numbers of companies that rely on multi-valued (sometimes called NF2) databases for applications that have typically been provided as a package by a third party.
Apart from performance and other generalised considerations the major requirements for transaction processing databases are so-called ACID properties and, typically, support for XA-compliant two-phase commit (where an XA compliant driver is a driver that can participate in an XA compliant transaction as defined by the X/Open specification). The former are properties that ensure that transactions are processed reliably and the latter specifically supports atomicity (the A in ACID), ensuring that all transactions are either completed in full or rolled back. Note that there are a variety of XML, NoSQL and Graph databases that are both ACID and XA compliant.
The short answer is everybody. Every organisation deals with at least some of customers (or clients, patients and so on), partners, suppliers, staff, stock and so forth and transactional databases are the bedrock upon which applications supporting these (often commercial) relationships are based.
However, in recent years, some low-quality web commerce applications have emerged which do not employ transaction processing – which means that if anything goes wrong, some past copy of the database has to be restored, commercial transactions may be lost, payments may be processed twice and remediation often relies on the good-will of customers contacted manually. Obviously, this is not really “fit for purpose” and to avoid this you really need, at least, a database that supports ACID and 2-phase commit (plus, of course, good application design).
In essence, transaction processing is about not having to say “sorry”….
In addition to the historical alternatives to the traditional relational model, this approach is now being challenged by so-called NewSQL database vendors, selling products that use the latest technologies to streamline their databases by removing support for legacy, and now unneeded, processing requirements. The argument these companies make, and it is a valid one, is that merchant relational databases have been evolving for over 30 years and they have acquired a lot of baggage over that period, primarily because of the need to keep compatibility with previous versions of that database. This means that the database code has (probably) millions of lines of code that are no longer necessary for today’s processing environments. For example, the provision of row-level locking was a major requirement for relational databases back in the 90s because of the contention that table locking caused but with modern technology that is no longer such an issue and the code needed to support row-level locking is nowadays redundant. In other words, long-established databases are bigger, less wieldy, more complex, more likely to break and more expensive than they need to be. NewSQL databases, which all support SQL as a processing language but which do not necessarily store data in a relational manner (though it will look as if they do to the user), are now being actively developed and marketed as an answer to these issues.
A further major trend is towards the use of in-memory databases and solid-state devices. Caching has been with us as a technique for many years but greater memory capacities are making this more important. While in-memory databases are more widely applicable in data warehousing environments the use of distributed in-memory data grids is a technique that may be used to speed up transaction processing. A further consideration is the use of flash or solid state disks (SSDs). However, these remain relatively expensive and are subject to write wear (they slow down over time as you write to them). If used, SSDs should be attached via a PCI port as otherwise they will be slowed down by the disk controller.
Apart from the introduction of NewSQL databases the most notable recent change has been the introduction by IBM of its pureScale and now PureData solution for transaction processing. Historically, DB2 (upon which these are based) has had a shared nothing architecture on Linux, UNIX and Windows platforms. This is great for data warehousing but less than ideal for transaction processing, for which a clustered shared disk architecture is preferable, and that is what IBM introduced with pureScale.
A major change has been the acquisition of Sybase by SAP. SAP has stated that Sybase ASE will, in due course, run SAP applications. Once that is the case it is likely that there will be significant marketing put behind ASE which should reverse the product’s previous decline in popularity. This may also adversely impact on IBM, Oracle and Microsoft.
Another significant change is the acquisition of everyone’s favourite Open Source database, MySQL, by Oracle; without, apparently, compromising its Open Source credentials. A word of warning here: MySQL has been used for non-transactional commercial systems (characterised by making people retype completed interactions after a crash; duplicate delivery and so on). MySQL has a version that does support transaction processing effectively but it also sells versions of the DBMS (effectively, just “smart filesysytems”) that don’t.
Finally, while NoSQL and OLTP have not traditonally been associated with one another (except for some older products that pre-date the NoSQL buzz) that has recently changed with the intorduction of FoundationDB, a NoSQL database with both ACID and XA compliance. It will be interesting to see of others follow its lead into this market.
The following companies offer solutions:
Further resources to broaden your knowledge:
What’s Hot in Data
In this paper, we have identified the potential significance of a wide range of data-based technologies that impact on the move to a data-driven environment.
Database performance management - the importance of time-based analysis
This paper discusses the various issues that surround database performance monitoring and how we believe that they should be addressed