Teradata Partners Oct 2007 – Part 2.

The 3875 attendees at the Teradata Partners User Group conference in Las Vegas this year were expecting more than just product announcements—and there was some serious DBA-level education going on, especially around the Active Data Warehouse (ADW) concept, supported by the Teradata 12 Enterprise Data Warehouse (EDW) database.

Steve Brobst, Teradata’s CTO, did a good job of demonstrating that Teradata has the technology to cope with a mixed-workload ADW, without resorting to marketing speak and product pitches. Well, not directly—Teradata was the assumed platform, of course and, as expected, everything was in the context of a centralised data warehouse, fed from virtualised operational data stores and, possibly, accessed through virtualised data marts for specific applications. This layered approach, with the EDW in the middle, should minimise data management and movement overheads by minimising (non-automated, unmanaged) duplication.

The Teradata technologies Brobst covered in a long session on the first day of the conference included:

Dynamic resource allocation—backed up by last-resort “governors” to prevent rogue queries hogging resources if resource allocation goes wrong. Governors are not a good approach by themselves as they rather wastefully kill queries after they have consumed excessive resources. Partitioning of resources assigns non-overlapping sets of resources to different classes of workload, with different “Service Level Goals”, and without replicating data. This depends on the sophistication of TASM (Teradata Active Systems Management) and Teradata’s ability to share strategic results with tactical decision-making queries—and is the sort of thing where the devil is in the detail. No doubt, competitors can claim something similar at a high level, but Teradata’s resource management looks particularly effective when you dig down to the details.

Teradata’s query optimisers. In a normalised relational data store (which Teradata is, more or less), performance should come from automatic query optimisation—an area where many relational databases actually perform rather badly (Oracle prior to Oracle version 7/8 is a famous example). Often this doesn’t matter much because most customers don’t really run particularly demanding queries/transactions—but some do (which is probably why mainframe DB2 still has a market). Teradata has a particularly good optimiser which can rewrite queries for efficiency and has excellent EXPLAIN facilities to tell you what a query actually does—it is also smart enough to select the appropriate index when it has a choice of local (within partition) and global indexes. Its Dynamic Query Manager can assess all incoming queries against business rules (concerning time of day, expected size of result set, concurrent usage, user id etc) and reschedule difficult queries for minimal resource impact. The net result is that in a Teradata installation the hardware can be near 100% utilised with minimal impact on service levels—as with traditional mainframe job schedulers, you can maximise throughput (rather than just performance) for a complex and changing workload. Teradata even has a Workload Analyser tool, to make sure you set this all up effectively.

Teradata has been able to partition its data by a particular attribute, such as date by week, for some time (this is its PPI, Partition Primary Index, feature). This means that if you want to summarise data for a particular week, you only have to read the data for that week, not the other 52, which significantly increases the retrieval efficiency. With this latest release, however, multilevel partitioning is possible—by date and country say, for even greater retrieval efficiency.

“Cylinder reads” are a particularly interesting innovation. ADW queries tend to retrieve specific data from a few entities, making small blocksizes efficient; whereas traditional data warehouse applications scan huge volumes of data and perhaps read entire tables, which works best with large blocksizes. When appropriate, Teradata can now read an entire cylinder of data at a time from a small blocksize data set, for just about the best of both worlds (although there’s a small amount of system data overhead in the cylinder read).

That all said, however, and it does help convince us that there is serious technology capability behind Teradata’s claims, there is usually more than one way to skin a cat. Teradata may be facing real competition from other technologies taking different approaches to the mixed workload and near-real-time EDW issues (see Part 1 of this conference report). However, even so, there is more to choosing a mission-critical EDW than ticking technical capability checkboxes. As the publicity around ITIL v3 is pointing out to businesses, what the IT department should increasingly be expected to deliver (or, at least, contribute to) is a “successful business outcome”—and here, provenance may be important.

Teradata has demonstrated its capability to deliver and support mission-critical EDW applications that are successful at the business level, albeit not particularly cheaply (purely considering cost of technology acquisition), and that may count for a lot with many customers. For instance, the “logical data models” (such as the Healthcare LDM) it sells for particular industry domains (which are customised by its customers and then translated into a physical Teradata implementation) may go a long way towards reducing the gap between what the technologists build and what the business really wants (or, rather, needs). That gap may be where the major technology risk resides and may be what really costs the business money.