Latency: an IT Governance story?

To my mind, IT governance is ultimately about having fact-based confidence in achieving desired business outcomes, and these often depend on addressing technical issues. If the desired outcome, for example, is an effective and profitable single dealer platform, which allows a financial institution’s customers to deal with it for a range of trading assets (FX, credit, equities and so on) through a single internet application, one of the issues that must be addressed is latency.

So, what is latency? Well, its not the same as response time, which it is sometimes confused with—you can have sub-second response times for, say, pricing information and still see a price on your screen that is out of date, because of latency issues.

Broadly speaking, latency is the time taken for changes in data to reach you from wherever the change is made (whereas response time is usually the time taken to get data out of a local store or cache). Response time is to do with the system supplying something (even a “please wait” message); latency is about it supplying the correct information at a point in time. And latency is is also about consistency; a consistent 250 msecs delay can be managed; an average 250 msecs delay, with delays occasionally stretching to seconds, is a much bigger problem (and life being what it is, you can bet that the delayed transaction is an important one).

There will always be some latency—no message can travel faster than the speed of light (until quantum entanglement can be exploited, that is), which imposes a measurable delay, and every network node a message passes through imposes a rather more significant delay on top of this. However, in a system dealing with human traders, the time taken for a human to make a decision and press a key imposes a sensible limit on how low latency needs to be. This isn’t necessarily the case for “high speed trading” between computers but the issues are similar, just orders of magnitude more difficult to deal with, and I won’t talk about them here.

Imagine a market moves (perhaps someone dumps a large quantities of stock in in Hong Kong) at 12:00:00 GMT. Immediately, the price of the security changes, as a result of the deal, in Hong Kong; and financial databases around the world then start to synchronise on the new price. However, this will take some time.

Suppose the news of the new price takes a second-and-a-quarter to reach London—probably unacceptable but not unthinkable. Now, what happens if a financial institution’s client in London purchases the security in London at 12:00:01 GMT, on the basis of the price on his/her screen at 12:00:01 in London, before the 12:00:00 change arrives. What price does he pay—the price actually in effect at 12:00:01 or the 11:00:59 price, before the Hong Kong deal, which is what he/she is seeing in London?

Well, it depends. If the transaction is “request for quote”, the financial institution will recheck the price before making the sale and usually accept it if the price has not moved or moved in the financial institution’s favour and reject it if it has moved in the client’s favour. If latency is a problem, the client will be making buying decisions based on out-of-date information and probably, as a result, experiencing a lot of rejected trades—resulting in an unhappy customer (even more so if the customer finds out that he/she lost out significantly on accepted trades). However, many institutions would like to move to a “click to trade” system, where the trade is made immediately without further checks and the financial institution honours the price its client saw on his/her screen, because this places fewer barriers in the way of trading and should increase business. Now, the financial institution is much more exposed to latency as it has to complete the deal even if the price has moved against it and, with a big securities deal, the difference in price might be many thousands of dollars. So, the financial institution will monitor latency and will probably grey out the “click to trade” option if latency increases beyond about 400 msec, say, which is of the order of magnitude of the brain’s response time. Now we have unhappy customers again and if the “click to trade” option is greyed out often enough, the financial institution also, in effect, loses a potential sales channel.

So, why is latency a problem now, when it should be rather smaller than it used to be when communication was by telegraph and telephone? Well, Internet communications have increased customer expectations (although Internet latency is far from deterministic) and the importance of electronic channels; and strengthening regulation (MiFID, in retail equities trading, is an example of the way regulators are thinking) are increasing transparency. Badly managed latency does not increase customer confidence, is becoming harder to hide and, in extreme cases, could attract the attention of the regulators.

As I said, some latency is unavoidable but it is easiest to manage if it is consistent, across channels, across time and as workload increases. And, the smaller the absolute latency is, the less any remaining inconsistencies will matter. The worst case is a channel with significantly worse latency than alternative channels carrying the same information (leading to possible arbitrage opportunities), where the actual latency experienced by any particular transaction varies widely and increases unpredictably as the channel approaches capacity.

Managing response time is relatively straightforward (chiefly, avoiding design bottlenecks and providing plenty of cache). Latency is much harder to manage, quite apart from the speed of light limitation. Latency can be introduced by network nodes (switches, routers and so on), Internet routing problems, overloaded databases, security technology (firewalls) and even manual authentication processes. As long as it is deterministic and known, it can be allowed for, but if random messages experience very high latency the business service becomes unmanageable.

What this all means is that latency must be explicitly addressed in the design of any computerised system where it might be a factor. It can’t simply be left to chance—buying fast computers and fast networks is no guarantee that latency issues won’t arise when the system is overloaded or hardware fails—or even with certain combinations of transactions.

One useful approach is to adopt an inherently low-latency framework, which manages communications latency for you and provides an API on each side of its channel. So all your programmers have to worry about is the value-add your processes deliver to your customers (and latency outside the communications channel), not about the skilled and specialist job of designing low-latency infrastructure.

An example of such a framework comes with Caplin’s Xaqua, an Internet-based single dealer platform financial hub for exchanging trade messages and market data with subscribers, that is network agnostic and tunnels through proxy servers and firewalls automatically. Caplin Liberator is the component of Caplin Xaqua which provides two-way low-latency web streaming of financial data.

Caplin supplies benchmaking tools suited to streaming applications—its inherently low and consistent latency comes from its streaming design, although its absolute performance depends on the power of the platform it’s running on, of course. Caplin Benchtools can create multiple concurrent client sessions subscribed to real-time updating objects—the same load conditions that real end-users experience when connecting to Liberator over the Internet—and test the sort of persistent HTTP streaming connections that most HTTP load testing tools have difficulty with.

Depending on the workload characteristics, end-to-end latency ranges from about 50 msecs to about 250 msecs. using sensible hardware. However what is interesting is that Caplin claims to show that latency on its platform remains just about constant as client numbers increase up to the point where capacity is reached (where latency goes through the roof) and the actual latency experienced clusters around a single value. The distribution of legacy and the way it changes with load is more important (within reason) than the actual values and achieving these characteristics is non-trivial. Hence the advisability of using a tried-and-tested framework and the need for tools which can simulate the workloads you are likely to encounter.

So, we have here is a situation where a technology solution to managing latency can be seen as part of “good IT governance”, and has to deliver compliance with internal policies for “acceptable latency”. However, failing to implement good governance in this area isn’t about failing to check a box or meet some industry good practice standard or even about the possibility of annoying some regulators. It’s about having increasingly unhappy customers and being unable to implement an innovative channel to market that could deliver more customers and higher profits.

IT governance can be seen, in part, as giving the business confidence that its technology can support, in this case, its business Internet trading strategies and vision, at the business level.