An emerging network management headache: analysing ever-faster networks

Written By: Peter Williams
Published:
Content Copyright © 2008 Bloor. All Rights Reserved.

Talking about
protocol analysers is not really my beat—except that they do provide useful information
for troubleshooting faults to support network management (which is my beat). However,
a major issue concerning network speeds and real-time analysers should be receiving
more management attention.

In
mission-critical environments, the ideal analysers are those which can capture
all the data all the time by working in real-time; then, if a problem occurs,
the network specialists can trawl for exactly what passed along the line at the
point of problem. Otherwise, they could be faced with trying to figure out,
then recreate, the problem in order to analyse what happens—a very hit-a-miss
affair.

However (in case
anyone hadn’t noticed), the more data that has to travel over networks the
faster the networks have had to become to cope. So, in the case of Ethernet, we
have gone from 10Mb/sec to 100Mb to 1Gb to 10Gb (so a 1000 times as fast as
10Mb) in a few short years.

The problem for
these ‘real-time’ protocol analysers, of which there are only a few, is partly that they have had to keep up.
This has, for instance, meant upgrading from software-only to purpose-built
plug-in hardware appliances. Yet, even if they do keep up, the amount of
information they collect in a very short time is multiplied, leaving the
network specialist with a tougher task trying to see the wood for the trees to
pinpoint the problem.

The market-leading
protocol analyser is called Sniffer (nowadays owned by Netscout after its
recent purchase of Network General). Yet, even in its hardware-software
appliance format, Sniffer cannot yet cope with 10Gb Ethernet in real-time. The
product nearest to achieving this at present is from Network Instruments. To do
this, its Observer software is supported by a dedicated capture card designed
from scratch for throughput and its GigaStor disk technology that incorporates
daisy-chained SATA RAID arrays to which the data is written in parallel to keep
up with storing the data at the speed received.

Of this
combination, Ian Cummins, Network Instruments’ VP of EMEA, told me: “It is
completely happy with 100Mb and 1Gb Ethernet, but 10Gb is a challenge.”

However, the
challenge he is referring to is not that it cannot keep up with the speed of
data flow; it is probably the only real-time product on the market which can, Sniffer
notwithstanding. It is that, as the speed of the network has multiplied, so the
analysis has become more complex.

All such analysers
use retrospective network analysis (RNA) software to trawl through the captured
information to identify possible problem points. But suppose, at 100Mb, the
analysis finds two potential causes of a glitch in a given time-span; at 1Gb
this may multiply to 10 and at 10Gb perhaps 30, all needing deeper
investigation. In other words, the faster the network the more difficult it is
to pinpoint the problem when a fault occurs—and the fastest networks are typically
those that run the most mission-critical tasks.

The longer term
management issue is that networks will inevitably get even faster and, even if
the appliances can be upgraded to keep up, the complexity in pinpointing the
problem will only get worse and resolving problems will tend to take longer—when they need to take less time!

Nor is this the
end of the story. This difficulty is multiplied when Voice over IP (VoIP) and
data traffic are mixed, since the software has to be able to separate out the
two different streams and, even after that is done, a fault caused by one may
manifest as a problem in the other. In Network Instruments’ own annual user
survey on networking, it found the number of VoIP users had grown by 5% in a
year (from 61 to 66%).

A further factor
is that VoIP traffic is not verified to the same degree as other data, so
adding VoIP tends to introduce more rogue packets so potentially multiply the
error count and making pinpointing problems even tougher.

So what’s to be
done? Cummins explained that Network Instruments is now working hard on the
analysis for 10Gb. The main approach Network Instruments is taking is to
provide an overview of the potential problem sources in less technical form,
then provide a drill down capability to get into the fine detail for each.
Parameters can also be set that will screen out acceptable “errors” to assist
in seeing the wood for the trees.

This is a sound
approach which other protocol analyser providers would do well to follow. Yet I
doubt that, come the next hike in network speeds, this will be enough.