
Home > News >
Approximately 80 minutes of extreme latency [ServePath Data Centre]

Approximately 80 minutes of extreme latency
[ServePath Data Centre]
by John Keagy, President ServePath, 6 February 2003
ServePath experienced approximately 80 minutes of extreme latency early
this morning as a result of a hardware failure in our border router. The
GigabitEthernet adapter on the router failed, resulting in the loss of
one of our connections. Traffic was temporarily re-routed until this equipment
was replaced, causing further intermittent latency that is now resolved.
ServePath is growing quickly and I'd like to fill you in on our current
efforts in building out our network infrastructure to improve reliability
and scalability. We have had a series of network incidents recently and
we have taken action to defend against them in the future. We are fortunate
to be located in Silicon Valley where some of world's most elite networking
talents and Internet pioneers can provide on-site management of our network.
The incidents began January 24 with the slammer / sapphire virus. The
initial outbreak of this SQL virus was contained quickly by our staff,
especially when compared to many other larger networks which experienced
significant downtime.
Unfortunately, the sapphire virus is still alive and well on the Internet,
and we've had recurring cases of our customers improperly deploying Microsoft
SQL on their equipment and creating a massive packet storm within minutes.
We've also had inbound and outbound syn floods and distributed denial
of service attacks. Here is our five step plan to improve our network
to ensure reliability in the future:
-
New "sniffer" tools have been deployed to identify compromised
machines and disable them. These tools also help us to identify incoming
attacks and block them from our upstream IP transit providers and
peers before they even get to our border router.
-
We are increasing the granularity of our internal network to the
maximum extent possible to isolate LAN broadcast domains so that packet
storms from one domain do not interfere with others. This project
will be complete by February 14.
-
We are installing additional routers and switches to improve redundancy
and further mitigate the risk of hardware failures like the one that
we experienced this morning. This hardware implementation will be
complete by February 19. We do maintain spares on-site for all equipment
in addition to a 4 hour on-site replacement contract with Cisco.
-
We are wiring all of our switches to a second internal network so
that we can still manage them if a packet storm has saturated the
switch uplinks. Sapphire has set new records for speed and so our
previous MRTG-based toolsets are being replaced. This will be complete
February 12.
-
The Cisco Internetworking Operating System (IOS) on our router is
being upgraded tonight. Although we have found our current version
to be perfectly stable, we want to implement several security and
performance features that are only available in the new version.
Please note that this upgrade of the Cisco IOS will result in 5 minutes
of downtime February 7, 2003 at 6 AM GMT (10 PM PST). Thank you for your
patience during this process. There will be a few scheduled network outages
lasting less than a few minutes in the next 8 days. We are working hard
to prevent any further outages. Deploying and testing new systems may
cause isolated issues, although roll-back provisions will be in-place
and implemented immediately.
We appreciate your business and thank you for your participation in our
successful growth.
helloyou web services by Clearing
Systems Inc.
|