Hello You Web Services
Home > News > Approximately 80 minutes of extreme latency [ServePath Data Centre]

Lahaul, Himachal Pradesh, India

Approximately 80 minutes of extreme latency
[ServePath Data Centre]

by John Keagy, President ServePath, 6 February 2003

ServePath experienced approximately 80 minutes of extreme latency early this morning as a result of a hardware failure in our border router. The GigabitEthernet adapter on the router failed, resulting in the loss of one of our connections. Traffic was temporarily re-routed until this equipment was replaced, causing further intermittent latency that is now resolved. ServePath is growing quickly and I'd like to fill you in on our current efforts in building out our network infrastructure to improve reliability and scalability. We have had a series of network incidents recently and we have taken action to defend against them in the future. We are fortunate to be located in Silicon Valley where some of world's most elite networking talents and Internet pioneers can provide on-site management of our network. The incidents began January 24 with the slammer / sapphire virus. The initial outbreak of this SQL virus was contained quickly by our staff, especially when compared to many other larger networks which experienced significant downtime.

Unfortunately, the sapphire virus is still alive and well on the Internet, and we've had recurring cases of our customers improperly deploying Microsoft SQL on their equipment and creating a massive packet storm within minutes. We've also had inbound and outbound syn floods and distributed denial of service attacks. Here is our five step plan to improve our network to ensure reliability in the future:

  1. New "sniffer" tools have been deployed to identify compromised machines and disable them. These tools also help us to identify incoming attacks and block them from our upstream IP transit providers and peers before they even get to our border router.

  2. We are increasing the granularity of our internal network to the maximum extent possible to isolate LAN broadcast domains so that packet storms from one domain do not interfere with others. This project will be complete by February 14.

  3. We are installing additional routers and switches to improve redundancy and further mitigate the risk of hardware failures like the one that we experienced this morning. This hardware implementation will be complete by February 19. We do maintain spares on-site for all equipment in addition to a 4 hour on-site replacement contract with Cisco.

  4. We are wiring all of our switches to a second internal network so that we can still manage them if a packet storm has saturated the switch uplinks. Sapphire has set new records for speed and so our previous MRTG-based toolsets are being replaced. This will be complete February 12.

  5. The Cisco Internetworking Operating System (IOS) on our router is being upgraded tonight. Although we have found our current version to be perfectly stable, we want to implement several security and performance features that are only available in the new version.

Please note that this upgrade of the Cisco IOS will result in 5 minutes of downtime February 7, 2003 at 6 AM GMT (10 PM PST). Thank you for your patience during this process. There will be a few scheduled network outages lasting less than a few minutes in the next 8 days. We are working hard to prevent any further outages. Deploying and testing new systems may cause isolated issues, although roll-back provisions will be in-place and implemented immediately.

We appreciate your business and thank you for your participation in our successful growth.

helloyou web services by Clearing Systems Inc.