High Avail Overview
Introduction
High availabilty is runing a network with 99.999% (5 min) or 99.9999% (30 sec) downtime per year.There a number of methods that provide the capability of HA, some of these include NSF and NSR.
HA
- HA is recoving a failure by continuing to flow traffic through a device experiencing a fault, instead of around it (RP failure).
- HA does not change the topology during faulty recovery.
- HA increases the redundancy of a single device.
- There are two main components of HA:
- Non-Stop Forwarding/Gracefull Restart (NSF)
- Non-Stop Routing (NSR)
- HA maintains the data plane forwarding while control plane is failingover to a standby.
- HA requires redundant RP or a chassis (VSS).
- Without HA, the following occurs:
- RP fails, data plane still works
- Control plane detects the failure and reset both control plane and data plane, which causes an outage.
- Control plane has to reconverge to route around the failure.
NSF
- Requires protocol extension to support NSF.
- Neighboring routers need to be NSF aware to understand the NSF messages.
- With NSF, the following occurs:
- RP fails, data plane still works
- Control plane detects the failure but keeps data plane marking the data as stale
- NSF aware neighbor will count down hold timer until failed device's RP comes up and signals with a gracefull restart message
- NSF aware neighbor updates the faulty device.
NSR
- NSR is a improvement to a NSF, which works by syncing control plane information between two RPs. In the event of an active RP failure, the standby can pick up control plane sessions.
- NSR does not require protocol extensions, but only synchronization within the chassis.
- Input packets to the routing process are sent to both RPs, but only one responds.
- Protocols that Cisco supports NSF include: