High Avail Non Stop Forwarding BGP

Introduction

Non Stop Forwarding (NSF) is one of the High Availability technologies which makes a device more resilient to failures. It requires redundant Route Processor or Supervisors.

NSF

  • NSF maintains the data plane forwarding during a control plane failure.
  • The Control plane failure is mainly for routing processor failure.
  • One of the main concepts for NSF is to prevent all BGP routers to be effected by a restart of the RP. Only BGP peers react to a restart that have the Graceful Restart capability.
  • This is also called Graceful Restart.
  • The default behavior without NSF:
    • Active RP crashes, once the neighbor detect control plane failure, it will drop data forwarding.
    • Standby RP takes over the sessions and BGP adj is restarted. Data plane forward is reset.
    • During this convergence packets are lost.
  • Behavior with NSF:
    • When a router restart it opens a new TCP session using the same RID.
    • The peer interprets it as a restart, closing the original session and marks all paths as stale from that peer.
    • The restarting router send it's BGP table, when it's done, the end-of-RIB is indicated with an empty withdraw Update message.
    • The peer exits read-only mode and performs best path calculation and updates CEF (the data plane).
  • Two new features introduced with BGP NSF:
    • End-of-RIB Marker - when a restart occurs, end-of-marker is used to notify the initial convergence is done and BGP can exit read-only mode and execute the best path algorithm. Cisco uses a BGP Keepalive to symbolize the same feature, but that's not implemented with all vendors, provides better interoperability between vendors.
    • Graceful Restart capability - capability exchanged during the BGP initial session establishment, indicates that peer intends to use end-of-RIB along with other functionality of NSF. These include:
      • Restart State - indicated a routers is restarting.
      • Restart Timer -determines how long peer routers will wait to delete stale routes before a BGP open message is received. The default value is 120 seconds. This timer should be less than the BGP Holdtime.
      • Forwarding State - Indicated if a BGP peer can maintain the data forwarding in the even of a failure. Some platforms are only capable of being NSF clients and will forward data during neighbor's failure, but they themselves will not be able to perform the same functionality.
  • During a restart, the FIB information should be marked as stale.
  • Enabling NSF requires BGP session restart.

Relevant IOS Commands

bgp graceful-restart

Enables NSF under the bgp configuration mode.

R1(config-router)#        bgp graceful-restart

bgp graceful-restart restart-time

The optional restart-time keyword and seconds argument determine how long peer routers will wait to delete stale routes before a BGP open message is received. The default value is 120 seconds. This timer waits for the TCP and BGP to be reestablished.

R1(config-router)#        bgp graceful-restart restart-time (sec)

bgp graceful-restart stale-time

The optional stalepath-time keyword and seconds argument determine how long a router will wait before deleting stale routes after an end of record (EOR) message is received from the restarting router. The default value is 360 seconds. This timer is per prefix, started after the new BGP session is reestablished, but an update to a stale prefix hasn't been received yet.

R1(config-router)#        bgp graceful-restart stale-time (sec)

Additional Resources

BRKIPM-2001 - Routing High Availability – NSF & NSR (Cisco Live).

Comments

rating: 0+x
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License