BGP multi-site

Once an organisation grows beyond a single site, routing complexity increases fast. Two sites can survive on static routes. Three might limp along. Beyond that, static routing becomes brittle, error-prone, and operationally risky.

I’ve seen environments where a single missed static route caused:

  • Partial outages between sites
  • Traffic hairpinning across continents
  • Failover links that never actually failed over

This is where BGP earns its place. While often associated with the internet, BGP is extremely effective for enterprise multi-site routing when you need dynamic path selection, policy control, and predictable failover behaviour.

Used properly, BGP lets your network adapt to failures instead of waiting for humans to fix them.


What BGP Is (And Why Enterprises Use It Between Sites)

BGP (Border Gateway Protocol) is a path-vector routing protocol. Unlike IGPs such as OSPF or EIGRP, BGP is built around policy, not speed.

That’s exactly why it works so well between sites.

In enterprise multi-site designs, BGP enables:

  • Dynamic route advertisement
  • Multiple WAN paths and ISPs
  • Controlled failover behaviour
  • Load-sharing across links
  • Fine-grained routing policy

Most importantly, it replaces fragile static routing with intent-based routing decisions.


Core Design Decisions Before You Touch a Router

This is where experienced engineers slow down—and juniors rush ahead.

ASN Strategy: Single ASN vs Multiple ASNs

You have two main models:

Single ASN (iBGP everywhere)

  • Easier internal policy
  • Requires route reflectors or full mesh
  • Common in tightly controlled enterprises

Multiple ASNs (eBGP between sites)

  • Stronger isolation between sites
  • Clearer policy boundaries
  • Easier troubleshooting at scale

In practice, I often favour multiple ASNs for large or geographically distributed environments. It adds clarity and limits blast radius when mistakes happen.


Peering Topology Choices

Your topology directly impacts stability and scalability:

  • Full mesh: simple conceptually, poor scalability
  • Hub-and-spoke: easy to manage, central dependency
  • Partial mesh: balanced but requires planning

There’s no “best” topology—only what fits your business and operational maturity.


Prefix Strategy and Aggregation

Advertising hundreds of small prefixes across all sites is a fast way to:

  • Increase convergence times
  • Stress router memory
  • Complicate troubleshooting

Summarise wherever topology allows. But don’t over-aggregate to the point where routing becomes inefficient or misleading.


Step-by-Step: Building Multi-Site BGP Properly

Step 1: Choose iBGP or eBGP Between Sites

  • Use iBGP when sites share the same ASN
  • Use eBGP when sites use separate ASNs

Key point many miss:

iBGP does not re-advertise routes learned from other iBGP peers unless you use route reflectors.

This single rule causes more “why doesn’t my route propagate?” tickets than almost anything else in BGP.


Step 2: Establish Stable Peering Sessions

Best practice:

  • Peer using loopback interfaces
  • Ensure IGP or static routes exist to reach loopbacks
  • Set explicit router IDs
  • Protect sessions from firewall interference

Using loopbacks dramatically improves resilience during interface failures or routing changes.


Step 3: Configure Core BGP Settings

Every BGP deployment should explicitly set:

  • Local ASN
  • Remote ASN
  • Router ID (never leave this random)
  • Keepalive and hold timers
  • Address families (IPv4/IPv6)

Defaults work in labs. Production deserves intention.


Routing Policy: Where BGP Actually Becomes Powerful

Without policy, BGP is just noisy dynamic routing.

Import and Export Filtering

Always define:

  • What prefixes you accept
  • What prefixes you advertise

Never assume “we’ll just trust the peer”. Even internal peers make mistakes.


Path Selection Controls

Key tools you’ll use daily:

  • Local Preference – primary tool for outbound traffic control
  • AS Path Prepending – de-prioritise certain paths externally
  • MED – influence inbound traffic when peers respect it
  • Communities – scalable, readable policy control

Real-world truth:
Local Preference solves 80% of enterprise routing decisions.


Redundancy and Failover (Where Theory Meets Reality)

BGP only reacts as fast as the signals it receives.

Link Failure Detection

Out of the box, BGP can take tens of seconds to detect failures.

For modern networks:

  • Enable BFD (Bidirectional Forwarding Detection) where supported
  • Tune timers carefully
  • Fix flapping links instead of masking them

Fast failover is meaningless if the underlying link is unstable.


Multiple Routers Per Site

Single routers are single points of failure—no matter how many links you add.

At scale:

  • Use at least two routers per site
  • Use route reflectors or confederations
  • Keep designs consistent across sites

Consistency reduces outages more than complexity ever will.


Security: BGP Is Trust-Based—So Add Guardrails

BGP assumes peers behave correctly. That’s optimistic.

Essential Security Controls

  • BGP session authentication
  • Prefix filtering (strict)
  • AS-path filtering
  • Maximum prefix limits
  • Route validation (RPKI where possible)

I’ve seen internal route leaks take down entire WANs—not because of attackers, but because of typos.


Hidden Settings That Separate Stable Networks From Fragile Ones

These rarely show up in basic guides:

Next-Hop Self in iBGP

Without it, routes may point to unreachable next hops, causing silent blackholes.


Route Reflector Design

Bad reflector design causes:

  • Suboptimal routing
  • Hidden loops
  • Inconsistent convergence

Always define cluster IDs intentionally.


BGP Dampening

Useful for unstable links—but dangerous if misused. Dampening the wrong prefixes can make outages worse.


IGP and BGP Metric Alignment

If your IGP prefers a path that BGP considers secondary, traffic will behave unpredictably.

Align them.


Testing and Monitoring (Where Good Designs Stay Good)

Before production:

  • Simulate link failures
  • Verify convergence time
  • Confirm traffic paths

In production:

  • Monitor session state
  • Track prefix counts
  • Watch CPU and memory
  • Alert on unexpected changes

If you don’t monitor BGP, it will fail silently.


Example: Practical Multi-Site Enterprise Design

Three sites:

  • Site A (primary data centre)
  • Site B (DR site)
  • Site C (branch)

Each site:

  • Uses its own ASN
  • Runs eBGP between sites
  • Has local internet breakout

Policies:

  • Internal traffic prefers local site
  • Internet traffic exits locally unless policy dictates otherwise
  • DR paths are pre-configured, not improvised

This design scales cleanly and survives failures predictably.


Common Pitfalls I See Repeatedly

  • No prefix filtering
  • Inconsistent ASN usage
  • Over-advertising routes
  • No failure testing
  • Blind trust in defaults

Every one of these has caused real outages.


Conclusion: BGP Rewards Discipline

BGP is not hard—but it is unforgiving.

When designed intentionally, it delivers:

  • Scalable routing
  • Predictable failover
  • Strong policy control
  • Long-term operational sanity

When rushed or under-planned, it becomes opaque and dangerous.

My rule of thumb after years of operating BGP networks:

If you can’t explain your BGP policy on a whiteboard, it’s probably too complex.

Design carefully, filter aggressively, test often—and BGP will serve you extremely well across your sites.

Leave a Reply

Your email address will not be published. Required fields are marked *