Skip to content

[bfd] Add SBFD hardware offload HLD document#2329

Open
yuezhoujk wants to merge 1 commit into
sonic-net:masterfrom
yuezhoujk:sbfd-hld
Open

[bfd] Add SBFD hardware offload HLD document#2329
yuezhoujk wants to merge 1 commit into
sonic-net:masterfrom
yuezhoujk:sbfd-hld

Conversation

@yuezhoujk
Copy link
Copy Markdown
Collaborator

Add High Level Design document for Seamless BFD (SBFD) hardware offload as an extension to BFD-Syncd HLD. Covers SBFD Initiator, Echo, and Reflector session types with SRv6 encapsulation, SR-TE Policy integration, SAI-level BFD-coupled NHG fast switchover, and CONFIG_DB/APPL_DB/STATE_DB schema extensions.

Add High Level Design document for Seamless BFD (SBFD) hardware
offload as an extension to BFD-Syncd HLD. Covers SBFD Initiator,
Echo, and Reflector session types with SRv6 encapsulation, SR-TE
Policy integration, SAI-level BFD-coupled NHG fast switchover,
and CONFIG_DB/APPL_DB/STATE_DB schema extensions.

Signed-off-by: yuezhou.jk <yuezhou.jk@alibaba-inc.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@yuezhoujk yuezhoujk marked this pull request as draft May 12, 2026 11:47
@azure-pipelines
Copy link
Copy Markdown

No pipelines are associated with this pull request.

@yuezhoujk yuezhoujk marked this pull request as ready for review May 13, 2026 06:36
segment-routing
traffic-eng
policy color 100 endpoint 200::D
sbfd echo 3 source-address 200::A
Copy link
Copy Markdown
Contributor

@rajshekhar-nexthop rajshekhar-nexthop May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config model shown enables SBFD only at the policy level. In production, operators commonly want one of three scopes: (a) global — enable SBFD for all SR-TE policies by default; (b) per-policy — enable for one policy (shown here); (c) per-candidate-path — enable for a specific candidate-path while leaving sibling CPs unmonitored. The current schema only supports (b). Could you extend the model to express all three, with a precedence rule (CP overrides policy overrides global) so operators don't have to repeat the same sbfd block under every policy?

segment-routing
traffic-eng
policy color 100 endpoint 200::D
sbfd echo 3 source-address 200::A
Copy link
Copy Markdown
Contributor

@rajshekhar-nexthop rajshekhar-nexthop May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both example forms inline the detect-multiplier (3) and source-address directly in the sbfd command. SONiC already has a first-class BFD_PROFILE table (sonic-bfd-profile.yang, landing via PR sonic-net/sonic-buildimage#27229) carrying detect_multiplier, receive_interval, transmit_interval, echo_interval, echo_mode, passive_mode, minimum_ttl. Recommend the SBFD CLI accept a profile <name> reference instead of (or in addition to) inline parameters, so SBFD timers can reuse the same profile abstraction as classic BFD and so the global/per-policy/per-CP scopes from comment #1 can share a profile by name.

Note: to serve SBFD specifically, BFD_PROFILE needs two additional optional leaves — mode (bfd / sbfd_init / sbfd_echo, default bfd) and local_address — so that mode and a shared inner-source address can be templated. See comment #16 for the full CONFIG_DB schema, including the profile leafref on BFD_PEER.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK,We were not using BFD_PROFILE before, so we directly modified BFD_SESSION_TABLE. Going forward, I will build on your approach and extend the BFD_PROFILE table as well.

```
CONFIG_DB SRTE_POLICY → frrcfgd → pathd (vtysh)
→ pathd internal sr_config_sbfd_apply()
→ ZAPI ZEBRA_SBFD_DEST_REGISTER
Copy link
Copy Markdown
Contributor

@rajshekhar-nexthop rajshekhar-nexthop May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design introduces a new ZAPI opcode ZEBRA_SBFD_DEST_REGISTER distinct from the existing ZEBRA_BFD_DEST_REGISTER (bfdd/ptm_adapter.c:476). Could you justify the new opcode rather than extending the existing ZEBRA_BFD_DEST_REGISTER payload with SBFD-specific fields (session type, segment list, encap source IPv6, bfd-name)? A second parallel opcode doubles the registration paths zebra/bfdd have to keep in sync (DEREGISTER, peer-down, peer-replay), and FRR upstream review tends to push back on duplicate opcodes when an extension is sufficient.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing that out. We introduced ZEBRA_SBFD_DEST_REGISTER because the implementation of SBFD for SRv6 TE differs from classic BFD in several aspects, for example, it needs to carry the segment list. That said, it should also be possible to extend the existing ZEBRA_BFD_DEST_REGISTER to support this use case. I agree with your suggestion, and I will review the implementation accordingly.

bfdd->>bfdd: Hardware offload confirmed<br/>After BFD_XMTDEL_DELAY_TIMER delay<br/>Stop software TX
```

**Software-to-hardware switchover:** bfdd initially sends SBFD packets (with SRv6 encapsulation) via raw socket. Upon receiving `BFD_STATE_CHANGE(up)` from bfdsyncd, bfdd continues software TX for a `BFD_XMTDEL_DELAY_TIMER` delay period to ensure overlap, then stops software TX. During the overlap window, the remote end may receive duplicate packets, which is harmless per RFC 7880.
Copy link
Copy Markdown
Contributor

@rajshekhar-nexthop rajshekhar-nexthop May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is software to hardware switchover is vendor ASIC specific requirement for S-BFD sessions?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because on some ASICs, a BFD session transitioning from down to up does not trigger an up-state notification.

+---------------------------+
```

The SRv6 sidlist is created in BFDOrch via `sai_srv6_api->create_srv6_sidlist()` with type `SAI_SRV6_SIDLIST_TYPE_ENCAPS`, and associated with the BFD session via `SAI_BFD_SESSION_ATTR_SRV6_SIDLIST_ID`. Multiple SBFD sessions using the same segment list share a single SAI sidlist object (reference counting management, see §5.3.2).
Copy link
Copy Markdown
Contributor

@rajshekhar-nexthop rajshekhar-nexthop May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BFDOrch directly calls sai_srv6_api->create_srv6_sidlist() / remove_srv6_sidlist() and maintains its own m_bfdSidlist reference-count table. Srv6Orch::createUpdateSidList() and Srv6Orch::deleteSidList() already exist in sonic-buildimage/src/sonic-swss/orchagent/srv6orch.h:182-183 and own SRv6 sidlist lifecycle for the data path. Recommend BFDOrch delegate to SRv6Orch (or call a shared helper) rather than maintaining a parallel sidlist table. Otherwise a segment-list used by both an SR-TE policy data path and an SBFD session has two orchagents racing on the same SAI object's lifetime.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the cpath scenario configured with bfd-name, the segment lists managed by BfdOrch and Srv6Orch are independent at the upper layer, so there is no conflict here. However, for the scenario where SBFD is enabled under a policy, the race condition you pointed out does indeed exist.


### 5.3.2 SRv6 Sidlist Management

BFDOrch creates SRv6 sidlist objects for SBFD Initiator and Echo sessions. Multiple sessions using the same segment list share a single SAI sidlist object.
Copy link
Copy Markdown
Contributor

@rajshekhar-nexthop rajshekhar-nexthop May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restating the same concern as comment #6 at the section that designs it: §5.3.2 makes BFDOrch the owner of sidlist create/delete and the reference-count table. The grounded recommendation is to factor createBfdSidList() / deleteSidList() out into Srv6Orch and have BFDOrch call into it by segment string, so SAI SRv6 object lifecycle stays in one orchagent.


1. Look up APPL_DB key from `m_bfdSAIIdToDB` mapping via SAI OID
2. If state is **DOWN**:
- Call `removeBfdPeer()` to delete SAI session
Copy link
Copy Markdown
Contributor

@rajshekhar-nexthop rajshekhar-nexthop May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

§5.3.4 calls removeBfdPeer() (delete SAI session), deleteSidList() (decrement sidlist ref), and clears m_bfdPeers/m_bfdSAIIdToDB on every ASIC bfd_session_state_change event that reports DOWN. A transient flap (segment-list briefly unreachable, then recovers) would then require a full SAI session re-create plus possible sidlist re-create on every recovery. Is teardown on every DOWN required for correctness, or could BFDOrch leave the SAI session in place (state-only update) and only tear down on explicit user removal? The latter matches how classic BFD handles peer-down notifications and avoids SAI churn under flap.

A second concrete failure mode: bfdd restart while a session is Down loses recovery entirely. If bfdd crashes / warm-restarts during a Down state, the SAI session is already gone (BFDOrch deleted it), bfdd has no record to re-issue, and the ASIC retains no state. The operator must recreate the session. This is worse than classical BFD HW offload's behavior, where the SAI session survives both Down transitions and bfdd restarts.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first point, our initial implementation actually worked as you described: the SAI session was not torn down when BFD went down. The issue is that on some ASICs, when the BFD session transitions from up to down, the SAI session reports the down event, but when the BFD session transitions from down to up, no up event is reported. As a result, if we keep the SAI session in place, the BFD session state in software can become inconsistent with the hardware state. Because of this limitation, we were forced to adopt the current approach. That said, I agree with your point that we may not need to accommodate the limitations of certain ASICs and could instead implement a more generic solution.

Our current design does not have the issue you described. If the session goes down, the hardware session is deleted. If bfdd restarts at that point, it will reload the BFD configuration after coming back up and re-establish the session. Once the BFD session comes back up, it will program the hardware again.

- Clear records from `m_bfdPeers` and `m_bfdSAIIdToDB`
3. Update `bfd_hw_state` in STATE_DB (updated for both UP and DOWN)

The design of deleting the SAI session on DOWN allows bfdd to rebuild the hardware session by resending `DP_ADD_SESSION` when needed.
Copy link
Copy Markdown
Contributor

@rajshekhar-nexthop rajshekhar-nexthop May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence makes the split-authority explicit: BFDOrch deletes the SAI session autonomously on ASIC DOWN, and bfdd reconstructs it by resending DP_ADD_SESSION. Two issues with that contract: (1) bfdd's view of "session exists" temporarily diverges from BFDOrch's, leading to race windows where a delete+recreate from bfdd's side could collide with BFDOrch's auto-delete; (2) it's harder to reason about who owns the SAI object's lifetime. Recommend a cleaner contract: BFDOrch only reports state to STATE_DB on DOWN (does not delete), bfdd is the sole issuer of both DP_DELETE_SESSION and DP_ADD_SESSION. That keeps SAI lifecycle authority in one place.

This also diverges from the classical BFD HW Offload HLD, which keeps the SAI session in place through Down/Up transitions and only deletes on explicit operator removal. SBFD shouldn't introduce a different lifecycle pattern without explicit justification — operators, observability tools, and downstream consumers all rely on classical BFD's "session persists, only state field changes" model.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fundamentally, this still comes back to that limitation. I agree with your point that the SAI session should be kept on a Down event. That said, I would like to clarify one aspect of the current implementation: after BFDOrch autonomously deletes the SAI session when the ASIC reports DOWN, the session still exists from bfdd’s point of view. The difference is that, at that point, BFD protocol packet processing is handled by the bfdd process in software, i.e., packets are sent from software rather than offloaded to hardware. In essence, BFDOrch is not competing with bfdd; BFDOrch is managed by bfdd. Even after BFDOrch removes the hardware session due to the session going down, whether and when the session is reprogrammed into hardware is still controlled by bfdd.

style CP2 fill:#fff9c4,stroke:#f9a825
```

pathd creates SBFD sessions via `sr_config_sbfd_apply()`, setting BFD timers, segment list, echo mode, and SRv6 encapsulation source address. The SBFD session lifecycle is fully managed by pathd (create/modify/delete SBFD sessions correspondingly when segment-lists are added/modified/deleted).
Copy link
Copy Markdown
Contributor

@rajshekhar-nexthop rajshekhar-nexthop May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

§3.3.1 (line 341) makes the pathd→bfdd transport explicit (ZAPI ZEBRA_SBFD_DEST_REGISTER via zebra). §6.2.1 just says "pathd creates SBFD sessions via sr_config_sbfd_apply() … fully managed by pathd," which reads as if sr_config_sbfd_apply() is a direct internal API. Please restate here that create/modify/delete from pathd go over ZAPI (whichever opcode is settled on after comment #4), so the IPC isn't ambiguous in this section.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, thank you for the careful review.

| FR-4 | Offload all three SBFD session types to ASIC hardware for fast failure detection |
| FR-5 | Support SR-TE Policy (pathd) triggered SBFD sessions for automatic segment-list liveness detection |
| FR-6 | Support bfdd CLI direct configuration of SBFD sessions |
| FR-7 | Support IPv4 and IPv6 inner-layer addresses for SBFD Initiator and Echo sessions |
Copy link
Copy Markdown
Contributor

@rajshekhar-nexthop rajshekhar-nexthop May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FR-7 (line 234): "Support IPv4 and IPv6 inner-layer addresses for SBFD Initiator and Echo sessions." §11 Limitations: "IPv4-only SBFD not yet implemented: Currently only IPv6 scenarios are supported. Pure IPv4 packets (without SRv6) SBFD is not yet implemented." These two statements are inconsistent: the requirements list claims IPv4 inner-layer support as in-scope, the limitations table says it isn't implemented. Please reconcile — either downgrade FR-7 to IPv6-only (matching the v1 implementation reality) or move it out of the v1 requirements list with a forward reference, so reviewers can tell what this PR is actually committing to deliver.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, the description here is inaccurate. What I intended to say is that this HLD only supports SBFD over SRv6, and does not support classical SBFD without an outer SRv6 header. I’ll update it.


**Key format:**
```
BFD_SESSION_TABLE:{{dst_ip}}|{{local_discr}}|{{bfd_name}}|{{sequence_id}}
Copy link
Copy Markdown
Contributor

@rajshekhar-nexthop rajshekhar-nexthop May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The APPL_DB BFD_SESSION_TABLE key includes sequence_id as a fourth tuple field (line 657, defined at line 663 as "monotonically increasing sequence number for deduplication and ordering"). In the full hardware-offload model this PR describes, sequence_id is an ASIC-vendor-specific deduplication workaround and not a property of the SONiC SBFD design — the design's authoritative session identity is (dst_ip, local_discr, bfd_name), which is exactly what the STATE_DB key uses (line 1181) and what bfdsyncd parses on the reverse path (line 680). Recommend removing sequence_id from the documented APPL_DB key here so the HLD reflects the architecture, not a transient vendor mitigation. If a specific ASIC needs sequence-tagging, that belongs in vendor-specific notes, not in the canonical schema.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I agree.

participant SAI as SAI / ASIC

CONFIG->>bfdd: Configure SBFD Initiator<br/>(bfd-mode sbfd-init, remote-discr, srv6-encap-data)
bfdd->>bfdd: Create SBFD Initiator session<br/>Allocate local discriminator<br/>Start software TX
Copy link
Copy Markdown
Contributor

@rajshekhar-nexthop rajshekhar-nexthop May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

§5.4 fast-switchover depends on the SAI_NEXT_HOP_ATTR_BFD_DISCRIMINATOR value embedded by pathd in the route encap (§5.4.3) matching the local_discr of an SBFD session on this device. The doc currently implies dynamic allocation by bfdd (line 405: "Allocate local discriminator") but exposes no configuration surface — FRR CLI (§9.1), pathd CLI, and the proposed CONFIG_DB BFD_SESSION_TABLE (§7.1) all lack a local-discr field for Initiator/Echo, and no bfdd→pathd return path for the allocated value is described. Recommend adopting the static-configuration model already in use for the Reflector (line 569: "administratively allocated by the operator, globally unique on the device"): expose local-discr <value> on the sbfd-init/sbfd-echo FRR CLI and as a local_discr field in CONFIG_DB BFD_PEER (per comment #16, which recommends reusing BFD_PEER rather than introducing a new BFD_SESSION_TABLE), drop the "Allocate local discriminator" wording on line 405, and document that pathd's embedded discriminator must match the bfdd-configured local_discr. This avoids introducing a new ZAPI return path for allocated IDs and keeps the design uniform across all three SBFD session types.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this was not described clearly enough in the HLD. The local discriminator is a randomly generated value created by FRR when the BFD session is instantiated. When SRv6 pathd binds the BFD session, it records that local discriminator and then programs it down to hardware through the pathd → zebra (ZAPI) → fpmsyncd → APP_DB → nhgorch/srv6orch → SAI pipeline. The purpose is to allow the hardware nexthop to associate with the corresponding BFD session, so that hardware-assisted BFD-triggered fast switchover can be achieved.

I will refine this part of the HLD to make the flow and ownership clearer.


## 7.1 CONFIG_DB

### BFD_SESSION_TABLE Table (extended for SBFD)
Copy link
Copy Markdown
Contributor

@rajshekhar-nexthop rajshekhar-nexthop May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CONFIG_DB schema in §7.1 introduces a new BFD_SESSION_TABLE keyed by {name} with mode, remote_discr, segment_list, source_ipv6, etc. Recommend dropping this table and instead reusing the existing BFD_PEER model already handled by frrcfgd.bfd_handler (src/sonic-frr-mgmt-framework/frrcfgd/frrcfgd.py:2826), extended along these lines:

BFD_PEER (existing, extend):

key = BFD_PEER|<bfd_name>
  enabled           : true|false
  mode              : bfd|sbfd_init|sbfd_echo      ; default "bfd"
  peer              : ipv4/v6
  local_address     : ipv4/v6                       ; per-session override of profile
  vrf               : <vrf_name>
  multihop          : true|false
  remote_discr      : uint32                        ; sbfd_init only
  local_discr       : uint32                        ; static, see comment #15
  srv6_encap_data   : <ipv6-csv>                    ; sbfd_init / sbfd_echo
  profile           : leafref -> BFD_PROFILE        ; optional

BFD_PROFILE (existing, extend per PR sonic-net/sonic-buildimage#27229): add two optional leaves so fields shared across a fleet of S-BFD sessions are declared once and inherited:

  mode              : bfd|sbfd_init|sbfd_echo       ; NEW, default "bfd"
  local_address     : ipv4/v6                        ; NEW, optional

Per-session values on BFD_PEER override profile-supplied values. Timer changes on a bound profile are applied in place by BfdOrch::update_bfd_session via set_bfd_session_attribute; mode/local_address changes require session re-create (CREATE_ONLY SAI attributes).

srv6_source_ipv6 is intentionally not added to either table. The outer-header source IPv6 should be read from FRR's existing global segment-routing srv6 encap-source-address, which already serves every other SRv6 feature on the node. Per-session/per-profile duplication of a node-wide value invites drift.

SBFD_REFLECTOR stays as proposed (its own table keyed by discriminator; Reflector has no peer).

|------|-----|-----------|---------|-------------|--------------|--------------|
| Initiator | RFC 7880 | Initiator → Reflector | Detect reachability of remote Reflector via a specific SRv6 path | Yes | 7784 | Reflector's discriminator (pre-configured) |
| Echo | RFC 7880 §4 | Self → Self (via SRv6 path) | Detect SRv6 path liveness via loopback | Yes | 3785 | = local_discr (self-addressed) |
| Reflector | RFC 7880 §4 | Passive response | Reflect received SBFD probe packets back to Initiator | No | 3784 | 0 (don't care) |
Copy link
Copy Markdown
Contributor

@rajshekhar-nexthop rajshekhar-nexthop May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The §1.2 session-type table lists the SBFD Reflector with UDP Dst Port = 3784, and §5.2.1 fills the same default in bfdsyncd. RFC 7881 §3 mandates UDP port 7784 for SBFD Control packets (3784 is the classical BFD port per RFC 5881). The HLD already uses 7784 correctly in the §1.3 packet walkthroughs (lines 173, 184, 195), so this is an internal contradiction, not a doctrine disagreement. Replace 3784 with 7784 in the §1.2 table and the §5.2.1 reflector defaults;


3. **Reflector/Initiator validation exemption**: For `SAI_BFD_SESSION_TYPE_REFLECTOR` and `SAI_BFD_SESSION_TYPE_INITIATOR`, skip `validate()` check (do not require both src/dst IP pairs to be valid, since Reflector's dst_ip is `::`)

4. **Common attribute**: All SBFD sessions set `SAI_BFD_SESSION_ATTR_OFFLOAD_TYPE = SAI_BFD_SESSION_OFFLOAD_TYPE_SUSTENANCE`
Copy link
Copy Markdown
Contributor

@rajshekhar-nexthop rajshekhar-nexthop May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SBFD has no handshake (RFC 7880 §3), so there's no establishment to do in software. Can you share more details why we need sustenance mode for S-BFD? Is it vendor ASIC specific?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly. S-BFD still requires the Reflector end to send reply packets before the session can transition to Up. In that sense, this issue is caused by that limitation—specifically, the lack of notification for the down-to-up state transition.


pathd manages SBFD sessions at **segment-list granularity**. Each segment-list can be bound to one SBFD Initiator or Echo session. When a segment-list is added, modified, or deleted, pathd correspondingly creates, updates, or deletes the associated SBFD session.

### 3.3.2 Path B — bfdd CLI Direct Configuration (color only)
Copy link
Copy Markdown
Contributor

@rajshekhar-nexthop rajshekhar-nexthop May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

§3.3.2 and §6.2.2 use the term "color only" SR Policy to mean "policy without an explicit endpoint." RFC 9256 §2.1 already standardizes this concept as a null-endpoint SR Policy (0.0.0.0 for IPv4, :: for IPv6); the §9.2 CONFIG_DB examples (lines 1347-1369) confirm this — the SRV6_POLICY key 1|:: encodes color + null endpoint. Recommend using "null-endpoint SR Policy (RFC 9256 §2.1)" consistently throughout instead of the local term "color only," and stating that endpoint = :: / 0.0.0.0 in CONFIG_DB / FRR CLI represents the null-endpoint case. The current upstream FRR pathd CLI (pathd/path_cli.c:613) makes endpoint mandatory; if this PR proposes making it optional with null as the implicit default, please call that out explicitly.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, agreed. Thank you.


**Key format:**
```
BFD_SESSION_TABLE:{{dst_ip}}|{{local_discr}}|{{bfd_name}}|{{sequence_id}}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

§5.2.2 makes bfd_name part of the APPL_DB key but doesn't define its uniqueness contract: must it be globally unique on the DUT, unique within a (dst_ip, local_discr) tuple, or unspecified? pathd-allocated names (e.g., synthesized from color|endpoint|preference|sl_name) and operator-typed names (free strings) could collide. Need an explicit uniqueness invariant plus the enforcement point (CONFIG_DB YANG must, frrcfgd validation, or bfdsyncd-side reject).

Note: comment #16's BFD_PEER schema (keyed by bfd_name) makes this question even more load-bearing, since the table key is bfd_name — collisions become silent overwrites.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For bfd_name, different BFD sessions must have different names. If you configure the same name twice, FRR will reject it. Going forward, we also plan to enforce this through CONFIG_DB YANG validation.

If the bfd_name values are different, then the local_discr values will also necessarily be different, because local_discr, like bfd_name, is also unique. For BFD sessions generated by pathd, the bfd_name will be automatically set to None, so in practice there will be no conflict.

- SRv6 data plane internal implementation
- SR-TE Policy pathd internal design (candidate path selection algorithm)

# Definitions/Abbreviation
Copy link
Copy Markdown
Contributor

@rajshekhar-nexthop rajshekhar-nexthop May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Terms used in the body but absent from the definitions table: NHG, NHGM (next-hop group member), RIB, FIB, SAI OID, OCP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants