Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
210 changes: 210 additions & 0 deletions doc/layer1/cmis-firmware-upgrade/cmis_max_duration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,210 @@
# CDB Commands with Advertised MaxDuration

## Table of Contents
- [List of Tables](#list-of-tables)
- [1. Revision](#1-revision)
- [2. Scope](#2-scope)
- [3. Definitions/Abbreviations](#3-definitionsabbreviations)
- [4. Overview](#4-overview)
- [5. Requirements](#5-requirements)
- [6. CDB Timeouts](#6-cdb-timeouts)
- [7. High-Level Design](#7-high-level-design)
- [7.1 Timeout Resolution Flow](#71-timeout-resolution-flow)
- [7.2 Commands Using Module Advertised MaxDuration (CMIS Compliant)](#72-commands-using-module-advertised-maxduration-cmis-compliant)
- [7.3 CDB Commands in the Firmware Upgrade Workflow](#73-cdb-commands-in-the-firmware-upgrade-workflow)
- [7.4 Repository to Change](#74-repository-to-change)
- [8. SAI API](#8-sai-api)
- [9. Configuration and Management](#9-configuration-and-management)
- [10. Warmboot and Fastboot Design Impact](#10-warmboot-and-fastboot-design-impact)
- [11. Restrictions/Limitations](#11-restrictionslimitations)
- [12. Testing Requirements](#12-testing-requirements)
- [References](#references)

## List of Tables
* [Table 1: Definitions](#table-1-definitions)
* [Table 2: Commands Using Advertised MaxDuration](#table-2-commands-using-advertised-maxduration)
* [Table 3: Upgrade Workflow CDB Commands](#table-3-upgrade-workflow-cdb-commands)

## 1. Revision
| Rev | Date | Author | Change Description |
|-----|------------|--------|--------------------|
| 0.1 | 2026-05-19 | Pavan Kalyan Nakka | Initial version |
| 0.2 | 2026-05-20 | Pavan Kalyan Nakka | Addressed review comments |

## 2. Scope

This document describes changes to use module advertised MaxDuration timeouts for CDB firmware commands, as defined in [CMIS 5.3](https://www.oiforum.com/wp-content/uploads/OIF-CMIS-05.3.pdf). It proposes which commands should use the module advertised MaxDuration values and which will retain fixed defaults for command execution time.

## 3. Definitions/Abbreviations

#### Table 1: Definitions
| **Term** | **Definition/Abbreviations** |
|-----------------|-----------------------------------------------------------------------|
| CDB | Command Data Block |
| CMIS | Common Management Interface Specification |
| LPL | Local Payload |
| EPL | Extended Payload |
| FW | Firmware |
| xcvrd | Transceiver Daemon |

## 4. Overview

The [CMIS 5.3](https://www.oiforum.com/wp-content/uploads/OIF-CMIS-05.3.pdf) defines per-command **MaxDuration** fields on Page 9Fh (bytes 144–153) that advertise the maximum time a module requires to execute specific CDB commands. These are U16 values in milliseconds, scaled by a multiplier **M** defined by bit 137.3:

- `0b` -> M = 1 (timeout in 1 ms units)
- `1b` -> M = 10 (timeout in 10 ms units)

The advertised MaxDuration values (bytes 144–153) are multiplied by M to get the effective timeout in milliseconds. The module populates these fields in the CDB 0041h (Firmware Management Features) reply.

**For example**: If MaxDurationStart is 100 and M = 10, the timeout for the Start (0101h) command is 100 × 10 = 1000 ms.

The current implementation applies a fixed timeout to most CDB commands. This can cause:

- **Premature timeouts** — Some commands may require more time to complete depending on the module.
- **Non-compliance** — the host must respect the module advertised MaxDuration when present.

This change uses the module advertised MaxDuration as the per-command timeout where defined. Commands without a MaxDuration field retain fixed defaults.

## 5. Requirements

### Functional Requirements

1. For CDB commands where the CMIS defines a MaxDuration field, use the module advertised value (plus a safety margin) as the command timeout.
2. For CDB commands where the CMIS does not define a MaxDuration field, use the fixed default timeout of 9960 ms (defined in [sonic-platform-common](https://github.com/sonic-net/sonic-platform-common/blob/master/sonic_platform_base/sonic_xcvr/cdb/cdb.py#L65-L66)).
3. Fall back to the default timeout when CDB 0041h is not supported by the module or returns invalid data.
4. Continue to use CDB status polling for early completion detection regardless of timeout value.
5. Add a safety margin on top of the advertised MaxDuration to account for latency.

### Non-Functional Requirements

1. No increase in firmware upgrade time for modules that already complete within the current timeout window.
2. Backward compatibility with modules that do not support CDB 0041h.

## 6. CDB Timeouts

The current default firmware CDB commands timeout in SONiC is **9960 ms**, defined in [sonic-platform-common](https://github.com/sonic-net/sonic-platform-common/blob/master/sonic_platform_base/sonic_xcvr/cdb/cdb.py#L65-L66). This fixed value is used for all firmware related CDB commands today.

This change modifies the timeout in the CDB command execution path (`CdbFwHandler` in `sonic-platform-common/sonic_platform_base/sonic_xcvr/cdb/cdb_fw.py`). The firmware upgrade flow queries MaxDuration values via CDB 0041h at the start, then passes the appropriate timeout (with a safety margin) to each subsequent CDB command:

```
┌──────────────────────────────────────────────────────────────────────┐
│ Firmware Upgrade │
├──────────────────────────────────────────────────────────────────────┤
│ CdbFwHandler │
│ │ │
│ ├── Query module for MaxDuration values (CDB 0041h) │
│ │ │ │
│ │ └── Parse per-command MaxDuration from response │
│ │ │
│ ├── Abort FW Download (0102h) ── use advertised timeout │
│ ├── Start FW Download (0101h) ── use advertised timeout │
│ ├── Write FW Block (0103h/0104h) ── use advertised timeout │
│ ├── Complete FW Download (0107h) ── use advertised timeout │
│ ├── Run FW Image (0109h) ── use default timeout │
│ └── Commit FW Image (010Ah) ── use default timeout │
└──────────────────────────────────────────────────────────────────────┘
```
**Note:** The effective timeout for advertised commands is `MaxDuration × M + SAFETY_MARGIN`. This ensures the module is not timed out prematurely due to communication overhead outside of the module's control.

## 7. High-Level Design

### 7.1 Timeout Resolution Flow

When a CDB command is issued during firmware upgrade, the timeout is determined as follows:

```
CDB command issued
Does the CMIS spec define a MaxDuration for this command (via CDB 0041h)?
├── Yes ──► Use the module advertised MaxDuration + SAFETY_MARGIN as the timeout ─┐
│ │
└── No ──► Use the fixed default timeout ─────────────────────────────────────────┤
Poll CDB status register for early completion regardless of timeout value
```

### 7.2 Commands Using Module Advertised MaxDuration (CMIS Compliant)

The following commands have their MaxDuration defined on Page 9Fh (bytes 144–153) and returned in the CDB 0041h response:

#### Table 2: Commands Using Advertised MaxDuration
| Command | CMD ID | MaxDuration Field | Byte Offset (Page 9Fh) |
|---------|--------|-------------------|------------------------|
| Start FW Download | 0101h | MaxDurationStart | 144–145 |
| Abort FW Download | 0102h | MaxDurationAbort | 146–147 |
| Write FW Block (LPL) | 0103h | MaxDurationWrite | 148–149 |
| Write FW Block (EPL) | 0104h | MaxDurationWrite | 148–149 |
| Complete FW Download | 0107h | MaxDurationComplete | 150–151 |

If CDB 0041h is not supported, these commands fall back to the default timeout.

### 7.3 CDB Commands in the Firmware Upgrade Workflow

The following table lists all CDB commands involved in the firmware upgrade process along with proposed timeout behavior. Commands marked "Default" use the fixed default timeout of **9960 ms** (as defined in [sonic-platform-common](https://github.com/sonic-net/sonic-platform-common/blob/master/sonic_platform_base/sonic_xcvr/cdb/cdb.py#L65-L66)) except the Run FW Image command that has its own fixed timeout value (as defined in [sonic-platform-common](https://github.com/sonic-net/sonic-platform-common/blob/master/sonic_platform_base/sonic_xcvr/fields/cdb_consts.py#L61)):

#### Table 3: Upgrade Workflow CDB Commands
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pnakka28 please highlight what is the default timeout value used in sonic

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prgeor addressed this to highlight default timeout

| # | Command | CMD ID | MaxDuration |
|---|---------|--------|-------------|
| 1 | Firmware Management Features | 0041h | Default |
| 2 | Get FW Info | 0100h | Default |
| 3 | Start FW Download | 0101h | Advertised |
| 4 | Abort FW Download | 0102h | Advertised |
| 5 | Write FW Block (LPL) | 0103h | Advertised |
| 6 | Write FW Block (EPL) | 0104h | Advertised |
| 7 | Complete FW Download | 0107h | Advertised |
| 8 | Run FW Image | 0109h | Default |
| 9 | Commit FW Image | 010Ah | Default |
| 10 | Enter Password | 0001h | Default |

### 7.4 Repository to Change

| Repository | Files to Modify |
|-----------------------|-----------------|
| sonic-platform-common | `sonic_platform_base/sonic_xcvr/cdb/cdb_fw.py`, `sonic_platform_base/sonic_xcvr/cdb/cdb.py`, `sonic_platform_base/sonic_xcvr/mem_maps/public/cdb.py`, `sonic_platform_base/sonic_xcvr/fields/cdb_consts.py` |

**Changes:**

1. **Parse MaxDuration in `initFwHandler()`** — Extend the existing CDB 0041h response parsing to extract MaxDuration fields (bytes 144–153), read the M multiplier from bit 137.3, multiply the U16 values by M, and store the resulting ms values as instance attributes.

2. **Pass timeouts to `send_cmd()`** — Update the following methods to pass their respective MaxDuration value + SAFETY_MARGIN:
- `start_fw_download()` -> `timeout_start + SAFETY_MARGIN`
- `abort_fw_download()` -> `timeout_abort + SAFETY_MARGIN`
- `write_lpl_block()` / `write_epl_block()` -> `timeout_write + SAFETY_MARGIN`
- `complete_fw_download()` -> `timeout_complete + SAFETY_MARGIN`

3. **Fallback** — If the 0041h reply does not contain valid MaxDuration values (e.g., zero or module doesn't support it), retain `None` so `wait_for_cdb_status()` falls back to its existing default.

No changes to API signatures, xcvrd, or any higher-level callers.

## 8. SAI API

N/A

## 9. Configuration and Management

N/A

## 10. Warmboot and Fastboot Design Impact

N/A

## 11. Restrictions/Limitations

- **Modules not supporting CDB 0041h**: Fall back to the default timeout for all commands.
- **Zero-value advertisements**: A module advertising 0 ms is treated as "use default timeout" to avoid immediate timeout failures.
- **Non-advertised commands**: Commands that do not advertise MaxDuration will rely solely on default or fixed timeouts.

## 12. Testing Requirements

1. Verify advertised MaxDuration is used as timeout for supported CDB commands.
2. Verify fallback to default timeout when CDB 0041h is not supported.
3. Verify that a 0 ms advertised value falls back to the default timeout.
4. Perform firmware upgrade end-to-end and verify no timeout failures on supported modules.

## References

- [OIF CMIS 5.3](https://www.oiforum.com/wp-content/uploads/OIF-CMIS-05.3.pdf) — Section 9.4.2, Page 9Fh MaxDuration fields (bytes 144–153)
- [sonic-platform-common](https://github.com/sonic-net/sonic-platform-common/blob/master/sonic_platform_base/sonic_xcvr/cdb/cdb_fw.py) — CDB firmware commands implementation