diff --git a/doc/hotpatch/hotpatch_hld.md b/doc/hotpatch/hotpatch_hld.md new file mode 100644 index 00000000000..d77a3a74968 --- /dev/null +++ b/doc/hotpatch/hotpatch_hld.md @@ -0,0 +1,649 @@ +# SONiC Hotpatch HLD + +## Table of Content + +- [1. Revision](#1-revision) +- [2. Scope](#2-scope) +- [3. Definitions/Abbreviations](#3-definitionsabbreviations) +- [4. Overview](#4-overview) +- [5. Requirements](#5-requirements) +- [6. Architecture Design](#6-architecture-design) +- [7. High-Level Design](#7-high-level-design) + - [7.1 Patch Packaging Model](#71-patch-packaging-model) + - [7.2 Patch Metadata Format](#72-patch-metadata-format) + - [7.3 Patch Type Details](#73-patch-type-details) + - [7.4 Installation Flow](#74-installation-flow) + - [7.5 Rollback Mechanism](#75-rollback-mechanism) + - [7.6 Build Pipeline](#76-build-pipeline) + - [7.7 Repository Changes](#77-repository-changes) + - [7.8 Serviceability and Debug](#78-serviceability-and-debug) +- [8. SAI API](#8-sai-api) +- [9. Configuration and Management](#9-configuration-and-management) + - [9.1 CLI Enhancements](#91-cli-enhancements) + - [9.2 CLI/YANG Model Enhancements](#92-cliyang-model-enhancements) + - [9.3 Config DB Enhancements](#93-config-db-enhancements) +- [10. Warmboot and Fastboot Design Impact](#10-warmboot-and-fastboot-design-impact) +- [11. Memory Consumption](#11-memory-consumption) +- [12. Restrictions/Limitations](#12-restrictionslimitations) +- [13. Testing Requirements/Design](#13-testing-requirementsdesign) + - [13.1 Unit Test Cases](#131-unit-test-cases) + - [13.2 System Test Cases](#132-system-test-cases) + + +### 1. Revision + + +| Rev | Date | Author | Change Description | +| :-: | :--------: | :--------: | :----------------- | +| 0.1 | 2026-05-06 | Kang Jiang | Initial version | + +### 2. Scope + +This document covers the high-level design of the SONiC Hotpatch feature, which enables applying software patches to a running SONiC system without requiring a full image upgrade or reboot. The scope includes: + +- Six patch types: docker image replacement, Debian package upgrade, script deployment, hook script execution, function-level live patching, and Dockerfile-based image building +- Hierarchical patch packaging model (Hotfix Package -> Sub-patches -> Packages) +- Integration with `sonic_installer patch_install` CLI +- Automatic backup and ordered rollback mechanism +- Patch build pipeline tooling + +### 3. Definitions/Abbreviations + + +| Term | Definition | +| :------------- | :------------------------------------------------------------------------------------ | +| Hotpatch | A software patch applied to a running system without a full reboot | +| Hotfix Package | The top-level distributable archive containing one or more sub-patches | +| Sub-patch | An individual patch unit within a Hotfix Package, containing packages and metadata | +| func_hotpatch | Function-level binary patching applied to a running process in-memory | +| libcare-ctl | Userspace live-patching tool for applying binary patches to running processes | +| kpatch | Binary patch file format (`.kpatch`) used for function-level hotpatching | +| dpkg-repack | Tool to recreate a`.deb` package from an installed Debian package (used for rollback) | +| patch_info.yml | YAML manifest describing patch metadata, package list, checksums, and types | +| summary.yml | YAML file tracking the ordered list of sub-patches within a Hotfix Package | + +### 4. Overview + +Production SONiC switches frequently require urgent bug fixes and security patches. Full image upgrades cause significant downtime, and even warm-reboot introduces control-plane disruption. The Hotpatch feature provides a targeted, surgical patching mechanism with minimal or zero service impact. + +The hotpatch system provides a hierarchical packaging and installation framework that supports six distinct patch types, each targeting a different layer of the SONiC software stack: + + +| Patch Type | Target Layer | Service Disruption | +| :------------ | :----------------------------- | :---------------------- | +| docker | Docker container image | Brief (service restart) | +| debian | Debian package | None to minimal | +| script | File on host filesystem | None | +| hook_script | Custom logic with run/rollback | Depends on script | +| func_hotpatch | Running process (in-memory) | None (zero-downtime) | +| dockerfile | Docker image (cold build) | None until reboot | + +The feature operates as a plugin to the existing `sonic_installer` framework. Patches are distributed as self-contained tar.gz archives with embedded integrity verification and automatic rollback on failure. + +### 5. Requirements + +#### Functional Requirements + +1. Support six patch types: `docker`, `debian`, `script`, `hook_script`, `func_hotpatch`, `dockerfile` +2. Hierarchical packaging: Hotfix Packages contain ordered sub-patches; sub-patches contain ordered packages +3. Cumulative patching: A new Hotfix Package aggregates all prior sub-patches (e.g., Hotfix3 includes sub-Hotfix1, sub-Hotfix2, sub-Hotfix3) +4. OS version compatibility checking before installation, supporting operators: `=`, `>=`, `>`, `<=`, `<` +5. MD5 checksum verification for every package within a sub-patch +6. Automatic backup before each package installation (docker image IDs, dpkg-repack, file copies) +7. Ordered rollback on failure: if package N fails, packages 0..N-1 are rolled back in reverse order +8. Uninstall capability: full reverse-order rollback of all packages +9. `func_hotpatch` duplicate detection: skip if same PatchId is already applied to a process +10. Integration with `sonic_installer patch_install ` CLI +11. Syslog logging of all installation steps +12. Support for `RestartForActive` flag (cold-upgrade patches that activate on next reboot) +13. Background progress prompting for long-running patch operations + +#### Non-Functional Requirements + +1. Zero data-plane disruption for `func_hotpatch`, `script`, `hook_script`, and `debian` types +2. Minimal data-plane disruption for `docker` type (only during service restart) +3. Atomic per-package semantics: each package either fully installs or fully rolls back +4. Compatible with both Python 2 and Python 3 runtimes + +### 6. Architecture Design + +The hotpatch feature does not change the existing SONiC architecture. It operates as a plugin to the `sonic_installer` framework. + +When `sonic_installer patch_install ` is invoked, the system extracts the patch archive and dynamically imports `install.py` from within it. The plugin exposes a well-defined interface contract: + +```python +# Return values +SUCCESS = 0 +FAIL = 1 + +# Mandatory exports +get_patch_name() # Returns patch name string +get_patch_desc() # Returns patch description string +do_patch_install() # Executes patch installation, returns (status, output) +do_patch_uninstall() # Executes patch uninstallation, returns (status, output) +``` + +The following diagram shows how the hotpatch system fits within the SONiC architecture: + +![Hotpatch Architecture](images/hotpatch_architecture.png "Figure 1: Hotpatch Architecture") + +### 7. High-Level Design + +#### 7.1 Patch Packaging Model + +The hotpatch system uses a two-tier hierarchical packaging model: + +**Tier 1: Hotfix Package** (e.g., `Hotfix3-SONiC-rel-x.y.z.tar.gz`) + +- Top-level distributable archive +- Contains a `summary.yml` tracking all sub-patches with MD5 checksums +- Contains one or more sub-patch archives +- Cumulative: each new Hotfix includes all prior sub-patches +- Accompanied by `.tar.gz.md5` file for integrity verification + +**Tier 2: Sub-patch** (e.g., `sub-Hotfix1.tar.gz`) + +- Self-contained patch unit +- Contains `patch_info.yml` manifest +- Contains `install.py` (the installation plugin) +- Contains actual package files (docker images, .deb files, scripts, .kpatch files, Dockerfiles) + +``` +Hotfix3-SONiC-rel-x.y.z.tar.gz +|-- summary.yml +|-- sub-Hotfix1.tar.gz +| |-- patch_info.yml +| |-- install.py +| +-- hook_script_disable_queue_watermark.sh +|-- sub-Hotfix2.tar.gz +| |-- patch_info.yml +| |-- install.py +| +-- docker-swss.tar.gz ++-- sub-Hotfix3.tar.gz + |-- patch_info.yml + |-- install.py + |-- orchagent_xxx.kpatch + +-- zebra_yyy.kpatch +``` + +**summary.yml format:** + +```yaml +os_version: SONiC-rel-x.y.z +patches: + - name: sub-Hotfix1.tar.gz + md5sum: + - name: sub-Hotfix2.tar.gz + md5sum: + - name: sub-Hotfix3.tar.gz + md5sum: +``` + +#### 7.2 Patch Metadata Format + +Each sub-patch contains a `patch_info.yml` manifest with the following schema: + +```yaml +Patch name: .tar.gz +Patch extern name: # Optional, shown in `show version` +Patch description: +Patch type: +SONiC version: # Supports operators: =/>=/>/<=/< (default =) +Previous patch: # Optional, for dependency tracking +Packages: + - Package: + Type: + Md5sum: + # Type-specific fields (see below) +``` + +**Patch type semantics:** + +- **`normal`**: Standard patch. The patch archive is not persisted after installation; it will not be re-applied automatically after a system reboot. +- **`hotpatch`**: Hotpatch patch. The patch archive is copied to `/usr/share/sonic/hotpatches/` during installation. After a system reboot, the `hotpatches-auto-install` service automatically re-applies all hotpatches in order. + +**Type-specific fields:** + + +| Type | Required Fields | +| :------------ | :-------------------------------------------------- | +| docker | Name, Version, Service, RestartForActive (optional) | +| debian | Name | +| script | Target directory | +| hook_script | (none) | +| func_hotpatch | ProcessName, PatchId | +| dockerfile | Name, Version, Service | + +**Optional flags (all types):** + +- `Background prompt`: yes/no - enables progress prompts for long-running operations + +#### 7.3 Patch Type Details + +##### 7.3.1 Docker Image Replacement (`docker`) + +**Install flow:** + +1. Save current docker image ID to `backup/_orig_image_id` +2. Remove current `:latest` tag: `docker rmi :latest` +3. Load new image: `docker load < ` +4. Restart service: `systemctl stop/start ` +5. Tag with version: `docker tag :latest :` + +**Cold-upgrade mode** (when `RestartForActive=True`): + +1. Load new image and tag (no service restart) +2. Create flag file: `/etc/sonic/boot_from_docker_image_` +3. Patch becomes active after next reboot + +**Rollback:** Re-tag original image ID as `:latest`, restart service. + +**Special handling:** SNMP service uses timer-based startup pattern requiring different restart logic. + +##### 7.3.2 Debian Package Upgrade (`debian`) + +**Install flow:** + +1. Backup current package: `dpkg-repack ` to `backup/` +2. Remove old package: `dpkg -r ` +3. Install new package: `dpkg -i ` + +**Rollback:** Remove new package, install backed-up .deb from `backup/`. + +##### 7.3.3 Script Deployment (`script`) + +**Install flow:** + +1. Backup existing target file to `backup/` (preserving path structure) +2. Copy permissions and ownership from target to new file +3. Copy new file to target directory: `cp -p /` + +**Rollback:** Restore original file from backup; if no backup exists, remove the deployed file. + +##### 7.3.4 Hook Script Execution (`hook_script`) + +**Install flow:** + +1. Set executable permission on script +2. Execute: `./