Skip to content

resctrl-mon: add NRI plugin for per-pod resctrl monitoring groups#666

Open
cmcantalupo wants to merge 2 commits intocontainers:mainfrom
cmcantalupo:resctrl-mon
Open

resctrl-mon: add NRI plugin for per-pod resctrl monitoring groups#666
cmcantalupo wants to merge 2 commits intocontainers:mainfrom
cmcantalupo:resctrl-mon

Conversation

@cmcantalupo
Copy link
Copy Markdown

Description

Add nri-resctrl-mon, a standalone NRI plugin that creates per-pod resctrl monitoring groups (mon_groups) to support passive monitoring of Application Energy Telemetry (AET) via consumers.

Motivation

Userspace daemon approaches to resctrl mon_group management suffer from a fork race: a container's first threads can execute before the daemon writes their PIDs into the mon_group's tasks file, causing energy attribution gaps. By using the NRI PostCreateContainer hook, this plugin assigns container PIDs to mon_groups before exec/fork, eliminating the race entirely.

What's included

Commit 1: Core plugin

  • Plugin source (main.go, plugin.go, resctrl.go, state.go)
  • Unit tests (plugin_test.go, resctrl_test.go)
  • Dockerfile following the nri-memory-qos pattern
  • Helm chart (Chart.yaml, values.yaml, templates, JSON schema)
  • Documentation (new "monitoring" category, plugin docs, Helm docs)
  • Sample configuration

Commit 2: Counter snapshots

  • Snapshot store (snapshot.go, snapshot_test.go) that mirrors the resctrl mon_data layout at /run/nri-resctrl-mon/<pod-uid>/
  • On first PID write: records .begin counter files and creates symlinks to live kernel event files
  • On last container stop: reads final counters as .end files, removes symlinks, writes completed_at
  • Consumers compute lifetime energy as end − begin with no new parsing logic
  • Completed snapshots are pruned after a configurable TTL (default 5 min)
  • New config keys: snapshotDir, snapshotTTL

Snapshot directory layout

/run/nri-resctrl-mon/<pod-uid>/
    created_at
    completed_at
    mon_data/
        mon_PERF_PKG_00/
            core_energy          → symlink to live resctrl counter (removed on stop)
            core_energy.begin    # counter value at mon_group creation
            core_energy.end      # counter value at mon_group removal
        mon_L3_00/
            llc_occupancy
            llc_occupancy.begin
            llc_occupancy.end
            ...

Design decisions

  • Hook selection: PostCreateContainer ensures PIDs are assigned before any user code runs.
  • Kernel RMID management: mkdir/rmdir on resctrl delegates RMID lifecycle to the kernel, avoiding userspace exhaustion bugs.
  • Symlink-based live access: While a pod is running, consumers can read live counters via symlinks without knowing the resctrl path. Symlinks are removed atomically when .end files are written (before the mon_group directory is removed), so dangling symlinks never exist.
  • Crash recovery: Synchronize re-creates mon_groups for running pods on plugin restart.
  • Minimal privileges: SYS_ADMIN + DAC_OVERRIDE only (no privileged: true).

Testing

  • Unit tests cover plugin lifecycle, resctrl operations, snapshot store, TTL pruning, and path traversal rejection.
  • Manually validated full pod lifecycle (create → run → delete → prune) on a Clearwater Forest system with k3s, confirming correct .begin/.end values and symlink cleanup.

Stats

24 files changed, 2861 insertions(+), 1 deletion(-)

Add nri-resctrl-mon, a standalone NRI plugin that creates per-pod resctrl
monitoring groups (mon_groups) to support passive monitorning of
Application Energy Telemetry (AET).

The plugin uses the PostCreateContainer hook to assign container PIDs to
mon_groups before exec/fork, eliminating the fork race that plagues
userspace daemon approaches. RMID allocation is delegated to the kernel
via mkdir/rmdir on the resctrl filesystem.

Includes:
- Plugin source (main.go, plugin.go, resctrl.go, state.go)
- Unit tests (plugin_test.go, resctrl_test.go)
- Dockerfile following nri-memory-qos pattern
- Helm chart (Chart.yaml, values.yaml, templates/, schema)
- Documentation (monitoring category, plugin docs, Helm docs)
- Sample configuration

Signed-off-by: Christopher M. Cantalupo <christopher.m.cantalupo@intel.com>
Signed-off-by: Jedrzej Wasiukiewicz <jedrzej.wasiukiewicz@intel.com>
Record initial and final resctrl counter values so that consumers
(e.g. Kepler) can account for energy consumed between the last scrape
and mon_group removal, and between mon_group creation and the first
scrape.

The snapshot directory mirrors the resctrl mon_data layout:

    /run/nri-resctrl-mon/<pod-uid>/
        created_at
        completed_at
        mon_data/mon_PERF_PKG_00/
            core_energy         → /sys/fs/resctrl/.../core_energy
            core_energy.begin
            core_energy.end

On first PID write (StartContainer / PostStartContainer / Synchronize),
the plugin reads all mon_data event files and writes .begin counter
files alongside symlinks to the live resctrl event files.  On the last
container stop (StopContainer), the counters are read again as .end
files.  Consumers compute lifetime totals as: end - begin.

This format lets existing resctrl consumers discover and use the
snapshot data with no new parsing logic — the same cat/read operations
work on .begin/.end files as on kernel event files.  If the snapshot
directory is absent, consumers fall back to direct resctrl scraping.

Completed snapshot directories are pruned after a configurable TTL
(default 5m).

New configuration keys: snapshotDir, snapshotTTL.

Signed-off-by: Christopher M. Cantalupo <christopher.m.cantalupo@intel.com>
Signed-off-by: Jedrzej Wasiukiewicz <jedrzej.wasiukiewicz@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant