ci: migrate Linux workflow to RunsOn self-hosted runners by mrpollo · Pull Request #14367 · mavlink/qgroundcontrol

mrpollo · 2026-05-12T21:02:21Z

This PR migrates the Linux workflow (.github/workflows/linux.yml) from GitHub-hosted runners to RunsOn self-hosted ephemeral EC2 runners. Runners are backed by the new qgc-ci-runs-on CloudFormation stack in the Dronecode AWS account (us-west-2, RunsOn v3.0.6). x86_64 builds run on c8i.2xlarge, ARM64 on c8g.2xlarge, the Debug+coverage test job on m8i.2xlarge — all 8 vCPU, On-Demand, ubuntu24-full-* images.

Wall-clock impact

Measured against the most recent successful GitHub-hosted run on master vs. the equivalent run on this branch:

Job	GitHub-hosted	RunsOn (this PR)	Δ
Release linux_gcc_64	36m 40s	15m 10s	-22m 30s (-59%)
Release linux_gcc_arm64	21m 26s	13m 04s	-9m 22s (-39%)
Test + Coverage linux_gcc_64	30m 43s	20m 21s	-11m 22s (-34%)
Wall-clock total (parallel)	36m 40s	20m 21s	-17m 19s (-44%)

Caches are still cold on this run — no master push has populated the Ubuntu 24.04 warm caches yet. Once this merges, the next push to master populates ccache-linux-…-shared-, cpm-modules-shared-, pipx-linux-…, and qt-linux-desktop-… for everyone, and PR runs should drop another 5-8 min off both Release builds.

Context

PX4-Autopilot has been running on RunsOn for over a year in the same account; this migration brings QGC onto the same infrastructure. Once this merges, the same pattern (inline runs-on= labels plus the runs-on/action@v2 step) can be applied to analysis.yml, pr-checks.yml, and the Linux portions of pre-commit.yml / ci-scripts.yml with very small diffs.

Runner choices

Job	Instance	OnD $/h	Why
Release linux_gcc_64	c8i.2xlarge	$0.375	Intel Granite Rapids, 16 GB, full-fat CPU
Release linux_gcc_arm64	c8g.2xlarge	$0.319	Graviton4, 16 GB, cheapest in the table
Test+Coverage linux_gcc_64 Debug	m8i.2xlarge (`volume=60gb`)	$0.423	32 GB headroom for tests + .gcda

On-Demand on the latest generation, deliberately. Earlier iterations on Spot c7i/m7i hit reclaims mid-build. The headline cost is ~$0.30-0.40 per Linux PR across all three jobs, dominated by engineer-time considerations rather than compute spend.

Caching

extras=s3-cache on each runner label plus runs-on/action@v2 as the first step bootstraps RunsOn's "magic cache" — a sidecar that transparently intercepts every actions/cache@v5 call and redirects it to the S3 bucket provisioned by the stack. Existing cache calls for ccache, Qt SDK, GStreamer, pipx, apt packages, and CPM modules all work unchanged. runs-on/action@v2 is a no-op on GitHub-hosted runners so the workflow stays portable if anyone needs to revert.

Matrix cleanup

The build job's matrix used os: [ubuntu-24.04-arm, ubuntu-22.04] both as the runner label and as a discriminator for two size-analysis steps (if: matrix.os == 'ubuntu-22.04'). With RunsOn we don't need os to pick the runner. The dual-purpose field is gone and matrix.arch is the single discriminator. The two if: conditions changed from matrix.os == 'ubuntu-22.04' to matrix.arch == 'linux_gcc_64' which is what they always meant ("x86_64 only").

OS bump

The x86_64 Release build's host OS moves from Ubuntu 22.04 → Ubuntu 24.04. That bumps the AppImage's glibc baseline (22.04 ships glibc 2.35 vs. 24.04's 2.39), so AppImages produced here won't run on hosts with older glibc (RHEL 8, Ubuntu 20.04, Debian 11). If supporting older distros matters for QGC's release builds, the image= parameter in the runner label can be switched back to ubuntu22-full-x64 with no other workflow changes.

Test execution restructured

cmake/QGCTest.cmake:164 auto-attaches RESOURCE_LOCK "MockLink" to every Integration test because MockLink shares a LinkManager singleton and a static _nextVehicleSystemId counter. A single CTest invocation over both labels with --parallel auto silently serialized everything on that lock. Split into two passes:

Run Unit Tests (parallel): -L Unit, --parallel auto. 151 Unit tests with no shared state.
Run Integration Tests (serial): -L Integration, --parallel 1. 37 Integration tests serialize on shared MockLink state.

Each pass writes its own junit + ctest output; downstream Analyze / Report / Upload steps run once per pass. The coverage path picks up .gcda from both passes via the existing find . -name '*.gcda'. The split is responsible for most of the test job's wall-clock improvement (process-startup overhead across 151 Unit invocations was the hidden bottleneck, not test execution itself).

The tester runner uses volume=60gb. The 40 GB default left only ~1-2 GB headroom at peak (Debug build + Qt SDK + caches + .gcda + scratch), which silently killed the runner agent before any diagnostic step could write to its log — caught after three identical failures by SSM'ing into the live runner.

Companion fix: `QGCKeychain` headless D-Bus fallback

The first PR run failed SigningTest and QGCKeychainTest because the Ubuntu 24.04 image doesn't run a Secret Service daemon by default, and libsecret reports "Cannot autolaunch D-Bus without X11 \$DISPLAY" instead of the patterns the existing isMissingSecretService() knew about. Recognize the autolaunch / missing-session-bus error strings as the same "no backend" condition so the QSettings fallback kicks in, matching the behavior on macOS/Windows when no keychain is configured.

This is its own commit so it can be backed out independently if the keychain fix is contentious.

Testing

The PR's own CI run validates:

Both Release builds complete on RunsOn (~13-15 min cold-cache, expect ~7-8 min warm after merge).
The QGCKeychain fallback lets SigningTest and QGCKeychainTest pass on headless Ubuntu 24.04.
Coverage report still produces valid coverage.xml, picking up .gcda from both Unit and Integration passes.

Related infra

RunsOn stack: qgc-ci-runs-on (CloudFormation, us-west-2)
GitHub App: Dronecode Infra, installed on mavlink/qgroundcontrol
Same Dronecode AWS account also runs px4-ci (RunsOn v2.12.6) for PX4-Autopilot

github-actions · 2026-05-12T21:04:34Z

Build Results

Platform Status

Platform	Status	Details
Linux	Passed	View
Windows	Passed	View
MacOS	Passed	View
Android	Passed	View

All builds passed.

Pre-commit

Check	Status	Details
pre-commit	Failed (non-blocking)	View

Pre-commit hooks: 4 passed, 45 failed, 7 skipped.

Test Results

linux-coverage-integration: 16 passed, 0 skipped
linux-coverage-unit: 74 passed, 0 skipped
Total: 90 passed, 0 skipped

Code Coverage

Coverage: 59.2%

No baseline available for comparison

Artifact Sizes

Artifact	Size
QGroundControl	216.86 MB
QGroundControl-aarch64	176.58 MB
QGroundControl-installer-AMD64	134.71 MB
QGroundControl-installer-AMD64-ARM64	77.33 MB
QGroundControl-installer-ARM64	106.06 MB
QGroundControl-linux	335.18 MB
QGroundControl-mac	187.14 MB
QGroundControl-windows	187.15 MB
QGroundControl-x86_64	188.59 MB
No baseline available for comparison

_{Updated: 2026-05-13 19:35:11 UTC • Triggered by: Android}

isMissingSecretService() routes "no Secret Service reachable" errors through the QSettings fallback so reads and writes still succeed. It only matched libsecret's org.freedesktop.secrets / ServiceUnknown patterns, which covers a Secret Service daemon not being registered on the bus, but not the case where there is no session bus at all. On a headless host (CI without dbus-launch / gnome-keyring, Docker, embedded test rigs) libsecret instead reports: "Cannot autolaunch D-Bus without X11 \$DISPLAY" That's a QKeychain::OtherError, did not match the existing patterns, and dropped through to the terminal error branch — QGCKeychain::write returned false and QGCKeychainTest / SigningTest failed. Recognize the autolaunch and missing-session-bus messages as the same "no backend" condition so the fallback kicks in and callers see the behavior they already see on macOS/Windows when no keychain is configured. Signed-off-by: Ramon Roche <mrpollo@gmail.com>

Move the Linux build and debug-validation jobs from GitHub-hosted runners to RunsOn ephemeral EC2 runners (qgc-ci-runs-on stack in us-west-2, RunsOn v3.0.6). x86_64 builds now run on c8i.2xlarge, ARM64 on c8g.2xlarge, the Debug+coverage test job on m8i.2xlarge — all 8 vCPU, On-Demand, ubuntu24-full-* images. Runner labels are inline in the workflow so this PR is self-contained (named runner profiles in .github/runs-on.yml require the config on the default branch first); the named profiles are kept in the tree for the next workflow migration to reuse. Caching: extras=s3-cache + runs-on/action@v2 transparently redirect all existing actions/cache@v5 calls (ccache, Qt SDK, GStreamer, pipx, apt, CPM) to the S3 bucket provisioned by the stack. runs-on/action is a no-op on GitHub-hosted runners so the workflow stays portable. Matrix cleanup: dropped the dual-purpose `matrix.os` field on the build job (it was both a runner selector and a discriminator for two size-analysis steps). matrix.arch is now the single discriminator; the previous `matrix.os == 'ubuntu-22.04'` conditions on lines 128 and 135 now correctly read `matrix.arch == 'linux_gcc_64'`. Both architectures build on Ubuntu 24.04 (was 22.04 for x64, 24.04 for ARM). This bumps the AppImage glibc baseline from 2.35 to 2.39; older distros (RHEL 8, Ubuntu 20.04, Debian 11) won't run binaries produced here. Test execution restructured by label. cmake/QGCTest.cmake:164 auto-attaches RESOURCE_LOCK "MockLink" to every Integration test because MockLink shares a LinkManager singleton and static _nextVehicleSystemId counter; a single CTest invocation over both labels with --parallel auto silently serialized everything on that lock. Split into two passes: - Run Unit Tests (parallel): -L Unit, --parallel auto. 151 Unit tests with no shared state. - Run Integration Tests (serial): -L Integration, --parallel 1. 37 Integration tests serialize on shared MockLink state. Each pass writes its own junit + ctest output; downstream Analyze / Report / Upload steps run once per pass. Coverage path picks up .gcda from both passes via the existing find . -name '*.gcda'. Tester runner uses volume=60gb (40GB default left ~1-2GB headroom at peak with the Debug build + Qt SDK + caches + .gcda + scratch, which silently killed the agent before any diagnostic could run). Signed-off-by: Ramon Roche <mrpollo@gmail.com>

mrpollo · 2026-05-13T19:15:56Z

This is ready for review @HTRamsey once we get this one in, I can help you migrate the rest, starting with Windows

Copilot

Pull request overview

This PR migrates QGroundControl’s Linux CI workflow to RunsOn self-hosted ephemeral runners (EC2) to reduce build/test wall-clock time, while also adjusting test execution to avoid unintended serialization and improving QGCKeychain behavior on headless Linux images.

Changes:

Migrate .github/workflows/linux.yml from GitHub-hosted runners to RunsOn labels (with runs-on/action@v2) and clean up the build matrix to use arch as the discriminator.
Restructure the Debug test job to run Unit tests in a parallel pass and Integration tests in a serial pass, producing separate artifacts/reports per pass.
Expand Linux libsecret “no backend/no session bus” detection in QGCKeychain so headless CI falls back to QSettings instead of failing.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
`src/Utilities/Platform/QGCKeychain.cc`	Broadens detection of headless/missing Secret Service conditions to trigger the existing QSettings fallback.
`.github/workflows/linux.yml`	Switches Linux jobs to RunsOn runners + magic cache and splits Unit vs Integration test execution/reporting.
`.github/runs-on.yml`	Adds RunsOn runner definitions (family/image/extras/volume) for the repository.

github-actions Bot added github_actions Pull requests that update GitHub Actions code RN: IMPROVEMENT size/XS labels May 12, 2026

github-actions Bot added CMake Tools size/S size/XS and removed size/XS CMake Tools size/S labels May 13, 2026

mrpollo added 2 commits May 13, 2026 11:44

mrpollo force-pushed the mrpollo/runs-on-linux-ci branch from 5ac0ef2 to c5de6f3 Compare May 13, 2026 18:45

mrpollo marked this pull request as ready for review May 13, 2026 19:15

mrpollo requested a review from HTRamsey as a code owner May 13, 2026 19:15

Copilot AI review requested due to automatic review settings May 13, 2026 19:15

Copilot started reviewing on behalf of mrpollo May 13, 2026 19:16 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

Comment thread .github/workflows/linux.yml

Comment thread .github/workflows/linux.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci: migrate Linux workflow to RunsOn self-hosted runners#14367

ci: migrate Linux workflow to RunsOn self-hosted runners#14367
mrpollo wants to merge 2 commits into
masterfrom
mrpollo/runs-on-linux-ci

mrpollo commented May 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 12, 2026 •

edited

Loading

Uh oh!

mrpollo commented May 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mrpollo commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Wall-clock impact

Context

Runner choices

Caching

Matrix cleanup

OS bump

Test execution restructured

Companion fix: QGCKeychain headless D-Bus fallback

Testing

Related infra

Uh oh!

github-actions Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Build Results

Platform Status

Pre-commit

Test Results

Code Coverage

Artifact Sizes

Uh oh!

mrpollo commented May 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mrpollo commented May 12, 2026 •

edited

Loading

Companion fix: `QGCKeychain` headless D-Bus fallback

github-actions Bot commented May 12, 2026 •

edited

Loading