ci: migrate Linux workflow to RunsOn self-hosted runners#14367
Conversation
Build ResultsPlatform Status
All builds passed. Pre-commit
Pre-commit hooks: 4 passed, 45 failed, 7 skipped. Test Resultslinux-coverage-integration: 16 passed, 0 skipped Code CoverageCoverage: 59.2% No baseline available for comparison Artifact Sizes
Updated: 2026-05-13 19:35:11 UTC • Triggered by: Android |
isMissingSecretService() routes "no Secret Service reachable" errors through the QSettings fallback so reads and writes still succeed. It only matched libsecret's org.freedesktop.secrets / ServiceUnknown patterns, which covers a Secret Service daemon not being registered on the bus, but not the case where there is no session bus at all. On a headless host (CI without dbus-launch / gnome-keyring, Docker, embedded test rigs) libsecret instead reports: "Cannot autolaunch D-Bus without X11 \$DISPLAY" That's a QKeychain::OtherError, did not match the existing patterns, and dropped through to the terminal error branch — QGCKeychain::write returned false and QGCKeychainTest / SigningTest failed. Recognize the autolaunch and missing-session-bus messages as the same "no backend" condition so the fallback kicks in and callers see the behavior they already see on macOS/Windows when no keychain is configured. Signed-off-by: Ramon Roche <mrpollo@gmail.com>
Move the Linux build and debug-validation jobs from GitHub-hosted
runners to RunsOn ephemeral EC2 runners (qgc-ci-runs-on stack in
us-west-2, RunsOn v3.0.6). x86_64 builds now run on c8i.2xlarge,
ARM64 on c8g.2xlarge, the Debug+coverage test job on m8i.2xlarge —
all 8 vCPU, On-Demand, ubuntu24-full-* images.
Runner labels are inline in the workflow so this PR is self-contained
(named runner profiles in .github/runs-on.yml require the config on
the default branch first); the named profiles are kept in the tree
for the next workflow migration to reuse.
Caching: extras=s3-cache + runs-on/action@v2 transparently redirect
all existing actions/cache@v5 calls (ccache, Qt SDK, GStreamer, pipx,
apt, CPM) to the S3 bucket provisioned by the stack. runs-on/action
is a no-op on GitHub-hosted runners so the workflow stays portable.
Matrix cleanup: dropped the dual-purpose `matrix.os` field on the
build job (it was both a runner selector and a discriminator for two
size-analysis steps). matrix.arch is now the single discriminator;
the previous `matrix.os == 'ubuntu-22.04'` conditions on lines 128
and 135 now correctly read `matrix.arch == 'linux_gcc_64'`.
Both architectures build on Ubuntu 24.04 (was 22.04 for x64, 24.04
for ARM). This bumps the AppImage glibc baseline from 2.35 to 2.39;
older distros (RHEL 8, Ubuntu 20.04, Debian 11) won't run binaries
produced here.
Test execution restructured by label. cmake/QGCTest.cmake:164
auto-attaches RESOURCE_LOCK "MockLink" to every Integration test
because MockLink shares a LinkManager singleton and static
_nextVehicleSystemId counter; a single CTest invocation over both
labels with --parallel auto silently serialized everything on that
lock. Split into two passes:
- Run Unit Tests (parallel): -L Unit, --parallel auto. 151 Unit
tests with no shared state.
- Run Integration Tests (serial): -L Integration, --parallel 1.
37 Integration tests serialize on shared MockLink state.
Each pass writes its own junit + ctest output; downstream Analyze /
Report / Upload steps run once per pass. Coverage path picks up
.gcda from both passes via the existing find . -name '*.gcda'.
Tester runner uses volume=60gb (40GB default left ~1-2GB headroom
at peak with the Debug build + Qt SDK + caches + .gcda + scratch,
which silently killed the agent before any diagnostic could run).
Signed-off-by: Ramon Roche <mrpollo@gmail.com>
5ac0ef2 to
c5de6f3
Compare
|
This is ready for review @HTRamsey once we get this one in, I can help you migrate the rest, starting with Windows |
There was a problem hiding this comment.
Pull request overview
This PR migrates QGroundControl’s Linux CI workflow to RunsOn self-hosted ephemeral runners (EC2) to reduce build/test wall-clock time, while also adjusting test execution to avoid unintended serialization and improving QGCKeychain behavior on headless Linux images.
Changes:
- Migrate
.github/workflows/linux.ymlfrom GitHub-hosted runners to RunsOn labels (withruns-on/action@v2) and clean up the build matrix to usearchas the discriminator. - Restructure the Debug test job to run Unit tests in a parallel pass and Integration tests in a serial pass, producing separate artifacts/reports per pass.
- Expand Linux libsecret “no backend/no session bus” detection in
QGCKeychainso headless CI falls back to QSettings instead of failing.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/Utilities/Platform/QGCKeychain.cc |
Broadens detection of headless/missing Secret Service conditions to trigger the existing QSettings fallback. |
.github/workflows/linux.yml |
Switches Linux jobs to RunsOn runners + magic cache and splits Unit vs Integration test execution/reporting. |
.github/runs-on.yml |
Adds RunsOn runner definitions (family/image/extras/volume) for the repository. |
This PR migrates the Linux workflow (
.github/workflows/linux.yml) from GitHub-hosted runners to RunsOn self-hosted ephemeral EC2 runners. Runners are backed by the newqgc-ci-runs-onCloudFormation stack in the Dronecode AWS account (us-west-2, RunsOn v3.0.6). x86_64 builds run on c8i.2xlarge, ARM64 on c8g.2xlarge, the Debug+coverage test job on m8i.2xlarge — all 8 vCPU, On-Demand,ubuntu24-full-*images.Wall-clock impact
Measured against the most recent successful GitHub-hosted run on master vs. the equivalent run on this branch:
Caches are still cold on this run — no master push has populated the Ubuntu 24.04 warm caches yet. Once this merges, the next push to master populates
ccache-linux-…-shared-,cpm-modules-shared-,pipx-linux-…, andqt-linux-desktop-…for everyone, and PR runs should drop another 5-8 min off both Release builds.Context
PX4-Autopilot has been running on RunsOn for over a year in the same account; this migration brings QGC onto the same infrastructure. Once this merges, the same pattern (inline
runs-on=labels plus theruns-on/action@v2step) can be applied toanalysis.yml,pr-checks.yml, and the Linux portions ofpre-commit.yml/ci-scripts.ymlwith very small diffs.Runner choices
volume=60gb)On-Demand on the latest generation, deliberately. Earlier iterations on Spot c7i/m7i hit reclaims mid-build. The headline cost is ~$0.30-0.40 per Linux PR across all three jobs, dominated by engineer-time considerations rather than compute spend.
Caching
extras=s3-cacheon each runner label plusruns-on/action@v2as the first step bootstraps RunsOn's "magic cache" — a sidecar that transparently intercepts everyactions/cache@v5call and redirects it to the S3 bucket provisioned by the stack. Existing cache calls for ccache, Qt SDK, GStreamer, pipx, apt packages, and CPM modules all work unchanged.runs-on/action@v2is a no-op on GitHub-hosted runners so the workflow stays portable if anyone needs to revert.Matrix cleanup
The build job's matrix used
os: [ubuntu-24.04-arm, ubuntu-22.04]both as the runner label and as a discriminator for two size-analysis steps (if: matrix.os == 'ubuntu-22.04'). With RunsOn we don't needosto pick the runner. The dual-purpose field is gone andmatrix.archis the single discriminator. The twoif:conditions changed frommatrix.os == 'ubuntu-22.04'tomatrix.arch == 'linux_gcc_64'which is what they always meant ("x86_64 only").OS bump
The x86_64 Release build's host OS moves from Ubuntu 22.04 → Ubuntu 24.04. That bumps the AppImage's glibc baseline (22.04 ships glibc 2.35 vs. 24.04's 2.39), so AppImages produced here won't run on hosts with older glibc (RHEL 8, Ubuntu 20.04, Debian 11). If supporting older distros matters for QGC's release builds, the
image=parameter in the runner label can be switched back toubuntu22-full-x64with no other workflow changes.Test execution restructured
cmake/QGCTest.cmake:164auto-attachesRESOURCE_LOCK "MockLink"to every Integration test because MockLink shares aLinkManagersingleton and a static_nextVehicleSystemIdcounter. A single CTest invocation over both labels with--parallel autosilently serialized everything on that lock. Split into two passes:-L Unit,--parallel auto. 151 Unit tests with no shared state.-L Integration,--parallel 1. 37 Integration tests serialize on shared MockLink state.Each pass writes its own junit + ctest output; downstream Analyze / Report / Upload steps run once per pass. The coverage path picks up
.gcdafrom both passes via the existingfind . -name '*.gcda'. The split is responsible for most of the test job's wall-clock improvement (process-startup overhead across 151 Unit invocations was the hidden bottleneck, not test execution itself).The tester runner uses
volume=60gb. The 40 GB default left only ~1-2 GB headroom at peak (Debug build + Qt SDK + caches + .gcda + scratch), which silently killed the runner agent before any diagnostic step could write to its log — caught after three identical failures by SSM'ing into the live runner.Companion fix:
QGCKeychainheadless D-Bus fallbackThe first PR run failed
SigningTestandQGCKeychainTestbecause the Ubuntu 24.04 image doesn't run a Secret Service daemon by default, and libsecret reports"Cannot autolaunch D-Bus without X11 \$DISPLAY"instead of the patterns the existingisMissingSecretService()knew about. Recognize the autolaunch / missing-session-bus error strings as the same "no backend" condition so the QSettings fallback kicks in, matching the behavior on macOS/Windows when no keychain is configured.This is its own commit so it can be backed out independently if the keychain fix is contentious.
Testing
The PR's own CI run validates:
SigningTestandQGCKeychainTestpass on headless Ubuntu 24.04.coverage.xml, picking up.gcdafrom both Unit and Integration passes.Related infra
qgc-ci-runs-on(CloudFormation, us-west-2)Dronecode Infra, installed onmavlink/qgroundcontrolpx4-ci(RunsOn v2.12.6) for PX4-Autopilot