Add performance testing infrastructure by chall37 · Pull Request #569 · gnachman/iTerm2

chall37 · 2026-01-27T18:25:06Z

Summary

Adds a tools/perf/ directory with scripts for stress testing and profiling iTerm2.

What's Included

run_stress_test.sh - Main test harness that opens iTerm2, creates multiple tabs with stress loads, runs the macOS sample profiler, and analyzes results
stress_load.py - Unified load generator with terminal output stress modes and dashboard/UI modes
analyze_profile.py - Parses sample profiler output to identify hotspots
analyze_self_time.py - Parses DTrace self-time output for actionable hotspots
DTrace scripts - For collecting frame rate, UX metrics, and self-time profiling (requires root)

Suite Isolation

The harness uses --suite= to isolate test preferences from your normal iTerm2 settings:

--suite=com.iterm2.defaults (default) - Clean empty suite for baseline testing
--suite=com.iterm2.fairness - Enables fairness scheduler for PR Add FairnessScheduler and TokenExecutor integration #568 testing
--suite=user - Uses your normal preferences
--suite=none - Disables suite isolation

Suite presets are stored in tools/perf/suites/.

Modes

Terminal output stress (unthrottled):

normal - Mixed patterns (Unicode, ANSI, bidi, emoji)
buffer - Long lines (~600 chars)
clearcodes - Includes screen clear/erase sequences
flood - Maximum throughput via yes

Dashboard/UI stress (throttled by --fps):

htop - CPU meters + scrolling process list
watch - Full-screen clear + redraw
progress - 20 progress bars updating in place
table - Fixed header + scroll region body
status - Grid of color-coded service status cells

Special:

all - Runs all 8 modes sequentially in one test

Modes can be comma-separated: --mode=htop,watch,progress

Usage Examples

# Basic test with 10 tabs, 20 seconds
./tools/perf/run_stress_test.sh /path/to/iTerm2.app

# Compare performance across tab counts
./tools/perf/run_stress_test.sh --tabs=1,5,10 /path/to/iTerm2.app

# Dashboard modes at 120fps
./tools/perf/run_stress_test.sh --mode=htop,watch,progress --fps=120 -t 30 /path/to/iTerm2.app

# Self-time profiling (requires sudo)
sudo ./tools/perf/run_stress_test.sh --self-time /path/to/iTerm2.app

# All modes in one run
./tools/perf/run_stress_test.sh --mode=all -t 80 /path/to/iTerm2.app

Notes

All perf scripts are self-contained in tools/perf/
DTrace features require root privileges
Power/energy warnings are displayed when not running under optimal conditions

Multi-tab stress test harness with profiling and analysis: - run_multi_tab_stress_test.sh: Main harness with tab count sweeps, synchronized startup, DTrace metrics, and interaction injection - stress_load.py: Terminal output generator with multiple modes (normal, buffer, clearcodes, flood) for exercising text processing - load_dashboard.py: Dashboard-style load generator (htop, watch, progress, table, status modes) for cursor positioning stress - analyze_profile.py: Profiler output analysis for hotspot detection - tmux_wrapper.sh: Auto-cleanup tmux session management - DTrace scripts for frame rate and UX metrics collection

- Add tick-based self-termination to dtrace script using $1 positional parameter for duration, ensuring clean exit with END block output - Fix dtrace invocation order: -p before -s so duration arg is not parsed as a flag - Add dtrace attach detection by polling for "Tracing" in output - Signal dtrace with SIGINT before quitting iTerm2 to flush output - Remove window close from AppleScript; main script controls shutdown - Use /tmp for sync dir with chmod 777 for root dtrace access - Capture build metadata (version, commit, branch, date) from app plist - Add sample test output demonstrating working dtrace metrics

gnachman · 2026-01-30T06:09:42Z

This looks amazing!

One recommendation is for the automated tests to revert to a default configuration. Doing defaults delete com.googlecode.iterm2 prior to app launch will take care of it. I can't tell you how many hours I've wasted chasing down performance issues that turned out to be a config change.

chall37 · 2026-01-30T08:37:30Z

That's very wise. I spent the better part of a day last week chasing down seemingly random performance degradation/regression. "How is this possibly slower now?!? Am I testing the right build? Did it actually rebuild? Did I even re-run the build?!?" Eventually I noticed that I was comparing tests while on battery to tests while plugged in. (And this despite using the "High Power" energy mode, so maybe that's just a software-based Turbo button after all.)

Actually, I should probably capture the power state info as well, if that's possible, and at least display a warning when unplugged.

…warnings to stress test Debug build command line flags (ITERM_DEBUG only): - Add --config=<path> to load preferences from custom path or URL - Add --use-default-config to skip custom preferences and use defaults - Flags are mutually exclusive; help text documents both options Stress test improvements: - Check iTerm2 not running at script start (fail fast) - Detect power source (AC/Battery/UPS) using pmset - Detect energy mode per-power-source (High Power/Automatic/Low Power) - Warn if not on AC Power, not in High Power mode, or using custom config - Display warnings in both header and footer for visibility - Show specific issues detected with tailored recommendations

chall37 · 2026-01-30T21:51:24Z

I don't use a VM or sandbox for my dev env, so to avoid blowing away my own config (and potentially other devs'), I made the following changes:

iTerm app:

Added --use-default-config command line arg
Added -config= as well, mutually exclusive with above, while I was at it. Could be useful for automated integration tests in the future.

These are both gated behind ITERM_DEBUG, because it creates a rather opaque vulnerability otherwise... Can always remove the gate for one or both down the road, if there's a use case.

Harness (run_multi_tab_stress_test.sh)

Now passes the new --use-default-config to iTerm by default (harmlessly ignored by builds that don't support it). Can explicitly set --use-default-config=false if desired.
Now passes through --config=[path/to/config] to iTerm2 when present
Belt and suspenders session cleanup attempts for tmux when --tmux is used (uses -d to invoke stress_load.py, appends 'exit' after the invocation, harness checks for any dangling sessions at cleanup).

gnachman · 2026-02-04T23:14:08Z

I've noticed that the most useful output in Instruments time profiler is not the hottest symbols (which will be objc_msgsend or something high up in the call stack that we don't control) but the symbols that spend the most time not in descendants (i.e., the ones that actually burn CPU themselves). I don't know how hard it would be to do that with DTrace but it'd make the output much more actionable.

gnachman · 2026-02-04T23:12:38Z

+    }
+
+    // Apply --use-default-config
+    if (useDefaultConfig) {


This isn't going to work right - all it does is disable loading prefs from custom folder. Commit 25a1933 adds --suite=<reverse domain name>, which safely uses a different User Defaults suite. I'd get rid of --use-default-config and make sure you testing script uses something like --suite=com.iterm2.perf

Wow, 25a1933 is quite a refactor! Very helpful though. I'll undo my --config and --use-default-config changes, and pass through --suite instead.

…ler probes Address PR gnachman#569 review comments: - Remove --use-default-config/--config= (replaced by gnachman's --suite=) - Add --self-time profiling with analyze_self_time.py for actionable hotspots - Add suite presets (com.iterm2.defaults, com.iterm2.fairness) - Add FairnessScheduler probes to DTrace scripts

…fy headers - Use current_frames[0] for leaf (DTrace prints leaf first) - Add is_iterm2_module() to normalize module names (handles paths, arch suffixes) - Fix regex to handle decimal offsets (+123) not just hex (+0x123) - Rename headers to be explicit about what each section shows: - 'iTerm2 CALLERS' for attributed stacks (which iTerm2 code drives cost) - 'SYSTEM FUNCTIONS (exclusive leaf samples)' for raw leaf time

Iterative improvements based on code review: DTrace output (iterm_self_time.d): - Use explicit section markers (===SELF_TIME===, ===STACKS===) for robust parsing - Add header section with duration and sample rate metadata - Increase limits: 50 self-time entries, 20 stacks Analysis heuristics (analyze_self_time.py): - Normalize module names at ingest to prevent bucket splitting - Add is_system_module() for module-based classification (more reliable) - Remove NS/CF prefix heuristics (redundant, high false-positive risk) - Track unattributed stacks and report coverage honestly - Remove misleading cross-aggregation percentage column The self-time vs attributed-caller distinction is now clearer: - Raw self-time shows where CPU actually executes (leaf functions) - Attributed callers show which iTerm2 code drives that work

chall37 · 2026-02-05T08:13:32Z

I've noticed that the most useful output in Instruments time profiler is not the hottest symbols (which will be objc_msgsend or something high up in the call stack that we don't control) but the symbols that spend the most time not in descendants (i.e., the ones that actually burn CPU themselves). I don't know how hard it would be to do that with DTrace but it'd make the output much more actionable.

I added a --self-time flag to approximate Instruments behavior using dtrace. It's not perfect, but should provide a degree of attribution to iTerm leaves on the call stack. Can refine over time.

gnachman · 2026-02-16T03:10:14Z

Merged as 60185ea with improvements in d205d5d. And it found a horrific bug, which was fixed in cac8c18 so already #winning

chall37 added 2 commits January 27, 2026 10:22

AI, unslop theyself

45bb195

chall37 marked this pull request as ready for review January 27, 2026 22:01

chall37 added 2 commits January 30, 2026 11:47

fixed tmux wrapping

14c21e2

chall37 mentioned this pull request Feb 3, 2026

Add FairnessScheduler and TokenExecutor integration #568

Closed

gnachman reviewed Feb 4, 2026

View reviewed changes

chall37 added 6 commits February 4, 2026 16:04

Merge gnachman/master: Add --suite= for UserDefaults isolation

ddc6dff

Rename run_multi_tab_stress_test.sh to run_stress_test.sh

61e321b

Remove interpretation guide section from self-time analysis

5f6ad20

gnachman closed this Feb 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add performance testing infrastructure#569

Add performance testing infrastructure#569
chall37 wants to merge 11 commits into
gnachman:masterfrom
chall37:feat/perf-tests

chall37 commented Jan 27, 2026 •

edited

Loading

Uh oh!

gnachman commented Jan 30, 2026

Uh oh!

chall37 commented Jan 30, 2026

Uh oh!

chall37 commented Jan 30, 2026 •

edited

Loading

Uh oh!

gnachman commented Feb 4, 2026

Uh oh!

gnachman Feb 4, 2026

Uh oh!

chall37 Feb 5, 2026

Uh oh!

chall37 commented Feb 5, 2026

Uh oh!

gnachman commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

chall37 commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's Included

Suite Isolation

Modes

Usage Examples

Notes

Uh oh!

gnachman commented Jan 30, 2026

Uh oh!

chall37 commented Jan 30, 2026

Uh oh!

chall37 commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gnachman commented Feb 4, 2026

Uh oh!

gnachman Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

chall37 Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

chall37 commented Feb 5, 2026

Uh oh!

gnachman commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chall37 commented Jan 27, 2026 •

edited

Loading

chall37 commented Jan 30, 2026 •

edited

Loading