Skip to content

Add performance testing infrastructure#569

Closed
chall37 wants to merge 11 commits into
gnachman:masterfrom
chall37:feat/perf-tests
Closed

Add performance testing infrastructure#569
chall37 wants to merge 11 commits into
gnachman:masterfrom
chall37:feat/perf-tests

Conversation

@chall37
Copy link
Copy Markdown
Contributor

@chall37 chall37 commented Jan 27, 2026

Summary

Adds a tools/perf/ directory with scripts for stress testing and profiling iTerm2.

What's Included

  • run_stress_test.sh - Main test harness that opens iTerm2, creates multiple tabs with stress loads, runs the macOS sample profiler, and analyzes results
  • stress_load.py - Unified load generator with terminal output stress modes and dashboard/UI modes
  • analyze_profile.py - Parses sample profiler output to identify hotspots
  • analyze_self_time.py - Parses DTrace self-time output for actionable hotspots
  • DTrace scripts - For collecting frame rate, UX metrics, and self-time profiling (requires root)

Suite Isolation

The harness uses --suite= to isolate test preferences from your normal iTerm2 settings:

  • --suite=com.iterm2.defaults (default) - Clean empty suite for baseline testing
  • --suite=com.iterm2.fairness - Enables fairness scheduler for PR Add FairnessScheduler and TokenExecutor integration #568 testing
  • --suite=user - Uses your normal preferences
  • --suite=none - Disables suite isolation

Suite presets are stored in tools/perf/suites/.

Modes

Terminal output stress (unthrottled):

  • normal - Mixed patterns (Unicode, ANSI, bidi, emoji)
  • buffer - Long lines (~600 chars)
  • clearcodes - Includes screen clear/erase sequences
  • flood - Maximum throughput via yes

Dashboard/UI stress (throttled by --fps):

  • htop - CPU meters + scrolling process list
  • watch - Full-screen clear + redraw
  • progress - 20 progress bars updating in place
  • table - Fixed header + scroll region body
  • status - Grid of color-coded service status cells

Special:

  • all - Runs all 8 modes sequentially in one test

Modes can be comma-separated: --mode=htop,watch,progress

Usage Examples

# Basic test with 10 tabs, 20 seconds
./tools/perf/run_stress_test.sh /path/to/iTerm2.app

# Compare performance across tab counts
./tools/perf/run_stress_test.sh --tabs=1,5,10 /path/to/iTerm2.app

# Dashboard modes at 120fps
./tools/perf/run_stress_test.sh --mode=htop,watch,progress --fps=120 -t 30 /path/to/iTerm2.app

# Self-time profiling (requires sudo)
sudo ./tools/perf/run_stress_test.sh --self-time /path/to/iTerm2.app

# All modes in one run
./tools/perf/run_stress_test.sh --mode=all -t 80 /path/to/iTerm2.app

Notes

  • All perf scripts are self-contained in tools/perf/
  • DTrace features require root privileges
  • Power/energy warnings are displayed when not running under optimal conditions

Multi-tab stress test harness with profiling and analysis:
- run_multi_tab_stress_test.sh: Main harness with tab count sweeps,
  synchronized startup, DTrace metrics, and interaction injection
- stress_load.py: Terminal output generator with multiple modes
  (normal, buffer, clearcodes, flood) for exercising text processing
- load_dashboard.py: Dashboard-style load generator (htop, watch,
  progress, table, status modes) for cursor positioning stress
- analyze_profile.py: Profiler output analysis for hotspot detection
- tmux_wrapper.sh: Auto-cleanup tmux session management
- DTrace scripts for frame rate and UX metrics collection
@chall37 chall37 marked this pull request as ready for review January 27, 2026 22:01
- Add tick-based self-termination to dtrace script using $1 positional
  parameter for duration, ensuring clean exit with END block output
- Fix dtrace invocation order: -p before -s so duration arg is not
  parsed as a flag
- Add dtrace attach detection by polling for "Tracing" in output
- Signal dtrace with SIGINT before quitting iTerm2 to flush output
- Remove window close from AppleScript; main script controls shutdown
- Use /tmp for sync dir with chmod 777 for root dtrace access
- Capture build metadata (version, commit, branch, date) from app plist
- Add sample test output demonstrating working dtrace metrics
@gnachman
Copy link
Copy Markdown
Owner

This looks amazing!

One recommendation is for the automated tests to revert to a default configuration. Doing defaults delete com.googlecode.iterm2 prior to app launch will take care of it. I can't tell you how many hours I've wasted chasing down performance issues that turned out to be a config change.

@chall37
Copy link
Copy Markdown
Contributor Author

chall37 commented Jan 30, 2026

That's very wise. I spent the better part of a day last week chasing down seemingly random performance degradation/regression. "How is this possibly slower now?!? Am I testing the right build? Did it actually rebuild? Did I even re-run the build?!?" Eventually I noticed that I was comparing tests while on battery to tests while plugged in. (And this despite using the "High Power" energy mode, so maybe that's just a software-based Turbo button after all.)

Actually, I should probably capture the power state info as well, if that's possible, and at least display a warning when unplugged.

…warnings to stress test

Debug build command line flags (ITERM_DEBUG only):
- Add --config=<path> to load preferences from custom path or URL
- Add --use-default-config to skip custom preferences and use defaults
- Flags are mutually exclusive; help text documents both options

Stress test improvements:
- Check iTerm2 not running at script start (fail fast)
- Detect power source (AC/Battery/UPS) using pmset
- Detect energy mode per-power-source (High Power/Automatic/Low Power)
- Warn if not on AC Power, not in High Power mode, or using custom config
- Display warnings in both header and footer for visibility
- Show specific issues detected with tailored recommendations
@chall37
Copy link
Copy Markdown
Contributor Author

chall37 commented Jan 30, 2026

I don't use a VM or sandbox for my dev env, so to avoid blowing away my own config (and potentially other devs'), I made the following changes:

iTerm app:

  • Added --use-default-config command line arg
  • Added -config= as well, mutually exclusive with above, while I was at it. Could be useful for automated integration tests in the future.

These are both gated behind ITERM_DEBUG, because it creates a rather opaque vulnerability otherwise... Can always remove the gate for one or both down the road, if there's a use case.

Harness (run_multi_tab_stress_test.sh)

  • Now passes the new --use-default-config to iTerm by default (harmlessly ignored by builds that don't support it). Can explicitly set --use-default-config=false if desired.
  • Now passes through --config=[path/to/config] to iTerm2 when present
  • Belt and suspenders session cleanup attempts for tmux when --tmux is used (uses -d to invoke stress_load.py, appends 'exit' after the invocation, harness checks for any dangling sessions at cleanup).

@gnachman
Copy link
Copy Markdown
Owner

gnachman commented Feb 4, 2026

I've noticed that the most useful output in Instruments time profiler is not the hottest symbols (which will be objc_msgsend or something high up in the call stack that we don't control) but the symbols that spend the most time not in descendants (i.e., the ones that actually burn CPU themselves). I don't know how hard it would be to do that with DTrace but it'd make the output much more actionable.

Comment thread sources/iTermApplicationDelegate.m Outdated
}

// Apply --use-default-config
if (useDefaultConfig) {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't going to work right - all it does is disable loading prefs from custom folder. Commit 25a1933 adds --suite=<reverse domain name>, which safely uses a different User Defaults suite. I'd get rid of --use-default-config and make sure you testing script uses something like --suite=com.iterm2.perf

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, 25a1933 is quite a refactor! Very helpful though. I'll undo my --config and --use-default-config changes, and pass through --suite instead.

…ler probes

Address PR gnachman#569 review comments:
- Remove --use-default-config/--config= (replaced by gnachman's --suite=)
- Add --self-time profiling with analyze_self_time.py for actionable hotspots
- Add suite presets (com.iterm2.defaults, com.iterm2.fairness)
- Add FairnessScheduler probes to DTrace scripts
…fy headers

- Use current_frames[0] for leaf (DTrace prints leaf first)
- Add is_iterm2_module() to normalize module names (handles paths, arch suffixes)
- Fix regex to handle decimal offsets (+123) not just hex (+0x123)
- Rename headers to be explicit about what each section shows:
  - 'iTerm2 CALLERS' for attributed stacks (which iTerm2 code drives cost)
  - 'SYSTEM FUNCTIONS (exclusive leaf samples)' for raw leaf time
Iterative improvements based on code review:

DTrace output (iterm_self_time.d):
- Use explicit section markers (===SELF_TIME===, ===STACKS===) for robust parsing
- Add header section with duration and sample rate metadata
- Increase limits: 50 self-time entries, 20 stacks

Analysis heuristics (analyze_self_time.py):
- Normalize module names at ingest to prevent bucket splitting
- Add is_system_module() for module-based classification (more reliable)
- Remove NS/CF prefix heuristics (redundant, high false-positive risk)
- Track unattributed stacks and report coverage honestly
- Remove misleading cross-aggregation percentage column

The self-time vs attributed-caller distinction is now clearer:
- Raw self-time shows where CPU actually executes (leaf functions)
- Attributed callers show which iTerm2 code drives that work
@chall37
Copy link
Copy Markdown
Contributor Author

chall37 commented Feb 5, 2026

I've noticed that the most useful output in Instruments time profiler is not the hottest symbols (which will be objc_msgsend or something high up in the call stack that we don't control) but the symbols that spend the most time not in descendants (i.e., the ones that actually burn CPU themselves). I don't know how hard it would be to do that with DTrace but it'd make the output much more actionable.

I added a --self-time flag to approximate Instruments behavior using dtrace. It's not perfect, but should provide a degree of attribution to iTerm leaves on the call stack. Can refine over time.

@gnachman
Copy link
Copy Markdown
Owner

Merged as 60185ea with improvements in d205d5d. And it found a horrific bug, which was fixed in cac8c18 so already #winning

@gnachman gnachman closed this Feb 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants