Add performance testing infrastructure#569
Conversation
Multi-tab stress test harness with profiling and analysis: - run_multi_tab_stress_test.sh: Main harness with tab count sweeps, synchronized startup, DTrace metrics, and interaction injection - stress_load.py: Terminal output generator with multiple modes (normal, buffer, clearcodes, flood) for exercising text processing - load_dashboard.py: Dashboard-style load generator (htop, watch, progress, table, status modes) for cursor positioning stress - analyze_profile.py: Profiler output analysis for hotspot detection - tmux_wrapper.sh: Auto-cleanup tmux session management - DTrace scripts for frame rate and UX metrics collection
- Add tick-based self-termination to dtrace script using $1 positional parameter for duration, ensuring clean exit with END block output - Fix dtrace invocation order: -p before -s so duration arg is not parsed as a flag - Add dtrace attach detection by polling for "Tracing" in output - Signal dtrace with SIGINT before quitting iTerm2 to flush output - Remove window close from AppleScript; main script controls shutdown - Use /tmp for sync dir with chmod 777 for root dtrace access - Capture build metadata (version, commit, branch, date) from app plist - Add sample test output demonstrating working dtrace metrics
|
This looks amazing! One recommendation is for the automated tests to revert to a default configuration. Doing |
|
That's very wise. I spent the better part of a day last week chasing down seemingly random performance degradation/regression. "How is this possibly slower now?!? Am I testing the right build? Did it actually rebuild? Did I even re-run the build?!?" Eventually I noticed that I was comparing tests while on battery to tests while plugged in. (And this despite using the "High Power" energy mode, so maybe that's just a software-based Turbo button after all.) Actually, I should probably capture the power state info as well, if that's possible, and at least display a warning when unplugged. |
…warnings to stress test Debug build command line flags (ITERM_DEBUG only): - Add --config=<path> to load preferences from custom path or URL - Add --use-default-config to skip custom preferences and use defaults - Flags are mutually exclusive; help text documents both options Stress test improvements: - Check iTerm2 not running at script start (fail fast) - Detect power source (AC/Battery/UPS) using pmset - Detect energy mode per-power-source (High Power/Automatic/Low Power) - Warn if not on AC Power, not in High Power mode, or using custom config - Display warnings in both header and footer for visibility - Show specific issues detected with tailored recommendations
|
I don't use a VM or sandbox for my dev env, so to avoid blowing away my own config (and potentially other devs'), I made the following changes: iTerm app:
These are both gated behind ITERM_DEBUG, because it creates a rather opaque vulnerability otherwise... Can always remove the gate for one or both down the road, if there's a use case. Harness (run_multi_tab_stress_test.sh)
|
|
I've noticed that the most useful output in Instruments time profiler is not the hottest symbols (which will be objc_msgsend or something high up in the call stack that we don't control) but the symbols that spend the most time not in descendants (i.e., the ones that actually burn CPU themselves). I don't know how hard it would be to do that with DTrace but it'd make the output much more actionable. |
| } | ||
|
|
||
| // Apply --use-default-config | ||
| if (useDefaultConfig) { |
There was a problem hiding this comment.
This isn't going to work right - all it does is disable loading prefs from custom folder. Commit 25a1933 adds --suite=<reverse domain name>, which safely uses a different User Defaults suite. I'd get rid of --use-default-config and make sure you testing script uses something like --suite=com.iterm2.perf
There was a problem hiding this comment.
Wow, 25a1933 is quite a refactor! Very helpful though. I'll undo my --config and --use-default-config changes, and pass through --suite instead.
…ler probes Address PR gnachman#569 review comments: - Remove --use-default-config/--config= (replaced by gnachman's --suite=) - Add --self-time profiling with analyze_self_time.py for actionable hotspots - Add suite presets (com.iterm2.defaults, com.iterm2.fairness) - Add FairnessScheduler probes to DTrace scripts
…fy headers - Use current_frames[0] for leaf (DTrace prints leaf first) - Add is_iterm2_module() to normalize module names (handles paths, arch suffixes) - Fix regex to handle decimal offsets (+123) not just hex (+0x123) - Rename headers to be explicit about what each section shows: - 'iTerm2 CALLERS' for attributed stacks (which iTerm2 code drives cost) - 'SYSTEM FUNCTIONS (exclusive leaf samples)' for raw leaf time
Iterative improvements based on code review: DTrace output (iterm_self_time.d): - Use explicit section markers (===SELF_TIME===, ===STACKS===) for robust parsing - Add header section with duration and sample rate metadata - Increase limits: 50 self-time entries, 20 stacks Analysis heuristics (analyze_self_time.py): - Normalize module names at ingest to prevent bucket splitting - Add is_system_module() for module-based classification (more reliable) - Remove NS/CF prefix heuristics (redundant, high false-positive risk) - Track unattributed stacks and report coverage honestly - Remove misleading cross-aggregation percentage column The self-time vs attributed-caller distinction is now clearer: - Raw self-time shows where CPU actually executes (leaf functions) - Attributed callers show which iTerm2 code drives that work
I added a --self-time flag to approximate Instruments behavior using dtrace. It's not perfect, but should provide a degree of attribution to iTerm leaves on the call stack. Can refine over time. |
Summary
Adds a
tools/perf/directory with scripts for stress testing and profiling iTerm2.What's Included
sampleprofiler, and analyzes resultssampleprofiler output to identify hotspotsSuite Isolation
The harness uses
--suite=to isolate test preferences from your normal iTerm2 settings:--suite=com.iterm2.defaults(default) - Clean empty suite for baseline testing--suite=com.iterm2.fairness- Enables fairness scheduler for PR Add FairnessScheduler and TokenExecutor integration #568 testing--suite=user- Uses your normal preferences--suite=none- Disables suite isolationSuite presets are stored in
tools/perf/suites/.Modes
Terminal output stress (unthrottled):
normal- Mixed patterns (Unicode, ANSI, bidi, emoji)buffer- Long lines (~600 chars)clearcodes- Includes screen clear/erase sequencesflood- Maximum throughput viayesDashboard/UI stress (throttled by --fps):
htop- CPU meters + scrolling process listwatch- Full-screen clear + redrawprogress- 20 progress bars updating in placetable- Fixed header + scroll region bodystatus- Grid of color-coded service status cellsSpecial:
all- Runs all 8 modes sequentially in one testModes can be comma-separated:
--mode=htop,watch,progressUsage Examples
Notes
tools/perf/