-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Add performance testing infrastructure #569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 5 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
6d90e2d
Add performance testing infrastructure
chall37 45bb195
AI, unslop theyself
chall37 4e4f7a0
Fix dtrace sequencing for reliable END block output
chall37 e1becbc
Add --config and --use-default-config flags; add power/energy/config …
chall37 14c21e2
fixed tmux wrapping
chall37 ddc6dff
Merge gnachman/master: Add --suite= for UserDefaults isolation
chall37 9203b61
Add --suite= and --self-time to perf test harness; add FairnessSchedu…
chall37 61e321b
Rename run_multi_tab_stress_test.sh to run_stress_test.sh
chall37 5f6ad20
Remove interpretation guide section from self-time analysis
chall37 8e1cefa
Fix self-time analysis: correct frame order, normalize modules, clari…
chall37 e92d8c5
Refine self-time profiling: improve heuristics and output format
chall37 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,126 @@ | ||
| # iTerm2 Performance Testing Tools | ||
|
|
||
| Scripts for stress testing and profiling iTerm2 builds with latency instrumentation. | ||
|
|
||
| ## Quick Start | ||
|
|
||
| ```bash | ||
| # Basic test (10 tabs, 20 seconds, normal mode) | ||
| ./run_multi_tab_stress_test.sh /path/to/iTerm2.app | ||
|
|
||
| # Compare behavior across tab counts | ||
| ./run_multi_tab_stress_test.sh --tabs=1,3,10 /path/to/iTerm2.app | ||
|
|
||
| # With title injection (exercises OSC 0 handling) | ||
| ./run_multi_tab_stress_test.sh --title /path/to/iTerm2.app | ||
|
|
||
| # With DTrace metrics (requires sudo) | ||
| ./run_multi_tab_stress_test.sh --dtrace /path/to/iTerm2.app | ||
|
|
||
| # With tmux wrapping (crash-safe cleanup) | ||
| ./run_multi_tab_stress_test.sh --tmux /path/to/iTerm2.app | ||
|
|
||
| # Htop-style dashboard load | ||
| ./run_multi_tab_stress_test.sh --mode=htop /path/to/iTerm2.app | ||
|
|
||
| # Progress bars stress test | ||
| ./run_multi_tab_stress_test.sh --mode=progress /path/to/iTerm2.app | ||
|
|
||
| # Status grid with tmux wrapping | ||
| ./run_multi_tab_stress_test.sh --tmux --mode=status /path/to/iTerm2.app | ||
| ``` | ||
|
|
||
| ## Scripts | ||
|
|
||
| | Script | Purpose | | ||
| |--------|---------| | ||
| | `run_multi_tab_stress_test.sh` | Main test harness - opens iTerm2, creates tabs, runs stress load, profiles | | ||
| | `stress_load.py` | Unified load generator - terminal output stress and dashboard modes | | ||
| | `analyze_profile.py` | Analyzes `sample` profiler output for hotspots | | ||
| | `iterm_ux_metrics_v2.d` | DTrace script for frame rate and latency metrics | | ||
|
|
||
| ## Options | ||
|
|
||
| ``` | ||
| -t, --time=SEC Duration in seconds (default: 20) | ||
| --tabs=N,M,... Tab counts to test (runs separate test for each) | ||
| --title[=MS] Inject OSC 0 title changes (default: every 2000ms) | ||
| --fps=N Target frame rate for dashboard modes (default: 30, 0 = unthrottled) | ||
| Accepts decimals (e.g., 0.5). Ignored for stress modes. | ||
| --dtrace Enable DTrace UX metrics (requires sudo) | ||
| --inject Enable interaction injection (tab switches, keyboard input) | ||
| --mode=MODES Stress mode(s), comma-separated (see Modes below) | ||
| --speed=SPEED Output speed: normal or slow | ||
| --tmux Wrap test in auto-cleanup tmux session | ||
| --load-script=PATH Use custom load generator (for non-built-in scripts) | ||
| --forever Run indefinitely without profiling | ||
| ``` | ||
|
|
||
| ## Modes | ||
|
|
||
| The `--mode` flag selects the stress pattern. Multiple modes can be comma-separated | ||
| and will run sequentially, time-sliced within a single test. | ||
|
|
||
| ### Terminal Output Stress (unthrottled) | ||
|
|
||
| | Mode | Description | | ||
| |------|-------------| | ||
| | `normal` | Mixed output patterns (ASCII, CJK, emoji, bidi), no screen clears (default) | | ||
| | `buffer` | Long lines (~600 chars), stresses line buffer handling | | ||
| | `clearcodes` | All patterns including screen clear/erase sequences | | ||
| | `flood` | Maximum throughput using `yes` command | | ||
|
|
||
| ### Dashboard/UI Stress (throttled by --fps, default 30) | ||
|
|
||
| | Mode | Description | Code Paths Stressed | | ||
| |------|-------------|---------------------| | ||
| | `htop` | CPU meters + scrolling process list | Scroll regions, partial updates, color bars | | ||
| | `watch` | Full-screen clear + redraw | Burst rendering, screen clear, cursor home | | ||
| | `progress` | 20 progress bars updating in place | Cursor positioning, same-line overwrites | | ||
| | `table` | Fixed header + scroll region body | Scroll regions, selective scroll | | ||
| | `status` | Grid of color-coded service status cells | Frequent SGR changes, partial cell updates | | ||
|
|
||
| ### Special | ||
|
|
||
| | Mode | Description | | ||
| |------|-------------| | ||
| | `all` | Runs all 8 modes sequentially within a single test | | ||
|
|
||
| ### Examples | ||
|
|
||
| ```bash | ||
| # All dashboard modes at 120fps | ||
| ./run_multi_tab_stress_test.sh --mode=htop,watch,progress,table,status --fps=120 -t 50 /path/to/iTerm2.app | ||
|
|
||
| # Mix stress and dashboard modes | ||
| ./run_multi_tab_stress_test.sh --mode=normal,htop,buffer -t 30 /path/to/iTerm2.app | ||
|
|
||
| # Dashboard unthrottled (as fast as possible) | ||
| ./run_multi_tab_stress_test.sh --mode=htop --fps=0 /path/to/iTerm2.app | ||
| ``` | ||
|
|
||
| ## Tmux Wrapping | ||
|
|
||
| The `--tmux` option wraps the entire test in a tmux session that auto-cleans on exit: | ||
| - Session is killed on normal exit, Ctrl-C, or crash | ||
| - Prevents orphaned stress processes if the harness is killed | ||
| - Session name: `iterm2-perf-<pid>-<timestamp>` | ||
| - Interactive: attaches to session for live viewing | ||
| - Non-interactive: waits for session to complete | ||
|
|
||
| ## Output | ||
|
|
||
| The test produces: | ||
| - **Profile analysis** - CPU hotspots from `sample` profiler | ||
| - **Latency metrics** - KeyboardInput, TitleUpdate timings (from instrumented builds) | ||
| - **Timer analysis** - GCD/NSTimer efficiency, cadence stability | ||
| - **DTrace metrics** - Frame rates, adaptive mode, lock contention (if --dtrace) | ||
| - **Summary table** - Cross-run comparison when testing multiple tab counts | ||
|
|
||
| ## Requirements | ||
|
|
||
| - macOS with `sample` profiler | ||
| - Python 3 | ||
| - For --tmux: tmux installed | ||
| - For --dtrace: sudo access | ||
| - Instrumented iTerm2 build (for latency metrics) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,114 @@ | ||
| #!/usr/bin/env python3 | ||
| """ | ||
| Analyze macOS sample profiler output for iTerm2. | ||
|
|
||
| Usage: | ||
| python3 analyze_profile.py <profile_file> | ||
|
|
||
| Reads a profile file generated by the macOS `sample` command and | ||
| prints a summary of iTerm2-relevant hotspots and patterns. | ||
| """ | ||
|
|
||
| import os | ||
| import re | ||
| import sys | ||
|
|
||
|
|
||
| def analyze_profile(output_file): | ||
| """Analyze the profile output for hotspots and inefficiencies.""" | ||
| print(f"\nAnalyzing profile: {output_file}") | ||
|
|
||
| if not os.path.exists(output_file): | ||
| print("Error: Profile output file not found") | ||
| return False | ||
|
|
||
| with open(output_file, 'r') as f: | ||
| content = f.read() | ||
|
|
||
| # Count occurrences of key patterns. | ||
| patterns = { | ||
| "boolForKey": r'\[iTermPreferences boolForKey:\]', | ||
| "intForKey": r'\[iTermPreferences intForKey:\]', | ||
| "objectForKey": r'\[iTermPreferences objectForKey:\]', | ||
| "updateConfigurationFields": r'updateConfigurationFields', | ||
| "NSUserDefaults": r'NSUserDefaults', | ||
| "@synchronized": r'@synchronized', | ||
| "os_unfair_lock": r'os_unfair_lock', | ||
| # StringToScreenChars and text processing | ||
| "StringToScreenChars": r'StringToScreenChars', | ||
| "ScreenCharArray": r'ScreenCharArray', | ||
| "bidi/Bidi": r'[Bb]idi', | ||
| "VT100Terminal": r'VT100Terminal', | ||
| "executeToken": r'executeToken', | ||
| # Metal rendering | ||
| "Metal": r'Metal|metal|MTL|CAMetalLayer', | ||
| "iTermTextRenderer": r'iTermTextRenderer', | ||
| } | ||
|
|
||
| # Broader categories for spotting redundant work or churn. | ||
| categories = { | ||
| "Allocations": r'\b(malloc|calloc|realloc|free|operator new|operator delete)\b', | ||
| "ObjC retain/release": r'objc_(retain|release|autoreleaseReturnValue|retainAutoreleasedReturnValue)', | ||
| "Autorelease pools": r'NSAutoreleasePool|autoreleasepool', | ||
| "Strings/Unicode": r'NSString|CFString|StringToScreenChars|ScreenCharArray', | ||
| "CoreText": r'CTLine|CTRun|CTFont|CoreText', | ||
| "CoreGraphics": r'CGContext|CGColor|CGPath|CGImage|CoreGraphics', | ||
| "AppKit geometry": r'NSRect|NSMakeRect|convertRect|bounds|frame', | ||
| "Locks/dispatch": r'os_unfair_lock|pthread_mutex|dispatch_semaphore|@synchronized', | ||
| "Terminal parsing": r'VT100Parser|VT100Terminal|VT100Screen|executeToken', | ||
| "Rendering": r'iTermTextRenderer|Metal|metal|MTL|CAMetalLayer', | ||
| "Process/cache": r'iTermProcessCache|TaskNotifier|deepestForegroundJob', | ||
| } | ||
|
|
||
| print("\n" + "=" * 60) | ||
| print("Profile Summary") | ||
| print("=" * 60) | ||
|
|
||
| for name, pattern in patterns.items(): | ||
| count = len(re.findall(pattern, content)) | ||
| print(f" {name}: {count} occurrences") | ||
|
|
||
| print("=" * 60) | ||
| print("\n" + "=" * 60) | ||
| print("Category Summary") | ||
| print("=" * 60) | ||
| for name, pattern in categories.items(): | ||
| count = len(re.findall(pattern, content)) | ||
| print(f" {name}: {count} occurrences") | ||
| print("=" * 60) | ||
|
|
||
| # Extract top iTerm2 symbols from the call graph. | ||
| symbol_pattern = re.compile(r'^\s*[+!:|]*\s*(\d+)\s+(.+?)\s+\(in iTerm2(?:\.debug\.dylib)?\)') | ||
| counts = {} | ||
| for line in content.splitlines(): | ||
| match = symbol_pattern.match(line) | ||
| if not match: | ||
| continue | ||
| count = int(match.group(1)) | ||
| symbol = match.group(2).strip() | ||
| if count > counts.get(symbol, 0): | ||
| counts[symbol] = count | ||
|
|
||
| if counts: | ||
| print("\n" + "=" * 60) | ||
| print("Top iTerm2 Symbols (by sample count)") | ||
| print("=" * 60) | ||
| for symbol, count in sorted(counts.items(), key=lambda item: item[1], reverse=True)[:15]: | ||
| print(f" {count} {symbol}") | ||
| print("=" * 60) | ||
|
|
||
| print(f"\nFull profile saved to: {output_file}") | ||
| return True | ||
|
|
||
|
|
||
| def main(): | ||
| if len(sys.argv) != 2: | ||
| print("Usage: python3 analyze_profile.py <profile_file>") | ||
| sys.exit(1) | ||
|
|
||
| if not analyze_profile(sys.argv[1]): | ||
| sys.exit(1) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| #!/usr/sbin/dtrace -s | ||
|
|
||
| #pragma D option quiet | ||
|
|
||
| dtrace:::BEGIN | ||
| { | ||
| printf("Tracing iTerm2 performance... Ctrl-C to stop.\n"); | ||
| start = timestamp; | ||
| } | ||
|
|
||
| objc$target:PTYSession:-updateDisplayBecause*:entry | ||
| { | ||
| @updates = count(); | ||
| } | ||
|
|
||
| objc$target:PTYSession:-refresh:entry | ||
| { | ||
| @refreshes = count(); | ||
| } | ||
|
|
||
| objc$target:PTYTextView:-refresh:entry | ||
| { | ||
| @textview_refreshes = count(); | ||
| } | ||
|
|
||
| objc$target:VT100Screen*:-sync*:entry | ||
| { | ||
| @syncs = count(); | ||
| } | ||
|
|
||
| objc$target:VT100ScreenMutableState:-performBlockWithJoinedThreads*:entry | ||
| { | ||
| @joined_blocks = count(); | ||
| } | ||
|
|
||
| dtrace:::END | ||
| { | ||
| duration_sec = (timestamp - start) / 1000000000; | ||
| printf("\n============================================================\n"); | ||
| printf("DTrace Performance Summary (duration: %d sec)\n", duration_sec); | ||
| printf("============================================================\n"); | ||
|
|
||
| printa(" updateDisplayBecause: %@d calls\n", @updates); | ||
| printa(" PTYSession refresh: %@d calls\n", @refreshes); | ||
| printa(" PTYTextView refresh: %@d calls\n", @textview_refreshes); | ||
| printa(" VT100Screen sync: %@d calls\n", @syncs); | ||
| printa(" joinedThreads blocks: %@d calls\n", @joined_blocks); | ||
|
|
||
| normalize(@updates, duration_sec); | ||
| normalize(@refreshes, duration_sec); | ||
| normalize(@textview_refreshes, duration_sec); | ||
| normalize(@syncs, duration_sec); | ||
|
|
||
| printf("\nRates:\n"); | ||
| printa(" updateDisplay/sec: %@d\n", @updates); | ||
| printa(" PTYSession refresh/sec: %@d\n", @refreshes); | ||
| printa(" PTYTextView refresh/sec:%@d\n", @textview_refreshes); | ||
| printa(" sync/sec: %@d\n", @syncs); | ||
| } |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't going to work right - all it does is disable loading prefs from custom folder. Commit 25a1933 adds
--suite=<reverse domain name>, which safely uses a different User Defaults suite. I'd get rid of --use-default-config and make sure you testing script uses something like--suite=com.iterm2.perfThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, 25a1933 is quite a refactor! Very helpful though. I'll undo my --config and --use-default-config changes, and pass through --suite instead.