The System Resource Tracker is a Level Zero validation layer checker that monitors both Level Zero API resources and system resources in real-time. It tracks resource allocation and deallocation across all Level Zero API calls that create or destroy resources, providing detailed insights into memory usage, resource lifecycles, and system-level metrics.
Platform Support: Linux only. This checker uses /proc/self/status for system metrics and is not available on Windows or macOS.
- Level Zero Resource Tracking: Monitors all L0 resources including contexts, command queues, modules, kernels, event pools, command lists, events, fences, images, samplers, and memory allocations
- Memory Leak Detection: Automatically detects memory leaks by tracking handle-to-metrics mappings for all create/destroy operations:
- Compares memory usage at resource creation vs destruction
- Logs warnings when memory increases during destroy operations
- Reports cumulative leaks per resource type at program exit
- Provides detailed per-handle leak information
- System Resource Monitoring: Tracks real system metrics via
/proc/self/statusincluding:- Virtual memory size (VmSize)
- Resident set size (VmRSS)
- Data segment size (VmData)
- Peak virtual memory (VmPeak)
- Thread count
- File descriptor count
- Signed Delta Tracking: Calculates both positive and negative resource changes (deltas) for each API call with proper signed arithmetic
- Cumulative Summaries: Maintains running totals of all resource types and leak totals
- CSV Export: Optionally exports timestamped data for graphing and analysis
- Per-Process Isolation: Each process creates a unique output file with PID appended
- Thread-Safe: Uses mutex protection for concurrent access using Meyers' singleton pattern
- Low Overhead: Minimal performance impact (< 1ms per API call)
Enable the checker to log resource usage to the Level Zero debug log:
export ZE_ENABLE_VALIDATION_LAYER=1
export ZEL_ENABLE_SYSTEM_RESOURCE_TRACKER_CHECKER=1
export ZEL_ENABLE_LOADER_LOGGING=1
export ZEL_LOADER_LOGGING_LEVEL=debug
# Run your Level Zero application
./my_level_zero_appSet the ZEL_SYSTEM_RESOURCE_TRACKER_CSV environment variable to specify the output CSV file path, this path will be relative to the current working directory of the application:
export ZE_ENABLE_VALIDATION_LAYER=1
export ZEL_ENABLE_SYSTEM_RESOURCE_TRACKER_CHECKER=1
export ZEL_SYSTEM_RESOURCE_TRACKER_CSV=tracker_output.csv
# Run your Level Zero application
./my_level_zero_appNote: The actual output file will include the process ID (e.g., tracker_output_pid12345.csv) to ensure each process creates a unique file. This prevents conflicts when multiple processes use the tracker simultaneously.
The System Resource Tracker monitors the following Level Zero API functions:
zeContextCreate- Creates L0 context, increments context count
zeCommandQueueCreate- Creates command queue, increments queue countzeCommandQueueDestroy- Destroys command queue, decrements queue count
zeModuleCreate- Creates module from binary, increments module countzeModuleDestroy- Destroys module, decrements module countzeKernelCreate- Creates kernel from module, increments kernel countzeKernelDestroy- Destroys kernel, decrements kernel count
zeEventPoolCreate- Creates event pool, increments pool countzeEventPoolDestroy- Destroys event pool, decrements pool countzeEventCreate- Creates event, increments event countzeEventDestroy- Destroys event, decrements event count
zeCommandListCreate- Creates regular command list, increments list countzeCommandListCreateImmediate- Creates immediate command list, increments list countzeCommandListDestroy- Destroys command list, decrements list countzeCommandListReset- Resets command list state
zeFenceCreate- Creates fence, increments fence countzeFenceDestroy- Destroys fence, decrements fence count
zeImageCreate- Creates image, increments image countzeImageDestroy- Destroys image, decrements image countzeSamplerCreate- Creates sampler, increments sampler countzeSamplerDestroy- Destroys sampler, decrements sampler count
zeMemAllocDevice- Allocates device memory, tracks allocation size and countzeMemAllocHost- Allocates host memory, tracks allocation size and countzeMemAllocShared- Allocates shared memory, tracks allocation size and countzeMemFree- Frees memory, tracks deallocation
The tracker also monitors all command list append operations to measure memory deltas after each call:
Memory Operations:
zeCommandListAppendMemoryCopy- Appends memory copy commandzeCommandListAppendMemoryFill- Appends memory fill commandzeCommandListAppendMemoryCopyRegion- Appends region-based memory copyzeCommandListAppendMemoryCopyFromContext- Appends cross-context memory copyzeCommandListAppendMemoryPrefetch- Appends memory prefetch hintzeCommandListAppendMemAdvise- Appends memory advice hint
Image Operations:
zeCommandListAppendImageCopy- Appends image copy commandzeCommandListAppendImageCopyRegion- Appends region-based image copyzeCommandListAppendImageCopyToMemory- Appends image-to-memory copyzeCommandListAppendImageCopyFromMemory- Appends memory-to-image copy
Synchronization:
zeCommandListAppendBarrier- Appends barrier commandzeCommandListAppendMemoryRangesBarrier- Appends memory ranges barrierzeCommandListAppendSignalEvent- Appends event signal commandzeCommandListAppendWaitOnEvents- Appends wait on events commandzeCommandListAppendEventReset- Appends event reset command
Kernel Execution:
zeCommandListAppendLaunchKernel- Appends kernel launch commandzeCommandListAppendLaunchCooperativeKernel- Appends cooperative kernel launchzeCommandListAppendLaunchKernelIndirect- Appends indirect kernel launch
Query Operations:
zeCommandListAppendQueryKernelTimestamps- Appends kernel timestamp queryzeCommandListAppendWriteGlobalTimestamp- Appends global timestamp write
When logging is enabled, each tracked API call produces output like:
[debug] zeContextCreate: Context Count=1, handle=0x5567c00e2f98, Delta: VmRSS=+192KB, VmSize=+0KB, VmData=+0KB
[debug]
System Use After Call:
Total System: VmRSS=158784KB, VmSize=427744KB, VmData=167456KB, Threads=2
L0 Contexts (1): VmRSS=0KB, VmSize=0KB, VmData=0KB
When memory leaks are detected on destroy operations:
[warning] MEMORY LEAK DETECTED: Context handle=0x5567c00e2f98 - After destroy: VmRSS=+384KB, VmSize=+1796KB (Total Context leaks: 384KB)
At program exit, a comprehensive leak summary is logged:
[error]
=== MEMORY LEAK SUMMARY ===
Context leaks: 384 KB
EventPool leaks: 384 KB
CommandList leaks: 384 KB
Event leaks: 192 KB
TOTAL LEAKS: 1344 KB
===========================
The CSV file contains the following columns:
| Column | Description |
|---|---|
CallNumber |
Sequential call number |
TimeMs |
Elapsed time in milliseconds since tracker started |
APICall |
Level Zero API function name |
VmSize_KB |
Virtual memory size (KB) |
VmRSS_KB |
Resident set size (KB) |
VmData_KB |
Data segment size (KB) |
VmPeak_KB |
Peak virtual memory size (KB) |
Threads |
Number of threads |
Delta_VmSize_KB |
Signed change in VmSize since previous call (KB, can be negative) |
Delta_VmRSS_KB |
Signed change in VmRSS since previous call (KB, can be negative) |
Delta_VmData_KB |
Signed change in VmData since previous call (KB, can be negative) |
Contexts |
Current number of L0 contexts |
CommandQueues |
Current number of L0 command queues |
Modules |
Current number of L0 modules |
Kernels |
Current number of L0 kernels |
EventPools |
Current number of L0 event pools |
CommandLists |
Current number of L0 command lists |
Events |
Current number of L0 events |
Fences |
Current number of L0 fences |
Images |
Current number of L0 images |
Samplers |
Current number of L0 samplers |
TotalMemory_Bytes |
Total L0 memory allocated (bytes) |
Note on Delta Values: Delta columns use signed arithmetic to show both increases (positive) and decreases (negative) in memory usage. This enables accurate tracking of memory being freed or reduced.
A plotting script is provided to automatically generate visualizations:
# Install required Python packages
pip install pandas matplotlib
# Generate plots
python3 scripts/plot_resource_tracker.py tracker_output.csvThis will:
- Display interactive plots showing memory usage, resource counts, API call distribution, and memory leak detection
- Save plots to
tracker_output_plot.png - Print summary statistics including detected memory leaks
You can also import the CSV into any data analysis tool:
Python/Pandas:
import pandas as pd
df = pd.read_csv('tracker_output.csv')
print(df.describe())Excel/LibreOffice Calc:
- Open the CSV file directly
- Create charts using the time series data
R:
data <- read.csv('tracker_output.csv')
plot(data$TimeMs, data$VmRSS_KB, type='l')The plotting script generates 12 comprehensive visualizations:
- Memory Leak Detection Over Time - Red cumulative leak line with scatter markers for individual leak events; light red background if leaks detected
- System Memory Usage Over Time - VmRSS, VmSize, VmData trends
- Memory Leaks by API Call - Horizontal bar chart showing total leaked memory per API type, or success message if no leaks
- Memory Deltas Per API Call - Shows which calls increase/decrease memory in MB with signed deltas
- L0 Resource Counts - Tracks contexts, queues, modules, kernels over time
- Command Lists and Events - Monitors command list and event pool usage
- Total Memory Allocations - Shows cumulative L0 memory allocations in MB
- Top 10 Most Frequent API Calls - Bar chart sorted by call frequency
- Top 10 API Calls by Memory Impact - Total memory delta in MB per API type
- Top 10 API Calls by Avg Memory per Call - Average memory delta in MB per call
- Cumulative Memory Impact by Top 5 APIs - Memory trends over time for highest-impact APIs
- Individual Leak Events Timeline - Bar chart showing leak magnitude at each time point with total annotation
The plotting script identifies potential memory leaks by:
- Matching create/destroy API pairs (e.g.,
zeContextCreate↔zeContextDestroy) - Flagging destroy operations where memory increased by more than 100KB
- Calculating cumulative leaked memory across the program's execution
- Highlighting leak events with red visual indicators
Example leak detection output:
=== MEMORY LEAK DETECTION ===
⚠️ LEAKS DETECTED!
Total leaked memory: 0.19 MB (192.00 KB)
Number of leak events: 1
Leak events by API:
zeEventPoolDestroy: 1 events, 0.19 MB total, 0.19 MB avg
The example above shows a typical visualization with all 12 plots in a 4x3 grid displaying:
- Row 1: Memory leak detection, system memory trends, and leak summary by API
- Row 2: Memory deltas, resource counts, and command list/event tracking
- Row 3: Total memory allocations, API call frequency, and memory impact rankings
- Row 4: Average memory per call, cumulative API impact, and individual leak events timeline
Command list append operations that have high memory deltas will automatically appear in the memory impact and cumulative trend plots. Memory leaks are prominently displayed with red highlighting and detailed annotations.
- Memory Leak Detection: Automatically identify resources that aren't properly cleaned up
- Detects leaks at both the validation layer level (real-time warnings) and plotting level (post-analysis)
- Pinpoints exact handles and API calls that leaked memory
- Quantifies leak severity with KB/MB measurements
- Performance Analysis: Identify excessive allocations and memory usage patterns
- Resource Optimization: Track resource lifecycle and identify unused resources
- Debugging: Correlate application behavior with resource usage patterns and leak events
- Benchmarking: Compare resource usage and leak counts across different runs or code versions
- CI/CD Integration: Automated tracking of resource usage and leak detection in test pipelines
- Capacity Planning: Understand peak resource requirements for production workloads
- Regression Testing: Detect unexpected increases in resource consumption or new memory leaks
The System Resource Tracker is implemented as a validation layer checker that uses both prologue and epilogue intercepts. When enabled, the validation layer injects calls to the tracker's functions before and after each tracked API call completes.
Key Components:
SystemResourceMetrics: Structure holding system-level metrics (VmSize, VmRSS, etc.) withsignedDiff()method for proper signed delta calculationResourceTracker: Meyers' singleton pattern structure maintaining:- Cumulative resource counts and metrics
- Handle-to-metrics maps for all 11 resource types (contexts, queues, modules, kernels, event pools, command lists, events, fences, images, samplers, memory allocations)
- Per-resource-type leak counters
- Thread-local pre-call metrics storage for append operations
getResourceTracker(): Function-local static singleton accessor ensuring proper initialization ordergetSystemResourceMetrics(): Parses/proc/self/statusto read current system metricscheckForLeak(): Compares creation metrics to destruction metrics and logs warnings if memory increasedwriteCsvData(): Atomic CSV line writer using ostringstream with signed delta supportlogResourceSummary(): Formats and logs cumulative resource usage- Prologue/Epilogue function pairs: 44 functions (22 pairs) for create/destroy operations plus 28 functions (14 pairs) for append operations
The tracker uses multiple mechanisms to ensure thread safety:
- Meyers' Singleton Pattern: Function-local static
ResourceTrackerensures initialization on first use, avoiding static initialization order fiasco - Mutex Protection: All prologue and epilogue functions acquire the tracker's mutex before accessing shared state
- Thread-Local Storage: Pre-call metrics for append operations are stored per-thread to avoid conflicts
- Atomic CSV Writes: CSV lines are built using ostringstream and written atomically
- Handle-to-Metrics Maps: Each resource type has its own map protected by the shared mutex
- Tracking overhead is approximately < 1ms per API call
- System metrics are read by parsing a small text file (
/proc/self/statuson Linux) - CSV writes are buffered and flushed after each call to ensure crash safety
- The tracker only runs when explicitly enabled via environment variable
The System Resource Tracker is Linux-only and relies on /proc/self/status for system resource metrics. The checker is automatically excluded from builds on Windows and macOS.
If the CSV file contains only the header row with no data, this was caused by a static initialization order bug that has been fixed. The current version uses a Meyers' singleton pattern (getResourceTracker()) to ensure proper initialization.
Solution: Rebuild with the latest version and regenerate the CSV file.
If you encounter pandas.errors.ParserError with messages like "Expected X fields, saw Y", this indicates corrupted CSV data. This can happen if:
- The CSV file was generated by an older version with a data race bug (fixed in current version)
- The application crashed during a write operation
Solution: Generate a new CSV file with the current version of the tracker, which uses:
- Atomic line writes to prevent data corruption
- Unique filenames per process (with PID appended) to prevent conflicts
- Meyers' singleton pattern for proper initialization
The plotting script uses a 100KB threshold for leak detection. Small memory increases during destroy operations may be normal due to:
- Internal driver caching
- Deferred cleanup in the driver
- Memory allocation granularity
If you see small leaks (< 100KB), they may not represent actual application leaks. The validation layer's real-time leak detection provides more accurate results as it compares metrics at the exact moment of creation vs destruction.
- Each process creates a unique CSV file with the process ID appended (e.g.,
tracker_pid12345.csv) - CSV output is flushed after each API call to ensure data is captured even if the application crashes
- The tracker uses atomic line writes to prevent data corruption in multi-threaded applications
- The tracker adds minimal overhead (< 1ms per API call)
- CSV files can grow large for long-running applications; consider log rotation for production use
- Both CSV output and debug logging can be enabled simultaneously