This directory documents issues encountered during AI training and inference on a specific Intel platform (Arrow Lake + Arc iGPU + NPU), along with mitigation strategies explored.
⚠️ Disclaimer: All observations here are based on specific hardware/software/model combinations and may not apply to other environments. See detailed disclaimers in each document.
| Component | Model |
|---|---|
| CPU | Intel Core Ultra 9 285H (Arrow Lake, 6P+8E, 14 cores) |
| iGPU | Intel Arc Graphics (8 Xe-core, 128 GB shared memory) |
| NPU | Intel AI Boost (/dev/dri/renderD128) |
| RAM | 128 GB DDR5 (shared with iGPU) |
| Software | Version |
|---|---|
| OS | Ubuntu 26.04 LTS (Linux Kernel 7.0) |
| PyTorch | 2.12.0+xpu |
| OpenVINO | 2026.1 |
| Python | 3.14 |
| oneAPI | 2026.0 (IntelLLVM, MKL, DNNL, TBB) |
| GPU driver | libze-intel-gpu1 26.14.37833.4 |
Use case: OpenVINO + Qwen3-series model inference on Intel Arc integrated GPU.
Key findings (on this platform):
- Sustained GPU utilization >90% → Kernel Panic / segfault
- NaN output → precursor to driver crash (not a quantization precision issue)
- Mitigation: small batch + frequent cooldown intervals + INT8 quantization
| File | Description |
|---|---|
| intel_gpu_aitrain001.md | Full analysis (bilingual) |
Use case: PyTorch XPU backend Transformer training on Arc iGPU.
Key findings (on this platform):
nn.TransformerEncoderLayer/F.scaled_dot_product_attentionbackward pass may crash- Error types:
RuntimeError(negative dimension / integer overflow) orIndexError(index out of range) - Even tiny 22M-parameter models (hidden=512, heads=8, batch=4) may crash
- AMP BF16 may escalate to full system freeze (driver crash)
- Attention-free architectures (Gated CNN, Conv2d) run stably
| File | Description |
|---|---|
| xpu_backward_issue.md | Full analysis (bilingual) |
| Scenario | GPU | Framework | Status | Recommended Alternative |
|---|---|---|---|---|
| Transformer training | iGPU | PyTorch XPU | ❌ Backward crash | Gated CNN / CPU training |
| CNN training | iGPU | PyTorch XPU | ✅ Stable | — |
| LLM inference (Qwen3) | iGPU | OpenVINO | INT8 + small batch + cooldown | |
| LLM inference | NPU | OpenVINO | ✅ Tested | Requires static shapes |
| LLM inference | CPU | OpenVINO | ✅ Stable | Slowest performance |
- Intel GPU Stability Guide (GitHub) — External mirror of this repo
- PyTorch XPU Documentation
- OpenVINO Documentation
These documents record observations from a specific test environment for reference only. Different hardware, driver, or framework versions may yield different results.