Skip to content

OpenCL-Benchmark v2.0

Latest

Choose a tag to compare

@ProjectPhysX ProjectPhysX released this 19 Mar 17:42
  • use min(1GB, max_global_buffer) for memory allocation size (thanks @jerryrt) - now older GPUs with <1GB memory will work too
  • more reliable PCIe Gen estimate
  • more robust Intel GPU core/CU detecton via CL_DEVICE_IP_VERSION_INTEL
  • set nvidia_compute_capability only for Nvidia GPUs not Nvidia CPUs
  • fixed TFLOPs/s estimate for AMD CDNA3/4 GPUs
  • fixed Device Name and CU reporting for AMD GPUs with rusticl
  • disabled zero-copy on ARM iGPUs as CL_MEM_USE_HOST_PTR is broken there
  • updated driver download links
  • cosmetics

Example 🖖😏

|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA B300 SXM6 AC                                        |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 580.126.09 (Linux)                                         |
| OpenCL Version | OpenCL C 3.0                                               |
| Compute Units  | 148 at 2032 MHz (18944 cores, 76.988 TFLOPs/s)             |
| Memory, Cache  | 274113 MB VRAM, 4736 KB global / 48 KB local               |
| Buffer Limits  | 68528 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| FP64   Compute   (double, fma  )                      1.184 TFLOPs/s (1/64) |
| FP32   Compute   (float , fma  )                     71.452 TFLOPs/s ( 1x ) |
| FP16   Compute   (half2 , fma  )                     75.201 TFLOPs/s ( 1x ) |
| INT64  Compute   (long  , a*b+c)                      3.714  TIOPs/s (1/24) |
| INT32  Compute   (int   , a*b+c)                     37.736  TIOPs/s (1/2 ) |
| INT16  Compute   (short2, a*b+c)                     34.592  TIOPs/s (1/2 ) |
| INT8   Compute   (char4 , dp4a )                    118.743  TIOPs/s ( 2x ) |
| Memory Bandwidth ( coalesced read      )                       6543.01 GB/s |
| Memory Bandwidth ( coalesced      write)                       6887.38 GB/s |
| Memory Bandwidth (misaligned read      )                       2355.50 GB/s |
| Memory Bandwidth (misaligned      write)                        969.95 GB/s |
| PCIe   Bandwidth (send                 )                          9.86 GB/s |
| PCIe   Bandwidth (   receive           )                          9.70 GB/s |
| PCIe   Bandwidth (        bidirectional)            (Gen3 x16)    8.93 GB/s |
|-----------------------------------------------------------------------------|