⚠️ The SYCL back-end is currently in an experimental state. Bugs are to be expected. Please report any issue you encounter.
At the moment alpaka's SYCL back-end can only be used together with Intel oneAPI (latest stable version) and supports x86 CPUs, Intel GPUs and Intel FPGAs.
icpx is identified as IntelLLVM by CMake. Compilers not based on icpx will be detected by CMake which will then print an error message.
alpaka_ACC_SYCL_ENABLE: set toONto enable the SYCL back-end. Requires the activation of at least one oneAPI hardware target (see below).oneDPL_DIR: always required. Set to the CMake path of your oneDPL installation. Example:/opt/intel/oneapi/dpl/2022.1.0/lib/cmake/oneDPL.
The following CMake flags can be set for CPUs:
alpaka_SYCL_ONEAPI_CPU: set toONto enable compilation for CPUs. Relies on the Intel OpenCL CPU runtime and ahead-of-time compiler, which support Intel and AMD CPUs.alpaka_SYCL_ONEAPI_CPU_ISA: the Intel ISA to compile for. Look at the possible--marchoptions listed in the output ofopencl-aot --help. Default:avx2.
Note: Intel FPGAs cannot be targeted together with other Intel hardware. This is because of different compiler trajectories as explained in the Intel oneAPI documentation.
alpaka_SYCL_ONEAPI_FPGA: set toONto enable compilation for Intel FPGAs.alpaka_SYCL_ONEAPI_FPGA_MODE: the Intel FPGA compilation mode. Valid values areemulation,simulationandhardwarewhich correspond to the Intel high-level synthesis targets with the same name. Default:emulation.alpaka_SYCL_ONEAPI_FPGA_BOARD: the Intel FPGA board to compile for. Ignored inemulationmode but important forsimulationandhardwaremodes. Valid values arepac_a10(Arria 10 GX),pac_s10(Stratix 10 SX / D5005) andpac_s10_usm(Stratix 10 SX / D5005 with restricted USM). Default:pac_a10.alpaka_SYCL_ONEAPI_FPGA_BSP: the Intel FPGA board support package (BSP). Valid values areintel_a10gx_pac(Arria 10 GX) andintel_s10sx_pac(Stratix 10 SX / D5005). It is also possible to supply the full path to the BSP here ifaocis unable to look this up by itself. Note that the BSP must be chosen according to the selected board (see previous bullet point).
alpaka_SYCL_ONEAPI_GPU: set toONto enable compilation for Intel GPUs.alpaka_SYCL_ONEAPI_GPU_DEVICES: semicolon-separated list of one or more Intel GPUs to compile for. The possible values for the devices are listed in the UsersManual under the flag-fsycl-targets. Default:intel_gpu_pvc. Note: currently only one target at a time can be specified (limitation of the Intel Compiler)
Using the SYCL back-end always requires the following flags:
-fsycl(compiler and linker)-fsycl-standard=2020(compiler)
Device-side printing is possible with printf, it calls sycl::ext::oneapi::experimental::printf that emulates the standard one. This is an extension of the SYCL standard, still in an experimental state, therefore may not always work correctly.
#include <alpaka/standalone/CpuSycl.hpp>in your C++ code.- Add the following flags:
-fsycl-targets=spir64_x86_64(compiler and linker): to enable CPU compilation. Note: If you are using multiple SYCL hardware targets (like CPU and GPU) separate them by comma here.-Xsycl-target-backend=spir64_x86_64 "-march=<ISA>"(linker): to choose the Intel ISA to compile for. Check the output ofopencl-aot --helpand look for the possible values of the--marchflag.
#include <alpaka/standalone/FpgaSyclIntel.hpp>in your C++ code.- Add the following flags:
-fintelfpga(compiler and linker): to enable FPGA compilation. Note: This flag is not compatible with the-fsycl-targetsflag required for the other possible targets; Intel FPGAs thus cannot be used together with other hardware targets.-DALPAKA_FPGA_EMULATION(compiler): to notify alpaka about compiling for the Intel FPGAemulationtarget. Required for-Xsemulatorand forbidden for-Xssimulationand-Xshardware.-Xsemulator(compiler and linker): to compile for Intel'semulationhigh-level synthesis target. Mutually exclusive with-Xssimulationand-Xshardware.-Xssimulation(compiler and linker): to compile for Intel'ssimulationhigh-level synthesis target. Mutually exclusive with-Xsemulatorand-Xshardware.-Xshardware(compiler and linker): to compile for Intel'shardwarehigh-level synthesis target. Mutually exclusive with-Xsemulatorand-Xssimulation.-Xsboard=<BSP>:<BOARD>(compiler and linker): to compile for a specific FPGA board. Required when either-Xssimulationor-Xshardwarehave been passed (no effect for-Xsemulator). Possible combinations for<BSP>:<BOARD>areintel_a10gx_pac:pac_a10(Arria 10 GX),intel_s10sx_pac:pac_s10(Stratix 10 SX / D5005) andintel_s10sx_pac:pac_s10_usm(Stratix 10 SX / D5005 with restricted USM).
#include <alpaka/standalone/GpuSyclIntel.hpp>in your C++ code.- Add the following flags:
-fsycl-targets=intel_gpu_pvc(compiler and linker): to enable GPU compilation. Note: If you are using multiple SYCL hardware targets (like CPU and GPU) separate them by comma here.
In contrast to the other back-ends the SYCL back-end comes with multiple different accelerators which should be chosen according to your requirements:
alpaka::AccCpuSyclfor targeting Intel and AMD CPUs. In contrast to the other CPU back-ends this will use Intel's OpenCL implementation for CPUs under the hood.alpaka::AccFpgaSyclIntelfor targeting Intel FPGAs.alpaka::AccGpuSyclIntelfor targeting Intel GPUs.
These can be used interchangeably (some restrictions apply - see below) with the non-experimental alpaka accelerators to compile an existing alpaka code for SYCL-capable hardware.
- The Intel FPGA back-end cannot be used together with the Intel CPU / GPU back-ends. This is because of the different compilation trajectory required for FPGAs and is unlikely to be fixed anytime soon. See here for an explanation.
- Similar to the CUDA and HIP back-ends the SYCL back-end only supports up to three kernel dimensions.
- Some Intel GPUs do not support the
doubletype for device code. alpaka will not check this. You can enable software emulation fordoubleprecision types withSee Intel's FAQ for more information.export IGC_EnableDPEmulation=1 export OverrideDefaultFP64Settings=1
- The FPGA back-end does not support atomics. alpaka will not check this.
- Shared memory works but on the GPU it is very slow.
- The latest Intel OpenCL CPU runtime does not work properly. Some tests (
atomicTest,blockSharedTest,blockSharedSharingTestandwarpTest) fail with aPI_ERROR_OUT_OF_RESOURCES. The only runtime version that seems to work is 2022.14.8.0.04 (can be downloaded here) apart from a bug withall_of_group/any_of_groupthat requires the warp size being equal to the block size as a workaround.
Most SYCL targets support multiple sub-group sizes. There is a trait to specify at compile time the sub-group size to use for a kernel. For example, if MyKernel requires a sub-group size of 32, this can be declared specialising the alpaka::trait::WarpSize:
struct MyKernel { ... };
template<typename TAcc>
struct alpaka::trait::WarpSize<MyKernel, TAcc>
: std::integral_constant<std::uint32_t, 32>
{
};This can be extended to kernels that support multiple sub-group sizes at compile time:
template<std::uint32_t TWarpSize>
struct MyKernel { ... };
template<std::uint32_t TWarpSize, typename TAcc>
struct alpaka::trait::WarpSize<MyKernel<TWarpSize>, TAcc>
: std::integral_constant<std::uint32_t, TWarpSize>
{
};The default behaviour, when no sub-group size is specified, is to let the back-end compiler pick the preferred size.
Before launching a kernel with a compile-time sub-group size the user should query the sizes supported by the device, and choose accordingly. If the device does not support the requested size, the SYCL runtime will throw a synchronous exception.
During just-in-time (JIT) compilation this guarantees that a kernel is compiled only for the sizes supported by the device. During ahead-of-time (AOT) compilation this is not enough, because the device is not known at compile time. The SYCL specification mandates that the back-end compilers should not fail if a kernel uses unsupported features, like unsupported sub-group sizes. Unfortunately the Intel OpenCL CPU and GPU compilers currently fail with a hard error. To work around this limitation, use the preprocessor macros defined when compiling AOT for the new SYCL targets to enable the compilation only for the sub-group sizes supported by each device.
Note: while the CPU OpenCL back-end supports a sub-group size of 64, Intel's SYCL implementation currently does not. To avoid issues with the sub-group primitives, alpaka always considers the sub-group size of 64 as not supported by the device.