Releases: linebender/fearless_simd
v0.5.0
This release has an MSRV of 1.88.
Added
- The
kernel!macro, which creates safe wrappers around SIMD-level-specific kernels so platform intrinsics fromcore::archorstd::archcan be used safely when a token proves the required target features. (#214 by @Shnatsel) - The
approximate_recipmethod on floating-point SIMD vector types. It uses fast hardware reciprocal estimates where available and exact division otherwise. (#204 by @tomcur) SimdMask::from_bitmask,SimdMask::to_bitmask,SimdMask::test, andSimdMask::set, mirroring thestd::simdmask API. (#226 by @Shnatsel)
Changed
- Breaking change: the crate's SIMD extension traits are now sealed, so external crates can no longer implement them for their own types. (#211 by @LaurenzV)
- Breaking change: mask types now have opaque storage and use the new
SimdMasktrait instead ofSimdBase. Masks no longer expose integer-vector APIs such asDeref, indexing,Bytes, publicSimdSplit/SimdCombine,slide,slide_within_blocks, byte conversions, or scalar bit-operator overloads. (#218 by @Shnatsel) - Generated SIMD loads, stores, reference casts, transmute-like conversions, helpers, const-generic functions, and intrinsic calls now use checked wrappers or
kernel!, removing mostunsafefrom generated code. (#232, #233, #234, #235, #236, #237, #238, #239, #244, #245 by @Shnatsel) - Documentation and examples have been expanded and cleaned up for SIMD level tokens, mask types, platform-specific intrinsics, custom transmute wrappers, README consistency, and docs.rs visibility for NEON and WebAssembly APIs. (#213, #221, #222, #230, #240, #243 by @Shnatsel, #224, #225 by @DJMcNab)
Removed
- Breaking change: the
core_archwrapper module and thesafe_wrappersfeature have been removed. Usekernel!withcore::archorstd::archintrinsics instead. (#216 by @Shnatsel)
Full Changelog: v0.4.1...v0.5.0
v0.4.1
This release has an MSRV of 1.88.
Added
- The
interleaveanddeinterleavemethods on integer and floating-point SIMD vector types. (#206 by @Shnatsel)
Fixed
Sse4_2andAvx2now consistently use the x86-64-v2 and x86-64-v3 feature sets for detection, dispatch, and generatedtarget_featureattributes. (#208 by @Shnatsel)
Full Changelog: v0.4.0...v0.4.1
v0.4.0
This release has an MSRV of 1.88.
Added
- All vector types now implement
IndexandIndexMut. (#112 by @Ralith) - 256-bit vector types now use native AVX2 intrinsics on supported platforms. (#115 by @valadaptive)
- 8-bit integer multiplication is now implemented on x86. (#115 by @valadaptive)
- New native-width associated types:
f64sandmask64s. (#125 by @valadaptive) - The bitwise "not" operation on integer vector types. (#130 by @valadaptive)
- The
from_fnmethod on vector types. (#137 by @valadaptive) - The
load_interleavedandstore_interleavedoperations now use native intrinsics on x86, instead of using the fallback implementations. (#140 by @valadaptive) - Add support for
relaxed_simdoperations in WebAssembly. (#143 by @valadaptive) - The
ceilandround_ties_evenoperations on floating-point vector types. (Rust'sroundoperation rounds away from zero in the case of ties. Many architectures do not natively implement that behavior, so it's omitted.) (#145 by @valadaptive) - A
preludemodule, which exports all the traits in the library but not the types. (#149 by @valadaptive) - The
any_true,all_true,any_false, andall_falsemethods on mask types. (#141 by @valadaptive) - Documentation for most traits, vector types, and operations. (#154 by @valadaptive)
- A "shift left by vector" operation, to go with the existing "shift right by vector". (#155 by @valadaptive)
- "Precise" float-to-integer conversions, which saturate out-of-bounds results and convert NaN to 0 across all platforms. (#167 by @valadaptive)
- Add the
slideandslide_within_blocksmethods for shifting elements within a vector. (#164 by @valadaptive) - The
Level::is_fallbackmethod, which lets you check if the current SIMD level is the scalar fallback. This works even ifLevel::Fallbackis not compiled in, always returning false in that case. (#168 by @valadaptive) - Added
store_arraymethods to store SIMD vectors back to memory explicitly using intrinsics. (#181 by @LaurenzV)
Fixed
- Improved the performance for load/store operations of vectors. (#185 by @valadaptive)
- Integer equality comparisons now function properly on x86. Previously, they performed "greater than" comparisons.
(#115 by @valadaptive) - All float-to-integer and integer-to-float conversions are implemented properly on x86, including the precise versions. (#134 by @valadaptive)
- The floating-point
min_preciseandmax_preciseoperations now behave the same way on x86 and WebAssembly as they do on AArch64, returning the non-NaN operand if one operand is NaN and the other is not. Previously, they returned the second operand if either was NaN. (#136 by @valadaptive)
Changed
-
Breaking change: The AVX2 level now requires all features from the x86-64-v3 baseline. (#188 by @Shnatsel)
-
Breaking change:
Level::fallbackhas been removed, replaced withLevel::baseline. (#105 by @DJMcNab)
This corresponds with a change to avoid compiling in support for the fallback level on compilation targets which don't
require it; this is most impactful for binary size on WASM, Apple Silicon Macs or Android.
A consequence of this is that the available variants onLevelare now dependent on the target features you are compiling with.
The fallback level can be restored with theforce_support_fallbackcargo feature. We don't expect this to be necessary outside
of tests. -
Code generation for
selectandunzipoperations on x86 has been improved. (#115 by @valadaptive) -
Breaking change: The native-width associated types (
f32s,u8s, etc.) for theAvx2struct have been widened from 128-bit
types (likef32x4) to 256-bit types (likef32x8). (#123 by @valadaptive) -
Breaking change: All the vector types' inherent methods have been removed. Any remaining functionality has been moved
to trait methods. (#149 by @valadaptive)Some functionality is exposed under different names:
- Instead of the
reinterpretmethods, use thebitcastmethod on theBytestrait. (e.g.foo.reinterpret_i32()
->foo.bitcast::<i32x4<_>>()) - Instead of the
cvtmethods, use theto_intorto_floatconvenience methods on theSimdFloatandSimdInt
traits (e.g.foo.cvt_u32()->foo.to_int::<u32x4<_>>())
Some functionality (such as
splitorcombine) is exposed under new traits. You may use the newpreludemodule to
conveniently import all of the traits. - Instead of the
-
Breaking change: The
maddandmsubmethods have been renamed tomul_addandmul_sub, matching Rust's naming conventions.
(#158 by @Shnatsel) -
Breaking change: the
valfield on SIMD vector types is now private, and vector types are no longer represented as arrays internally. To access a vector type's elements, you can use theIntoorDereftraits to obtain an array, or theas_slice/as_mut_slicemethods to obtain a slice. (#159 by @valadaptive) -
Breaking change: the
Elementtype on theSimdBasetrait is now an associated type instead of a type parameter. This should make it more pleasant to write code that's generic over different vector types. (#170 by @valadaptive) -
The
WasmSimd128token type now wraps the newcrate::core_arch::wasm32::WasmSimd128type. This doesn't expose any new functionality as WASM SIMD128 can only be enabled statically, but matches all the other backend tokens. (#176 by @valadaptive) -
Breaking change: the
SimdFrom::simd_frommethod now takes the SIMD token as the first argument instead of the second. This matches the argument order of thefrom_slice,splat, andfrom_fnmethods onSimdBase. (#180 by @valadaptive)
Removed
Full Changelog: v0.3.0...v0.4.0
v0.3.0
This release has an MSRV of 1.86.
Added
SimdBase::witnessto fetch theSimdimplementation associated with a
generic vector. (#76 by @Ralith)Selectis now available on native-width masks. (#77, #83 by @Ralith)Simd::shrv_*preforms a right shift with shift amount specified
per-lane. (#79 by @Ralith)- The
>>operator is implemented for SIMD vectors. (#79 by @Ralith) - Assignment operator implementations. (#80 by @Ralith)
SimdFromsplatting is available on native-width vectors. (#84 by @Ralith)- Left shift by u32. (#86 by @Ralith)
- Unary negation of signed integers. (#91 by @Ralith)
- A simpler
dispatchmacro to replacesimd_dispatch. (#96, #99 by @Ralith, @DJMcNab)
Fixed
Simdnow requires consistent mask types for native-width
vectors. (#75 by @Ralith)Simdnow requires consistentBytestypes for native-width vectors,
enablingBytes::bitcastin generic code. (#81 by @Ralith)- Scalar fallback now uses wrapping integer addition. (#85 by @Ralith)
Changed
- Breaking:
a.madd(b, c)anda.msub(b, c)now correspond toa * b + canda * b - cfor consistency withmul_addin
std. (#88 by @Ralith)
Previously,maddwasa + b * c, andmsubwasa - b * c.
Therefore, if you previously hada.madd(b, c), that's now written asb.madd(c, a).
And if you hada.msub(b, c), that's now writtenb.madd(-c, a). - Constructors for static SIMD levels are now
const(#93 by @Ralith)
Full Changelog: v0.2.0...v0.3.0
v0.2.0
This release has an MSRV of 1.86.
There has been a complete rewrite of Fearless SIMD.
For some details of the ideas used, see our blog post Towards fearless SIMD, 7 years later.
The repository has also been moved into the Linebender organisation.
New Contributors
- @LaurenzV made their first contribution in #6
- @ajakubowicz-canva made their first contribution in #8
- @no-materials made their first contribution in #21
- @Ralith made their first contribution in #32
- @sagudev made their first contribution in #43
- @DJMcNab made their first contribution in #57
Full Changelog: https://github.com/linebender/fearless_simd/commits/v0.2.0