Add env-based decomposer overloads for DeviceRadixSort descending variants#8099
Add env-based decomposer overloads for DeviceRadixSort descending variants#8099gonidelis wants to merge 17 commits intoNVIDIA:mainfrom
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
This comment has been minimized.
This comment has been minimized.
36988ec to
c4ea000
Compare
This comment has been minimized.
This comment has been minimized.
d359c22 to
7ef72e8
Compare
7ef72e8 to
e2c26b0
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
5c3320f to
c19c993
Compare
|
I rebased the PR to resolve conflicts. |
cub/cub/device/device_radix_sort.cuh
Outdated
| "DecomposerT must be a callable object returning a tuple of references to " | ||
| "arithmetic types"); | ||
|
|
||
| if constexpr (decomposer_check_t::value) |
There was a problem hiding this comment.
why is the static_assert(check); if constexpr(check){} practice good? nvhpc hates the unreachable path after if constexpr pushing a nit-fix but why not drop the if constexpr through and through since static_assert guard it anyways?
| if constexpr (decomposer_check_t::value) | |
| static_assert(decomposer_check_t::value, | |
| "DecomposerT must be a callable object returning a tuple of references to " | |
| "arithmetic types"); | |
| return detail::radix_sort::dispatch<Order>( |
is that wrong?
There was a problem hiding this comment.
why is the static_assert(check); if constexpr(check){} practice good?
It isn't IMO.
why not drop the if constexpr through
The answer to that questions is 2m in front of you once you are back in HQ ;)
That person explained to me that by branching on the static assert condition, we can prevent the compiler from emitting follow-up errors, in case the assertion fails. This is supposed to reduce the output in case of compilation errors and should thus be user friendly. I don't buy this argument, because so far it has confused every developer and the error message in case of a compilation failure is long regardless.
I think the best course of action is to place a requires decomposer_check_t::value at the public API. But that's for a separate PR.
Feel free to remove the if constexpr or leave it. I have learned to ignore it.
This comment has been minimized.
This comment has been minimized.
c439d4b to
4980cc5
Compare
|
I rebased to resolve the conflict. |
| auto error = cub::DeviceRadixSort::SortKeysDescending( | ||
| keys_in.data().get(), | ||
| keys_out.data().get(), | ||
| static_cast<int>(keys_in.size()), | ||
| keys_decomposer_t{}, | ||
| 0, | ||
| sizeof(int) * 8, | ||
| stream_ref); | ||
|
|
||
| // example-end radix-sort-keys-descending-decomposer-bits-env | ||
| stream.sync(); | ||
|
|
||
| REQUIRE(error == cudaSuccess); | ||
| thrust::device_vector<custom_key_t> expected{{9}, {8}, {7}, {6}, {5}, {3}, {0}}; |
There was a problem hiding this comment.
Important: I think we should show the expected outcome in the documentation example:
| auto error = cub::DeviceRadixSort::SortKeysDescending( | |
| keys_in.data().get(), | |
| keys_out.data().get(), | |
| static_cast<int>(keys_in.size()), | |
| keys_decomposer_t{}, | |
| 0, | |
| sizeof(int) * 8, | |
| stream_ref); | |
| // example-end radix-sort-keys-descending-decomposer-bits-env | |
| stream.sync(); | |
| REQUIRE(error == cudaSuccess); | |
| thrust::device_vector<custom_key_t> expected{{9}, {8}, {7}, {6}, {5}, {3}, {0}}; | |
| auto error = cub::DeviceRadixSort::SortKeysDescending( | |
| keys_in.data().get(), | |
| keys_out.data().get(), | |
| static_cast<int>(keys_in.size()), | |
| keys_decomposer_t{}, | |
| 0, | |
| sizeof(int) * 8, | |
| stream_ref); | |
| thrust::device_vector<custom_key_t> expected{{9}, {8}, {7}, {6}, {5}, {3}, {0}}; | |
| // example-end radix-sort-keys-descending-decomposer-bits-env | |
| stream.sync(); | |
| REQUIRE(error == cudaSuccess); |
🥳 CI Workflow Results🟩 Finished in 1h 31m: Pass: 100%/269 | Total: 8d 11h | Max: 1h 23m | Hits: 74%/177147See results here. |
closes #7998