Skip to content

Design API for hierarchical parallelism #8362

@NaderAlAwar

Description

@NaderAlAwar

Based on the identified algorithms in #8361, we should design an API that can express the use cases mentioned in #8358 as a single kernel.

It would also be good to study existing solutions in other frameworks mentioned in #6410

This issue can be closed once we have a PoC implementation that can show a performance improvement over the existing custom CUDA C++ kernels mentioned in #8358

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions