Modern graphics cards are just as rough on their memory chips as they are on the GPU that orchestrates them; especially in the age of LLM inference and training. It would be useful to have a live readout of memory temperatures during such operations, where these are supported.
Opening this issue to understand the limitations, as I believe it is possible to at least read temperatures from the PCIe bus for NVIDIA cards, if you know where to look; see for example the https://github.com/ThomasBaruzier/gddr6-core-junction-vram-temps project, which I've adapted to a command line / kernel module in https://github.com/jamthief/jeepers.
Modern graphics cards are just as rough on their memory chips as they are on the GPU that orchestrates them; especially in the age of LLM inference and training. It would be useful to have a live readout of memory temperatures during such operations, where these are supported.
Opening this issue to understand the limitations, as I believe it is possible to at least read temperatures from the PCIe bus for NVIDIA cards, if you know where to look; see for example the https://github.com/ThomasBaruzier/gddr6-core-junction-vram-temps project, which I've adapted to a command line / kernel module in https://github.com/jamthief/jeepers.