fix: pandas 3.0 compatibility and ADOPT optimizer health-check (#374) by Demirrr · Pull Request #403 · dice-group/dice-embeddings

Demirrr · 2026-04-29T06:24:20Z

Fix entity/relation CSV loading: use per-column dtype dict instead of dtype=str to prevent pandas>=3.0 from coercing the integer row-index to strings (dicee/static_funcs.py)
Lift pandas version cap from <=2.3.3 to >=2.1.0 in setup.py
Fix ADOPT optimizer: call _cuda_graph_capture_health_check() with hasattr fallback for cross-version PyTorch compatibility
Add regression tests (tests/test_regression_pandas3.py):
- unit tests for integer-keyed entity/relation vocab loading
- documents the root cause of dtype=str index coercion
- end-to-end training + KGE load cycle asserting integer indices

- Fix entity/relation CSV loading: use per-column dtype dict instead of dtype=str to prevent pandas>=3.0 from coercing the integer row-index to strings (dicee/static_funcs.py) - Lift pandas version cap from <=2.3.3 to >=2.1.0 in setup.py - Fix ADOPT optimizer: call _cuda_graph_capture_health_check() with hasattr fallback for cross-version PyTorch compatibility - Add regression tests (tests/test_regression_pandas3.py): - unit tests for integer-keyed entity/relation vocab loading - documents the root cause of dtype=str index coercion - end-to-end training + KGE load cycle asserting integer indices

pandas 3.0.2 uses StringDtype(na_value=nan) instead of object when dtype=str is passed to read_csv. The test now checks that the index is not an integer dtype rather than asserting a specific string dtype.

When a prior test causes a CUDA kernel crash, torch.cuda.is_available() continues to return True but the runtime is corrupt. Lightning's isolate_rng -> _collect_rng_states calls torch.cuda.get_rng_state_all() whenever is_available() is True, even with accelerator='cpu', which re-triggers the broken init and raises RuntimeError before training starts. Changes in dicee/trainer/dice_trainer.py: - Add _cuda_is_usable(): probes actual CUDA init via get_device_name(0) to detect a broken runtime without relying on is_available() - Add _disable_cuda_in_process(): patches torch.cuda.is_available to return False so Lightning's RNG isolation skips CUDA state collection - PL Trainer init: call _cuda_is_usable() before constructing the Trainer; if CUDA is broken, call _disable_cuda_in_process() and force accelerator='cpu' - Guard get_device_name() diagnostic print in DICE_Trainer.__init__ with try/except to prevent crash on startup Fixes test_swa.py::TestSWA::test_k_vs_all_ema (torch.AcceleratorError: CUDA error: unspecified launch failure)

Demirrr added 3 commits April 29, 2026 08:23

fix: update pandas3 dtype assertion to accept StringDtype

d68c26b

pandas 3.0.2 uses StringDtype(na_value=nan) instead of object when dtype=str is passed to read_csv. The test now checks that the index is not an integer dtype rather than asserting a specific string dtype.

Demirrr merged commit ac64984 into develop Apr 29, 2026
3 checks passed

Demirrr deleted the feature/374-pandas3-compat branch April 29, 2026 11:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: pandas 3.0 compatibility and ADOPT optimizer health-check (#374)#403

fix: pandas 3.0 compatibility and ADOPT optimizer health-check (#374)#403
Demirrr merged 3 commits intodevelopfrom
feature/374-pandas3-compat

Demirrr commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Demirrr commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant