fix: pandas 3.0 compatibility and ADOPT optimizer health-check (#374)#403
Merged
fix: pandas 3.0 compatibility and ADOPT optimizer health-check (#374)#403
Conversation
Member
Demirrr
commented
Apr 29, 2026
- Fix entity/relation CSV loading: use per-column dtype dict instead of dtype=str to prevent pandas>=3.0 from coercing the integer row-index to strings (dicee/static_funcs.py)
- Lift pandas version cap from <=2.3.3 to >=2.1.0 in setup.py
- Fix ADOPT optimizer: call _cuda_graph_capture_health_check() with hasattr fallback for cross-version PyTorch compatibility
- Add regression tests (tests/test_regression_pandas3.py):
- unit tests for integer-keyed entity/relation vocab loading
- documents the root cause of dtype=str index coercion
- end-to-end training + KGE load cycle asserting integer indices
- Fix entity/relation CSV loading: use per-column dtype dict instead of dtype=str to prevent pandas>=3.0 from coercing the integer row-index to strings (dicee/static_funcs.py) - Lift pandas version cap from <=2.3.3 to >=2.1.0 in setup.py - Fix ADOPT optimizer: call _cuda_graph_capture_health_check() with hasattr fallback for cross-version PyTorch compatibility - Add regression tests (tests/test_regression_pandas3.py): - unit tests for integer-keyed entity/relation vocab loading - documents the root cause of dtype=str index coercion - end-to-end training + KGE load cycle asserting integer indices
pandas 3.0.2 uses StringDtype(na_value=nan) instead of object when dtype=str is passed to read_csv. The test now checks that the index is not an integer dtype rather than asserting a specific string dtype.
When a prior test causes a CUDA kernel crash, torch.cuda.is_available() continues to return True but the runtime is corrupt. Lightning's isolate_rng -> _collect_rng_states calls torch.cuda.get_rng_state_all() whenever is_available() is True, even with accelerator='cpu', which re-triggers the broken init and raises RuntimeError before training starts. Changes in dicee/trainer/dice_trainer.py: - Add _cuda_is_usable(): probes actual CUDA init via get_device_name(0) to detect a broken runtime without relying on is_available() - Add _disable_cuda_in_process(): patches torch.cuda.is_available to return False so Lightning's RNG isolation skips CUDA state collection - PL Trainer init: call _cuda_is_usable() before constructing the Trainer; if CUDA is broken, call _disable_cuda_in_process() and force accelerator='cpu' - Guard get_device_name() diagnostic print in DICE_Trainer.__init__ with try/except to prevent crash on startup Fixes test_swa.py::TestSWA::test_k_vs_all_ema (torch.AcceleratorError: CUDA error: unspecified launch failure)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.