Skip to content

Commit d237e98

Browse files
committed
πŸ‘Œ BaseOutput: Add typing + doc to outputs namespace
The `outputs` property previously returned a `SimpleNamespace` β€” no type information, no hover docs, no filtered tab completion, no immutability. This commit replaces it with a typed, documented, frozen namespace. Introduce the `output_mapping` decorator for per-code output schemas. Each output schema is decorated with `@output_mapping`, making it a `@dataclass(frozen=True)` whose fields declare the output name, type, extraction `Spec`, and docstring in one place β€” a single source of truth with elegant syntax: fermi_energy: float = Spec(...) """Fermi energy in eV.""" This is far cleaner than the removed `_output_spec_mapping` approach, which was a simple dictionary. To add this static typing to the `outputs` namespace, we add `Generic[T]` to `BaseOutput`. This means subclasses have to define their own mapping type in the class definition: class PwOutput(BaseOutput[_PwMapping]): This generic type is returned by the `outputs` namespace, meaning static type checkers like Pylance can resolve attribute types and docstrings without any extra stubs. To keep the above syntax as clean as possible, without having to duplicate the mapping information as e.g. a class attribute, we implement the `_get_mapping_class` method on `BaseOutput`. Besides extracting the correct output mapping, it also verifies that the subclass specifies the mapping type at instantiation, which is appropriate for an abstract base class. Moreover, in the constructor of `BaseOutput`, a `TypeError` is raised at instantiation if any field in the mapping subclass does not use a `Spec(...)` default, guarding against silently dropped fields in case a contributor forgets to add `Spec`. The `output_mapping` decorator adapts the `__getattribute__` dunder method to keep the previous (and correct) behavior to raise an `AttributeError` when accessing an output that was not parsed from the outputs, but now with a clearer message. `__dir__` is filtered so only available outputs appear in tab completion at runtime (used in IPython kernels, e.g. Jupyter notebooks). Static type checkers will still show the missing outputs, but by definition cannot use runtime information so that is unavoidable. Complexity is intentionally concentrated in `BaseOutput` and `output_mapping`, which is maintainer territory. Subclass code is clean and easily extendable.
1 parent f2e5c52 commit d237e98

6 files changed

Lines changed: 199 additions & 67 deletions

File tree

β€Ždocs/design/outputs.mdβ€Ž

Lines changed: 37 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -101,28 +101,44 @@ glom(pw_out.raw_outputs, {'fermi_energy': 'xml.output.band_structure.fermi_energ
101101
```
102102

103103
This will return a dictionary: `{'fermi_energy': 0.04425026484437661}`.
104-
The idea is that every base output has one key-value pair in an `_output_spec_mapping` dictionary defined on each output class:
104+
105+
### Defining outputs
106+
107+
Each extracted output is declared as a typed field on a per-code mapping class decorated with `@output_mapping`:
105108

106109
```python
107-
_output_spec_mapping = {
108-
<output_name: str>: <glom_spec>,
109-
...
110-
}
110+
@output_mapping
111+
class _PwMapping:
112+
fermi_energy: float = Spec("xml.output.band_structure.fermi_energy")
113+
"""Fermi energy in eV."""
111114
```
112115

113-
For example:
116+
This is a single source of truth: the field declaration carries the output name, type annotation, extraction `Spec`, and docstring.
117+
Adding a new output means adding one field β€” nothing else.
118+
119+
The mapping class is connected to the output class via the generic typing syntax:
114120

115121
```python
116-
output_mapping = {
117-
'fermi_energy': 'xml.output.band_structure.fermi_energy'
118-
}
122+
class PwOutput(BaseOutput[_PwMapping]):
123+
...
119124
```
120125

121-
!!! warning "Important"
126+
`BaseOutput` extracts the mapping class from this generic parameter at instantiation, then uses `dataclasses.fields()` to build `_output_spec_mapping` β€” a dict mapping each field name to its `Spec`.
127+
The actual extraction runs when the user accesses `outputs`: glom resolves each `Spec` against `raw_outputs`, and the results are used to populate the mapping instance.
128+
Fields whose `Spec` cannot be resolved retain the `Spec` object as their value β€” a placeholder that signals "not extracted".
129+
Accessing such a field on the `outputs` namespace raises `AttributeError` with a clear message.
130+
131+
!!! note
132+
133+
The `Spec` object here acts as a placeholder for an output that was not produced by the calculation.
134+
Other options would be `None`, `dataclasses.MISSING`, or a custom sentinel β€” all carry the same semantic awkwardness: the field is typed as `float`, but its value is not a float.
135+
The initial option we chose was `Annotated[float | None, Spec(...)] = None`, but this has several drawbacks: it implies the field can legitimately be `None` (no more honest), it is considerably more verbose, it requires type-hacking to extract the `Spec` from `Annotated`, and contributors adding new outputs may not be familiar with `Annotated`.
136+
137+
Using `Spec` as the field default has a key advantage: it unambiguously identifies the value as a glom spec, making it clear both that the field is not yet extracted *and* how it will be extracted.
138+
The `@output_mapping` decorator enforces that every field uses a `Spec` as its default, catching mistakes at instantiation.
122139

123-
In our current design, we have a `BaseOutput` class that defines several "data retrieval" methods.
124-
Some of these rely on the fact that the child classes (e.g. `PwOutput`) cannot have state changes after construction.
125-
This _should_ be the case, and no mutating methods should be allowed on `BaseOutput` classes.
140+
The `@output_mapping` decorator applies `@dataclass(frozen=True)` to the mapping class, making the extracted outputs immutable: users cannot overwrite a parsed result.
141+
Beyond this, `BaseOutput` itself is designed to be stateful but immutable β€” it stores `raw_outputs` and derived data at construction, and no mutating methods are allowed after that point.
126142

127143
## Conversion to other libraries
128144

@@ -167,7 +183,7 @@ This points to poor design of this class' constructor, but we can still support
167183

168184
!!! note
169185

170-
This approach requires careful syncing the `_output_spec_mapping` of the output classes to the `conversion_mapping` of the converter classes, and hence the code logic for obtaining is not fully localized.
186+
This approach requires careful syncing the extraction specs of the output classes (defined via `@output_mapping` fields) to the `conversion_mapping` of the converter classes, and hence the code logic for obtaining is not fully localized.
171187
To make things worse, in some cases it also requires understanding the raw outputs (but this can be prevented with clear schemas for the base outputs).
172188
We're not fully converged on the design here, but some considerations below:
173189

@@ -213,12 +229,14 @@ Whose attributes are populated on the fly, based on the **available** outputs.
213229

214230
!!! note
215231

216-
There are currently two "issues" with the `outputs` namespace.
232+
Accessing an output that was not produced by the calculation raises `AttributeError`
233+
with a clear message. The `outputs` namespace only exposes available outputs in tab
234+
completion (`__dir__` is filtered at runtime), though static type checkers like
235+
Pylance will still show all declared fields.
217236

218-
1. Since an attribute cannot be populated in case the corresponding output isn't there, the output `name` will be missing from the namespace.
219-
Hence, the `outputs` namespace only has available outputs.
220-
2. The `outputs` namespace can only output the "base outputs", without conversion to e.g. ASE.
221-
We're exploring ways to change the default output format [in this issue](https://github.com/aiidateam/qe-tools/issues/113)
237+
The `outputs` namespace currently returns base outputs only β€” conversion to e.g. ASE
238+
is not supported via this interface. We're exploring this in
239+
[issue #113](https://github.com/aiidateam/qe-tools/issues/113).
222240

223241
## Custom spec
224242

β€Ždocs/getting_started.mdβ€Ž

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,8 @@ pw_out.outputs.fermi_energy
7272
!!! warning
7373

7474
The `outputs` namespace is designed for interactive access.
75-
If an output is not available, it will not be in the namespace.
75+
If an output is not available, accessing it raises `AttributeError`.
76+
Tab completion (e.g. in Jupyter) only shows available outputs β€” but static type checkers like Pylance will show all declared outputs regardless.
7677

7778

7879
Finally, you can obtain a dictionary of all available outputs in your preferred library:

β€Žsrc/qe_tools/outputs/base.pyβ€Ž

Lines changed: 67 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,79 @@
11
"""Abstract base class for the outputs of Quantum ESPRESSO."""
22

33
from __future__ import annotations
4-
from functools import cached_property
5-
from types import SimpleNamespace
6-
from glom import glom, GlomError
74

85
import abc
6+
import dataclasses
7+
import typing
8+
from functools import cached_property
9+
10+
from glom import glom, GlomError, Spec
11+
12+
13+
T = typing.TypeVar("T")
14+
15+
16+
def output_mapping(cls):
17+
"""Decorator that defines a typed, frozen output mapping for a Quantum ESPRESSO code.
18+
19+
Applies `@dataclass(frozen=True)` and injects `__getattribute__` and `__dir__` so that:
20+
21+
- Accessing a field whose value is still a `Spec` raises `AttributeError` with a clear
22+
message (i.e. the output was not parsed).
23+
- `dir()` only lists fields that were successfully extracted.
24+
25+
Each field must declare a `Spec(...)` as its default value:
26+
27+
fermi_energy: float = Spec("path.to.fermi_energy")
28+
\"""Fermi energy in eV.\"""
29+
"""
930

31+
def __getattribute__(self, name):
32+
value = object.__getattribute__(self, name)
33+
if isinstance(value, Spec):
34+
raise AttributeError(f"'{name}' is not available in the parsed outputs.")
35+
return value
1036

11-
class BaseOutput(abc.ABC):
37+
def __dir__(self):
38+
return [
39+
name for name, value in self.__dict__.items() if not isinstance(value, Spec)
40+
]
41+
42+
cls.__getattribute__ = __getattribute__
43+
cls.__dir__ = __dir__
44+
return dataclasses.dataclass(frozen=True)(cls)
45+
46+
47+
class BaseOutput(abc.ABC, typing.Generic[T]):
1248
"""Abstract base class for the outputs of Quantum ESPRESSO."""
1349

50+
@classmethod
51+
def _get_mapping_class(cls) -> type:
52+
"""Extract the mapping class from the generic parameter.
53+
54+
Example: PwOutput(BaseOutput[_PwMapping]) β†’ _PwMapping
55+
"""
56+
for base in getattr(cls, "__orig_bases__", []):
57+
if typing.get_origin(base) is BaseOutput and (
58+
args := typing.get_args(base)
59+
):
60+
return args[0]
61+
raise TypeError(
62+
f"{cls.__name__} must subclass BaseOutput[T] with a decorated output mapping, "
63+
"e.g. class PwOutput(BaseOutput[_PwMapping])"
64+
)
65+
1466
def __init__(self, raw_outputs: dict):
1567
self.raw_outputs = raw_outputs
68+
self._output_spec_mapping = {}
69+
70+
for field in dataclasses.fields(self._get_mapping_class()):
71+
if not isinstance(field.default, Spec):
72+
raise TypeError(
73+
f"{type(self).__name__}.{field.name}: expected a Spec(...) default, "
74+
f"got {field.default!r}"
75+
)
76+
self._output_spec_mapping[field.name] = field.default
1677

1778
@classmethod
1879
@abc.abstractmethod
@@ -106,11 +167,6 @@ def list_outputs(self, only_available: bool = True) -> list[str]:
106167
return output_names
107168

108169
@cached_property
109-
def outputs(self) -> SimpleNamespace:
170+
def outputs(self) -> T:
110171
"""Namespace with available outputs."""
111-
namespace = SimpleNamespace()
112-
113-
for name in self.list_outputs(only_available=True):
114-
setattr(namespace, name, self.get_output(name))
115-
116-
return namespace
172+
return self._get_mapping_class()(**self.get_output_dict())

β€Žsrc/qe_tools/outputs/dos.pyβ€Ž

Lines changed: 33 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
"""Output of the Quantum ESPRESSO dos.x code."""
22

3-
from __future__ import annotations
4-
53
from pathlib import Path
64
from typing import TextIO
75

8-
from qe_tools.outputs.base import BaseOutput
6+
from glom import Spec
7+
8+
from qe_tools.outputs.base import BaseOutput, output_mapping
99

1010
from .parsers.base import BaseStdoutParser
1111
from .parsers.dos import DosParser
@@ -22,19 +22,37 @@ def _determine_spin_type(spin: dict) -> str:
2222
return "non-spin-polarised"
2323

2424

25-
class DosOutput(BaseOutput):
26-
"""Output of the Quantum ESPRESSO dos.x code."""
25+
@output_mapping
26+
class _DosMapping:
27+
"""Typed outputs of a dos.x calculation."""
28+
29+
energy: list = Spec("dos.energy")
30+
"""Energy grid in eV."""
31+
32+
dos: list = Spec("dos.dos")
33+
"""Total density of states (states/eV). Not available for spin-polarised calculations."""
34+
35+
dos_up: list = Spec("dos.dos_up")
36+
"""Spin-up DOS (states/eV). Not available for non-spin-polarised calculations."""
37+
38+
dos_down: list = Spec("dos.dos_down")
39+
"""Spin-down DOS (states/eV). Not available for non-spin-polarised calculations."""
2740

28-
_output_spec_mapping = {
29-
"energy": "dos.energy",
30-
"dos": "dos.dos",
31-
"dos_up": "dos.dos_up",
32-
"dos_down": "dos.dos_down",
33-
"fermi_energy": "dos.fermi_energy",
34-
"integrated_dos": "dos.integrated_dos",
35-
"full_dos": "dos",
36-
"spin_type": ("xml.input.spin", _determine_spin_type),
37-
}
41+
fermi_energy: float = Spec("dos.fermi_energy")
42+
"""Fermi energy in eV."""
43+
44+
integrated_dos: list = Spec("dos.integrated_dos")
45+
"""Integrated DOS."""
46+
47+
full_dos: dict = Spec("dos")
48+
"""Full parsed DOS dictionary."""
49+
50+
spin_type: str = Spec(("xml.input.spin", _determine_spin_type))
51+
"""Spin type: 'non-spin-polarised', 'spin-polarised', 'non-collinear', or 'spin-orbit'."""
52+
53+
54+
class DosOutput(BaseOutput[_DosMapping]):
55+
"""Output of the Quantum ESPRESSO dos.x code."""
3856

3957
@classmethod
4058
def from_dir(cls, directory: str | Path):

β€Žsrc/qe_tools/outputs/pw.pyβ€Ž

Lines changed: 39 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,22 @@
11
"""Output of the Quantum ESPRESSO pw.x code."""
22

3-
from __future__ import annotations
4-
53
from pathlib import Path
64
from typing import TextIO
75

8-
from qe_tools.outputs.base import BaseOutput
6+
from glom import Spec
7+
8+
from qe_tools.outputs.base import BaseOutput, output_mapping
99
from qe_tools.outputs.parsers.pw import PwStdoutParser, PwXMLParser
1010

1111
from qe_tools import CONSTANTS
1212

1313

14-
class PwOutput(BaseOutput):
15-
"""Output of the Quantum ESPRESSO pw.x code."""
14+
@output_mapping
15+
class _PwMapping:
16+
"""Typed outputs of a pw.x calculation."""
1617

17-
_output_spec_mapping = {
18-
"structure": {
18+
structure: dict = Spec(
19+
{
1920
"atomic_species": (
2021
"xml.output.atomic_species.species",
2122
[lambda species: species["@name"]],
@@ -40,8 +41,12 @@ class PwOutput(BaseOutput):
4041
]
4142
],
4243
),
43-
},
44-
"forces": (
44+
}
45+
)
46+
"""Crystal structure: cell vectors (Γ…), element symbols, and Cartesian positions (Γ…)."""
47+
48+
forces: list = Spec(
49+
(
4550
"xml.output.forces",
4651
lambda forces: [
4752
[
@@ -50,8 +55,12 @@ class PwOutput(BaseOutput):
5055
]
5156
for atom_index in range(forces["@dims"][1])
5257
],
53-
),
54-
"stress": (
58+
)
59+
)
60+
"""Forces on atoms in eV/Γ…, shape [n_atoms][3]."""
61+
62+
stress: list = Spec(
63+
(
5564
"xml.output.stress",
5665
lambda stress: [
5766
[
@@ -60,16 +69,29 @@ class PwOutput(BaseOutput):
6069
]
6170
for row_number in range(3)
6271
],
63-
),
64-
"fermi_energy": (
72+
)
73+
)
74+
"""Stress tensor in GPa, shape [3][3]."""
75+
76+
fermi_energy: float = Spec(
77+
(
6578
"xml.output.band_structure.fermi_energy",
6679
lambda energy: energy * CONSTANTS.hartree_to_ev,
67-
),
68-
"total_energy": (
80+
)
81+
)
82+
"""Fermi energy in eV."""
83+
84+
total_energy: float = Spec(
85+
(
6986
"xml.output.total_energy.etot",
7087
lambda energy: energy * CONSTANTS.hartree_to_ev,
71-
),
72-
}
88+
)
89+
)
90+
"""Total energy in eV."""
91+
92+
93+
class PwOutput(BaseOutput[_PwMapping]):
94+
"""Output of the Quantum ESPRESSO pw.x code."""
7395

7496
@classmethod
7597
def from_dir(cls, directory: str | Path):

0 commit comments

Comments
Β (0)