You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
π BaseOutput: Add typing + doc to outputs namespace
The `outputs` property previously returned a `SimpleNamespace` β no type information, no
hover docs, no filtered tab completion, no immutability. This commit replaces it with a
typed, documented, frozen namespace.
Introduce the `output_mapping` decorator for per-code output schemas. Each output schema
is decorated with `@output_mapping`, making it a `@dataclass(frozen=True)` whose fields
declare the output name, type, extraction `Spec`, and docstring in one place β a single
source of truth with elegant syntax:
fermi_energy: float = Spec(...)
"""Fermi energy in eV."""
This is far cleaner than the removed `_output_spec_mapping` approach, which was a simple
dictionary.
To add this static typing to the `outputs` namespace, we add `Generic[T]` to
`BaseOutput`. This means subclasses have to define their own mapping type in the class
definition:
class PwOutput(BaseOutput[_PwMapping]):
This generic type is returned by the `outputs` namespace, meaning static type checkers
like Pylance can resolve attribute types and docstrings without any extra stubs.
To keep the above syntax as clean as possible, without having to duplicate the
mapping information as e.g. a class attribute, we implement the `_get_mapping_class`
method on `BaseOutput`. Besides extracting the correct output mapping, it also
verifies that the subclass specifies the mapping type at instantiation, which is
appropriate for an abstract base class. Moreover, in the constructor of `BaseOutput`,
a `TypeError` is raised at instantiation if any field in the mapping subclass does not
use a `Spec(...)` default, guarding against silently dropped fields in case a
contributor forgets to add `Spec`.
The `output_mapping` decorator adapts the `__getattribute__` dunder method to keep the
previous (and correct) behavior to raise an `AttributeError` when accessing an output
that was not parsed from the outputs, but now with a clearer message. `__dir__`
is filtered so only available outputs appear in tab completion at runtime (used in
IPython kernels, e.g. Jupyter notebooks). Static type checkers will still show the
missing outputs, but by definition cannot use runtime information so that is
unavoidable.
Complexity is intentionally concentrated in `BaseOutput` and `output_mapping`, which is
maintainer territory. Subclass code is clean and easily extendable.
`BaseOutput` extracts the mapping class from this generic parameter at instantiation, then uses `dataclasses.fields()` to build `_output_spec_mapping` β a dict mapping each field name to its `Spec`.
127
+
The actual extraction runs when the user accesses `outputs`: glom resolves each `Spec` against `raw_outputs`, and the results are used to populate the mapping instance.
128
+
Fields whose `Spec` cannot be resolved retain the `Spec` object as their value β a placeholder that signals "not extracted".
129
+
Accessing such a field on the `outputs` namespace raises `AttributeError` with a clear message.
130
+
131
+
!!! note
132
+
133
+
The `Spec` object here acts as a placeholder for an output that was not produced by the calculation.
134
+
Other options would be `None`, `dataclasses.MISSING`, or a custom sentinel β all carry the same semantic awkwardness: the field is typed as `float`, but its value is not a float.
135
+
The initial option we chose was `Annotated[float | None, Spec(...)] = None`, but this has several drawbacks: it implies the field can legitimately be `None` (no more honest), it is considerably more verbose, it requires type-hacking to extract the `Spec` from `Annotated`, and contributors adding new outputs may not be familiar with `Annotated`.
136
+
137
+
Using `Spec` as the field default has a key advantage: it unambiguously identifies the value as a glom spec, making it clear both that the field is not yet extracted *and* how it will be extracted.
138
+
The `@output_mapping` decorator enforces that every field uses a `Spec` as its default, catching mistakes at instantiation.
122
139
123
-
In our current design, we have a `BaseOutput` class that defines several "data retrieval" methods.
124
-
Some of these rely on the fact that the child classes (e.g. `PwOutput`) cannot have state changes after construction.
125
-
This _should_ be the case, and no mutating methods should be allowed on `BaseOutput` classes.
140
+
The `@output_mapping` decorator applies `@dataclass(frozen=True)` to the mapping class, making the extracted outputs immutable: users cannot overwrite a parsed result.
141
+
Beyond this, `BaseOutput` itself is designed to be stateful but immutable β it stores `raw_outputs` and derived data at construction, and no mutating methods are allowed after that point.
126
142
127
143
## Conversion to other libraries
128
144
@@ -167,7 +183,7 @@ This points to poor design of this class' constructor, but we can still support
167
183
168
184
!!! note
169
185
170
-
This approach requires careful syncing the `_output_spec_mapping` of the output classes to the `conversion_mapping` of the converter classes, and hence the code logic for obtaining is not fully localized.
186
+
This approach requires careful syncing the extraction specs of the output classes (defined via `@output_mapping` fields) to the `conversion_mapping` of the converter classes, and hence the code logic for obtaining is not fully localized.
171
187
To make things worse, in some cases it also requires understanding the raw outputs (but this can be prevented with clear schemas for the base outputs).
172
188
We're not fully converged on the design here, but some considerations below:
173
189
@@ -213,12 +229,14 @@ Whose attributes are populated on the fly, based on the **available** outputs.
213
229
214
230
!!! note
215
231
216
-
There are currently two "issues" with the `outputs` namespace.
232
+
Accessing an output that was not produced by the calculation raises `AttributeError`
233
+
with a clear message. The `outputs` namespace only exposes available outputs in tab
234
+
completion (`__dir__` is filtered at runtime), though static type checkers like
235
+
Pylance will still show all declared fields.
217
236
218
-
1. Since an attribute cannot be populated in case the corresponding output isn't there, the output `name` will be missing from the namespace.
219
-
Hence, the `outputs` namespace only has available outputs.
220
-
2. The `outputs` namespace can only output the "base outputs", without conversion to e.g. ASE.
221
-
We're exploring ways to change the default output format [in this issue](https://github.com/aiidateam/qe-tools/issues/113)
237
+
The `outputs` namespace currently returns base outputs only β conversion to e.g. ASE
238
+
is not supported via this interface. We're exploring this in
0 commit comments