👌 BaseOutput: Add typing + doc to outputs namespace

mbercx · mbercx · commit d237e9816b46 · 2026-04-03T14:42:39.000+02:00
The `outputs` property previously returned a `SimpleNamespace` — no type information, no
hover docs, no filtered tab completion, no immutability. This commit replaces it with a
typed, documented, frozen namespace.

Introduce the `output_mapping` decorator for per-code output schemas. Each output schema
is decorated with `@output_mapping`, making it a `@dataclass(frozen=True)` whose fields
declare the output name, type, extraction `Spec`, and docstring in one place — a single
source of truth with elegant syntax:

    fermi_energy: float = Spec(...)
    """Fermi energy in eV."""

This is far cleaner than the removed `_output_spec_mapping` approach, which was a simple
dictionary.

To add this static typing to the `outputs` namespace, we add `Generic[T]` to
`BaseOutput`. This means subclasses have to define their own mapping type in the class
definition:

    class PwOutput(BaseOutput[_PwMapping]):

This generic type is returned by the `outputs` namespace, meaning static type checkers
like Pylance can resolve attribute types and docstrings without any extra stubs.

To keep the above syntax as clean as possible, without having to duplicate the
mapping information as e.g. a class attribute, we implement the `_get_mapping_class`
method on `BaseOutput`. Besides extracting the correct output mapping, it also
verifies that the subclass specifies the mapping type at instantiation, which is
appropriate for an abstract base class. Moreover, in the constructor of `BaseOutput`,
a `TypeError` is raised at instantiation if any field in the mapping subclass does not
use a `Spec(...)` default, guarding against silently dropped fields in case a
contributor forgets to add `Spec`.

The `output_mapping` decorator adapts the `__getattribute__` dunder method to keep the
previous (and correct) behavior to raise an `AttributeError` when accessing an output
that was not parsed from the outputs, but now with a clearer message. `__dir__`
is filtered so only available outputs appear in tab completion at runtime (used in
IPython kernels, e.g. Jupyter notebooks). Static type checkers will still show the
missing outputs, but by definition cannot use runtime information so that is
unavoidable.

Complexity is intentionally concentrated in `BaseOutput` and `output_mapping`, which is
maintainer territory. Subclass code is clean and easily extendable.
diff --git a/docs/design/outputs.md b/docs/design/outputs.md
@@ -101,28 +101,44 @@ glom(pw_out.raw_outputs, {'fermi_energy': 'xml.output.band_structure.fermi_energ
 ```
 
 This will return a dictionary: `{'fermi_energy': 0.04425026484437661}`.
-The idea is that every base output has one key-value pair in an `_output_spec_mapping` dictionary defined on each output class:
+
+### Defining outputs
+
+Each extracted output is declared as a typed field on a per-code mapping class decorated with `@output_mapping`:
 
 ```python
-_output_spec_mapping = {
-    <output_name: str>: <glom_spec>,
-    ...
-}
+@output_mapping
+class _PwMapping:
+    fermi_energy: float = Spec("xml.output.band_structure.fermi_energy")
+    """Fermi energy in eV."""
 ```
 
-For example:
+This is a single source of truth: the field declaration carries the output name, type annotation, extraction `Spec`, and docstring.
+Adding a new output means adding one field — nothing else.
+
+The mapping class is connected to the output class via the generic typing syntax:
 
 ```python
-output_mapping = {
-    'fermi_energy': 'xml.output.band_structure.fermi_energy'
-}
+class PwOutput(BaseOutput[_PwMapping]):
+    ...
 ```
 
-!!! warning "Important"
+`BaseOutput` extracts the mapping class from this generic parameter at instantiation, then uses `dataclasses.fields()` to build `_output_spec_mapping` — a dict mapping each field name to its `Spec`.
+The actual extraction runs when the user accesses `outputs`: glom resolves each `Spec` against `raw_outputs`, and the results are used to populate the mapping instance.
+Fields whose `Spec` cannot be resolved retain the `Spec` object as their value — a placeholder that signals "not extracted".
+Accessing such a field on the `outputs` namespace raises `AttributeError` with a clear message.
+
+!!! note
+
+    The `Spec` object here acts as a placeholder for an output that was not produced by the calculation.
+    Other options would be `None`, `dataclasses.MISSING`, or a custom sentinel — all carry the same semantic awkwardness: the field is typed as `float`, but its value is not a float.
+    The initial option we chose was `Annotated[float | None, Spec(...)] = None`, but this has several drawbacks: it implies the field can legitimately be `None` (no more honest), it is considerably more verbose, it requires type-hacking to extract the `Spec` from `Annotated`, and contributors adding new outputs may not be familiar with `Annotated`.
+
+    Using `Spec` as the field default has a key advantage: it unambiguously identifies the value as a glom spec, making it clear both that the field is not yet extracted *and* how it will be extracted.
+    The `@output_mapping` decorator enforces that every field uses a `Spec` as its default, catching mistakes at instantiation.
 
-    In our current design, we have a `BaseOutput` class that defines several "data retrieval" methods.
-    Some of these rely on the fact that the child classes (e.g. `PwOutput`) cannot have state changes after construction.
-    This _should_ be the case, and no mutating methods should be allowed on `BaseOutput` classes.
+The `@output_mapping` decorator applies `@dataclass(frozen=True)` to the mapping class, making the extracted outputs immutable: users cannot overwrite a parsed result.
+Beyond this, `BaseOutput` itself is designed to be stateful but immutable — it stores `raw_outputs` and derived data at construction, and no mutating methods are allowed after that point.
 
 ## Conversion to other libraries
 
@@ -167,7 +183,7 @@ This points to poor design of this class' constructor, but we can still support
 
 !!! note
 
-    This approach requires careful syncing the `_output_spec_mapping` of the output classes to the `conversion_mapping` of the converter classes, and hence the code logic for obtaining is not fully localized.
+    This approach requires careful syncing the extraction specs of the output classes (defined via `@output_mapping` fields) to the `conversion_mapping` of the converter classes, and hence the code logic for obtaining is not fully localized.
     To make things worse, in some cases it also requires understanding the raw outputs (but this can be prevented with clear schemas for the base outputs).
     We're not fully converged on the design here, but some considerations below:
 
@@ -213,12 +229,14 @@ Whose attributes are populated on the fly, based on the **available** outputs.
 
 !!! note
 
-    There are currently two "issues" with the `outputs` namespace.
+    Accessing an output that was not produced by the calculation raises `AttributeError`
+    with a clear message. The `outputs` namespace only exposes available outputs in tab
+    completion (`__dir__` is filtered at runtime), though static type checkers like
+    Pylance will still show all declared fields.
 
-    1. Since an attribute cannot be populated in case the corresponding output isn't there, the output `name` will be missing from the namespace.
-       Hence, the `outputs` namespace only has available outputs.
-    2. The `outputs` namespace can only output the "base outputs", without conversion to e.g. ASE.
-       We're exploring ways to change the default output format [in this issue](https://github.com/aiidateam/qe-tools/issues/113)
+    The `outputs` namespace currently returns base outputs only — conversion to e.g. ASE
+    is not supported via this interface. We're exploring this in
+    [issue #113](https://github.com/aiidateam/qe-tools/issues/113).
 
 ## Custom spec
 
diff --git a/docs/getting_started.md b/docs/getting_started.md
@@ -72,7 +72,8 @@ pw_out.outputs.fermi_energy
 !!! warning
 
     The `outputs` namespace is designed for interactive access.
-    If an output is not available, it will not be in the namespace.
+    If an output is not available, accessing it raises `AttributeError`.
+    Tab completion (e.g. in Jupyter) only shows available outputs — but static type checkers like Pylance will show all declared outputs regardless.
 
 
 Finally, you can obtain a dictionary of all available outputs in your preferred library:
diff --git a/src/qe_tools/outputs/base.py b/src/qe_tools/outputs/base.py
@@ -1,18 +1,79 @@
 """Abstract base class for the outputs of Quantum ESPRESSO."""
 
 from __future__ import annotations
-from functools import cached_property
-from types import SimpleNamespace
-from glom import glom, GlomError
 
 import abc
+import dataclasses
+import typing
+from functools import cached_property
+
+from glom import glom, GlomError, Spec
+
+
+T = typing.TypeVar("T")
+
+
+def output_mapping(cls):
+    """Decorator that defines a typed, frozen output mapping for a Quantum ESPRESSO code.
+
+    Applies `@dataclass(frozen=True)` and injects `__getattribute__` and `__dir__` so that:
+
+    - Accessing a field whose value is still a `Spec` raises `AttributeError` with a clear
+      message (i.e. the output was not parsed).
+    - `dir()` only lists fields that were successfully extracted.
+
+    Each field must declare a `Spec(...)` as its default value:
+
+        fermi_energy: float = Spec("path.to.fermi_energy")
+        \"""Fermi energy in eV.\"""
+    """
 
+    def __getattribute__(self, name):
+        value = object.__getattribute__(self, name)
+        if isinstance(value, Spec):
+            raise AttributeError(f"'{name}' is not available in the parsed outputs.")
+        return value
 
-class BaseOutput(abc.ABC):
+    def __dir__(self):
+        return [
+            name for name, value in self.__dict__.items() if not isinstance(value, Spec)
+        ]
+
+    cls.__getattribute__ = __getattribute__
+    cls.__dir__ = __dir__
+    return dataclasses.dataclass(frozen=True)(cls)
+
+
+class BaseOutput(abc.ABC, typing.Generic[T]):
     """Abstract base class for the outputs of Quantum ESPRESSO."""
 
+    @classmethod
+    def _get_mapping_class(cls) -> type:
+        """Extract the mapping class from the generic parameter.
+
+        Example: PwOutput(BaseOutput[_PwMapping]) → _PwMapping
+        """
+        for base in getattr(cls, "__orig_bases__", []):
+            if typing.get_origin(base) is BaseOutput and (
+                args := typing.get_args(base)
+            ):
+                return args[0]
+        raise TypeError(
+            f"{cls.__name__} must subclass BaseOutput[T] with a decorated output mapping, "
+            "e.g. class PwOutput(BaseOutput[_PwMapping])"
+        )
+
     def __init__(self, raw_outputs: dict):
         self.raw_outputs = raw_outputs
+        self._output_spec_mapping = {}
+
+        for field in dataclasses.fields(self._get_mapping_class()):
+            if not isinstance(field.default, Spec):
+                raise TypeError(
+                    f"{type(self).__name__}.{field.name}: expected a Spec(...) default, "
+                    f"got {field.default!r}"
+                )
+            self._output_spec_mapping[field.name] = field.default
 
     @classmethod
     @abc.abstractmethod
@@ -106,11 +167,6 @@ def list_outputs(self, only_available: bool = True) -> list[str]:
         return output_names
 
     @cached_property
-    def outputs(self) -> SimpleNamespace:
+    def outputs(self) -> T:
         """Namespace with available outputs."""
-        namespace = SimpleNamespace()
-
-        for name in self.list_outputs(only_available=True):
-            setattr(namespace, name, self.get_output(name))
-
-        return namespace
+        return self._get_mapping_class()(**self.get_output_dict())
diff --git a/src/qe_tools/outputs/dos.py b/src/qe_tools/outputs/dos.py
@@ -1,11 +1,11 @@
 """Output of the Quantum ESPRESSO dos.x code."""
 
-from __future__ import annotations
-
 from pathlib import Path
 from typing import TextIO
 
-from qe_tools.outputs.base import BaseOutput
+from glom import Spec
+
+from qe_tools.outputs.base import BaseOutput, output_mapping
 
 from .parsers.base import BaseStdoutParser
 from .parsers.dos import DosParser
@@ -22,19 +22,37 @@ def _determine_spin_type(spin: dict) -> str:
     return "non-spin-polarised"
 
 
-class DosOutput(BaseOutput):
-    """Output of the Quantum ESPRESSO dos.x code."""
+@output_mapping
+class _DosMapping:
+    """Typed outputs of a dos.x calculation."""
+
+    energy: list = Spec("dos.energy")
+    """Energy grid in eV."""
+
+    dos: list = Spec("dos.dos")
+    """Total density of states (states/eV). Not available for spin-polarised calculations."""
+
+    dos_up: list = Spec("dos.dos_up")
+    """Spin-up DOS (states/eV). Not available for non-spin-polarised calculations."""
+
+    dos_down: list = Spec("dos.dos_down")
+    """Spin-down DOS (states/eV). Not available for non-spin-polarised calculations."""
 
-    _output_spec_mapping = {
-        "energy": "dos.energy",
-        "dos": "dos.dos",
-        "dos_up": "dos.dos_up",
-        "dos_down": "dos.dos_down",
-        "fermi_energy": "dos.fermi_energy",
-        "integrated_dos": "dos.integrated_dos",
-        "full_dos": "dos",
-        "spin_type": ("xml.input.spin", _determine_spin_type),
-    }
+    fermi_energy: float = Spec("dos.fermi_energy")
+    """Fermi energy in eV."""
+
+    integrated_dos: list = Spec("dos.integrated_dos")
+    """Integrated DOS."""
+
+    full_dos: dict = Spec("dos")
+    """Full parsed DOS dictionary."""
+
+    spin_type: str = Spec(("xml.input.spin", _determine_spin_type))
+    """Spin type: 'non-spin-polarised', 'spin-polarised', 'non-collinear', or 'spin-orbit'."""
+
+
+class DosOutput(BaseOutput[_DosMapping]):
+    """Output of the Quantum ESPRESSO dos.x code."""
 
     @classmethod
     def from_dir(cls, directory: str | Path):
diff --git a/src/qe_tools/outputs/pw.py b/src/qe_tools/outputs/pw.py
@@ -1,21 +1,22 @@
 """Output of the Quantum ESPRESSO pw.x code."""
 
-from __future__ import annotations
-
 from pathlib import Path
 from typing import TextIO
 
-from qe_tools.outputs.base import BaseOutput
+from glom import Spec
+
+from qe_tools.outputs.base import BaseOutput, output_mapping
 from qe_tools.outputs.parsers.pw import PwStdoutParser, PwXMLParser
 
 from qe_tools import CONSTANTS
 
 
-class PwOutput(BaseOutput):
-    """Output of the Quantum ESPRESSO pw.x code."""
+@output_mapping
+class _PwMapping:
+    """Typed outputs of a pw.x calculation."""
 
-    _output_spec_mapping = {
-        "structure": {
+    structure: dict = Spec(
+        {
             "atomic_species": (
                 "xml.output.atomic_species.species",
                 [lambda species: species["@name"]],
@@ -40,8 +41,12 @@ class PwOutput(BaseOutput):
                     ]
                 ],
             ),
-        },
-        "forces": (
+        }
+    )
+    """Crystal structure: cell vectors (Å), element symbols, and Cartesian positions (Å)."""
+
+    forces: list = Spec(
+        (
             "xml.output.forces",
             lambda forces: [
                 [
@@ -50,8 +55,12 @@ class PwOutput(BaseOutput):
                 ]
                 for atom_index in range(forces["@dims"][1])
             ],
-        ),
-        "stress": (
+        )
+    )
+    """Forces on atoms in eV/Å, shape [n_atoms][3]."""
+
+    stress: list = Spec(
+        (
             "xml.output.stress",
             lambda stress: [
                 [
@@ -60,16 +69,29 @@ class PwOutput(BaseOutput):
                 ]
                 for row_number in range(3)
             ],
-        ),
-        "fermi_energy": (
+        )
+    )
+    """Stress tensor in GPa, shape [3][3]."""
+
+    fermi_energy: float = Spec(
+        (
             "xml.output.band_structure.fermi_energy",
             lambda energy: energy * CONSTANTS.hartree_to_ev,
-        ),
-        "total_energy": (
+        )
+    )
+    """Fermi energy in eV."""
+
+    total_energy: float = Spec(
+        (
             "xml.output.total_energy.etot",
             lambda energy: energy * CONSTANTS.hartree_to_ev,
-        ),
-    }
+        )
+    )
+    """Total energy in eV."""
+
+
+class PwOutput(BaseOutput[_PwMapping]):
+    """Output of the Quantum ESPRESSO pw.x code."""
 
     @classmethod
     def from_dir(cls, directory: str | Path):
diff --git a/tests/outputs/test_base.py b/tests/outputs/test_base.py