Skip to content

XAS specialization using subclasses#7

Merged
mretegan merged 53 commits into
mainfrom
xas-using-inheritance
Jul 3, 2026
Merged

XAS specialization using subclasses#7
mretegan merged 53 commits into
mainfrom
xas-using-inheritance

Conversation

@mretegan

Copy link
Copy Markdown

Another possibility for having fields that are acquisition mode dependent is to use subclasses as suggested here: nexusformat#1352 (comment)

This is a reasonable alternative, as it is unlikely that we will be able to define acquisition modes that will be reused by other techniques (the alternative option proposed above and implemented here: #6).

The two options were discussed in the NIAC https://www.nexusformat.org/Telco_20260211.html

@maurov

maurov commented Feb 19, 2026

Copy link
Copy Markdown

@mretegan @woutdenolf @newville

My understanding of the NIAC minutes of Telco 20260211 is that both options are OK. As said many times, my position is to go for NeXus base classes representing the experimental data collection modes, as I wrote in the famous shared Google document long time ago. This solution has the strong advantage that then the base classes can be reused by other application definitions, like XMCD and other techniques. I do not understand why we are still hesitating on this. Please, let's move on.

As a first start, I propose implementing the basic modes that represent most of the XAS and other techniques' data:

  • NXtrans: transmission, valid for any technique and any wavelength measuring sample absorption;
  • NXtfy: total fluorescence yield;
  • NXpfy: partial fluorescence yield;
  • NXherfd: particular case of partial fluorescence yield with a high resolution spectrometer;

As an alternative, for the fluorescence yield (in view of the electron yield or the optical yield), we may adopt the subclass approach:

  • NXfy base class:
    • NXfy_total
    • NXfy_partial
    • NXfy_herfd

But I think this is just a complication and I would just go for the first approach of base classes for each experimental collection mode.

@newville

Copy link
Copy Markdown
Member

@maurov @mretegan Thanks (and sorry for the delay). I'm OK with either approach. I see the modes as mostly informative, as they suggest (but do not necessarily require) changes in the processing and analysis. But those steps are not really fixed anyway, so mode is mostly a "type hint".

@mretegan

Copy link
Copy Markdown
Author

Thank you both for your input. I think that to have a complete picture, it will make sense to finish both in parallel and to create a few HDF5 examples. I will start working on this tomorrow.

@maurov

maurov commented Feb 19, 2026

Copy link
Copy Markdown

@maurov @mretegan Thanks (and sorry for the delay). I'm OK with either approach. I see the modes as mostly informative, as they suggest (but do not necessarily require) changes in the processing and analysis. But those steps are not really fixed anyway, so mode is mostly a "type hint".

Hi @newville thanks for your feedback. To me, the modes are more than just informative. They tell exactly what is "mu" and how it was measured. For example, Fe K-edge XAS "mu" of Fe2O3 measured in transmission is a different thing than Fe K-edge XAS "mu" of Fe2O3 measured in HERFD. Furthermore, the "minimum required metadata" for transmission are not the same as HERFD.

@maurov

maurov commented Feb 19, 2026

Copy link
Copy Markdown

Thank you both for your input. I think that to have a complete picture, it will make sense to finish both in parallel and to create a few HDF5 examples. I will start working on this tomorrow.

@mretegan for me it is very difficult to read the .nxdl.xml files directly. Would it be possible to have an automatic build of the documentation on the ESRF gitlab server? Or link here two HDF5 files generated following the two approaches. By the way, I do not think that our decision should be based on HDF5 readability, as most likely are our software tools that will read the HDF5 file, not humans.

@newville

Copy link
Copy Markdown
Member

@maurov I would be cautious about being overly strict here.

Yes, data measured in different modes are different, and processing/analysis may want to do different steps based on the mode. And the mode should be stated.

And, yes, HERFD really ought to state energy analyzed (but that is also a trusted value), but is NeXuS going to say that a file is invalid if it states mode="HERFD" but does not correctly spell "analyzed energy" in every group?

@mretegan

Copy link
Copy Markdown
Author

Thank you both for your input. I think that to have a complete picture, it will make sense to finish both in parallel and to create a few HDF5 examples. I will start working on this tomorrow.

@mretegan for me it is very difficult to read the .nxdl.xml files directly. Would it be possible to have an automatic build of the documentation on the ESRF gitlab server? Or link here two HDF5 files generated following the two approaches. By the way, I do not think that our decision should be based on HDF5 readability, as most likely are our software tools that will read the HDF5 file, not humans.

We have something set up, but it builds the main branch, and every time we want to switch, we need to update the CI file https://gitlab.esrf.fr/hdf5/nexus/nxxas.

You can also build locally. Go to the nexus_definitions folder and run (I use uv, but vanilla pip should work):

uv venv --python 3.12
source .venv/bin/activate
uv pip install -r requirements.txt 
make clean; make local
firefox build/manual/build/html/classes/applications/NXxas_new.html

@newville

Copy link
Copy Markdown
Member

@mretegan @maurov Thanks. I don't really disagree with any of the changes here.

But getting bogged down on how to spell different types of emissions seems like both a killer of motivation and a detail that does not dramatically change the downstream use of the normalized mu data. Yes, it is helpful metadata, and can sometimes be important for comparing data. But detection modes can vary and evolve, and you may not know every possible data collection mode, and metadata can be messy, incomplete, or wrong. Still, an nxXAS group with normalized mu(E) is useful.

Please get this merged, and let us start using this. The danger that data collection and analysis applications will define their own HDF5 format and see no need to support this one is very real.

@mretegan mretegan force-pushed the xas-using-inheritance branch from a619317 to 24c6e81 Compare March 15, 2026 22:27
@newville

Copy link
Copy Markdown
Member

@mretegan Why is Iref removed? For many people, the unbreakable coupling of an XAS measurement with a reference channel is vital. This really needs to be supported.

@maurov

maurov commented Mar 18, 2026

Copy link
Copy Markdown

@mretegan Why is Iref removed? For many people, the unbreakable coupling of an XAS measurement with a reference channel is vital. This really needs to be supported.

@newville the idea is to substitute Iref with a ref subgroup that will be itself of NXXas type. In fact, it may happen that the "energy reference spectrum" is not measured as simply the "measured beam intensity after a reference foil, (= Iref)", but measured separately and in another mode/conditions. A typical case (even if it is less common) would be measuring a reference foil with an absorption edge close to the element/edge of interest in fluorescence mode and/or in a separate scan (e.g. for laboratory instruments).

@newville

Copy link
Copy Markdown
Member

@mretegan Yes, of course, a separate spectrum can act as a reference signal.

Many people and many beamlines also often collect a reference spectrum in the same scan. This is more than an additional reference spectrum - it is unambiguously the same energy scan measuring multiple spectra simultaneously, and cannot be separated from the original spectrum.

And, yes, many beamlines at modern facilities have sufficiently stable energies and do not require this. But there are older spectra and older beamlines that do really need this.

@newville

Copy link
Copy Markdown
Member

@mretegan Are NXAtom and NXElement really the same thing? I think of Atom as one actual object, whereas as Element is a category of Atoms. We do spectroscopies on elements, not really isolated individual atoms.

Maybe my concern is that simply that NXAtom (https://manual.nexusformat.org/classes/base_classes/NXatom.html) is hopelessly vague: "a set of atoms". That seems circular, possibly to the point of "what?". It appears to allow "ion" and maybe even a molecule. It has a thing called "position". So, it (they?) are somewhere. But it's a set.... hmmm.

OTOH, with X-ray spectroscopic methods, we really, really mean Element to be "all elemental atoms that have the same number of protons". We need to say "titanium" and mean exactly "22 protons", no more, no less.

It seems reasonable to have a class of Elements of the Periodic Table. That could include variables such as "ionization state", "isotope", and so forth. But there is a finite and enumerable list: maybe ~118 elements, each with maybe 4 ionization states, and maybe 10 isotopes.

It doesn't seem like NXAtom is exactly that... but maybe I'm missing something ;).

@maurov

maurov commented Jun 12, 2026

Copy link
Copy Markdown

Personally, I would push for keeping NXelement instead of NXatom. In fact, in XAS an absorbing element may represent multiple crystallographic sites or atoms in a cluster. From a XAS point of view, an "absorbing atom" is linked to specific atomic positions, while an element is more generic and does not require atomic positions.

FYI some references to NIAC discussions on element/atom.
NXatom was already there and the NIAC proposed to use it instead of our new NXelement
nexusformat#1619 (comment) https://github.com/nexusformat/wiki/blob/master/source/content/Telco_20260429.md
Suggestion to use atom instead of element as field name
nexusformat#1619 (comment) https://github.com/nexusformat/wiki/blob/master/source/content/Telco_20260610.md

As I mentioned, while I am more in favor of keeping element, we should also remember that this definition could be used to store theoretical data, and in that context, atom is more widely used.

@newville

Copy link
Copy Markdown
Member

@mretegan

What does it mean to make Energy optional? How can one use such a spectrum?
@newville Please tell me where you saw that? It might be an error.

Thanks, and sorry for the confusion. I saw b6fd81f

I wasn't fully sure what that was referring too....

Do I understand correctly that Transmission have a reference channel, but other subclasses do not? Why is that?
For transmission, iref is measured at the same time, while it is not necessarily the case for fluorescence, for example. I remember @maurov making the point that in some cases the reference can be another edge altogether. This is why it is not consistently put in all subclasses.

For transmission, a reference spectrum is sometimes measured at the same time. A reference spectrum can also be measured at the same time as other modes, say by scattering some beam before the sample or measuring literally in parallel. Both are done, maybe infrequently, but not never. It seems easy enough to allow "iref" as optional for all subclasses.

I agree that here the use of NXatom is ideal, but if we keep the field name element (which I am very much in favor of), it should not be that troubling. But that being said, I am fine either way.

I think NXatom is going to be really confusing. An atom has specific quantities (say, "position", "oxidation state", "spin", "orbital configuration", "isotope") that the category "element" does not. We definitely mean that none of those quantities are specified.

NXatom was already there and the NIAC proposed to use it instead of our new NXelement

Using "atom" in place of "element" seems really odd to me. The entire point of NeXuS is to give scientifically meaningful names to hierarchical groups. Communication is the goal. Just as the current "absorbed intensity" is hopelessly odd and confusing, using "atom" in place of "element" is poor communication.

As I mentioned, while I am more in favor of keeping element, we should also remember that this definition could be used to store theoretical data, and in that context, atom is more widely used.

Sort of. An XAS spectrum, measured or calculated represents the energy response of a large collection of atoms of the same Z (an "element"). Many (billions at least) of photons are absorbed. Each is absorbed by exactly one atom, but there is no way to distinguish which atom in the illuminated volume does the absorbing. Each absorption event is an isolaated (we're ignoring strong-field effects here) event, each lasting femto-seconds.

Anyway, just because a single in-silico atom could be used as a model does not imply that this is how experimental data should be communicated.

@mretegan

mretegan commented Jul 1, 2026

Copy link
Copy Markdown
Author

I would like to merge the current branch into our main as soon as possible. This branch existed to explore the different ways to specify the acquisition modes, so even if we will still need to change the individual classes, it covers the initial purpouse.

There are a few changes that you should be aware of, and I would suggest that you look in detail at the base class NXxas and NXxas_trans. The others will be updated after.

  1. The element uses the proposed NXelement class.
  2. Even though there was an iref present in NXxas_trans, there is now also a subentry called reference that allows specifying a reference as an entire spectrum. The documentation should be self-explanatory.
  3. The instrument and the monochromator are now recommended fields.
  4. It is now possible to specify a stack of XAS spectra without losing any convenience/simplicity when specifying a single one. Following the discussion last week, it was clear that the previous definition was not covering many cases where some parameter was varied in the experiment, and it made sense to have all those spectra together. With the updated class, it is possible to have time, position on the sample (think mapping experiments), temperature, magnetic field (XMCD/XMLD), pressure, etc. as a second dimension of the dataset, again, with full backwards compatibility when this is not needed.

The rendered page is here: https://nexus-definitions.readthedocs.io/en/xas-using-inheritance

If this looks okay for you, and considering that the transmission definition is the simplest, I would suggest opening a new MR to the NeXus definitions main repo containing only NXxas and NXxas_trans and having the two approved quickly by the NIAC.

@mretegan mretegan changed the title [WIP] XAS specialization using subclasses XAS specialization using subclasses Jul 1, 2026
@mretegan

mretegan commented Jul 2, 2026

Copy link
Copy Markdown
Author

Here are some examples of how this would look for different cases:

  1. dataRank = 1, no nP
entry:NXentry
  definition = "NXxas_trans"
  element:NXelement
    name = "Fe"
  edge:NXabsorption_edge
    name = "K"
  is_experimental = true
  energy:NX_FLOAT[nEnergy]
  intensity:NX_FLOAT[nEnergy]  
  sample:NXsample
    name = "Fe foil"
  instrument:NXinstrument
    i0:NXdetector
      data:NX_NUMBER[nEnergy]
    itrans:NXdetector
      data:NX_NUMBER[nEnergy]
    iref:NXdetector
      data:NX_NUMBER[nEnergy]
  data:NXdata
    @signal = "intensity"
    @axes = "energy"
    energy --> /entry/energy
    intensity --> /entry/intensity
  1. dataRank = 2, an operando time series experiment, for example
entry:NXentry
  definition = "NXxas_trans"
  element:NXelement
    name = "Fe"
  edge:NXabsorption_edge
    name = "K"
  is_experimental = true
  energy:NX_FLOAT[nEnergy]
  intensity:NX_FLOAT[nP, nEnergy]      # dataRank = 2
  sample:NXsample
    temperature:NX_FLOAT[nP]           # varies across the stack
  instrument:NXinstrument
    monochromator:NXmonochromator
      energy:NX_FLOAT[nEnergy]
    i0:NXdetector
      data:NX_NUMBER[nP, nEnergy]
    itrans:NXdetector
      data:NX_NUMBER[nP, nEnergy]
    iref:NXdetector
      data:NX_NUMBER[nP, nEnergy]
  data:NXdata
    @signal = "intensity"
    @axes = ["temperature", "energy"]
    temperature --> /entry/sample/temperature
      @AXISNAME_indices = 0
    energy --> /entry/energy
    intensity --> /entry/intensity
  1. Using reference instead of iref
entry:NXentry
  definition = "NXxas_trans"
  element:NXelement
    name = "Fe"
  edge:NXabsorption_edge
    name = "K"
  energy:NX_FLOAT[nEnergy]
  intensity:NX_FLOAT[nEnergy]
  instrument:NXinstrument
    i0:NXdetector
      data:NX_NUMBER[nEnergy]
    itrans:NXdetector
      data:NX_NUMBER[nEnergy]
    # no iref: reference is not a simultaneous channel on this energy axis
  reference:NXsubentry
    definition = "NXxas_trans"
    element:NXelement
      name = "Cu"                      # different element than the main entry
    edge:NXabsorption_edge
      name = "K"
    energy:NX_FLOAT[nEnergy_ref]
    intensity:NX_FLOAT[nEnergy_ref]
    instrument:NXinstrument
      i0:NXdetector
        data:NX_NUMBER[nEnergy_ref]
      itrans:NXdetector
        data:NX_NUMBER[nEnergy_ref]

@mretegan mretegan requested review from a team, maurov, newville and woutdenolf and removed request for a team July 2, 2026 13:31

@maurov maurov left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mretegan I agree to merge

@newville newville left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks for all of this work and perseverance! I think this will be very useful for lots of kinds of XAS++ methods, and a good starting point for other spectroscopies.

@mretegan mretegan merged commit 8d2f133 into main Jul 3, 2026
3 checks passed
@emilianofonda

Copy link
Copy Markdown

Thanks for introducing rank 2, this will be of great help for standardizing qexafs community exchange of data.
I agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants