Skip to content

doc: Add learned closure demo, add T-torch to landing page + docs#626

Open
dionhaefner wants to merge 7 commits into
mainfrom
dion/demo-closure
Open

doc: Add learned closure demo, add T-torch to landing page + docs#626
dionhaefner wants to merge 7 commits into
mainfrom
dion/demo-closure

Conversation

@dionhaefner

@dionhaefner dionhaefner commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Relevant issue or PR

n/a

Description of changes

This celebrates the release of Tesseract-Torch 🎉

It adds Tesseract-Torch to all places where Tesseract-JAX is mentioned (except a few demos that explicitly built on top of T-JAX). Also features a brand new demo ("learned closure") that uses Tesseract-Torch.

Testing done

Docs builds pass on CI, demo runs end-to-end on my machine and on CI. Docs preview: https://pasteur-labs-docs--626.com.readthedocs.build/projects/tesseract-core/626/

@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.99%. Comparing base (2b7ac30) to head (965ab1f).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #626      +/-   ##
==========================================
+ Coverage   77.95%   77.99%   +0.04%     
==========================================
  Files          39       39              
  Lines        4635     4635              
  Branches      754      754              
==========================================
+ Hits         3613     3615       +2     
+ Misses        716      714       -2     
  Partials      306      306              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jpbrodrick89

Copy link
Copy Markdown
Contributor

I realise this is a draft, just checking in as I was curious (I've been working a universal differential equation approach which has some similarities), for this case it looks like you have Tesseracts that does a single timestep of a very cheap simulation. Does this not go against our advice for sensible runtime for Tesseract as latency/overhead cost will likely dominate here?

@dionhaefner

Copy link
Copy Markdown
Contributor Author

@jpbrodrick89 I agree, see the most recent push for a complete refactor :)

The new version is still bottlenecked by overhead, but at least it fits the narrative now - in the wild you'd use a much more expensive solver where the balance shifts, this is just the cheap demo you can run in a few minutes. Wdyt?

@dionhaefner dionhaefner marked this pull request as ready for review June 24, 2026 14:51
@dionhaefner dionhaefner requested a review from samalipio June 24, 2026 14:58
Comment on lines +8 to +14
These tests load the solver via ``Tesseract.from_tesseract_api`` (in-process, no
Docker) so they run fast as a local smoke check. The demo notebook itself uses
``Tesseract.from_image`` to serve the solver in a container over HTTP — the same
``apply_tesseract`` call path works either way. This is also the same pattern
that would work with a Fortran solver Tesseract backed by Enzyme or a
hand-written adjoint: the solver just needs apply + VJP with the interface
(u, nu_field, dt) -> u_next. The closure stays ordinary PyTorch.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind it (especially if this is just a private test file not part of the demo itself), but just wanted to note there's a strong AI smell about this paragraph.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do mind, thanks for catching it.

"3. **Train a neural network closure end-to-end** through the containerized solver, differentiating through the entire time-stepping loop.\n",
"4. **Compare the learned closure against baselines** (a pure physics model and a pure ML model).\n",
"\n",
"We will replace the viscosity model of a Burgers' equation solver -- normally a hand-tuned constant -- with a small neural network, and train it so that it recovers the true (unknown) viscosity profile from solution data alone.\n",

@jpbrodrick89 jpbrodrick89 Jun 25, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just use an em dash here and elsewhere (unless ruff/ipynb's forbid them)

Comment on lines +62 to +64
# Spatial derivatives via central differences
dudx = torch.zeros_like(u)
dudx[1:-1] = (u[2:] - u[:-2]) / (2 * DX)

@jpbrodrick89 jpbrodrick89 Jun 25, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might want a comment here about why you're not upwinding, using conservative form, ETDRK methods or anything else more fancy (e.g. low Reynolds, no shocks, role of nu_max etc.)

@jpbrodrick89

Copy link
Copy Markdown
Contributor

Thanks for putting this together @dionhaefner ! I think it reads fairly well but a few things come across as surprising/questionable.

Model behaviour and training

Firstly, the recovered viscosity profile is not very impressive (even though the model appears to be very "predictive"):
image

And the fact that a constant viscosity model does almost as well seems a poor sell for learned closures (and is a big hint your closure would probably train a lot better with a .

Model                         Test MSE
----------------------------------------
Constant viscosity          8.7019e-06
Direct ML                   7.6441e-02
Learned closure             3.7934e-06

Part of this I think comes down to the fact that you allowed your closure to be so open-minded about what viscosity could depend on (it only depends on space but you've allowed it to depend on u and du/dx) and that your training data is fairly limited (N_TRAIN=8 for full run). As such it struggles to "unlearn" any initial dependence encoded in the random weights with an Adam optimiser.

Given that your loss function is a sum of square residuals and that you can fit your entire batch into memory, I'd be very tempted to attempt a least squares solve on point-wise residuals Jacobian using forward-mode AD (it can work, even for MLP's of this size, and least square should very efficiently teach it that u and du/dx are not drivers of nu) or at least use L-BFGS rather than Adam. However, I accept that to Torch users a recipe of how to do the standard train an MLP with Adam using Tesseracts is the story you're committed to telling.

Single step simulator

I think this is the bit we need to think very very carefully about. This demo could essentially become considered a recipe of how to do Neural ODE's with Tesseracts. Do we really want to suggest a single timestep Tesseract be called potentially thousands of times to pass the output of the closure through (especially without any hint of how one could plug checkpointing in)? I'm not sure this is the best use case for Tesseracts unless you have simulators in mind that don't need very many timesteps but each timestep takes a while.

My impression is the problem you're trying to solve is how to do Neural ODE's for differentiable-solvers not written in an ML-friendly framework (e.g. Enzyme). (Because neural ODE's in JAX is essentially a solved problem with diffrax.) Is the huge overhead one will inevitably incur with this approach worth saving the pain of just writing an MLP in Enzyme-compatible Fortran/Julia and wrapping the whole simulator+closure model in a single Tesseract?

If you're just wanting a simple sell for Tesseract-Torch it might be better to just do something more vanilla, where we don't do a true "neural ODE" but instead solve the "inverse problem" of learning a "viscosity" field which we know a priori is only a function of space. In that case you don't need to pass an updated version of the viscosity field at every simulation timestep you can just pass it through at initialisation and the Tesseract could run a whole simulation with torchdiffeq instead of just a timestep. If you're really keen on showing that we can call the Tesseract multiple times in a chain what I would do (and in fact have done; this is a fairly standard neural ODE approach) is chain the simulator to produce snapshots and have training data at those snapshots (e.g. ten per simulation). That way you get extra training data essentially for free and have a good justification for why you are chunking. I think that would be much more compelling and self-explanatory.

Batching

Is it possible to have a tesseract-torch vmap batching rule in the future so your Tesseract can run all training simulations at once instead of with Python list comprehensions? Or is this currently fundamental the torchfun vs autograd limitaiton?

@samalipio samalipio left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants