THOR is a secure Transformer inference framework that uses homomorphic encryption to run a BERT sequence-classification forward pass over encrypted data. It is built on the DESILO FHE library.
The repository exposes three CLI commands:
encode_weights: generates encoded plaintext weights and masks underlight_plaintexts/forward: runs one validation example and compares the result with the plain PyTorch modelforward_batch: runsforwardfor a range of validation indices across one or more GPUs
- desilofhe 1.13+ (CUDA version 12.1 to 13.0).
- GPU with at least 36 GB of VRAM (the default), or 32 GB when using the
--compactflag.
THOR works with Python 3.14+, and any standard Python package manager can be used; the examples below use Poetry.
poetry installBefore running the encrypted forward pass, you need to prepare the following files:
2-1. A fine-tuned BERT checkpoint file including model.safetensors
Download finetuned_models.tar Google Drive Link, then extract the file into the repository root.
The code also loads bert-base-uncased components from Hugging Face at runtime.
2-2. Encoded checkpoint files under light_plaintexts/, generated from model weights
encode_weights writes the weights, biases, and masks used by the encrypted forward pass.
To generate the light plaintexts for the model:
poetry run encode_weights \
--model_path ./finetuned_models/mrpc/model.safetensorsThis writes default-mode files to ./light_plaintexts/default/. With --compact, it writes compact-mode files to ./light_plaintexts/compact/.
2-3. (Recommended) Cache the encoded files for your selected mode with vmtouch -t light_plaintexts/default/ or vmtouch -t light_plaintexts/compact/. Each directory is around 110 GB, so make sure you have enough RAM for the mode you cache.
You can install vmtouch from your package manager or from source: vmtouch
- Single encrypted forward pass:
poetry run forward - With the memory-efficient engine:
poetry run forward --compact
You can also run a batch over a range of validation indices:
poetry run forward_batch \
--start-idx 0 \
--end-idx 10 \
--devices 0 1 \
--output-dir ./forward-batch-resultsforward_batch creates one subdirectory per target index and skips indices that already have results in the output directory.
All scripts (forward, forward_batch, encode_weights) support a --compact flag that uses a more compact encoding for the internal data structures, which can reduce memory usage during the forward pass and enable it to run on GPUs with 32 GB of VRAM.
Note that the compact encoding is not compatible with the non-compact forward pass, so you must use the --compact flag for both encoding and forward steps if you choose to use it.
# With compact encoding
poetry run encode_weights --compact
poetry run forward --compactEach forward run writes:
result.json, which includes the dataset type, target index, device, key size, prediction, plain-model prediction, label, HE logits, and plain logits- Optional per-layer plots such as
layer-00.pngthroughlayer-11.png
During execution, the script also prints per-stage timing information from thor.timer.Timer.
- HE and PT denote logits from the homomorphically encrypted forward pass and the plain PyTorch model, respectively.
compute timemeasures the core encrypted inference execution time only, whiletotal timeincludes end-to-end overhead such as preprocessing, data transfer, and visualization.
Predicted by HE: 1, Ground Truth: 1
HE A [-3.007829226318608] B [5.926385952893445]
PT A [-3.12514591217041] B [6.013195514678955]
now: 2026-05-13 02:24:13.853284
----------------------------------------------------------------------------------
stage time compute time total time stage name
----------------------------------------------------------------------------------
6m 2.666s 11m 26.870s
----------------------------------------------------------------------------------
The accuracy of the total run of MRPC examples is 84.07% (343/408), and the average compute time is 590.6 seconds on an NVIDIA A100-SXM4-80GB GPU. Note that the original THOR paper reports 84.80% accuracy and 602 seconds compute time.
| Mode | CPU | GPU | Compute Time |
|---|---|---|---|
| Default | Intel Xeon Platinum 8462Y+ | NVIDIA A100-SXM4-80GB | 590.6s |
| Compact | Intel Xeon Platinum 8462Y+ | NVIDIA A100-SXM4-80GB | 650.3s |
| Compact | Intel Core i7-10700K @ 3.80GHz | NVIDIA GeForce RTX 5090 | 362.6s |
For optimal performance, cache the encoded files for your selected mode with vmtouch -t light_plaintexts/default/ or vmtouch -t light_plaintexts/compact/ before running the forward pass.