nanoIM

Chat collapses time. nanoIM restores it.
A small from-scratch interaction-model lab for the temporal-aliasing bottleneck.

_{Controlled symbolic evidence for a representation bottleneck. Trains from scratch, no API keys, fully offline.}

What This Is

nanoIM is a compact research artifact for one question:

If two interactions flatten to the same final chat transcript but require different next actions, what can a transcript-only model know?

The answer is structural. A transcript-only model cannot separate the pair because its input is identical. A native micro-turn model can, if the missing variables are preserved: silence, timing, overlap, interruptions, visual cues, policy boundaries, and asynchronous tool events.

The repository is intentionally small enough to read like nanoGPT, but complete enough to generate data, train models from scratch, evaluate baselines, render proof galleries, and audit leakage.

Layer	What is included	Why it matters
Formal spine	`paper/temporal_aliasing.md`, `paper/the_interaction_bottleneck.md`	States the impossibility argument and experimental method.
Data lab	symbolic 200 ms-style micro-turn traces across 10 task families	Tests interaction states that disappear in chat transcripts.
Fair baselines	transcript majority, transcript naive Bayes, transcript oracle upper bound, rule harness, field-lookup table	Separates structural transcript failure from weak modeling.
Trainable models	tiny GRU and tiny Transformer, stream/time embeddings, action head	Proves the effect with from-scratch local models.
Anti-cheat gates	alias equality tests, no transcript leakage, destructive controls	Makes shortcut explanations harder.
Reviewer proof	machine-readable scorecards, timeline gallery, honest limitations	Gives skeptics direct artifacts to inspect.

The Thesis

Turn-based chat is a lossy serialization format. It turns interaction into alternating messages and erases the variables that often decide whether a system should wait, speak, yield, interrupt, request approval, resume after a tool result, or integrate a background event.

nanoIM studies that bottleneck as temporal aliasing:

flatten(trace_a) == flatten(trace_b)
target_action(trace_a) != target_action(trace_b)

Any policy that only receives flatten(trace) must assign the same action distribution to both traces. If the required actions differ, it must fail at least one member of the pair. That is the core claim, and the repo turns it into executable tests.

One-Minute Result

Held-out test split, hard suite:

System	Temporal aliasing accuracy	Paired separation	Notes
Transcript majority	`0.300`	`0.000`	transcript only
Transcript naive Bayes	`0.467`	`0.000`	transcript only
Transcript oracle upper bound	`0.500`	`0.000`	structural ceiling for any transcript-only policy
Rule harness (stream-aware)	`1.000`	`1.000`	hand-written timing/stream rules
Field-lookup table (stream-aware)	`1.000`	`1.000`	memorized dict over stream fields
MicroTurn Tiny GRU	`1.000 +/- 0.0000`	`1.000 +/- 0.0000`	from scratch, seeds 3/7/11
MicroTurn Tiny Transformer	`1.000`	`1.000`	from scratch, seed 7

Held-out test split, noisy counterbalanced suite:

System	Temporal aliasing accuracy	Paired separation	Notes
Transcript majority	`0.400`	`0.000`	transcript only
Transcript naive Bayes	`0.4625`	`0.000`	transcript only
Transcript oracle upper bound	`0.500`	`0.000`	structural ceiling for any transcript-only policy
Rule harness (stream-aware)	`1.000`	`1.000`	hand-written timing/stream rules
Field-lookup table (stream-aware)	`1.000`	`1.000`	memorized dict over stream fields
MicroTurn Tiny GRU	`1.000 +/- 0.0000`	`1.000 +/- 0.0000`	from scratch, seeds 3/7/11
No-timing ablation (GRU)	`0.650 +/- 0.0000`	`0.300 +/- 0.0000`	timing removed from the input

Bootstrap 95% confidence intervals on per-example correctness from the hard-suite test split (10,000 resamples, seed 7):

Method	TAA point estimate	TAA 95% CI
Transcript majority	`0.300`	`[0.217, 0.383]`
Tiny GRU (seed 7)	`1.000`	`[1.000, 1.000]`
Tiny Transformer (seed 7)	`1.000`	`[1.000, 1.000]`
Rule harness	`1.000`	`[1.000, 1.000]`

Paired permutation tests (10,000 resamples, seed 7) for each stream-aware method against transcript majority all return p = 0.0000, consistent with p < 1e-4. Reproduce with uv run python -m nanoim.experiments.statistical_tests; full output in reports/statistical_tests.json.

Read this as a statement about the representation, not the model. Transcript-only policies are provably capped at 0.500 on balanced alias pairs: the oracle can assign only one action per identical transcript, so it must miss one member of every pair (paired separation 0.000). Every stream-aware method I tried (rules, lookup table, GRU, Transformer) saturates at 1.000. nanoIM does not claim the neural model beats symbolic baselines; in this controlled lab it ties them, which is the point. The trained model is an existence proof that the signal is learnable from scratch. The contribution is the 0.500 → 1.000 gap from restoring the streams (state and timing) the transcript discards.

What "held-out test split" does and does not mean. Train/val/test use disjoint transcript wordings, so a transcript-only model cannot win by memorizing text templates. But the streams that decide the action are identical across splits; only the filler wording differs. The test numbers therefore measure within-distribution reproduction plus robustness to decoy noise, not generalization to novel interaction structure or unseen task families. nanoIM makes no generalization claim for the learned model beyond this controlled setup.

The field-lookup table is deliberately included as a strong baseline, not a strawman (reports/hard_field_lookup_scorecard.json, reports/noisy_field_lookup_scorecard.json; reproduce with uv run python -m nanoim.eval --baseline field_lookup ...). That a plain dict saturates the task once it sees the streams is the point: the bottleneck is the transcript representation, not model capacity.

Transcript-only LLM baselines (the structural ceiling held by modern open-weights models)

The structural argument predicts that any deterministic transcript-only policy is capped at TAA 0.500 on balanced alias pairs with paired separation 0.000 under this construction, regardless of capacity. I tested that against seven modern open-weights LLMs spanning five families and 3B–30B parameters, each given the same flattened transcript and asked for the next action. The parser uses max_tokens=2048 and falls back to the reasoning field for models that emit chain-of-thought (Qwen3 and GLM-4.7 do); every prediction is sourced from a real model output, not from a default. Zero parse failures across 840 LLM calls.

Model	Family	Params	Type	TAA	Paired separation	n
Phi-4 14B	Microsoft	14B	dense	`0.133`	`0.000`	120
Llama 3.2 3B	Meta	3B	dense	`0.208`	`0.000`	120
Qwen3 4B	Alibaba	4B	dense	`0.133`	`0.000`	120
Qwen3 14B MLX	Alibaba	14B	dense	`0.158`	`0.000`	120
Qwen3 30B-A3B MLX	Alibaba	30B (3B active)	MoE	`0.183`	`0.000`	120
Gemma 4 26B	Google	26B	dense	`0.208`	`0.000`	120
GLM-4.7 Flash	Zhipu AI	30B class	reasoning (2026)	`0.117`	`0.000`	120

420 alias pairs evaluated, 420/420 with paired separation = 0.000. Every tested model, across family, parameter count, dense/MoE/reasoning architecture, and chain-of-thought behavior, assigns the same action to both members of every alias pair. The 19K-parameter from-scratch GRU that sees the streams reaches 1.000 / 1.000 on the same data. In this benchmark, the bottleneck is the representation, not the model. Per-model JSON in reports/llm_baseline_*.json; aggregate in reports/llm_baseline_summary.json. Reproduce with uv run python -m nanoim.experiments.llm_baseline --base-url http://127.0.0.1:11434/v1 --model qwen3:4b --out reports/llm_baseline_qwen3_4b.json (Ollama) or substitute the LM Studio endpoint and model name.

Input-field importance (occlusion at inference time)

Same trained checkpoints, no retraining; each input field is blanked at inference time. The drop in temporal aliasing accuracy measures how much each architecture actually consults each field at decision time. Both architectures agree on the ranking:

Field blanked	GRU TAA drop	Transformer TAA drop
`user_audio_state`	`-0.350`	`-0.400`
`visual_event`	`-0.183`	`-0.200`
`background_event`	`-0.100`	`-0.100`
`t_ms` / `dt_ms`	`-0.050`	`-0.050`
`model_audio_state`	`-0.050`	`-0.050`
`policy_event`	`-0.050`	`-0.050`

user_audio_state is the single most load-bearing field. That fits the data: most task families turn on whether the user is speaking, silent, hesitating, or interrupting at the critical timestep. This is different from the retraining no_X ablations (where the model can adapt around a missing field): the question here is what the trained model relies on at inference. Reproduce with uv run python -m nanoim.experiments.occlusion --out reports/occlusion_analysis.json. Full per-field, per-architecture detail in reports/occlusion_analysis.json.

Generalization (leave-one-family-out)

I separately tested whether each method generalizes to interaction structures it did not see during training. For each of the ten task families, I retrained a fresh tiny GRU on the other nine families and evaluated on the held-out family. The result is honest about which methods generalize and which do not:

Hard suite, mean across 10 held-out families × 3 seeds:

Method	Mean LOFO TAA	Mean LOFO paired separation
Rule harness (invariant)	`1.000`	`1.000`
Tiny GRU	`0.344`	`0.156`
Tiny Transformer	`0.256`	`0.011`
Field-lookup table	`0.300`	`0.000`
Transcript majority	`0.300`	n/a

Noisy suite, mean across 10 held-out families × 3 seeds:

Method	Mean LOFO TAA	Mean LOFO paired separation
Rule harness (invariant)	`1.000`	`1.000`
Tiny GRU	`0.429`	`0.075`
Tiny Transformer	`0.317`	`0.033`
Field-lookup table	`0.400`	`0.000`
Transcript majority	`0.400`	n/a

The rule harness is invariant by construction (its fit is a no-op), so it transfers trivially on every held-out family and the release verifier asserts this. The field-lookup table collapses to its global default action on stream keys it never saw. Both trained architectures underperform the transcript-majority baseline on the hard suite. The more expressive Transformer collapses further than the GRU on both suites, which is the right signal: more capacity means more family-specific overfitting, not more generalization. Source: reports/lofo_sweep.json and reports/lofo_sweep_noisy.json.

This is the kind of negative result that should ship with the positive one. The repo's claim is about the representation and the invariant rule harness, not about novel-structure generalization from the learned model. Reproduce with uv run python -m nanoim.experiments.lofo --out reports/lofo_sweep.json. Per-family numbers are in reports/lofo_sweep.json.

Proof Spine

flowchart LR
    A["Symbolic micro-turn generator"] --> B["Paired alias examples"]
    B --> C["Anti-leakage tests"]
    C --> D["Transcript baselines (oracle bound 0.50)"]
    C --> E["Rule harness / field-lookup"]
    C --> F["MicroTurn Tiny GRU / Transformer"]
    F --> G["Ablations and destructive controls"]
    D --> H["Machine-readable scorecards"]
    E --> H
    G --> H
    H --> I["Timeline gallery"]

Quickstart

Requires Python 3.11 or 3.12. Install uv first, then:

uv sync --dev
uv run python -m nanoim --help
uv run python -m nanoim.data.generate --suite mini --out data/mini.jsonl
uv run python -m nanoim.train --config configs/tiny.yaml
uv run python -m nanoim.eval --checkpoint runs/tiny/best.pt --suite mini --out reports/scorecard.json
uv run python -m nanoim.viz.render --run reports/example_trace.jsonl --out reports/timeline.html
uv run pytest
uv build
uv run python -m nanoim.package_audit --out reports/package_audit.json

After activating .venv, the spec's bare commands work as written:

source .venv/bin/activate
python -m nanoim --help
python -m nanoim.data.generate --suite mini --out data/mini.jsonl
python -m nanoim.train --config configs/tiny.yaml
python -m nanoim.eval --checkpoint runs/tiny/best.pt --suite mini --out reports/scorecard.json
python -m nanoim.viz.render --run reports/example_trace.jsonl --out reports/timeline.html
pytest

Start Here For Reviewers

If you have ten minutes and want to attack the claim, start here:

docs/reviewer_guide.md - how to attack the claim using the committed artifacts.
CLAIM_LEDGER.md - public claims tied to exact evidence, reviewer attacks, and non-claims.
SOURCE_MAP.md - where every artifact, command, model, report, and test lives.
reports/failure_report.md - preserved failure analysis and interpretation.
reports/timeline_gallery.html - local visual proof gallery for same-transcript/different-timing pairs.

The commands below refresh committed artifact paths. Use --out /tmp/... for no-diff smoke runs.

Ten-Minute Command Path

If you are reviewing the claim skeptically, read these in order:

paper/temporal_aliasing.md for the toy impossibility argument.
tests/test_aliasing.py for the anti-leakage contract.
reports/scorecard.md for the human-readable result summary.
reports/noisy_sweep.json and reports/adversarial_controls.json for timing and shortcut checks.
reports/timeline_gallery.html for side-by-side same-transcript/different-timing examples.
reports/failure_report.md for known failures and interpretation.
HOSTILE_REVIEW.md and AUTHOR_RESPONSE.md for the strongest objections and current answers.

Task Families

nanoIM ships at least 10 interaction families, each designed so a transcript can be identical while the correct micro-turn action differs:

Family	Missing variable	Example action contrast
yield detection	silence duration and user audio state	`WAIT` vs `SPEAK`
hesitation	filled pauses and non-final prosody proxy	`WAIT` vs `BACKCHANNEL`
barge-in recovery	overlap and interruption onset	`STOP_SPEAKING` vs `WAIT`
self-correction	revision timing	`WAIT` vs `SPEAK`
backchannel timing	short acknowledgement placement	`BACKCHANNEL` vs `ASK_CLARIFICATION`
clarification timing	visual/text confidence cue	`WAIT` vs `ASK_CLARIFICATION`
visual cue trigger	non-verbal event stream	`ASK_CLARIFICATION` vs `SPEAK`
approval interruption	policy boundary during speech	`REQUEST_APPROVAL` vs `CALL_TOOL`
async tool weaving	background tool result timing	`CALL_TOOL` vs `WAIT`
background result integration	external result arrives mid-turn	`RESUME_WITH_RESULT` vs `SPEAK`

Example Alias Pair

Both examples flatten to the same transcript:

User: book the 3 pm slot

But they require different actions:

Trace	Preserved micro-turn state	Correct action
A	user is still hesitating, no approval cue	`WAIT`
B	user has yielded, approval cue is active	`REQUEST_APPROVAL`

The transcript-only model sees identical input. The micro-turn model receives the streams that decide the action.

Architecture

flowchart TB
    subgraph Data["data layer"]
        G["nanoim.data.generate"]
        P["nanoim.data.pilot"]
        HP["nanoim.data.human_protocol"]
    end

    subgraph Features["representation"]
        TF["transcript-only features"]
        MF["micro-turn stream/time features"]
    end

    subgraph Policies["policies"]
        TM["transcript majority / NB / oracle"]
        RH["rule harness"]
        GRU["tiny GRU"]
        TX["tiny Transformer"]
    end

    subgraph Evidence["evidence"]
        EV["nanoim.eval"]
        SW["nanoim.experiments.sweep"]
        CT["nanoim.controls"]
        VZ["nanoim.viz.render"]
        VR["nanoim.verify_release"]
    end

    G --> TF
    G --> MF
    P --> MF
    HP --> MF
    TF --> TM
    MF --> RH
    MF --> GRU
    MF --> TX
    TM --> EV
    RH --> EV
    GRU --> EV
    TX --> EV
    EV --> SW
    EV --> CT
    EV --> VZ
    SW --> VR
    CT --> VR

Models

The default model is deliberately small:

stream embeddings for symbolic audio/text/visual/background/policy state;
time/delta-time embeddings for 200 ms-style micro-turns;
a compact GRU backbone with per-timestep action supervision;
an action head for both critical-decision and trajectory metrics;
optional tiny Transformer comparator with the same feature boundary.

Everything trains locally from scratch. No paid API keys, frontier APIs, hosted models, or hidden services are used.

Metrics

Scorecards are machine-readable JSON. The main release metrics include:

Metric	Meaning
`transcript_upper_bound`	best possible transcript-only ceiling under paired aliases
`temporal_aliasing_accuracy`	critical-action accuracy on alias pairs
`paired_separation_rate`	fraction of pairs where the policy separates both members correctly
`action_accuracy`	ordinary critical-action accuracy
`sequence_action_accuracy`	per-timestep trajectory accuracy
`wait_vs_speak_accuracy`	separation of wait/speak timing decisions
`false_interrupt_rate` / `missed_interrupt_rate`	interruption handling errors
`barge_in_recovery_time`	recovery latency after barge-in
`clarification_timing_score`	timing-sensitive clarification behavior
`tool_weaving_score`	background/tool-event integration
`turn_based_baseline_delta`	micro-turn gain over transcript-only baseline

Anti-Cheat Contract

nanoIM is only interesting if the transcript baseline is fair. The repo therefore checks:

alias-pair members share identical flattened transcripts;
transcript features exclude pair IDs, task families, labels, event labels, timestamps, split IDs, and timing annotations;
generated hard/noisy artifacts match deterministic generator output;
destructive controls collapse performance when timing/stream/label structure is broken;
public docs distinguish symbolic proof from real audio/video competence.

The relevant tests and checks are tests/test_aliasing.py, nanoim.verify_release (asserts the scientific contract — transcript oracle bound 0.50, paired separation 0.00, and large destructive-control drops), and nanoim.security_audit.

Ablations And Controls

Hard-suite multi-seed input ablations:

Variant	Temporal aliasing accuracy	Paired separation	Main observed hit
Full	`1.000 +/- 0.0000`	`1.000 +/- 0.0000`	baseline
Small model	`0.983 +/- 0.0236`	`0.967 +/- 0.0471`	capacity halved; hard suite remains learnable
No audio	`0.847 +/- 0.0039`	`0.694 +/- 0.0079`	turn ownership / hesitation
No visual	`0.942 +/- 0.0068`	`0.883 +/- 0.0136`	clarification timing
No background	`0.894 +/- 0.0039`	`0.789 +/- 0.0079`	tool weaving
No policy	`0.939 +/- 0.0079`	`0.878 +/- 0.0157`	approval/tool boundary
No timing	`0.950 +/- 0.0000`	`0.900 +/- 0.0000`	hard suite mostly uses state streams

Noisy-suite multi-seed ablations:

Variant	Temporal aliasing accuracy	Paired separation	Main observed hit
Full	`1.000 +/- 0.0000`	`1.000 +/- 0.0000`	baseline
No timing	`0.650 +/- 0.0000`	`0.300 +/- 0.0000`	timing-only alias pairs
No background	`0.935 +/- 0.0078`	`0.871 +/- 0.0156`	tool/result state
No policy	`0.950 +/- 0.0000`	`0.900 +/- 0.0000`	approval boundaries
No visual	`0.944 +/- 0.0051`	`0.887 +/- 0.0102`	visual cue trigger

Destructive controls on noisy seed 7:

Control	TAA	Drop from base
Base	`1.0000`	n/a
Scramble `t_ms` within each trace	`0.4000`	`0.6000`
Mismatch critical non-transcript streams across examples	`0.3188`	`0.6813`
Permute held-out target labels	`0.1967`	`0.8033`

Visual Proof

The visualizer writes a self-contained HTML gallery:

uv run python -m nanoim.viz.render --run reports/example_trace.jsonl --out reports/timeline_gallery.html

It shows paired examples side by side with:

the identical flattened transcript;
micro-turn rows for user audio, text deltas, model state, visual events, background events, and policy events;
target action vs micro-turn prediction;
transcript-baseline prediction;
highlighted row-level errors;
timing annotations such as 0.2s-0.4s (200ms).

Open reports/timeline_gallery.html after running the command.

Checks

These are correctness checks, not a self-administered quality score. Each recomputes from the data and committed artifacts:

Check	Command	What it asserts
Scientific contract	`nanoim.verify_release`	transcript oracle bound stays `0.50`, paired separation `0.00`, multi-seed/Transformer clear the oracle delta, and the destructive controls cause large drops
Calibration	`nanoim.calibration`	noisy checkpoint accuracy `1.000`, ECE `0.0251`, Brier `0.0074`, NLL `0.0275`
Security hygiene	`nanoim.security_audit`	no secrets, no leaked paths, no oversized tracked files
Package artifacts	`nanoim.package_audit`	wheel/sdist metadata, console script, license, required modules
Hugging Face bundle	`nanoim.hf_export` + `nanoim.hf_validate`	deterministic bundle, card/split/checksum/replay validation with explicit noisy-checkpoint replay thresholds
Reproducibility	`nanoim.repro`	sha256 of every canonical source and evidence artifact

What is intentionally not here: there is no consented natural human-event corpus, and no independent external review. Those are real-world steps the repo does not pretend to have completed. Hosted CI runs on every released tag and should be confirmed green before public citation.

Human Event Bridge

The release includes a future-facing protocol for collecting consented human event logs:

uv run python -m nanoim.data.human_protocol \
  --events data/human_events.jsonl \
  --examples data/human_examples.jsonl \
  --out reports/human_event_protocol_audit.json

This protocol validates schema, monotonic timing, consent metadata, redaction status, obvious identifier leakage, and, when candidate examples exist, the same-transcript/different-action alias-pair contract. Passing without a corpus is not real-world validation; it only proves the repo is ready to collect and audit a real corpus without changing the claim boundary.

Hugging Face Release Kit

nanoIM includes local Hub-ready package templates for a dataset repo, a model repo, and a static Space:

uv run python -m nanoim.hf_export --out dist/huggingface --manifest-out reports/huggingface_release_manifest.json
uv run python -m nanoim.hf_validate --bundle dist/huggingface --manifest reports/huggingface_release_manifest.json --out reports/huggingface_offline_validation.json
uv run python -m nanoim.release_bundle --out dist/release/nanoim-0.1.5 --manifest-out reports/release_bundle_manifest.json

dist/huggingface/dataset contains the JSONL data and dataset card. dist/huggingface/model contains checkpoints, configs, scorecards, calibration evidence, and a model card. dist/huggingface/space contains a static proof-gallery front door. The offline validator checks cards, repo IDs, split files, checksums, manifests, alias contracts, model-card coverage for every exported checkpoint, and exported noisy-checkpoint replay before upload. The replay gate is near-perfect rather than exact (TAA >= 0.98, paired separation >= 0.97) and is applied to the committed release checkpoint. Full CI separately retrains models for reproduction coverage, then restores the committed release-packet inputs before building the publication bundle. The export is local-only and does not upload credentials or call the Hugging Face API.

See docs/huggingface_release.md and docs/release_engineering.md for the pre-upload checklist, .hfignore hygiene, SBOM/checksum bundle, and provenance workflow.

Full Reproduction

The full reproduction path is intentionally explicit:

uv run python -m nanoim.data.generate --suite mini --out data/mini.jsonl
uv lock --check
uv build
uv run python -m nanoim.package_audit --out reports/package_audit.json
uv run python -m nanoim.data.generate --suite hard --out data/hard.jsonl
uv run python -m nanoim.data.generate --suite noisy --seed 7 --out data/noisy.jsonl
uv run python -m nanoim.data.pilot --from-events data/pilot_events.jsonl --events data/pilot_events.jsonl --out data/pilot.jsonl
uv run python -m nanoim.train --config configs/tiny.yaml
uv run python -m nanoim.eval --checkpoint runs/tiny/best.pt --suite mini --out reports/scorecard.json
uv run python -m nanoim.eval --baseline transcript --suite mini --out reports/transcript_scorecard.json
uv run python -m nanoim.eval --baseline rule --suite mini --out reports/rule_scorecard.json
uv run python -m nanoim.train --config configs/hard.yaml
uv run python -m nanoim.eval --checkpoint runs/hard/full/seed_7/best.pt --suite hard --data data/hard.jsonl --out reports/hard_scorecard.json
uv run python -m nanoim.eval --baseline transcript --suite hard --data data/hard.jsonl --out reports/hard_transcript_scorecard.json
uv run python -m nanoim.eval --baseline transcript_nb --suite hard --data data/hard.jsonl --out reports/hard_transcript_nb_scorecard.json
uv run python -m nanoim.eval --baseline rule --suite hard --data data/hard.jsonl --out reports/hard_rule_scorecard.json
uv run python -m nanoim.experiments.sweep --suite hard --seeds 3,7,11 --ablations 'full;small_model;no_audio;no_visual;no_background;no_policy;no_timing' --epochs 45 --data-seed 7 --out reports/hard_sweep.json
uv run python -m nanoim.train --config configs/transformer_hard.yaml
uv run python -m nanoim.eval --checkpoint runs/transformer/hard/seed_7/best.pt --suite hard --data data/hard.jsonl --out reports/transformer_hard_scorecard.json
uv run python -m nanoim.controls --data data/hard.jsonl --checkpoint runs/transformer/hard/seed_7/best.pt --suite hard --out reports/transformer_hard_controls.json
uv run python -m nanoim.train --config configs/noisy.yaml
uv run python -m nanoim.eval --checkpoint runs/noisy/full/seed_7/best.pt --suite noisy --data data/noisy.jsonl --out reports/noisy_scorecard.json
uv run python -m nanoim.calibration --checkpoint runs/noisy/full/seed_7/best.pt --suite noisy --data data/noisy.jsonl --out reports/calibration_audit.json
uv run python -m nanoim.eval --baseline transcript --suite noisy --data data/noisy.jsonl --out reports/noisy_transcript_scorecard.json
uv run python -m nanoim.eval --baseline transcript_nb --suite noisy --data data/noisy.jsonl --out reports/noisy_transcript_nb_scorecard.json
uv run python -m nanoim.eval --baseline rule --suite noisy --data data/noisy.jsonl --out reports/noisy_rule_scorecard.json
uv run python -m nanoim.eval --baseline field_lookup --suite hard --data data/hard.jsonl --out reports/hard_field_lookup_scorecard.json
uv run python -m nanoim.eval --baseline field_lookup --suite noisy --data data/noisy.jsonl --out reports/noisy_field_lookup_scorecard.json
uv run python -m nanoim.eval --checkpoint runs/hard/full/seed_7/best.pt --suite pilot --data data/pilot.jsonl --eval-split all --out reports/pilot_scorecard.json
uv run python -m nanoim.eval --baseline rule --suite pilot --data data/pilot.jsonl --eval-split all --out reports/pilot_rule_scorecard.json
uv run python -m nanoim.experiments.sweep --suite noisy --seeds 3,7,11 --ablations 'full;no_timing;no_audio;no_visual;no_background;no_policy' --epochs 120 --data-seed 7 --out reports/noisy_sweep.json
uv run python -m nanoim.controls --data data/noisy.jsonl --checkpoint runs/noisy/full/seed_7/best.pt --suite noisy --out reports/adversarial_controls.json
uv run python -m nanoim.viz.render --run reports/example_trace.jsonl --out reports/timeline.html
uv run python -m nanoim.viz.render --run reports/example_trace.jsonl --out reports/timeline_gallery.html
uv run python -m nanoim.verify_release --out reports/release_verification.json
uv run python -m nanoim.data.human_protocol --events data/human_events.jsonl --examples data/human_examples.jsonl --out reports/human_event_protocol_audit.json
uv run python -m nanoim.hf_export --out dist/huggingface --manifest-out reports/huggingface_release_manifest.json
uv run python -m nanoim.hf_validate --bundle dist/huggingface --manifest reports/huggingface_release_manifest.json --out reports/huggingface_offline_validation.json
uv run python -m nanoim.security_audit --out reports/security_audit.json
uv run python -m nanoim.repro --out reports/reproducibility.json
uv run python -m nanoim.release_bundle --out dist/release/nanoim-0.1.5 --manifest-out reports/release_bundle_manifest.json

Repository Map

Path	Role
`nanoim/data/generate.py`	deterministic symbolic suite generator
`nanoim/data/pilot.py`	raw event-log to micro-turn importer
`nanoim/data/human_protocol.py`	future consented human-event protocol and alias-contract validator
`nanoim/features.py`	transcript and micro-turn feature boundaries
`nanoim/model.py`	tiny GRU and Transformer models
`nanoim/train.py`	from-scratch training loop
`nanoim/eval.py`	scorecard evaluator
`nanoim/calibration.py`	checkpoint confidence-calibration audit
`nanoim/controls.py`	destructive controls
`nanoim/viz/render.py`	timeline HTML renderer
`nanoim/baselines.py`	transcript majority/NB, rule harness, field-lookup table
`nanoim/verify_release.py`	asserts the scientific result contract
`nanoim/security_audit.py`	local secret/path/size hygiene scan
`nanoim/hf_export.py`	Hugging Face dataset/model/Space export
`nanoim/hf_validate.py`	offline Hugging Face bundle validator
`nanoim/package_audit.py`	wheel/sdist artifact auditor
`nanoim/release_bundle.py`	checksummed release bundle and SBOM
`nanoim/repro.py`	sha256 manifest of canonical artifacts
`SOURCE_MAP.md`	reviewer map from claims to files
`CLAIM_LEDGER.md`	public claims, evidence, non-claims

Scope Boundaries

The substrate is symbolic, not real audio/video.
The pilot importer uses scripted local events with measured timing; it proves an ingestion path, not natural audio/video competence.
The result shows representation value in a controlled lab, not broad conversational intelligence.
The hard suite primarily tests state-stream aliasing; the noisy suite adds counterbalanced timing-only cases and is the timing-ablation evidence.
The human-event protocol is ready, but no consented natural human-event corpus is included.
Public claims should say "controlled symbolic evidence for a representation bottleneck," not "solved interaction modeling."
The core release path is offline: no API keys, no hosted-model dependency. The only module that makes outbound HTTP calls is the optional nanoim.experiments.llm_baseline, which talks to a user-supplied OpenAI-compatible endpoint (Ollama at 127.0.0.1:11434 or LM Studio at 127.0.0.1:1234 by default) and is not invoked by any release gate.

Project files

RESULTS.md - one-page summary of every headline number with its source artifact and reproduction command.
FAQ.md - the questions a curious reader actually asks.
CHANGELOG.md - what has shipped in each release.
CONTRIBUTING.md - how to push back on the central claim or extend the evidence.
SECURITY.md - threat model and reporting process.
CODE_OF_CONDUCT.md - Contributor Covenant v2.1.
.github/ISSUE_TEMPLATE/ - bug-report and claim-challenge templates.

Citation

@software{lovell_nanoim,
  title     = {nanoIM: A Tiny Interaction-Model Lab for Temporal Aliasing},
  author    = {Jason Lovell},
  year      = {2026},
  version   = {0.1.6},
  doi       = {10.5281/zenodo.20492362},
  publisher = {Zenodo},
  license   = {MIT},
  url       = {https://github.com/jlov7/nanoIM}
}

See CITATION.cff for machine-readable citation metadata. Zenodo record: https://doi.org/10.5281/zenodo.20492362.

Launch Line

That is the whole artifact: a small, inspectable proof that interaction modeling needs representations that preserve time.

Personal Work Disclaimer

This repository is personal research and engineering work by Jason Lovell. It is not affiliated with, endorsed by, sponsored by, or representative of any current or former employer or organization. All views, code, claims, mistakes, and limitations are the author's own.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github		.github
assets/readme		assets/readme
configs		configs
data		data
docs		docs
hf		hf
nanoim		nanoim
notebooks		notebooks
paper		paper
reports		reports
runs		runs
tests		tests
.gitignore		.gitignore
.hfignore		.hfignore
AUTHOR_RESPONSE.md		AUTHOR_RESPONSE.md
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CLAIM_LEDGER.md		CLAIM_LEDGER.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DATASET_CARD.md		DATASET_CARD.md
FAQ.md		FAQ.md
HOSTILE_REVIEW.md		HOSTILE_REVIEW.md
LICENSE		LICENSE
MODEL_CARD.md		MODEL_CARD.md
README.md		README.md
RELATED_WORK_MAP.md		RELATED_WORK_MAP.md
RESULTS.md		RESULTS.md
SECURITY.md		SECURITY.md
SOURCE_MAP.md		SOURCE_MAP.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

nanoIM

What This Is

The Thesis

One-Minute Result

Transcript-only LLM baselines (the structural ceiling held by modern open-weights models)

Input-field importance (occlusion at inference time)

Generalization (leave-one-family-out)

Proof Spine

Quickstart

Start Here For Reviewers

Ten-Minute Command Path

Task Families

Example Alias Pair

Architecture

Models

Metrics

Anti-Cheat Contract

Ablations And Controls

Visual Proof

Checks

Human Event Bridge

Hugging Face Release Kit

Full Reproduction

Repository Map

Scope Boundaries

Project files

Citation

Launch Line

Personal Work Disclaimer

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages