Date: April 3-4, 2026 Author: edvatar (toroleapinc) Repo: https://github.com/toroleapinc/encephagen AI-assisted: Code development assisted by Claude (Anthropic). All experimental design, analysis, and interpretation by the author. All code reviewed, validated, and tested.
Day 3 answered the central question: does human brain structure help learning?
- Deep research into C. elegans (OpenWorm) and Drosophila (FlyWire/Eon) brain simulations
- Implemented e-prop learning rule (Bellec et al. 2020) — replacing the crude Hebbian outer product
- Experiment 22: Connectome vs random with e-prop — structure helps conditioning (p=0.011)
- Experiment 23: Analysis of WHY — discovered channeling vs distributing trade-off
Key finding: The human connectome is a channeling architecture. It concentrates signals through specific pathways (VIS→AMYG, VIS→PFC), making survival-critical computations more efficient. Random wiring distributes signals more uniformly. Neither is universally better — each has advantages for different tasks.
Before building, conducted exhaustive research into the two most successful connectome simulation projects.
- Timeline: 2011-present (13+ years)
- Data: 302 neurons, 7,000 synapses (electron microscopy)
- Key finding: Raw connectome weights do NOT produce locomotion. Required proprioceptive feedback loops and parameter tuning.
- What failed: Direct weight transcription, uniform neuron models, spiking-only approaches
- What worked: Muscle-driven body simulation (Sibernetic), proprioceptive feedback chain, careful E/I balance tuning
- Timeline: 2014-2024
- Data: 139,000 neurons, 54M synapses at 8nm resolution (EM)
- Key finding: Visual motion detection emerged from connectivity alone (LIF model, one free parameter: global gain)
- Embodiment: Pre-trained walking controllers, brain provides high-level commands via descending neurons
- Critical gap: No learning — all weights fixed from connectome
- Our data is 6 orders of magnitude coarser (96 macro-regions vs 139K identified neurons)
- We're working with a "roadmap" not a "circuit diagram"
- Both projects confirm: topology constrains but does not determine behavior
- Neither project attempted learning — we are the first to test whether macro-scale topology provides a learning advantage
Experiments 15-21 used "three-factor STDP" which was actually just Hebbian outer product:
# OLD (Exp 21): Not real STDP — no spike timing, no eligibility traces
cs_active = (cs_activity > 2).float()
us_active = (us_activity > 2).float()
dW = learning_rate * torch.outer(cs_active, us_active)Expert reviewers correctly flagged this: "This is Hebbian weight injection, not STDP. No temporal credit assignment."
Implemented Bellec et al. (2020) "A solution to the learning dilemma for recurrent networks of spiking neurons" (Nature Communications):
Three components:
- Eligibility trace per synapse:
e_ji = ψ_j × z̄_iwhere ψ is the surrogate gradient at the postsynaptic neuron and z̄ is the filtered presynaptic spike train - Surrogate gradient: Piecewise linear approximation of the Heaviside derivative:
ψ = γ × max(0, 1 - |v - v_thr| / v_thr) - Reward-modulated update:
ΔW = lr × reward_signal × e_snapshot
Key design: eligibility snapshot. During CS presentation, eligibility traces accumulate — tracking which synapses causally contributed to the response. At CS offset, we snapshot these traces. When reward arrives (US phase), it modulates the SNAPSHOT, not current eligibility. This implements temporal credit assignment: the reward "reaches back" to strengthen synapses that were active during the stimulus.
Problem 1: Exploding eligibility traces
- First run: e_max = 1,367, e_mean = 121
- Trial 1 modified ALL 1.1M synapses, destroyed the network
- Root cause:
z_bar = alpha * z_bar + zaccumulates without bound when alpha ≈ 0.995 (dt=0.1ms, tau_m=20ms) - Fix: Normalize with
(1-alpha)factor:z_bar = alpha * z_bar + (1-alpha) * z - After fix: e_max = 0.018, e_mean = 0.0017 ✓
Problem 2: Learning signal collapse
- After 5 trials, weight changes dropped to zero
- Root cause: reward baseline tracked reward too quickly (decay=0.95), so
reward - baseline ≈ 0 - Fix: Slower baseline (decay=0.99) + apply reward ONCE per trial using snapshot
Problem 3: Reward timing
- Initially applied reward at each step during US phase
- But by then, eligibility traces had been overwritten by US-phase activity
- Fix: Snapshot eligibility at CS offset, apply reward to snapshot when US arrives
Final working parameters:
- lr=0.1, tau_e=50ms, gamma=0.5, reward_decay=0.99
- ~1.1M synapses modified per trial, dW_max ≈ 0.0013 per trial
- Post-training CS response: +28.5% increase
Protocol: Same as Experiment 21, but with e-prop instead of Hebbian.
- 10 connectome brains vs 10 degree-preserving random brains
- Phase 1: Innate regional differentiation
- Phase 2: Classical conditioning (30 CS-US pairings with e-prop)
- Phase 3: Pattern discrimination (5 classes, after e-prop training)
- Phase 4: Working memory (PFC persistence)
Results:
| Metric | Connectome | Random | p-value | Winner |
|---|---|---|---|---|
| Regional differentiation (CV) | 0.745 | 0.692 | 0.0002* | Connectome |
| Conditioning speed (slope) | -0.000 | -0.000 | 0.473 | — |
| Conditioning strength (final) | 0.00018 | 0.00008 | 0.011* | Connectome |
| Pattern discrimination | 38% | 45% | 0.052 | — |
| Working memory | 129% | 131% | 0.427 | — |
Key finding: E-prop revealed a conditioning advantage (p=0.011, d=1.5) that Hebbian learning couldn't detect. The connectome's VIS→AMYG pathway enables more efficient association learning. This was invisible with the crude Hebbian rule in Exp 21.
Comparison with Experiment 21:
- Exp 21 (Hebbian): 1/5 significant (only regional differentiation)
- Exp 22 (E-prop): 2/5 significant (differentiation + conditioning strength)
- The learning rule matters — a better learning algorithm extracts more from the topology
Exp 22 showed random brains trending better at pattern discrimination (p=0.052). Investigated why.
Metrics measured (8 connectome + 8 random brains):
| Metric | Connectome | Random | p-value | Winner |
|---|---|---|---|---|
| Mean cosine distance | 0.0004 | 0.0004 | 0.083 | — |
| Trial consistency | 0.9995 | 0.9994 | 0.003 | Connectome |
| Fisher discriminability | 0.355 | 0.345 | 0.382 | — |
| Effective dimensionality | 1.75 | 1.74 | 0.879 | — |
| Activation entropy | 4.306 | 4.332 | 0.0002 | Random |
Key finding: The discrimination difference was likely noise. Response diversity (cosine distance, Fisher ratio, dimensionality) is NOT significantly different. What IS different:
- Connectome: higher trial consistency (p=0.003, d=2.2) — more reliable signal
- Random: higher activation entropy (p=0.0002, d=-9.0) — more evenly distributed activity
Interpretation: The connectome concentrates activity in specific regions through structured pathways. Random wiring distributes it more uniformly. Neither strategy is better for discrimination — the differences cancel out.
Experiments 21-23 together reveal a fundamental trade-off:
| Property | Connectome | Random |
|---|---|---|
| Regional organization | High (p=0.0002) | Low |
| Conditioning efficiency | High (p=0.011) | Low |
| Trial-to-trial reliability | High (p=0.003) | Low |
| Activity distribution | Concentrated | Uniform (p=0.0002) |
| Pattern discrimination | ~38% | ~45% (ns) |
| Working memory | ~129% | ~131% (ns) |
The connectome is an optimization, not a universal advantage. It channels signals through specific pathways that evolution selected for survival-critical computations (fear conditioning via VIS→AMYG). Random wiring distributes signals uniformly — no better, no worse for general tasks, but missing the specialized routing.
This parallels the C. elegans/Drosophila lesson: structure constrains but does not determine. The macro-scale connectome is a roadmap that makes certain computations more efficient, not a blueprint that makes all computations possible.
EpropLearner: GPU-native e-prop with eligibility traces (~1.3M traces, ~5MB VRAM)EpropParams: Learning rate, surrogate gradient dampening, eligibility filter tau, reward decaysnapshot_eligibility(): Capture CS-phase eligibility for delayed rewardapply_reward(): Reward-modulated weight update using snapshotapply_supervised(): Supervised variant with random feedback alignment (not yet used)
enable_learning(params): Initialize e-prop learner on the brain's sparse weightsapply_reward(spikes, reward): Apply reward and rebuild sparse weight matrix- Eligibility trace update happens inside
step()at pre-spike voltage (before reset)
- Conditioning slope is not significant — both brains learn at similar speeds, the connectome just reaches a slightly higher final response
- Pattern discrimination shows no connectome advantage — and random actually trends better
- Working memory is topology-independent — NMDA dynamics dominate regardless of wiring
- Macro-scale data (96 regions) — 6 orders of magnitude coarser than FlyWire. Many structural advantages may only appear at cellular resolution.
- Small neurons/region (200) — limited representational capacity per region
- Degree-preserving rewiring — preserves degree distribution, so any advantage from degree heterogeneity is invisible. A more aggressive null model (Erdős-Rényi) might show larger differences.
- E-prop reward signal is global — real dopamine has region-specific effects. A more biologically detailed reward system might reveal larger topology-dependent effects.
- No conduction delays — the connectome has distance-dependent delays (up to 20ms) that we're not modeling. Delays create temporal coding opportunities that topology could exploit.
- "The conditioning result (p=0.011) is interesting but the effect size is small in absolute terms (0.00018 vs 0.00008)"
- "Only 2/5 metrics are significant — this is partial evidence at best"
- "Need to test with multiple null models, not just degree-preserving rewiring"
- "Conduction delays should be added — temporal coding may be where structure really matters"
- 先天 (innate structure): Provides channeling — efficient routing for specific computations
- 后天 (learned calibration): E-prop fills in the synaptic weights that macro-scale dMRI can't resolve
- The interaction: Structure × learning > either alone. The conditioning advantage only appears with a proper learning rule (e-prop), not with crude Hebbian injection.
| Project | Connectome resolution | Learning | Structure advantage |
|---|---|---|---|
| OpenWorm | Synaptic (302 neurons) | None | Locomotion emerges with tuning |
| FlyWire/Eon | Synaptic (139K neurons) | None | Visual motion detection emerges |
| Encephagen | Macro (96 regions) | E-prop | Conditioning advantage (p=0.011) |
We are the first to demonstrate a learning advantage from macro-scale connectome topology.
- Add conduction delays (distance-dependent, from tract lengths in TVB96) — temporal coding may be where topology provides the largest advantage
- Test with Erdős-Rényi null model — more aggressive control
- Scale to 500+ neurons/region for richer within-region dynamics
- Implement attractor dynamics in PFC (true working memory, not just NMDA decay)
- Sequential learning tasks (where temporal credit assignment matters most)
- Multi-sensory integration (auditory + visual → associative learning)
Can a macro-scale connectome provide enough structural advantage to justify building brain-structured AI, vs just using random topology? Current evidence: partial. The conditioning advantage is real but small. Need more tasks and richer dynamics to fully answer this.
| File | Change |
|---|---|
src/encephagen/learning/eprop.py |
NEW — E-prop learning rule |
src/encephagen/learning/__init__.py |
Export EpropLearner, EpropParams |
src/encephagen/gpu/spiking_brain_gpu.py |
Added enable_learning(), apply_reward() |
experiments/22_eprop_connectome_vs_random.py |
NEW — Definitive experiment with e-prop |
experiments/23_discrimination_analysis.py |
NEW — Response diversity analysis |
README.md |
Updated experiments table, learning description |
results/exp22_eprop_connectome_vs_random/results.json |
Raw data |
results/exp23_discrimination_analysis/results.json |
Raw data |
Total runtime: ~3.5 hours (Exp 22: ~2h, Exp 23: ~1h, debugging: ~0.5h) GPU: RTX 5070 (12GB VRAM), peak VRAM usage: ~2GB