You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Wave 20 β Model Composition + Serving Infrastructure
Module
Operation
Latency (Β΅s)
Notes
ModelMerger
slerp() 256Γ256 t=0.5
92.1
Great-circle interpolation
ModelMerger
merge() SLERP 2 keys
197.8
Dict-level merge orchestration
LoRAComposer
forward() 3 adapters weighted d=256
23.62
Weighted delta sum
LoRAComposer
forward() 3 adapters equal-weight
23.10
Default 1/N weights
CBScheduler
step_batch() 8 running
0.43
Batch promotion check
CBScheduler
submit() enqueue one request
1.08
Waiting queue insert
MatryoshkaEmb
embed() full=512 β 64
3.11
Truncate + L2-norm
MatryoshkaEmb
batch_embed() batch=32 β 256
18.64
Batched truncation
ANEProfiler
record_op() matmul float16
3.13
Heuristic ANE classify
ANEProfiler
summary() over ~40 ops
255.69
Aggregate metrics
SpecBenchRunner
run_task() 2 prompts gamma=4
1.5
Draft+verify loop
PPLTracker
record() seq=16 vocab=1k
117.7
NLL + PPL update
PPLTracker
rolling_ppl property
5.90
Geometric mean
GrammarCache
get_mask() cold (compute)
0.41
First-time mask build
GrammarCache
get_mask() warm (cache hit)
0.22
O(1) dict lookup
GrammarCache
transition() state β next
0.56
FSM edge traversal
QuantAwareCal
record() percentile (16, 32)
5.80
Stat accumulation
QuantAwareCal
compute_scales() 32 channels
4293.88
Percentile scale search
AdaptiveBudget
step() latency=180ms (over SLO)
1.84
PI controller update
AdaptiveBudget
step() latency=100ms (under SLO)
1.75
Budget relaxation
VisionTokenComp
compress() attention n=50 d=768
11.3
Attention-weight pruning
VisionTokenComp
compress() clustering n=20 d=768
1059.3
k-means centroid select
ToolSchemaCache
register() 1 schema (idempotent)
1.27
Hash lookup / no-op
ToolSchemaCache
get() by name (cache hit)
0.17
O(1) dict lookup
ToolSchemaCache
ToolRouter.route() validate+call
0.56
Validation + dispatch
DistilSpecCal
record_step() vocab=1k (1-D)
33.8
KL grad accumulation
DistilSpecCal
record_step() seq=8 vocab=1k (2-D)
126.9
Sequence-level distil
DistilSpecCal
compute_delta()
2.52
Mean gradient output
BatchEmbedder
pool() mean b=8 seq=32 d=256
36.18
Masked mean pooling
Reference: Paper-Reported Technique Improvements
Note: These are technique-level estimates derived from published papers.
End-to-end validation on Squish with a loaded model on Apple Silicon
has not yet been run for this wave.
See dev/benchmarks/bench_eoe.py for the real-hardware benchmark harness.