Threvo

The Problem

The cost of training AI
is exploding.
The waste inside every run
is unsustainable.

$500M

per frontier run · 2025

Source: Epoch AI, 2025

$350B

AI compute spend · 2025

Source: IDC, 2025

3×

electricity demand · 2030

Source: IEA, 2026

Frontier model training crossed $500M per run in 2025. By 2027, $1B to $3B. The largest technology companies spent $350 billion on AI compute infrastructure in 2025 alone.

AI data center electricity consumption surged 50% in 2025 and is projected to triple by 2030, approaching the entire electricity consumption of Japan.

Inside every run, from $1M fine-tunes to $500M pretraining jobs, a significant portion of compute is spent on operations that contribute zero to model intelligence, burning physical cluster power unnecessarily. A large and growing fraction of FFN neurons become computationally inactive as models train, yet every optimizer continues computing them. Baked into every training pipeline on earth. With every optimizer. At every scale.

The field has known.
Nobody fixed it during training.

Until now.

The Opportunity

The bigger the model,
the bigger the waste.
The bigger the waste,
the bigger the saving.

7B run $50K – $500K

70B run $1.2M – $6M

Frontier 2026 $200M – $500M

Frontier 2027 $1B – $3B

Sources: Epoch AI, 2025 · Galileo AI, 2025

Redundant compute is not a fixed cost. It compounds with model size, with cluster scale, with every generation of infrastructure the industry deploys.

The organizations operating at the frontier of AI are not facing an optimization problem. They are facing a structural one. Every training run, regardless of architecture, dataset, or hardware, carries the same inefficiency at its foundation.

At 7B scale, early benchmark data suggests SRED reduces dead-zone compute by [X]%, translating to [$X] in direct energy savings per run. At frontier scale, that number compounds by orders of magnitude.

Full results publishing Q3 2026

How It Works

The optimizer built for how models actually train.

AdamW

Standard optimizers compute everything.

Every neuron. Every layer. Every forward pass. As training progresses, a growing fraction of FFN neurons become computationally inactive, reaching the majority of FFN capacity at scale. AdamW computes them every step regardless.

Source: Voita et al., "Neurons in Large Language Models: Dead, N-gram, Positional" — ACL 2024

SRED

Detects computationally inactive FFN neurons. Skips them. Every step.

Each training step, SRED eliminates inactive FFN neurons. The rest of the model trains unaffected.

Inactive neuron count grows throughout training as the model specializes, and compounds further with scale. SRED's energy savings grow with both.

Optimizer level. No pipeline changes. No quality loss.

LLaMA Mistral GPT Falcon T5 Qwen

Works across every major transformer architecture.

Integration

Drop in. Nothing else changes.

SRED is a drop-in replacement for AdamW. Your training loop, your model, your pipeline — untouched.

train.py


1	# Setup - unchanged
2	model = FSDP(LlamaForCausalLM(config), sharding_strategy=ShardingStrategy.FULL_SHARD)

-	optimizer = torch.optim.AdamW(model.parameters(), lr=3e-4, weight_decay=0.1)
+	optimizer = SREDOptimizer(model.parameters(), lr=3e-4, weight_decay=0.1, model=model) ← only change

3	scheduler = get_cosine_schedule_with_warmup(
	optimizer,
	num_warmup_steps=int(0.01 * total_steps),
	num_training_steps=total_steps
	)

4	# Training loop - unchanged
5	for step, batch in enumerate(train_loader):
6	with torch.autocast("cuda", dtype=torch.bfloat16):
7	loss = model(**batch).loss / grad_accum_steps
8	loss.backward()
9	if (step + 1) % grad_accum_steps == 0:
10	torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
11	optimizer.step()
12	scheduler.step()
13	optimizer.zero_grad()

Training loop unchanged · No model modifications · No hyperparameter changes · FSDP compatible

If it runs on AdamW today, it runs on SRED tomorrow.

Early Access

Start eliminating wasted compute from day one.

For ML teams running serious training workloads. Limited early access open now.

We review every request personally. Compute grant applications welcome.

About

Dev Shorya

Founder & Director, Threvo Labs

Dev Shorya builds the algebraic-efficiency layer for AI training and inference. At Threvo Labs he leads development of SRED, a training optimizer that eliminates dead-zone computation in feed-forward layers, and DeltaCert, an open-source certification framework for LLM serving changes.

His work is grounded in research on operator algebras. His 2026 paper A Commutator Distance on Nets of von Neumann Subalgebras develops the commutator-distance framework whose theorems underpin both systems.

SRED Training optimizer · threvo.co

DeltaCert Open-source · PyPI

Research · 2026 DOI: 10.5281/zenodo.21279097

Threvo Labs Private Limited CIN: U26103UP2025PTC231403

dev@threvo.co · ORCID · LinkedIn