v1.0.2
Patent-Pending Framework
Research

Every token has a price.
Most of them aren't worth it.

The Agent Efficiency Limit is a formal framework — not a heuristic. This page covers the core formula, provider comparisons, empirical data, and the honest limits of the AEL method.

⚠️ Read this first. Savings figures on this page comes from parametric simulations under idealized conditions — linear context growth, perfect δ classification, constant output tokens, no cache TTL events. Real deployments typically see 2–4× improvement. The extreme multipliers represent theoretical ceiling from lengthy sessions and is not a guaranteed outcome for any session or every agent. We're transparent about this because the math is strong enough to stand without exaggeration.

The spawn condition, derived from first principles.

AEL defines a precise, provable threshold for when spawning a fresh agent session costs less than continuing the current one.

Spawn Condition
// Spawn when savings exceed spawn cost
Spawn when: V_shed > S_real

// V_shed: cost of carrying irrelevant context
V_shed = δ=0_tokens × k_cached

// S_real: true spawn overhead
S_real = S_fixed + S_variable
S_fixed = startup_tokens × k_write
S_variable = c_lean_tokens × k_uncached

// Breakeven: carry if referenced >1/10 turns
ratio = k_cached / k_uncached = 0.10

// N_breakeven: turns until spawn pays off
N_breakeven = S_real / V_shed

δ (delta) is the relevance flag for each context component. δ=1 means the block is still needed. δ=0 means it's dead weight — paid for every turn, contributing nothing.

V_shed is the absolute minimum you save per turn if you spawn right now. As irrelevant content accumulates, V_shed grows — and eventually crosses the threshold you're unwilling or unable to keep paying. Every tool call and API call in a turn multiplies that waste, so a busy turn can compound your irrelevant context cost fast.

S_real is what a spawn actually costs — writing the startup files to cache, plus the successor reading the handoff document cold. It's small, but it's real, and it must be overcome before spawning is worth it.

The spawn threshold is yours to set. Depending on the work being done and how critical continuity is at that moment, users dial the threshold up or down — tighter for long autonomous runs, looser for short interactive tasks where a reset costs nothing. Perfect for calibrating your agents to their most efficient limit.

The breakeven ratio across flagship models.

Most major providers price cached reads at 10% of uncached reads. Budget and fast-tier models may differ — but the spawn math still applies.

Provider k_cached k_uncached Breakeven Ratio S_real
Grok 4 (xAI) $0.20/1M $2.00/1M 0.10 ~$0.059
Anthropic Claude Sonnet 4.6 $0.30/1M $3.00/1M 0.10 ~$0.088
OpenAI GPT-5.4 $0.25/1M $2.50/1M 0.10 ~$0.004 (no write cost)
Google Gemini 2.5 Pro * $0.125/1M $1.25/1M 0.10 ~$0.102
Grok 4.1 Fast (xAI) $0.05/1M $0.20/1M 0.25 ~$0.006

* Gemini support not yet available in AEL products.

Key insight: OpenAI charges zero for cache writes. This means spawn overhead is nearly zero — AEL fires more aggressively and delivers more value per dollar on OpenAI than on any other provider. If you're running GPT-based agents, AEL is the highest-leverage optimization available.

How much can AEL save?

Parametric simulation across providers, session lengths, and budgets. All figures assume idealized conditions — see the caveat above.

Peak Savings by Session Length (at minimum savings of $0.02/turn threshold)

Provider 50 turns 100 turns 500 turns
Grok 4 (xAI) 68.9% 83.5% 96.6%
Anthropic Claude Sonnet 4.6 66.3% 81.9% 96.2%
OpenAI GPT-5.4 84.6% 91.9% 98.3%
Google Gemini 2.5 Pro * 48.6% 71.0% 93.6%
Grok 4.1 Fast (xAI) 81.6% 90.5% 98.1%

Turns Per Dollar — $10 Budget

Provider Without AEL With AEL Gain
Anthropic Claude Sonnet 4.6 79 turns 346 turns +338%
OpenAI GPT-5.4 86 turns 557 turns +548%
Google Gemini 2.5 Pro 123 turns 478 turns +289%
⚠️ The extreme multipliers occur because the simulation allows AEL to spawn as aggressively as the math permits. Spawn latency, imperfect δ classification, and C_lean quality all reduce real-world gains. Production deployments typically see 2–4× improvement.

A real session. Real numbers.

April 3, 2026 — a normal productive session of UI development, code edits, and analysis. Not a stress test.

$5.10
Total session cost
66%
Context overhead ($3.37)
34%
Actual work ($1.73)
~49
Turns in session
$0.13
Peak turn drag (6 tool calls)
What this means: For every $1 of useful work done, $1.94 was spent re-reading context that had already been processed. One well-timed spawn would have reset the cost curve and recovered a significant portion of that overhead.

What the model doesn't cover.

We built this framework to be used, not just cited. Here's what you need to know before applying it to your workload.

Cache TTL events

Anthropic's 5-minute TTL means idle sessions re-pay write costs. Human-in-the-loop workflows are disproportionately affected. Not modeled in current simulations.

Contextual Handoff Quality

The value of a spawn depends on how well the handoff document is written. A lean, precise C_lean keeps successor startup cost low and accuracy high — a bloated or incomplete one raises S_real and risks losing critical state.

Spawn latency

Seconds of dead time per spawn. Negligible for cost calculations but real for user experience in interactive deployments.

Not All Caches Are Created Equal

Provider caching architectures vary significantly — some are automatic, some require explicit setup, some charge by the hour. AEL's math holds regardless, but the implementation path depends on how your provider of choice handles cache under the hood.

Output token variation

Agents producing long outputs per turn have higher baseline costs, which affects savings percentages. The formula still holds — the threshold just shifts.

Constant context deployments

If context never changes between turns, nothing is ever δ=0 and V_shed never exceeds S_real. AEL delivers zero value in constant-context setups.

Provider Data

Spawn cost and savings by provider.

All charts generated from parametric simulation at 10k tokens/turn. Real deployments vary.

Savings % vs Threshold — All Flagship Providers

50 turns

50 turns

100 turns

100 turns

500 turns

500 turns

Per-Provider Detail — Flagship Models

Grok 4 (xAI)

Grok 4

Claude Sonnet 4.6 (Anthropic)

Anthropic

GPT-5.4 (OpenAI)

OpenAI

Gemini 2.5 Pro (Google)

Gemini
Budget / Mini Models

The same math, at a fraction of the cost.

Grok 4.1 Fast is the cheapest model tested and carries a 4:1 cache ratio (0.25 breakeven) instead of the usual 10:1 — meaning the spawn threshold shifts but AEL still applies. Smaller absolute savings, but near-total overhead elimination at longer sessions. Comparable mini models from each provider follow the same curve shapes.

Grok 4.1 Fast — most efficient, lowest absolute savings

Grok Fast

Claude Haiku 4.5 (Anthropic)

Haiku

GPT-5.4 nano (OpenAI)

GPT nano

Gemini 2.5 Flash (Google)

Gemini Flash
S_real Sensitivity

How spawn overhead changes the picture.

Three S_real scenarios: minimal ($0.01), our agent ($0.086), and heavy ($0.50). Each chart shows all 9 scenario combinations across session length and tokens/turn.

S_real = $0.01 — minimal agent

S_real minimal

S_real = $0.086 — our agent

S_real our agent

S_real = $0.50 — heavy agent

S_real heavy