v1.2.2
Patent-Pending Framework — U.S. Provisional #64/019,334 · #64/040,347 and others
Research & Methods

Your large tasks don’t have to be one long session.
They’re better as a series of short ones.

Most AI cost waste comes from treating every task as a single continuous session. This page explains why that’s expensive, how to identify break points where resetting makes sense, and the math behind knowing exactly when the cost of continuing exceeds the cost of starting fresh.

You can’t always reset when the math says to. Here’s how to work with that.

SpawnPoint Dashboard tells you the economically optimal reset moment. But most real tasks aren’t infinitely interruptible — and that’s fine. The goal is to reset at the next convenient break point, armed with the knowledge of what waiting is costing you.

What counts as a break point?

Any moment where your agent has finished a coherent unit of work and doesn’t need active context to proceed:

  • A task is completed and the results are saved
  • You’re switching to a different type of work
  • A long tool chain has finished and output is stored
  • You’re about to start a new phase of a project
  • The agent is about to load a large new file or do a fresh search
What to carry forward (the handoff)

A good reset doesn’t lose anything important. It discards the noise and carries the signal:

  • Current task state and what’s been completed
  • Key decisions made and why
  • File paths, credentials, and config that matter
  • What the next agent needs to know to pick up immediately
  • Nothing else — every extra token in the handoff raises reset cost
The compound benefit: When you split a large task across multiple focused sessions with clean handoffs, each session starts fresh with only what it needs. There’s no accumulated tool output from three phases ago. No completed sub-task logs still billing every turn. The context stays relevant throughout — which means the work quality holds up too, not just the cost.
For pipelines and automated workflows

If your agent runs repetitive or independent tasks — batch processing, code review, research sweeps, data extraction — context from one task rarely helps the next. Every task boundary is a natural reset point.

In this case, the AEL/MEL math becomes an automatic trigger: spawn at the mathematically optimal interval, carry a minimal handoff, and eliminate the context overhead buildup entirely. SpawnPoint Dashboard surfaces the N* optimal interval for your specific session parameters and provider pricing.

⚠️ Simulation note. Savings figures on this page come from parametric simulations under idealized conditions — linear context growth, perfect relevance classification, constant output tokens. Real deployments typically see 2–4× improvement over unmanaged sessions. The higher figures represent long sessions with aggressive optimization. We are transparent about this because the math is strong enough to stand without exaggeration.

Every turn, you pay for everything you’ve ever said.

LLM providers charge for every token in the context window on every request — not just the new ones. As a session grows, you accumulate irrelevant context: closed topics, completed tasks, tool outputs no one will reference again. You pay to re-read all of it, every turn.

What waste looks like

A real productive session — UI development, code edits, analysis. Not a stress test.

66%
Context overhead
34%
Actual work

For every $1 of useful work, $1.94 was spent re-reading context that had already been processed.

Why it compounds

Modern agent sessions make multiple API calls per user turn — tool calls, file reads, browser actions, code execution. Each one reprocesses the full context window. A session with 10 API calls per turn accumulates waste 10× faster than a simple chat session. This multiplier is the most important factor most cost models miss entirely.

Two thresholds. One curve. One decision.

The AEL/MEL framework answers two distinct questions that previous approaches conflated or ignored entirely.

Threshold 1
Agent Efficiency Limit
AEL — The Break-Even Point
The first moment where resetting a session costs less than continuing it for one more turn. Below AEL, resetting loses money. Above AEL, resetting is profitable — but not necessarily at its most profitable. AEL is a necessary condition, not a sufficient one.
Threshold 2
Maximum Efficiency Limit
MEL — The Optimal Point
The peak of the savings curve — the exact spawn timing that maximizes total savings over the session. Derived in closed form by minimizing total session cost as a function of spawn interval. MEL occurs after AEL. Past MEL, savings are still positive but declining with every additional turn.
The distinction that matters: Every prior approach to context management — compaction, compression, periodic summarization — is triggered by context size. AEL/MEL is triggered by context cost. Size is a proxy for cost. AEL/MEL measures cost directly, and fires at the economically optimal moment — which for typical agent sessions is far earlier than any size-based trigger would fire.

Three zones. Every session lives in one of them.

The relationship between spawn timing and total savings is a unimodal curve. The SpawnPoint Dashboard tells you which zone you’re in at every turn.

Zone 1
Before AEL
Resetting costs more than it saves. Overhead exceeds the waste you’d eliminate. Stay in session. Do not spawn.
Zone 2
AEL → MEL
Resetting is profitable. Savings are growing toward the peak. This is the window where spawning makes economic sense. MEL is approaching.
Zone 3
Past MEL
Resetting still saves money vs. never resetting. But savings are declining. Every turn past MEL is suboptimal drag. Your waste tolerance determines how far you drift.
The operator dial: Between MEL and the end of the session, operators set a personal waste tolerance threshold — the maximum waste per turn they will accept before being forced to act. The SpawnPoint Dashboard surfaces all three signals live: AEL entry, MEL peak, and your threshold crossing. You decide how tight or loose to run.

Derived, not tuned.

AEL/MEL is not a heuristic. Every threshold is derived from first principles — a closed-form result of minimizing total session cost as a function of session parameters and provider pricing. The core signals:

AEL/MEL Core Signals
// The waste cost: what irrelevant context is costing you per turn
V_shed = irrelevant_tokens × k_cached × api_calls_per_turn

// AEL: spawn when this condition is met
AEL condition: probable_savings/turn > S_real / turns_remaining

// MEL: the optimal spawn interval (derived, not configured)
N* = [closed-form derivation — patent pending]
MEL condition: probable_savings/turn ≥ threshold* × api_calls_per_turn

// S_real: what a session reset actually costs
S_real = startup_tokens × k_write + handoff_tokens × k_uncached

// The universal breakeven ratio
carry_threshold = k_cached / k_uncached = 0.10 (most providers)

The N* derivation and MEL threshold formula are covered under patent pending applications. The working paper is available to researchers and attorneys under NDA — see below.

Context compression is not free. And it’s not spawn.

Compression tools — Anthropic native compaction, Morph Compact, LLMLingua, periodic summarization — are widely used as the default approach to context management. The economics don’t favor them.

Compression cost at 200k tokens
~$0.64

Full context read at uncached rate + summary write. Compare to a lean session reset at ~$0.12–$0.25. Compression costs 2–5× more per event at typical context sizes.

After a compression event

Context = compressed history + system prompt. Even at 70% compression of a 200k session, the residual is ~60k tokens of history. Every turn after compression still pays drag on that residual.

After a session reset: context = baseline startup tokens only. Every turn starts from the clean floor. Spawn always produces a smaller post-event context than compression.

What compression cannot do: It cannot target irrelevance — it reduces everything proportionally. It cannot write a structured handoff. It degrades the fidelity of content it does retain. And it is triggered by context size, which means it fires thousands of turns after MEL would have fired for most agent sessions.

MEL vs. All Strategies — Anthropic Sonnet Simulation (500 turns)

Strategy Light Session Medium Session Heavy Session
No optimization (baseline) 0% 0% 0%
Provider default compaction (fires at 950k tokens) 22.5% 44.6% 60.8%
Aftermarket compression at 200k tokens 75.2% 84.7% 89.6%
Compression at MEL cadence (same timing, compress instead of spawn) 85.2% 92.6% 95.0%
MEL Spawn (N* optimal) ★ 90.3% 94.6% 96.4%

MEL wins every scenario. The compress-at-MEL-cadence row isolates the action from the timing — when compression fires at the same interval as MEL spawn, spawn still wins because it resets context to baseline while compression leaves a residual.

Provider-agnostic. 18 scenarios. 0 exceptions.

The AEL/MEL framework requires three pricing parameters from your provider — k_cached, k_uncached, k_write — and derives correct thresholds for any provider. We validated on Anthropic Claude Sonnet and OpenAI GPT-5.4 across all session type and length combinations.

Provider k_cached k_uncached k_write Breakeven ratio MEL wins?
Anthropic Claude Sonnet $0.30/1M $3.00/1M $3.75/1M 0.10 9/9 ★
OpenAI GPT-5.4 $0.25/1M $2.50/1M $15.00/1M 0.10 9/9 ★
GPT-5.4 note: GPT-5.4’s output token cost is 4× higher than Sonnet ($15 vs $3.75/1M). This makes session resets more expensive to initiate — but the MEL math self-adjusts, spacing spawns further apart at the new optimal interval. MEL still wins all 9 scenarios. The absolute dollar stakes are dramatically higher: an unoptimized heavy 500-turn session costs $1,917 on GPT-5.4 vs $83 on Sonnet. MEL reduces this to $102 — saving $1,815 per session.

MEL Savings vs. No Optimization — Both Providers

Session type Turns Sonnet (MEL) GPT-5.4 (MEL)
Light5036.8%12.3%
Light10060.6%40.5%
Light50090.3%84.8%
Medium5058.7%39.1%
Medium10076.3%65.2%
Medium50094.6%92.0%
Heavy5069.9%56.4%
Heavy10083.2%75.9%
Heavy50096.4%94.7%

Light/Medium/Heavy = avg API calls per turn of 2.0, 6.2, 10.0 respectively. Session length and call frequency are the primary drivers of savings magnitude.

What the model doesn’t cover.

We built this framework to be used, not just cited. Here’s what you need to know before applying it.

Cache TTL events

Anthropic’s 5-minute TTL means idle sessions re-pay write costs. Human-in-the-loop workflows are disproportionately affected. The dashboard tracks this but simulations assume continuous activity.

Handoff quality matters

The value of a session reset depends on how well the handoff document is written. A lean, precise handoff keeps successor startup cost low. A bloated or incomplete one raises reset cost and risks losing critical state.

Relevance classification

The framework assumes you can identify which context blocks are no longer needed. The dashboard provides operator controls for this. Misclassification in either direction reduces realized savings.

Spawn latency

Seconds of dead time per reset. Negligible for cost calculations but real for user experience in interactive deployments. Not modeled in savings figures.

Output token variation

Agents producing long outputs per turn have higher baseline costs, which affects savings percentages. The thresholds still hold — the absolute numbers shift.

Constant-context deployments

If context never changes between turns and everything remains relevant indefinitely, waste never accumulates and AEL/MEL delivers zero value. The framework targets sessions with evolving, partially-obsolete context.

Working Paper & Full Derivation

The complete AEL/MEL framework — including closed-form derivations, compression crossover analysis, multi-provider validation, and all simulation methodology — is documented in a working paper. Available to researchers, attorneys, and enterprise partners under NDA.

Request Access