Skip to content

Blog

Numbers, not vibes.

Benchmarks, methodology, and engineering notes — everything we claim on the landing page, with the receipts.

Essays
June 6, 2026 · 4 min read

Your coding agent spends 76% of your money re-reading your codebase

We measured where coding-agent tokens actually go. Three-quarters is reading and navigation — most of it rediscovery of things the agent already knew — and the standard fixes don't touch it. The data, the mechanism, and what would.

Honest answers
June 6, 2026 · 3 min read

When unerr is NOT the right tool

Solo on a small repo, light agent use, prose-heavy work, or a bill that's already fine — the cases where unerr won't pay for itself, stated plainly, with what to do instead.

Glossary
June 6, 2026 · 1 min read

What is token ops?

Token ops is the practice of managing the unit economics of LLM token consumption — measuring where tokens go, attributing them to workloads, and engineering them down without losing output quality.

Glossary
June 6, 2026 · 1 min read

What is memory integration?

Memory integration is wiring persistent memory into an AI agent's actual workflow — so remembered knowledge arrives in context automatically at the moment it's relevant, instead of sitting in a store the agent never consults.

Glossary
June 6, 2026 · 1 min read

What is LLM steering?

LLM steering is shaping model behavior at inference time — through context, constraints and feedback — so output lands inside your team's conventions without retraining anything.

Glossary
June 6, 2026 · 1 min read

What is LLM ops?

LLM ops is the operational discipline of running LLM-powered systems in production — cost, reliability, observability and governance for the model calls your software makes.

Glossary
June 6, 2026 · 1 min read

What is context ops?

Context ops is the discipline of controlling what enters an LLM's context window — because context size is the multiplier on every other cost lever, and agents fill windows with low-value re-reads by default.

Glossary
June 6, 2026 · 1 min read

What is memory ops?

Memory ops is the practice of persisting what AI agents learn across sessions — so a stateless model doesn't re-pay to rediscover the same codebase, conventions and decisions every morning.

Glossary
June 6, 2026 · 1 min read

What is coding ops?

Coding ops is the operational discipline for AI-assisted software development — governing the cost, consistency and visibility of the coding agents a team runs, the way DevOps governed how teams ship.

Glossary
June 6, 2026 · 1 min read

What is agent ops?

Agent ops is the operational layer for running AI agents in production and on developer machines — spend control, consistency enforcement, visibility into what agents actually did, and the shared context they work from.

Guides
June 6, 2026 · 5 min read

Token optimization for AI coding agents: the complete guide

76% of agent tokens go to reading code, costs grow quadratically with session length, and prompt caching breaks exactly when agents work hardest. The mechanics of coding-agent bills — and the levers ranked by what they actually save.

Guides
June 6, 2026 · 3 min read

How to reduce LLM API costs: the four meters that price every call

Every LLM API bill is roundtrips × model rate × cache state × context size. How each meter works, what the standard levers actually save, and why optimizing one meter at a time leaves most of the money on the table.

Guides
June 6, 2026 · 5 min read

How to reduce Claude Code costs: where the tokens actually go

Claude Code averages ~$6/dev/day — but heavy users burn 10× that, and most of it is the agent re-reading your codebase. The pricing model, the usage limits, and the fixes, ranked by impact.

Comparisons
June 6, 2026 · 3 min read

RAG vs code graph: how should a coding agent know your codebase?

Vector retrieval answers 'what looks similar'; a code graph answers 'what calls this.' Why embedding search is losing ground for code navigation, where it still wins, and what the token bill says about each.

Comparisons
June 6, 2026 · 3 min read

Prompt caching vs context trimming: which one actually cuts your agent bill?

Caching discounts what you re-send; trimming stops sending it. They optimize different parts of the prompt, both break in characteristic ways on coding agents, and neither fixes the read problem underneath.

Guides
June 6, 2026 · 3 min read

GitHub Copilot pricing explained: premium requests, AI Credits, and the agentic bill shock

Copilot moved to usage-based AI Credits on June 1, 2026 — and power users saw agentic bills jump 10–50×. What changed, which features are still unlimited, and how to keep agent workflows from draining a month of credits in a day.

Guides
June 6, 2026 · 4 min read

Cursor pricing explained: why bills explode and how to cut them

Credit pools, the 20% margin, Max Mode, and the 2025 controversy that started it all — how Cursor billing actually works in 2026, with worked math and the levers that reduce real bills.

Guides
June 6, 2026 · 4 min read

Why your coding agent forgets everything — and what actually gives it memory

LLMs are stateless: every session starts blank, and your agent pays to rediscover your codebase each time. The memory tooling landscape — rules files, auto-memory, memory MCP servers, vector RAG — with honest tradeoffs and the re-read tax in dollars.

Benchmarks
June 5, 2026 · 2 min read

How unerr cuts code-navigation tokens 86–90%

Head-to-head, fidelity-gated, real tokenizer: unerr −86%/−90% vs graphify −43%/−81% vs RTK −30%/−49% on the same repos — and how to reproduce it on yours.