If you upgraded past v2.1.98, you're paying a 40% token tax on every API call — and you can't see it without a proxy.
GitHub issue #46917 documents the regression with network-level evidence: v2.1.100 sends *fewer* bytes than v2.1.98 but consumes ~20,000 more cache_creation_input_tokens. Same conversation, same project. The inflation is server-side and version-specific — likely routed by User-Agent header.
The claude-code-cache-fix project (95 stars) identified three root causes:
- Partial block scatter — Attachment blocks (skills, MCP instructions, deferred tools) drift from
messages[0]to later messages on resume, breaking the byte-identical cache prefix - Fingerprint instability — The
cc_versionhash mixes metadata into its salt, so block shifts bust the cache key even when user content is unchanged - Non-deterministic tool ordering — Tool definitions arrive in different orders between turns, invalidating cached prefixes
For Max plan users, this means hitting the 5-hour usage cap significantly faster. It's not just billing — those 20K extra tokens enter the context window, potentially diluting your CLAUDE.md instructions with hidden system content.
v2.1.104 shipped today (Apr 13) as a CHANGELOG-only update. Whether it carries the same inflation is unconfirmed.
Workarounds:
npx claude-code@2.1.98
# Install the fetch interceptor (requires npm-installed Claude Code)
npm install -g claude-code-cache-fix
# Then launch with preload:
NODE_OPTIONS="--import claude-code-cache-fix" claude
The fix intercepts globalThis.fetch before /v1/messages calls to relocate scattered blocks, sort tool definitions alphabetically, and recompute the fingerprint from user content only. It also adds quota monitoring via ~/.claude/quota-status.json.
Important: The fix only works with npm-installed Claude Code, not the standalone binary.
claude -p is stateless by design — every invocation starts fresh. Claudraband (175 stars, 105 HN points) wraps the Claude Code TUI to give you persistent, resumable sessions with programmatic control.
The key architecture decision: it wraps rather than replaces the TUI. OAuth, auth tokens, and security boundaries pass through to the official Claude Code client unchanged. All sessions run through authenticated TUI instances backed by tmux.
Three deployment models:
- Local tmux sessions — Interactive work with persistence. Resume anytime:
- Daemon mode — HTTP API for remote or headless control. Your scripts drive Claude sessions via raw HTTP endpoints. Launch with
cband daemon
- ACP server — Alternative Client Protocol for editor integration. Connect from Toad, Zed, or other ACP-aware editors
The non-interactive workflow is where it gets interesting for agent builders. Answer pending prompts programmatically with --select flags. Self-interrogate prior sessions — Claude analyzing decisions from a previous run. List and manage live sessions across your machine.
cband "analyze this codebase"
# Later:
cband continue <id> "now refactor the auth module"
Caveat: Bundles Claude Code v2.1.96. Override with CLAUDRABAND_CLAUDE_PATH=/path/to/your/claude to use your installed version. Single contributor, MIT licensed, 30 commits on master — solid for personal/ad-hoc use, not a production replacement for the SDK.
Your session hits 70% context. Claude's responses are getting vague, referencing files it read 40 messages ago as if they're still in front of it. Three options, one right answer per situation.
Starting unrelated work → /clear. No point carrying dead context. A fresh session with a focused prompt beats a bloated one carrying 50K tokens of irrelevant history. If you need info from the previous task, copy the key decisions into your prompt — cheaper than dragging the whole conversation.
Same task, context getting heavy → /compact now. Don't wait. The model summarizes your conversation, preserving key context while dropping intermediate reasoning. A 70,000-token session compresses to ~4,000. Do this proactively at 70% rather than reactively at 95%.
Auto-compaction triggered → you waited too long. At ~95% capacity, Claude auto-compacts. But by then you've spent several turns in degraded territory — vague answers, forgotten instructions, repeated questions about decisions you already made.
The force multiplier: custom compaction instructions in CLAUDE.md. Without them, compaction keeps what the model *thinks* matters. With them, it keeps what *you* know matters:
- The full list of modified files and their purposes
- Test commands run and their pass/fail results
- Architectural decisions with their reasoning
- Current task status and next steps
The difference shows up in the first response after compaction — Claude either picks up where it left off or asks you to repeat everything.
Code review before your agent acts on changes. Revdiff (249 stars) is a TUI diff reviewer that outputs structured annotations to stdout — meaning other tools, including Claude, can consume them programmatically.
The Claude Code plugin is the highlight. It launches Revdiff as a terminal overlay, you annotate the diff inline, and those annotations feed back into Claude's context. Instead of describing problems in chat ("line 42 has a type issue"), you point directly at the code.
Two plugin modes:
/revdiff— Review uncommitted changes, staged changes, or branch diffs. Your annotations return to Claude for action/revdiff-plan— Auto-opens when Claude exits plan mode. Annotate the plan before approving execution — a human gate between planning and implementation
brew install umputun/apps/revdiff
# Linux — grab the binary from GitHub releases
# Available for linux-amd64 and linux-arm64
Works with tmux, Zellij, kitty, wezterm, ghostty, iTerm2, and Emacs vterm — the plugin detects your multiplexer automatically.
Inside the TUI: vim-style / search with n/N navigation, intra-line word-diff highlighting (W toggle), blame gutter (B), hunk jumping, and review history that auto-saves to ~/.config/revdiff/history/. All-files mode (--all-files) lets you browse the entire repo, not just changed files.
Stop staring at your terminal waiting for Claude to finish. claude-notifications-go (529 stars) watches session activity via hooks and fires desktop notifications with click-to-focus — the notification switches you to the exact terminal pane where Claude needs you.
Six notification types:
| Type | Trigger | Hook |
|---|---|---|
| Task Complete ✅ | Main task finished | Stop |
| Review Complete 🔍 | Code review concluded | Stop (tool pattern match) |
| Question ❓ | Claude needs input | PreToolUse (AskUserQuestion) |
| Plan Ready 📋 | Plan awaiting approval | PreToolUse (ExitPlanMode) |
| Session Limit ⏱️ | Token limit hit | Stop |
| API Error 🔴 | Auth/rate/server issue | Stop |
Click-to-focus works with tmux, zellij, WezTerm, and kitty sessions — it navigates to the correct pane, not just the terminal window. On macOS, it works with Ghostty, iTerm2, VS Code, Warp, and others via accessibility APIs.
Webhook support routes notifications to Slack, Discord, Telegram, Microsoft Teams, or ntfy.sh. Zero dependencies — pure Go binary, auto-downloaded and cached on first run.
# Restart Claude Code, then configure sounds/webhooks:
# /claude-notifications-go:settings
Audit your prompt cache performance and identify whether the v2.1.100+ regression is costing you tokens.
Prerequisites: Any Claude Code installation for diagnosis. The fix step requires Claude Code installed via npm (npm install -g @anthropic-ai/claude-code), not the standalone binary.
What you'll do: Measure your cache hit rate, identify token waste, and optionally apply a fix.
Steps:
- Start a Claude session in a project you use regularly. Do some real work — file reads, edits, a few tool calls. After 5+ turns, check your cache health:
Look at cache_creation vs cache_read in the output. Healthy sessions show growing cache_read (tokens reused from prior turns) relative to cache_creation (tokens written fresh). If cache_creation dominates or stays high every turn, your cache is busting repeatedly.
- Check your version:
Versions 2.1.100 through 2.1.104 are known to carry the cache inflation regression. v2.1.98 is the last clean baseline.
- (npm users only) Install the cache fix:
Launch Claude with the fix loaded:
For a permanent setup, add to your shell profile or create an alias:
- Run a comparable session with the fix. After 5+ turns, check
/costagain and compare thecache_creationnumbers between your two sessions.
Expected outcome: With the fix applied, cache_creation_input_tokens drops by ~20,000 per turn. Your effective context window grows by the same amount — meaning Claude has more room for your actual conversation and less wasted on cache misses.
Verify: The /cost output should show lower cache_creation and higher cache_read ratios. If you enabled the quota monitor, check ~/.claude/quota-status.json for burn-rate data and cache hit percentages.