Claude Mastery
#11 · Monday Edition
⌨️ CLI POWER MOVE
⚠️ Breaking🔧 Try It
The Prompt Cache Tax — v2.1.100+ Burns 20K Extra Tokens Per Turn

If you upgraded past v2.1.98, you're paying a 40% token tax on every API call — and you can't see it without a proxy.

GitHub issue #46917 documents the regression with network-level evidence: v2.1.100 sends *fewer* bytes than v2.1.98 but consumes ~20,000 more cache_creation_input_tokens. Same conversation, same project. The inflation is server-side and version-specific — likely routed by User-Agent header.

The claude-code-cache-fix project (95 stars) identified three root causes:

  1. Partial block scatter — Attachment blocks (skills, MCP instructions, deferred tools) drift from messages[0] to later messages on resume, breaking the byte-identical cache prefix
  2. Fingerprint instability — The cc_version hash mixes metadata into its salt, so block shifts bust the cache key even when user content is unchanged
  3. Non-deterministic tool ordering — Tool definitions arrive in different orders between turns, invalidating cached prefixes

For Max plan users, this means hitting the 5-hour usage cap significantly faster. It's not just billing — those 20K extra tokens enter the context window, potentially diluting your CLAUDE.md instructions with hidden system content.

v2.1.104 shipped today (Apr 13) as a CHANGELOG-only update. Whether it carries the same inflation is unconfirmed.

Workarounds:

bash
# Pin to v2.1.98 (simplest — but auto-update may overwrite)
npx claude-code@2.1.98

# Install the fetch interceptor (requires npm-installed Claude Code)
npm install -g claude-code-cache-fix
# Then launch with preload:
NODE_OPTIONS="--import claude-code-cache-fix" claude

The fix intercepts globalThis.fetch before /v1/messages calls to relocate scattered blocks, sort tool definitions alphabetically, and recompute the fingerprint from user content only. It also adds quota monitoring via ~/.claude/quota-status.json.

Important: The fix only works with npm-installed Claude Code, not the standalone binary.

🏗️ AGENT ARCHITECTURE
🔥 New🔧 Try It
Persistent Agent Sessions with Claudraband's Daemon Mode

claude -p is stateless by design — every invocation starts fresh. Claudraband (175 stars, 105 HN points) wraps the Claude Code TUI to give you persistent, resumable sessions with programmatic control.

The key architecture decision: it wraps rather than replaces the TUI. OAuth, auth tokens, and security boundaries pass through to the official Claude Code client unchanged. All sessions run through authenticated TUI instances backed by tmux.

Three deployment models:

  1. Local tmux sessions — Interactive work with persistence. Resume anytime:
bash
cband continue <session-id> 'what did you decide about the auth refactor?'
  1. Daemon mode — HTTP API for remote or headless control. Your scripts drive Claude sessions via raw HTTP endpoints. Launch with cband daemon
  1. ACP server — Alternative Client Protocol for editor integration. Connect from Toad, Zed, or other ACP-aware editors

The non-interactive workflow is where it gets interesting for agent builders. Answer pending prompts programmatically with --select flags. Self-interrogate prior sessions — Claude analyzing decisions from a previous run. List and manage live sessions across your machine.

bash
npm install -g @halfwhey/claudraband
cband "analyze this codebase"
# Later:
cband continue <id> "now refactor the auth module"

Caveat: Bundles Claude Code v2.1.96. Override with CLAUDRABAND_CLAUDE_PATH=/path/to/your/claude to use your installed version. Single contributor, MIT licensed, 30 commits on master — solid for personal/ad-hoc use, not a production replacement for the SDK.

🧭 OPERATOR THINKING
🔧 Try It🌿 Evergreen
The Compaction Decision — /clear, /compact, or Keep Going?

Your session hits 70% context. Claude's responses are getting vague, referencing files it read 40 messages ago as if they're still in front of it. Three options, one right answer per situation.

Starting unrelated work → /clear. No point carrying dead context. A fresh session with a focused prompt beats a bloated one carrying 50K tokens of irrelevant history. If you need info from the previous task, copy the key decisions into your prompt — cheaper than dragging the whole conversation.

Same task, context getting heavy → /compact now. Don't wait. The model summarizes your conversation, preserving key context while dropping intermediate reasoning. A 70,000-token session compresses to ~4,000. Do this proactively at 70% rather than reactively at 95%.

Auto-compaction triggered → you waited too long. At ~95% capacity, Claude auto-compacts. But by then you've spent several turns in degraded territory — vague answers, forgotten instructions, repeated questions about decisions you already made.

The force multiplier: custom compaction instructions in CLAUDE.md. Without them, compaction keeps what the model *thinks* matters. With them, it keeps what *you* know matters:

markdown
When compacting, always preserve:
- The full list of modified files and their purposes
- Test commands run and their pass/fail results
- Architectural decisions with their reasoning
- Current task status and next steps

The difference shows up in the first response after compaction — Claude either picks up where it left off or asks you to repeat everything.

🌐 ECOSYSTEM INTEL
🔥 New🔧 Try It
Revdiff — Structured Diff Review with Claude Code Plugin

Code review before your agent acts on changes. Revdiff (249 stars) is a TUI diff reviewer that outputs structured annotations to stdout — meaning other tools, including Claude, can consume them programmatically.

The Claude Code plugin is the highlight. It launches Revdiff as a terminal overlay, you annotate the diff inline, and those annotations feed back into Claude's context. Instead of describing problems in chat ("line 42 has a type issue"), you point directly at the code.

Two plugin modes:

  • /revdiff — Review uncommitted changes, staged changes, or branch diffs. Your annotations return to Claude for action
  • /revdiff-plan — Auto-opens when Claude exits plan mode. Annotate the plan before approving execution — a human gate between planning and implementation
bash
# macOS
brew install umputun/apps/revdiff

# Linux — grab the binary from GitHub releases
# Available for linux-amd64 and linux-arm64

Works with tmux, Zellij, kitty, wezterm, ghostty, iTerm2, and Emacs vterm — the plugin detects your multiplexer automatically.

Inside the TUI: vim-style / search with n/N navigation, intra-line word-diff highlighting (W toggle), blame gutter (B), hunk jumping, and review history that auto-saves to ~/.config/revdiff/history/. All-files mode (--all-files) lets you browse the entire repo, not just changed files.

🌐 ECOSYSTEM INTEL
🔧 Try It🌿 Evergreen
claude-notifications-go — 529-Star Smart Notification Plugin

Stop staring at your terminal waiting for Claude to finish. claude-notifications-go (529 stars) watches session activity via hooks and fires desktop notifications with click-to-focus — the notification switches you to the exact terminal pane where Claude needs you.

Six notification types:

TypeTriggerHook
Task Complete ✅Main task finishedStop
Review Complete 🔍Code review concludedStop (tool pattern match)
Question ❓Claude needs inputPreToolUse (AskUserQuestion)
Plan Ready 📋Plan awaiting approvalPreToolUse (ExitPlanMode)
Session Limit ⏱️Token limit hitStop
API Error 🔴Auth/rate/server issueStop

Click-to-focus works with tmux, zellij, WezTerm, and kitty sessions — it navigates to the correct pane, not just the terminal window. On macOS, it works with Ghostty, iTerm2, VS Code, Warp, and others via accessibility APIs.

Webhook support routes notifications to Slack, Discord, Telegram, Microsoft Teams, or ntfy.sh. Zero dependencies — pure Go binary, auto-downloaded and cached on first run.

bash
curl -fsSL https://raw.githubusercontent.com/777genius/claude-notifications-go/main/bin/bootstrap.sh | bash
# Restart Claude Code, then configure sounds/webhooks:
# /claude-notifications-go:settings
🔬 PRACTICE LAB
🔧 Try It
Diagnose and Fix Your Prompt Cache Health

Audit your prompt cache performance and identify whether the v2.1.100+ regression is costing you tokens.

Prerequisites: Any Claude Code installation for diagnosis. The fix step requires Claude Code installed via npm (npm install -g @anthropic-ai/claude-code), not the standalone binary.

What you'll do: Measure your cache hit rate, identify token waste, and optionally apply a fix.

Steps:

  1. Start a Claude session in a project you use regularly. Do some real work — file reads, edits, a few tool calls. After 5+ turns, check your cache health:
/cost

Look at cache_creation vs cache_read in the output. Healthy sessions show growing cache_read (tokens reused from prior turns) relative to cache_creation (tokens written fresh). If cache_creation dominates or stays high every turn, your cache is busting repeatedly.

  1. Check your version:
bash
claude --version

Versions 2.1.100 through 2.1.104 are known to carry the cache inflation regression. v2.1.98 is the last clean baseline.

  1. (npm users only) Install the cache fix:
bash
npm install -g claude-code-cache-fix

Launch Claude with the fix loaded:

bash
NODE_OPTIONS="--import claude-code-cache-fix" claude

For a permanent setup, add to your shell profile or create an alias:

bash
alias claude='NODE_OPTIONS="--import claude-code-cache-fix" claude'
  1. Run a comparable session with the fix. After 5+ turns, check /cost again and compare the cache_creation numbers between your two sessions.

Expected outcome: With the fix applied, cache_creation_input_tokens drops by ~20,000 per turn. Your effective context window grows by the same amount — meaning Claude has more room for your actual conversation and less wasted on cache misses.

Verify: The /cost output should show lower cache_creation and higher cache_read ratios. If you enabled the quota monitor, check ~/.claude/quota-status.json for burn-rate data and cache hit percentages.