You start an agent on a complex task. Then you leave. Walk the dog. Cook dinner. The agent keeps working — and when it needs to run rm -rf build/ or push to main, the approval request shows up on your phone.
That's Farmer. It's a standalone Node.js HTTP server that intercepts Claude Code's tool calls via the hook system and routes them to a web dashboard you can access from any device.
How it works. farmer connect --global writes four hooks into your settings.json:
PreToolUse→/hooks/permission(blocking — waits for your approve/deny)PostToolUse→/hooks/activity(non-blocking — logs results)Notification→/hooks/notification(non-blocking — questions, messages)Stop→/hooks/stop(graceful shutdown signal)
The hooks use a cat | curl ... || true pattern — piping the hook JSON payload to curl. The PreToolUse call blocks (up to 120 seconds) until you tap approve or deny on the dashboard. If Farmer isn't running, || true ensures Claude Code continues unblocked.
The deny-to-respond pattern. This is the clever part. When Claude calls AskUserQuestion, Farmer shows the question on your dashboard. You type an answer. Farmer responds with permissionDecision: "deny" and puts your answer in permissionDecisionReason. Claude reads the denial reason as your response. It hijacks the permission protocol to create a bidirectional communication channel — from your phone to a running agent session.
Three trust tiers, configurable per session:
| Tier | Behavior |
|---|---|
paranoid | Every tool call requires manual approval — including Read, Grep, Glob |
standard | Auto-approves read-only tools; blocks writes, deletes, and network calls |
autonomous | Auto-approves everything EXCEPT dangerous patterns: rm -rf, git push --force, sudo, piped installs |
Trust tier inheritance prevents the "paranoid reset" problem. When Claude Code spawns a subagent via claude -p, the new session inherits the trust level from the most recent active session. Without this, every headless invocation would default to paranoid and block immediately.
Stale server guard. If all dashboard SSE connections drop (you closed the tab, lost signal), Farmer auto-approves everything rather than letting Claude Code block indefinitely. Pragmatic design: agent liveness over security-by-default when the human is unavailable.
The entire server — HTTP, SSE, crypto, rate limiting, CSRF, audit logging — is built on Node.js built-ins. Zero npm dependencies. Single-file dashboard. npm install -g @grainulation/farmer and you're running.
What it is not. Not an MCP server. Not a plugin. It's a hook consumer — the same integration point you'd use for any external system that needs to participate in Claude Code's permission lifecycle. 129 commits, MIT licensed, actively developed. Two stars — brand new. The architecture is solid but it hasn't been battle-tested at scale.
You know how to use subagents. You know the frontmatter format, the Agent() tool, the five built-in types. But there's a deeper question: why do fork agents exist at all?
The answer, buried in Claude Code's source, is not architectural. It's economic. Fork agents exist to exploit Anthropic's prompt cache.
The cache exploitation mechanism. When Claude Code forks a subagent, the child inherits the parent's exact rendered system prompt bytes, exact tool array, and exact message history. Every byte is designed for identical prefixes. This is not a side effect — it's the primary design constraint.
Here's the math. A typical parent session has ~48,500 tokens of shared prefix (system prompt + tools + conversation). The first child pays full price for those tokens. But children 2, 3, 4, and 5? They hit a prefix cache. Anthropic charges 10% for cached input tokens. That's a 90% cost reduction on the shared prefix for every child after the first.
Child 2: 48,500 tokens × $0.30/1M = $0.015
Child 3: 48,500 tokens × $0.30/1M = $0.015
Child 4: 48,500 tokens × $0.30/1M = $0.015
Child 5: 48,500 tokens × $0.30/1M = $0.015
5 children: $0.206 vs $0.730 without caching — 72% savings
For Max plan users, the economics work differently — you don't pay per-token, but cache hits still reduce latency (cached prefixes process faster) and reduce load on Anthropic's infrastructure, which means fewer rate-limit hits.
The architectural compromises this creates. Every design decision in the fork agent system traces back to preserving byte-identical prefixes:
- The Agent tool stays in the child's tool pool — even though children are forbidden from spawning their own agents. Removing it would change the serialized tool array and bust the cache.
- Placeholder tool results are used instead of real ones. When the parent passes conversation context to a child, certain tool results are replaced with placeholders that are byte-identical across all children.
- The session date is memoized. If midnight passes during a long session, the date doesn't update — because changing the date string in the system prompt would invalidate the entire cached prefix. A stale date is cosmetic; a cache bust reprocesses the entire conversation.
- Sticky latch fields. Five boolean flags that, once set to
true, never revert tofalsefor the lifetime of the session. Same reason: flipping a flag would change the system prompt bytes.
The 15-step runAgent() lifecycle (from claude-code-from-source) reveals the full sequence: routing → model resolution → context preparation → permission isolation → tool pool assembly → system prompt rendering → hook registration → skill preloading → MCP initialization → execution → cleanup. Steps 4-6 are where cache alignment happens — permission isolation strips sensitive state, tool pool assembly preserves forbidden tools, and system prompt rendering produces byte-identical output.
Recursive fork prevention uses a dual guard: a fast querySource check (is this already a fork?) plus a fallback message history scan for the fork boilerplate tag. The belt-and-suspenders approach exists because a single guard caused infinite fork loops in production — one of five documented "death spiral" guards, each added because someone hit that failure mode.
What this means for your agent design. If you're building multi-agent systems with claude -p, the cache exploitation pattern applies to you. Design your orchestrator so that all child invocations share a common prefix: same system prompt, same tool configuration, same conversation preamble. Vary only the final user message. The cache TTL is 5 minutes — launch children within that window to maximize hits.
Two independent analyses — Anthropic's internal memory system (documented in claude-code-from-source) and the community-built Collabmem — arrived at the same architectural conclusion: flat files on disk, searched by an LLM, beat vector databases and embedding-based retrieval.
This is counterintuitive. RAG is the default architecture for AI memory. Embed everything, store vectors, retrieve by cosine similarity. But both systems chose plain markdown files. Here's why.
The derivability test. Claude Code's memory system uses a four-type taxonomy (user, feedback, project, reference) with a strict filter: if knowledge can be re-derived from the current codebase, it is excluded. Code patterns, architecture, file structure, recent changes — all excluded. The memory stores only what cannot be recovered by reading files or running git log.
This is a profound design choice. Most memory systems try to store everything and retrieve intelligently. Claude Code's approach inverts this: store almost nothing, but store the right things. The result is a memory that never becomes a stale parallel copy of information better sourced from its origin.
LLM-powered recall beats embeddings. Instead of vector similarity search, Claude Code uses a Sonnet side-query to read frontmatter manifests and select up to 5 relevant memories per turn. The recall runs as an async prefetch in parallel with the main model call.
Why not embeddings? One word: negation. The embedding for "do NOT mock databases in tests" is nearly identical to "mock databases in tests." Cosine similarity can't distinguish them. An LLM can. This isn't a theoretical concern — it was validated through evals (3/3 correct recall with LLM, 0/3 with embedding-based retrieval on negated instructions).
Collabmem's awareness-over-retrieval philosophy arrives at the same destination from a different starting point. Instead of treating memory as a search problem (store then retrieve), Collabmem keeps compact index tables always loaded in the context window. The indexes create continuous awareness — the model's attention mechanism naturally matches relevant topics before any explicit retrieval happens.
graph TD
A[New Information] --> B{Can it be derived
from codebase?}
B -->|Yes| C[Don't Store]
B -->|No| D{What type?}
D --> E[user: role, preferences]
D --> F[feedback: corrections, confirmations]
D --> G[project: decisions, deadlines]
D --> H[reference: external pointers]
E & F & G & H --> I[Markdown file + frontmatter]
I --> J[LLM reads manifest at recall time]
J --> K[Up to 5 memories selected per turn]
Two-tier context management. Collabmem divides memory into:
- Tier 1 (always loaded): Compact indexes and current state (~5,000 chars each). These live in the context window permanently via
@importdirectives in CLAUDE.md. - Tier 2 (searched on demand): Full notes, how-tos, domain knowledge. Grow without limit. Accessed via
index → grep → readwhen the model's attention identifies a relevant topic from Tier 1.
Episodic memory is append-only. Notes are never rewritten. Old notes about code that was later changed remain as history — the reasoning, the alternatives considered, the decisions made. When knowledge matures, it's extracted upward into the "world model" (current truth). When the world model gets too large, compressed knowledge flows back down as episodic notes. Nothing is ever deleted.
The sentinel token pattern. Collabmem uses three explicit words — readmem, updatemem, maintainmem — that trigger memory operations when they appear in your message. The methodology document (imported into CLAUDE.md) instructs the model to act on these with MUST-level priority. It's a creative alternative to MCP tools: no infrastructure, no API calls, just words that activate behavior through instruction following.
Practical takeaway. If you're designing memory for your own agents, both systems validate the same principles:
- Store only what can't be re-derived. Your codebase is the source of truth for code patterns. Memory is for decisions, preferences, and external references.
- Use LLMs for recall, not embeddings. Especially when your instructions contain negations or nuanced conditions.
- Keep indexes in context, details on disk. Awareness is cheap (compact indexes). Full retrieval is expensive (loading entire files). Let the model decide when to drill down.
- Separate "what happened" from "what's true now." Episodic memory (append-only history) and world model (maintained current state) serve different purposes and have different staleness characteristics.
When Claude Code asks "Allow Bash: rm -rf build/?" — that question didn't come from Claude. The model never decided whether to ask. The permission system is built on purely deterministic code. No probabilistic model judgment anywhere.
This is a deliberate architectural decision documented in a detailed analysis by Raed: Anthropic trusts Claude to write code but does not trust Claude to decide whether it should be allowed to run that code. The permission model is classical RBAC, not AI-driven.
The 5-step permission decision pipeline executes sequentially before any tool runs:
graph TD
A[Tool Call Requested] --> B[1. Tool-level deny/ask rules
glob pattern matching]
B --> C[2. Tool checkPermissions
per-tool deterministic code]
C --> D[3. Bypass-immune safety
hardcoded restrictions]
D --> E[4. Bypass mode evaluation
only if nothing above triggered]
E --> F[5. Tool-level allow rules
with default fallback to prompt]
F --> G{Decision}
G -->|allow| H[Execute Tool]
G -->|deny| I[Block Tool]
G -->|ask| J[Prompt User]
The ordering is the design. Denials and safety checks run before any bypass can take effect. The bypass literally cannot fire until all immune checks have had their say.
The Bash tool's 6-stage pipeline is the most complex in the system:
- Compound command splitting —
npm install && rm -rf /is split into two separate subcommands, each validated independently. Chaining with&&,;, or||doesn't bypass checks.
- Safe wrapper stripping — removes command wrappers before validation.
- Rule matching per subcommand — your configured deny/ask/allow rules checked against each piece.
- 23 independent security validators — covering command substitution patterns, zsh-specific dangerous builtins, IFS injection, brace expansion, Unicode whitespace tricks, and more.
- Path constraint checks — validates target paths against allowed/denied patterns.
- Sed/mode validation — special handling for
sedcommands and file permission changes.
Multi-representation command analysis. The bash tool pre-computes four representations of every command to prevent quote-based evasion:
| Representation | Example |
|---|---|
| Raw unchanged | bash -c "rm '$target'" |
| Double-quotes stripped | bash -c rm '$target' |
| Fully unquoted | bash -c rm $target |
| Quote-chars preserved | bash -c " ' '" |
Each representation runs through the full validation pipeline. If any representation triggers a denial, the command is blocked. This is why v2.1.98 patched six bash permission bypasses — earlier versions had gaps in the representation coverage.
Bypass-immune paths. These are hardcoded restrictions that cannot be overridden by any configuration, mode, or setting:
.git/directory writes.claude/directory writes.vscode/directory writes- Shell configuration file modifications (
.bashrc,.zshrc, etc.) - Tools requiring user interaction
No matter how permissive your configuration, these paths always prompt. They are architectural walls, not configurable signs.
The auto mode exception — TRANSCRIPT_CLASSIFIER. There is exactly one place where an LLM enters the permission pipeline:
- Gated behind the
TRANSCRIPT_CLASSIFIERfeature flag - Activates only as a fallback when deterministic rules reach an "ask" state and the user has opted into auto mode
- The classifier reviews the conversation transcript to decide allow/deny
- Every error path fails closed: API errors → deny. 3 consecutive denials → revert to human prompting. 20 total denials → reset counter, prompt human. Context window overflow → prompt human. The system never defaults to "allow" on error.
What this means for operators. Understanding that permissions are deterministic changes how you configure them. You're not "training" Claude to be more permissive — you're writing rules in a pattern-matching engine. The permissions.deny field in settings.json overrides everything, including hook decisions (fixed in v2.1.101). The permissions.allow field only fires after all safety checks pass. If you want to understand why a specific tool call gets blocked, trace it through the 5-step pipeline — the answer is always deterministic.
In March 2026, Anthropic accidentally published Claude Code's source as unstripped source maps in the npm package. 512,000 lines of TypeScript across ~1,900 files became publicly readable. The source maps have since been stripped, but alejandrobalderas/claude-code-from-source captured the architecture before the window closed.
The result: an 18-chapter, book-length clean-room analysis. 1,076 stars. No actual source code reproduced — all examples are original pseudocode with different variable names. What it documents is the architecture, patterns, and design decisions behind Claude Code's internals.
What's in the book. Seven parts covering:
- Foundations (Ch 1-4): The 6-abstraction architecture (Query Loop, Tool System, Tasks, State, Memory, Hooks). Two-tier state (mutable singleton for infrastructure, reactive store for UI). Bootstrap pipeline with 50+ profiling checkpoints.
- The Core Loop (Ch 5-7): The ~1,730-line
async function*generator that is the single code path for ALL interactions — REPL, SDK, sub-agents, headless. Four-layer context compression. 14-step tool execution pipeline. - Multi-Agent (Ch 8-10): Fork agent cache exploitation (covered above). 15-step
runAgent()lifecycle. Six built-in agent types. - Persistence (Ch 11-12): File-based memory over RAG (covered above). KAIROS mode. Two-phase skill loading.
- Interface (Ch 13-14): Custom terminal UI with packed typed arrays, cell-level diffing at 60fps, React Compiler for auto-memoization.
- Connectivity (Ch 15-16): MCP with 8 transport types. Full OAuth 2.0 + PKCE. Content-signature deduplication.
- Performance (Ch 17-18): The 8K output slot optimization. 26-bit bitmap pre-filter for file search. Sticky latch fields.
Five novel insights you won't find elsewhere:
- The withholding pattern. Recoverable API errors are suppressed from the message stream because SDK consumers (Cowork, desktop app) disconnect on any error message. Recovery happens silently. Errors surface only when all recovery paths fail. This explains why Claude Code sometimes pauses briefly during a session — it's retrying an API call behind the scenes without telling you.
- The 8K output optimization. Production data shows the p99 output length is 4,911 tokens, but the default SDK reservation is 32K-64K. Claude Code caps at 8K and escalates on truncation. This alone recovers 12-28% of the usable context window — free context space that most API users are wasting.
- Speculative tool execution. Tools start executing while the model is still streaming its response. Child abort controllers with sibling error cascades (only Bash errors cascade to siblings). This is why tool results appear almost instantly after the model finishes — the tool was already running.
- Concurrent-safe tool partitioning.
isConcurrencySafe(input)receives the parsed input, not just the tool type.Bash("ls -la")is safe to run concurrently.Bash("rm -rf build/")is not. The partition algorithm groups consecutive safe tools into concurrent batches but always yields results in submission order.
- The sed simulation in BashTool. When you approve a
sedcommand, the system pre-executes it in a sandbox and caches the result. The actual execution applies the pre-computed edit directly — preventing TOCTOU (time-of-check-time-of-use) issues between the preview you approved and what actually runs.
How to use it. The book is browsable as an Astro website or readable as markdown in the repo. The "Apply This" sections at the end of each chapter distill transferable patterns for your own agent builds. Start with Part III (Multi-Agent) if you're building agent orchestration, Part IV (Persistence) if you're designing memory systems.
Time: 45-60 minutes | What you'll build: A remote approval dashboard for autonomous agents, accessible from your phone, with trust tiers and bidirectional communication.
Why This Matters
Autonomous agents need supervision. But staring at a terminal waiting for permission prompts is a waste of your time. The ideal setup: agents work in the background, and approval requests come to you on whatever device you're carrying. This lab builds exactly that using Farmer and Claude Code's hook system.
Prerequisites
- Node.js >= 20
- Claude Code installed
- A phone on the same network (or Tailscale for remote access)
Part 1: Install and Connect (10 minutes)
farmer connect --global
farmer start
The connect command writes four hooks into ~/.claude/settings.json. Verify they're there:
You should see PreToolUse, PostToolUse, Notification, and Stop hooks pointing to http://127.0.0.1:9090/hooks/*.
Open http://localhost:9090 in your browser. You should see the Farmer dashboard with a session panel. Note the auth token URL displayed in the terminal output — you'll need this for phone access.
Part 2: Test Basic Approval Flow (10 minutes)
Open a new terminal and start a Claude Code session:
Watch the Farmer dashboard. You should see tool call cards appearing:
- A
Bashcall forls— instandardtrust mode, this auto-approves (read-only) - A
Writecall fortest-farmer.txt— this blocks, waiting for your approval
Approve the Write call. The agent completes. Now you understand the basic flow.
Part 3: Test the Deny-to-Respond Pattern (10 minutes)
Start an interactive session:
Tell Claude: "Ask me what color scheme I prefer for the project."
Claude will call AskUserQuestion. On the Farmer dashboard, you'll see the question with a text input field. Type your answer (e.g., "dark mode with blue accents").
Watch what happens: Farmer sends back permissionDecision: "deny" with your answer as the permissionDecisionReason. Claude reads the denial reason as your response and continues working with your preference.
This is bidirectional communication through the permission protocol. You just answered a running agent from a web browser.
Part 4: Configure Trust Tiers (10 minutes)
Farmer supports three trust levels. Experiment with each:
Paranoid mode — open the dashboard settings and switch to paranoid. Start a new Claude session. Even Read and Grep calls will block for approval. This is useful for sensitive environments but impractical for daily work.
Autonomous mode — switch to autonomous. Start a new session. Most operations auto-approve. Try asking Claude to run sudo apt update or git push --force — these trigger the dangerous-pattern regex and block even in autonomous mode:
/\brm\s+-rf?\b/
/\bgit\s+push\s+.*--force\b/
/\bsudo\b/
/\bcurl\b.*\|\s*sh\b/
Standard mode (default) — the sweet spot for most work. Read-only tools auto-approve; writes require approval.
Part 5: Phone Access Setup (10 minutes)
Same network (LAN/Tailscale):
Find your machine's IP (Tailscale IP if using Tailscale):
# or for Tailscale:
tailscale ip -4
On your phone's browser, navigate to http://. Use the auth token from the Farmer startup output. You should see the same dashboard — now you can approve/deny from your couch.
Security note: Farmer uses token-based authentication with HMAC-signed invite links, CSRF protection, and rate limiting. The admin token is stored with 0600 permissions. For internet access, use a tunnel (Tailscale, Cloudflare Tunnel) rather than exposing port 9090 directly.
Part 6: Subagent Trust Inheritance Test (5 minutes)
This is the critical test for agent fleet operators. Set trust to autonomous, then run:
Watch the dashboard. The subagents spawned by Claude Code should inherit the autonomous trust level from the parent session — not reset to paranoid. Verify by checking that read-only operations (Grep, Glob) auto-approve without blocking.
Verification
You've successfully built the system if:
- [ ] Farmer dashboard loads on localhost:9090
- [ ] Tool calls appear as approval cards in the dashboard
- [ ] You can approve/deny Write and Bash calls
- [ ] The deny-to-respond pattern works with AskUserQuestion
- [ ] Trust tiers change approval behavior correctly
- [ ] Autonomous mode blocks dangerous patterns but allows safe operations
- [ ] Phone access works via LAN or Tailscale IP
- [ ] Subagents inherit trust levels from parent sessions
Architecture Reflection
What you've built is a human-in-the-loop gate using only Claude Code's public hook API. No MCP server. No plugin. No modifications to Claude Code itself. The key architectural insight: hooks are the right primitive for permission interception, while MCP servers are the right primitive for capability extension. Farmer doesn't add capabilities to Claude — it adds a human gate on existing capabilities.
The deny-to-respond pattern is particularly worth studying. It repurposes a control channel (permission decisions) as a data channel (user answers). This kind of protocol-level creativity is what separates tools that work with a system from tools that fight against it.