Teaching agents to know what they don't know

The agent soul solved behavior. SOUL.md tells the agent how to talk, what to prioritize, where to push back. HUMAN.md gives it context about who it’s working for. TENSIONS.md logs friction. Together, they shape personality.

But personality doesn’t help when the agent silently works around a limitation.

I’ve watched this happen. The agent needs to verify something in a browser. Instead of saying “I can’t do this,” it generates a plausible-sounding description of what the page probably looks like. It needs an API key. Instead of asking, it writes code that skips the authentication step. It needs domain knowledge it doesn’t have. Instead of flagging ignorance, it confabulates.

The problem isn’t malice. It’s that the agent doesn’t have a framework for thinking about its own capabilities. It knows how to behave (SOUL.md) but not what it can actually do.

AGENTS.md — the self-awareness file

For a new project — QuoteCraft — I created a file called AGENTS.md that sits in .pi/ alongside the project code. It’s loaded automatically every session. And it does something SOUL.md never did: it makes the agent aware of itself as an entity with specific capabilities and specific gaps.

The file opens with an identity statement:

## Identity

You are **Claude Opus 4.6** running inside **Pi** (coding agent harness)
with **extra-high thinking** enabled.
You are not just a code completion tool. You are a multi-role cognitive agent.

This isn’t motivational. It’s calibration. The agent needs to know it’s running inside a harness that provides tools — read, write, bash, edit — and that these tools define the boundary of what’s immediately possible. Not what’s ever possible. What’s possible right now.

Five roles, every task

The most useful part of AGENTS.md is a set of five operational roles the agent must engage on every task:

### 🔭 Observer
- Read and understand the full context before acting
- Notice what's missing, ambiguous, or assumed

### 🧠 Thinker
- Reason deeply about architecture, trade-offs, and consequences
- Consider multiple approaches before choosing one

### 📋 Planner
- Break complex work into phases with clear milestones
- Anticipate what could go wrong and plan contingencies

### 🧪 Tester
- Verify assumptions before building on them
- Test code after writing it — don't assume it works

### ⚡ Executor
- Write clean, idiomatic, well-structured code
- Ship working increments, not half-finished features

Without this, the agent defaults to Executor mode. You say “build me a thing” and it starts writing code immediately. No observation. No thinking about alternatives. No testing. Just code — fast, often wrong, expensive to fix.

The five roles force a cadence. Before the agent writes a line of code, it should have observed the current state, thought about trade-offs, planned the approach, and considered how to verify. Only then does it execute.

This maps to a concrete protocol the agent runs before each major action:

OBSERVE → What do I see? What's the current state?
THINK   → What are the options? What are the trade-offs?
PLAN    → What's the sequence? What could go wrong?
  ↳ GAP CHECK → For each step: Can I actually do this?
ACT     → Execute the plan with precision
VERIFY  → Did it work? Does it match expectations?

The gap check in the middle is the key. It’s not a separate step — it’s embedded in the planning phase. For every step in the plan, the agent asks: “Can I actually do this right now, or am I assuming I can?”

The Gap Detection Protocol

This is the part that changed how my agents operate. Instead of listing every limitation upfront — which is impossible, because limitations are contextual — the protocol teaches the agent to detect gaps dynamically:

On EVERY task, at EVERY step, actively ask:

> "Can I actually do this right now, or am I assuming I can?"

Then follow this decision tree:

1. DO I HAVE THE CAPABILITY?
   ├─ YES → Proceed
   └─ NO → Go to 2

2. CAN I BUILD/INSTALL IT VIA PI?
   ├─ YES, and it's quick → Build the extension/skill, then proceed
   ├─ YES, but it's complex → Tell the human, propose building it
   └─ NO → Go to 3

3. IS THIS BEYOND WHAT PI + CLAUDE CAN SOLVE?
   └─ YES → Ask the human for help. Be specific about:
       - What exactly is needed
       - Why I can't do it
       - What the human could do
       - What I'll do once the gap is filled

The decision tree has three tiers. The first tier — “do I have the capability?” — catches the obvious cases. The agent knows it has bash, read, write, edit. Can it run a build command? Yes, proceed.

The second tier is where it gets interesting. Pi’s architecture is extensible — you can build custom tools, install skills, add API integrations. So when the agent hits a gap, the first question isn’t “tell the human” — it’s “can I solve this myself?” Need web search? Build a fetch extension. Need database access? Write a custom tool. The agent is instructed to try self-sufficiency before escalating.

The third tier is the honest one. Some gaps can’t be coded around. Credentials. Design decisions. Domain knowledge. Business judgment. The agent is instructed to surface these clearly — not work around them, not pretend they don’t exist, not produce a degraded result without saying so.

The principle is explicit:

Never silently work around a gap. Never pretend a limitation doesn't exist.
Never produce a degraded result without saying so.
Either solve the gap (build/install), or surface it to the human clearly.

This single line eliminated the confabulation problem. The agent stopped guessing what a webpage looked like. It stopped skipping authentication. It stopped making up domain knowledge. Not because it became more capable — because it was given explicit permission to say “I can’t do this” and a protocol for what to do next.

CAPABILITY-MAP.md — the honest inventory

AGENTS.md defines the detection process. CAPABILITY-MAP.md is the reference sheet — a living inventory of what the agent can and can’t do right now.

## ✅ Can Do Now
- Read, write, edit any file in the project
- Execute any shell command (build, test, git, npm, etc.)
- Reason deeply with extra-high thinking
- Read images (screenshots, diagrams, mockups)
- Navigate complex codebases
- Run tests and interpret results
- Git operations (commit, branch, diff, etc.)
- Create Pi extensions and skills on the fly

This isn’t aspirational. It’s a factual list of what works today, in this project, with the current toolset. But the interesting part is how the file handles gaps:

## ⚠️ Known Gaps (Non-Exhaustive — Always Detect New Ones)

This list is a starting point, not a boundary.
Every task may reveal gaps not listed here.

| Known Gap          | Can Pi Solve It? | How                                    |
|--------------------|-------------------|----------------------------------------|
| Web search         | Yes               | Install brave-search skill             |
| Browse web pages   | Yes               | Build Playwright extension             |
| Access external APIs| Yes              | Build custom tool extension            |
| Database queries   | Yes               | Build db tool extension                |
| Send notifications | Yes               | Extension using webhooks/email API     |

The “Can Pi Solve It?” column is the key distinction. Most of these gaps are solvable — the agent just hasn’t built the solution yet. This prevents the learned helplessness where an agent says “I can’t access the web” as if it’s a permanent condition. It’s not permanent. It’s a gap with a known bridge.

Then there’s the hard boundary — gaps that always need a human:

## 🔴 Gaps That Always Need Human Help
- Credentials & secrets — Never guess, always ask
- Design & UX decisions — Propose options, let human choose
- Business logic & domain knowledge — Ask when uncertain
- Ambiguous requirements — Clarify before building
- Anything requiring human judgment — Taste, priority, risk tolerance
- Legal/compliance decisions — Never assume

This list is short and permanent. These aren’t technical limitations — they’re judgment calls that belong to the human. The agent can propose, recommend, and analyze. But it can’t decide what looks good, what the business needs, or what’s legally safe. Putting this in writing prevents the agent from overreaching.

The dynamic rule

The most important line in CAPABILITY-MAP.md is this:

> The lists above will never be complete. The process matters more than the list.
>
> On every task, at every step, run the Gap Detection Protocol from AGENTS.md.
> If you discover a new gap, either solve it or ask the human.
> Never silently degrade.

Static capability lists go stale. You write them once, they get outdated, and the agent stops checking because the list says “I can do X” even when circumstances have changed. The dynamic rule says: the list is a reference, but the protocol is the authority. Run the check every time, even for things the list says you can do.

This is the same philosophy as TENSIONS.md — the system observes itself. But where TENSIONS.md logs friction in the product design, the Gap Detection Protocol catches friction in the agent’s own capabilities. Both are self-observation. Both feed improvement. Both work because they’re continuous processes, not one-time audits.

Project-first, always

AGENTS.md has one more opinion that shapes how agents work: everything lives in the project.

### Extension & Skill Placement — Project-First, Always

All extensions and skills MUST be created in the project directory, never globally.

.pi/extensions/    → Project extensions (auto-loaded)
.pi/skills/        → Project skills (auto-loaded)
.agents/skills/    → Project skills (cross-agent compatible)

This is a portability decision. If the agent builds a custom tool to talk to an API, that tool lives in the repo. Clone the repo, get the agent capabilities. No “works on my machine” surprises. No hidden global dependencies.

The rule is strict:

Never create extensions in ~/.pi/agent/extensions/ — always .pi/extensions/
Never create skills in ~/.agents/skills/ globally — always in the project
All extensions and skills are committed to git alongside the code

This means every project carries its own agent configuration. The QuoteCraft agent has different capabilities than the Second Brain agent, because they have different AGENTS.md files with different capability maps. The Second Brain agent knows about the D1 API. The QuoteCraft agent doesn’t — but it knows it could build a tool for it if needed.

The agent isn’t a general-purpose entity you configure once. It’s a project-specific collaborator that adapts to whatever the project needs.

The evolution from SOUL.md

The agent soul article described three files: SOUL.md (behavior), HUMAN.md (user context), TENSIONS.md (friction). Those files answer how and for whom and what’s breaking.

AGENTS.md and CAPABILITY-MAP.md answer a different question: what am I, and what can I do?

File	Question	Changes
SOUL.md	How should I behave?	Rarely
HUMAN.md	Who am I working for?	Slowly, as the agent learns
TENSIONS.md	Where is the system breaking?	Continuously
AGENTS.md	What am I and how do I think?	Per-project
CAPABILITY-MAP.md	What can I do right now?	As capabilities change

The soul files are about personality. The agent files are about cognition. You could have a perfectly well-behaved agent (great SOUL.md) that still silently fails because it doesn’t know its own limits (no AGENTS.md). Behavior without self-awareness produces an agent that’s polite about delivering garbage.

The five files together — SOUL.md, HUMAN.md, TENSIONS.md, AGENTS.md, CAPABILITY-MAP.md — form something close to a complete agent architecture. The agent knows how to behave, who it’s serving, where the product is breaking, what it’s capable of, and what to do when it hits a wall.

What this actually prevents

Without gap detection, I’ve seen agents:

Hallucinate API responses instead of admitting they can’t access the API
Skip authentication flows and write code that only works locally
Generate mock data and present it as real output
Describe what a webpage “probably” looks like instead of saying they can’t browse
Make business decisions they have no authority to make
Produce working but wrong code because they guessed at requirements

Every one of these failures has the same root cause: the agent encountered a gap and worked around it instead of surfacing it. The code compiled. The tests passed. The output looked plausible. But it was wrong in ways that only became visible later.

Gap detection doesn’t make the agent smarter. It makes it honest. An honest agent that says “I need an API key to proceed” is more useful than a clever agent that writes elaborate mock data pretending the API call succeeded.

The cost

Two markdown files. AGENTS.md is about 150 lines. CAPABILITY-MAP.md is about 80 lines. They take maybe 30 minutes to write for a new project — most of that time is thinking about what the project actually needs.

The return: agents that stop pretending and start asking. Agents that build their own tools when they can. Agents that escalate to humans only when they genuinely can’t solve the problem. Agents that log what they discover so the capability map stays current.

It’s the same trade as every other system I’ve described: a small upfront investment in structure that compounds across every session. The .plan/ protocol invests in memory. The agent soul invests in personality. The capability map invests in honesty.

The best agents aren’t the ones that can do everything. They’re the ones that know exactly what they can’t do — and either fix it or tell you.