Open-sourcing my agent skills

I’ve been using Pi as my primary coding agent for months. Along the way I kept solving the same problems — how to break a large project into sessions the agent can execute, how to run the agent unattended without it going off the rails, how to get lint to zero without playing whack-a-mole. Each time I solved one, I extracted it into a reusable skill.

Today I’m open-sourcing all of them: pi-skills.

pi install git:github.com/khairold/pi-skills

11 skills. All battle-tested across real projects — Second Brain, Singapore Legal SEO, this site, and a kiosk sales agent I’ve been writing about. Here’s what’s in the box and why each one exists.

The session problem

AI agents have no memory between sessions. You start a conversation, the agent builds half a feature, you close the terminal, and tomorrow you’re back to “here’s my project, here’s what I need.” Every session starts cold.

The obvious fix is documentation. But documentation alone doesn’t solve sequencing. When a project has 40 tasks across 6 phases, you need more than a README — you need a system that tracks what’s done, what’s next, what decisions were made, and what drifted from the original plan.

That’s the phased-plan → execute-phase → close-plan pipeline.

phased-plan takes a spec and produces a .plan/ directory with four files:

PLAN.md — the phased task list with checkboxes
MEMORY.md — decisions, patterns, gotchas the agent discovered
DRIFT.md — where reality diverged from the original plan
SESSION-LOG.md — what happened each session

When you start a new session, the agent reads these files and knows exactly where it left off. execute-phase handles the attended workflow — it orients, proposes scope for your confirmation, executes, verifies the build, and updates all four files. close-plan runs when the project is done — it extracts everything valuable from .plan/ into the project’s own documentation and cleans up.

I’ve used this on every multi-session project since building it. The difference is stark. Without it, I’d spend the first 10 minutes of each session re-explaining context. With it, the agent reads four files and starts working in seconds.

The autopilot loop

Once execute-phase was reliable, the next question was obvious: what if I’m not there?

autopilot-execute is the unattended version. The agent reads the plan, picks the next unchecked item, does the work, runs the build to verify, updates the plan files, and exits. It’s designed to be called in a loop by a shell script.

The autopilot.sh wrapper handles everything around it:

./autopilot.sh                    # Run until phase boundary
./autopilot.sh --continuous       # Run through all phases
./autopilot.sh --phase 3          # Only Phase 3
./autopilot.sh --dry-run          # Show what it would do

It does stuck detection — if the same item fails three times in a row, it stops and tells you. It does build verification after every iteration — if the build breaks, it stops with the error log. It does git checkpoints before and after each iteration, so you can always roll back. And it writes a STATUS file that a companion autopilot-dashboard.sh script can tail in a second terminal.

The name comes from Geoffrey Huntley’s “Ralph Wiggum” pattern — the idea that you can get useful work from an AI agent by just running it in a loop with a clear plan and a build gate. My autopilot adds the session management and stuck detection that make it practical for real projects.

I’ve run this overnight on multi-phase builds. Wake up, check the dashboard, review the git log, keep going. It’s not magic — the agent still makes mistakes, still gets stuck on ambiguous tasks. But for the mechanical work of grinding through a well-defined plan, it saves hours.

The everyday skills

Not everything needs a plan. Some skills are for problems you hit every day:

lint-fix drives typecheck and lint to zero errors. The key insight is fixing by pattern, not by file. If 30 files have the same unused-import warning, that’s one fix applied 30 times — not 30 separate problems. It auto-detects your tooling (ESLint, Biome, tsc, framework wrappers) and works in batches.

responsive-audit uses a headless browser to screenshot a component at multiple breakpoints, reads the visual output alongside the source code, and produces a concrete fix plan. I built this after the third time I deployed a page that looked great on desktop and broke on mobile. The agent can now catch overflow, collapsed grids, and unreadable text before I push.

autoresearch-create sets up an autonomous experiment loop — try ideas, measure results, keep or discard. It’s for optimization problems where you don’t know the answer upfront, you just have a metric you want to improve. I used it for prompt tuning on the sales agent.

The architecture skills

These are less frequent but high-leverage:

deep-modules-audit scans a codebase for “agent debt” — code that’s structured for human IDE navigation (lots of small files, one function per file, deeply nested directories) instead of AI agent navigation (fewer, deeper files that contain everything the agent needs in one read). It produces a refactoring plan with before/after file counts. The concept comes from John Ousterhout’s A Philosophy of Software Design, adapted for the reality that your codebase’s primary reader is now a tool-calling LLM.

docs-audit identifies documentation that’s stale, contradictory, duplicated, or simply unnecessary. Coding agents read code — they don’t need an inventory of every file in the project. The audit finds docs that cost more to maintain than they contribute and proposes cuts.

living-design-system builds a design system that renders real app components with fixture data, independent of the backend. This came directly from my experience building design systems as documents — the agent needs to see components rendered to understand if its changes look right.

The discovery skill

find-skills connects to the open agent skills ecosystem via npx skills find. When you ask “is there a skill for X?”, it searches the registry and helps you install what it finds. The ecosystem is young, but the idea — shareable, composable agent capabilities — feels like the right direction.

Why skills, not prompts

Every one of these started as instructions I was copy-pasting into conversations. “Before you start, read the plan files.” “Fix lint by pattern, not by file.” “Screenshot at 320, 768, 1024, 1440.” Copy-paste works until you have ten projects and fifteen workflows. Then you want something that loads automatically when the task matches.

Pi’s skill system does exactly that. Each skill is a markdown file with a description and instructions. When your request matches the description, the skill loads. When it doesn’t, it stays out of the way. No configuration, no routing logic — the agent reads the description and decides.

The skills are deliberately opinionated. phased-plan always creates four files. autopilot-execute always verifies the build. lint-fix always works by pattern. These aren’t flexible frameworks — they’re specific workflows that I’ve tested enough to trust. If they don’t match how you work, fork them. That’s the point of open source.

Install

The whole collection:

pi install git:github.com/khairold/pi-skills

Or just the ones you want:

{
  "packages": [
    {
      "source": "git:github.com/khairold/pi-skills",
      "skills": [
        "skills/phased-plan",
        "skills/execute-phase",
        "skills/autopilot-execute",
        "skills/lint-fix"
      ]
    }
  ]
}

Repo: github.com/khairold/pi-skills