khairold
← Back to Blog

Building a desktop GUI for my coding agent

·
AIDesktopElectrobunPiArchitecture

I’ve been using Pi as my primary coding agent for months. It’s a CLI tool — you run it in your terminal, it reads files, executes commands, edits code, and streams its thinking back to you. It’s excellent.

But terminals have limits. Long conversations scroll off-screen. Tool output (a 200-line bash result, a file diff, a syntax-highlighted code block) deserves better rendering than monospace text. Switching between sessions means restarting the process. And if you want to glance at what the agent is doing while you’re in another app, you’re alt-tabbing into a terminal window.

So I’m building PiBun — a native desktop GUI for Pi, powered by Electrobun.

Why not just fork T3 Code?

T3 Code is the obvious reference — it’s a desktop GUI for OpenAI’s Codex agent, built with Electron and Effect. I studied it carefully. About 60% of its server code is Codex-specific protocol plumbing: approval flows, collaboration modes, plan mode instructions, thread/turn mapping, model normalization. The orchestration layer (event sourcing with decider/projector) adds complexity that Pi simply doesn’t need.

Pi already manages its own state. It handles sessions, compaction, retries, model switching — all internally. The server doesn’t need to reimplement any of that. It just needs to pipe events through.

Starting fresh with Pi’s clean RPC protocol is faster than ripping out Codex internals. But T3 Code’s patterns — WebSocket transport with reconnect, Zustand store structure, chat rendering approach — those are worth learning from.

The architecture

┌─────────────┐     WebSocket      ┌──────────────┐     stdio/JSONL     ┌─────────┐
│  React UI   │ ◄──────────────────►│  Bun Server  │ ◄──────────────────►│ pi --rpc│
│  (Vite)     │                     │              │                     │         │
│  Chat, Tools│                     │ piRpcManager  │                     │ LLM API │
│  Sessions   │                     │ wsServer      │                     │ Tools   │
└─────────────┘                     └──────────────┘                     └─────────┘
         ▲                                  ▲
         └──────── Electrobun webview ──────┘

Three layers, each with a clear boundary:

Server spawns pi --mode rpc as a subprocess. Pi speaks JSONL over stdio — structured events for every text delta, tool execution, thinking block, compaction, retry. The server reads these events and pushes them over WebSocket to connected clients. It also forwards commands (prompt, abort, switch model) from the client back to Pi’s stdin. That’s it. No session state, no event sourcing, no orchestration. Pi handles all of that.

Web is a React + Vite SPA with Zustand for state. It connects to the server via WebSocket, renders streaming conversations, tool call cards, thinking blocks, and provides controls for sessions, models, and compaction. Tailwind v4 for styling, Shiki for syntax highlighting.

Desktop wraps both in an Electrobun native app — starts the server on a random port at launch, opens a native webview pointing at it. Native menus, keyboard shortcuts, window management.

The core principle: Pi handles state. The server is a thin bridge. Don’t reimplement what Pi already does.

Why Electrobun, not Electron?

This is the most opinionated choice in the stack. Electrobun uses the OS’s native webview instead of bundling Chromium. That means:

  • Tiny binaries — no 150MB Chromium download
  • Bun-native — PiBun already uses Bun as its runtime, so Electrobun fits naturally
  • Modern — designed for the Bun ecosystem from day one

The tradeoff is maturity. Electron has a decade of ecosystem. Electrobun is newer, with fewer escape hatches when things get weird. I’m betting that PiBun’s needs are simple enough — a webview, some menus, window management — that this bet pays off.

What the UI will look like

The web app is designed around Pi’s event model. Every message from Pi is a stream of typed events:

  • text_delta — assistant text, rendered in real-time with Markdown
  • thinking_delta — reasoning blocks, collapsible and streaming
  • tool_execution_start/update/end — tool calls rendered as cards with expandable output
  • auto_compaction — context window management, shown as a status indicator
  • auto_retry — rate limit recovery, shown as a retry banner

Tool output gets specialized rendering: bash output as terminal blocks, file reads as syntax-highlighted code with paths, edits as diff views. The composer supports image paste (for screenshot-driven debugging) and message steering (type while the agent is still responding to redirect it).

Session management is first-class: new sessions, switching between sessions, forking from any message. Model and thinking level selectors let you change Claude’s behavior mid-conversation.

Where I am

Phase 0 — documentation and project setup. All six architecture docs are written. The agent system (.plan/, .agents/, .pi/) is configured for autonomous multi-session execution. The monorepo scaffold is next.

The roadmap is phased to ship incrementally:

  1. Server + Pi RPC bridge — spawn Pi, send prompts, receive streaming events
  2. WebSocket server — browser can connect and interact with Pi
  3. Minimal web UI — working chat in a browser
  4. Full web UI — all Pi features exposed (thinking blocks, tool cards, sessions, extensions)
  5. Electrobun desktop — native app wrapping everything

Each phase produces a usable artifact. The web app will work standalone in a browser before it ever gets wrapped in a desktop shell. That keeps the feedback loop tight.

The build process

Like my case law platform, PiBun will be built primarily through autonomous agent sessions. The plan is written. The conventions are documented. Each session reads the plan, picks up the next item, executes it, verifies the build passes, and leaves notes for the next session.

The irony isn’t lost on me: I’m using an AI coding agent to build a GUI for an AI coding agent. But that’s kind of the point. Pi is good enough at its job that the main thing holding it back is the interface, not the capabilities. A better interface makes a better agent — not because the AI changes, but because the human can see and direct it more effectively.

Code is on GitHub. Watch if you’re interested in either Pi or Electrobun — this project exercises both in non-trivial ways.