How to Build an Education Agent Using Flue

In 1984, Benjamin Bloom published a finding that has haunted education ever since: students tutored one-on-one performed two standard deviations better than students in a conventional classroom. The average tutored kid beat 98% of the untutored ones. The catch—the reason it’s called the 2 Sigma Problem—is that one-on-one tutoring never scaled. We’ve known the best way to teach for forty years and could never afford it.

Then, rather suddenly, we could. An LLM is an infinitely patient tutor: it never sighs when you ask the same question a third time, never makes you feel slow, meets you exactly where you are at 2am for as long as you want. That’s not a replacement for teachers—Bloom’s tutors brought judgment, warmth, and accountability that no model has—but it’s the first time the one-on-one part of his finding has been affordable at scale. Education + AI finally gives the 2 Sigma Problem a real shot at a solve.

But that raises a question I couldn’t stop thinking about: most of the world’s best teaching material is static. It lives in books. How do you take a book’s content—its concepts, its exercises, its carefully sequenced curriculum—and turn it into an AI that can actually teach you that same material, interactively, adaptively, one question at a time?

There’s an obvious place to steal the interaction model from: Claude Code. Claude Code is an agent that lives in your terminal, reads your files, edits them, and talks to you about what it’s doing. So: can we take inspiration from Claude Code, but run it in reverse? Instead of an agent that writes code for you, an agent that opens a file in your editor, asks you to edit it, reads what you wrote, and questions your understanding using the Socratic method. It never gives you the answer. It makes you earn it.

This post walks through building exactly that. The example I picked—and it really is just one example, chosen to illustrate an approach you could apply to any book or curriculum — is the Actions, Calculations, Data distinction from Eric Normand’s Grokking Simplicity: the single highest-leverage idea, I think, for writing testable code. So: an ACD tutor. A CLI you run in your terminal. It writes a lesson file, pops it open in your editor, and starts asking questions. The full code is on GitHub, and every snippet below links to the commit it came from.

But before we write any code, we need to get one thing straight — because the word “agent” has gotten slippery.

An agent used to be a prompt

Two years ago, if you said “I built an AI agent,” what you usually meant was: I wrote a really good system prompt.

// "agent", circa 2023
const response = await llm.complete({
  system: "You are a Socratic tutor. Never give the answer...",
  messages: [{ role: "user", content: question }],
});

And honestly? That worked okay. A well-written prompt can shape a model’s behavior dramatically. The original socratic-tutor skill this project started from is exactly that lineage — a markdown file full of carefully-worded rules like:

NEVER give the answer. Not directly. Not indirectly. Not by heavily implying it.

What’s a “skill”? If that word is new to you, hold the thought — we’ll define skills properly in a moment. For now: a skill is a structured markdown package of expertise that an agent can load.

But a prompt alone has a hard ceiling. It can’t open a file in your editor. It can’t check whether you actually edited lesson-1.ts. It can’t remember that you finished lesson 3 last Tuesday. It produces text, and text is where it ends.

The thing that changed — the thing that makes Claude Code feel categorically different from a chat window — is that an agent today is not a prompt. It’s a harness.

The harness: a loop with hands

Strip away the branding, and every modern agent is the same small machine:

Every agent you’ve used — Claude Code included — is this loop.

The model is still just predicting text. But the harness around it watches for a particular kind of text—a tool call—executes it in the real world, and feeds the result back in. Run that in a loop and the model stops being a text generator and starts being something that does things: reads files, edits them, runs commands, checks its own work.

The harness is everything around the model: the loop itself, the tools it exposes, the instructions and skills it loads, the sandbox it executes in, the transport that connects it to a UI. The model provides the judgment; the harness provides the hands, the memory, and the guardrails.

So when we say “build an agent,” we really mean: build a harness. You could build all of it yourself—but you don’t have to, because this is exactly the layer that Flue provides. Flue is a new agent harness from the same folks who built Astro, and it carries the same design sensibility: declarative definitions, file-based conventions, and a build step that turns it all into something deployable. We’ll dig into it properly in a minute.

First, though: a harness has three ingredients worth distinguishing carefully, because they look similar and absolutely aren’t.

Prompts, skills, and tools

Instructions are identity

The system prompt—what Flue calls instructions—is the agent’s standing identity. Always present, always in context. System prompts aren’t necessarily short—production ones can run to pages—but in this case it is: a few sentences. “You are an ACD tutor. On a fresh start, check where the learner left off.” Think of it as the job description.

Skills are knowledge

A skill is a package of expertise—typically a SKILL.md file plus reference documents. It’s still “just prompting” in the sense that it’s words the model reads, but it’s structured, versioned, and progressively loaded: the model sees the skill’s name and description up front, and pulls in the full text and references only when relevant.

My tutor’s skill looks like this on disk:

The SKILL.md encodes the whole pedagogy: the fixed curriculum (lessons walking through chapters 3–5 of the book), the rule to read the lesson file on every single turn in case the learner edited it, the rule to never classify a function for the learner. The references hold the exercises and the answer key the model consults but never reveals.

Crucially, a skill can describe procedures that use tools—”write the exercise into lesson-1.ts, then open it in their editor”—without being a tool itself. It’s the recipe, not the knife.

Tools are capabilities

A tool is the opposite: not words, but code. A typed function the model is allowed to call. It has a name, a schema for its parameters, and a handler that runs in the real world:

defineTool({
  name: "readFile",
  description: "Read a lesson file from the learner's workspace…",
  parameters: {
    type: "object",
    properties: { filename: { type: "string" } },
    required: ["filename"],
  },
  execute: async (args) =>
    readFile(resolve(args.filename), "utf8").catch(() => "FILE_NOT_FOUND"),
});

The model never sees the implementation—only the name, description, and schema. It decides when to call readFile; your code decides what that actually does. That split is the security model and the design space of the whole field, compressed into one sentence.

Here’s the cheat sheet:

	What it is	Changes the model’s…	Example here
Instructions	Standing system prompt	identity & defaults	“You are an ACD tutor”
Skill	Markdown expertise package	knowledge & procedure	`acd-tutor/SKILL.md` + exercise bank
Tool	Typed callable function	capabilities	`readFile`, `writeFile`, `openFile`

A useful test: if you deleted it, would the agent become dumber or weaker? Delete the skill and my tutor still has hands but forgets how to teach. Delete the tools and it remembers the whole curriculum but can’t touch a single file. You need both, and you should never confuse them.

Why the distinction matters When the tutor misbehaves pedagogically — gives a hint too early, skips a lesson—the fix is in SKILL.md. When it misbehaves mechanically—writes to the wrong directory, can’t open the editor—the fix is in tool code. Keeping these in separate files means each bug has exactly one home.

Enter Flue

You could build the harness yourself—the loop is genuinely not that much code. But there’s a lot of supporting machinery around the loop: streaming, conversation state, tool dispatch, sandboxing, serving it over HTTP. That’s what Flue (@flue/runtime + @flue/sdk + @flue/cli) handles.

Flue’s pitch is that an agent is a declaration, not an application. You describe the harness—model, instructions, skills, tools, sandbox—and Flue compiles it into a server. Here is the entire agent definition for the tutor, very close to how it looked in the first working commit:

import { createAgent } from "@flue/runtime";
import { local } from "@flue/runtime/node";
import acdTutor from "../skills/acd-tutor/SKILL.md" with { type: "skill" };

export default createAgent(() => ({
  model: "anthropic/claude-sonnet-4-6",
  instructions: "You are an ACD tutor…",
  skills: [acdTutor],
  sandbox: local(),
  cwd: process.cwd(),
}));

Twelve lines, and several of Flue’s best ideas are visible in them:

Skills are imports. That with { type: "skill"} import attribute is a cool line in the codebase. The skill—markdown, references and all—is a first-class module. The bundler resolves it, packages the reference files alongside it, and hands the agent a structured skill object. Your expertise is under version control and participates in the build like any other dependency.
The sandbox is explicit. sandbox: local() gives the agent an execution environment on the host. In this project its main job is to let the skill operate: when SKILL.md says “consult references/exercise-bank.md,” the model needs somewhere to read those reference files from — that’s the sandbox. (Early on it also ran the tutor’s file commands, which turned out to be a mistake; more on that below.) The point is that it’s opt-in: Flue makes you choose host access on purpose rather than handing it out by default.
Convention over wiring. Flue scans src/agents/ for agent definitions, and within each file the contract is a set of named exports, each well-defined and for a specific purpose—the default export is the agent, and exporting a route middleware opts it into HTTP transport at POST /agents/main/:id. If you’ve used Next.js—where exporting GET makes a route handler and exporting metadata sets page metadata—this will feel instantly familiar; Flue is clearly picking up a lot of lessons from it. One export, and your agent is a streaming API.

Then flue build compiles everything—agent, skill, references, runtime—into a single dist/server.mjs. The agent isn’t a script you run; it’s a server you talk to. That decision pays off immediately, as we’re about to see.

How this project actually started

Here’s the part I want to be honest about, because it’s the most replicable lesson in this whole post: I didn’t start from scratch. I started from a skill.

There was already a socratic-tutor skill—a general-purpose “never reveal the answer” tutor for kids, written for Claude’s skill ecosystem, complete with age tiers and Mermaid diagram patterns. It encoded real pedagogical craft: ask what they already know first, diagram the structure of the problem but never the solution path, calibrate question difficulty to the learner.

The very first commit of this repo is just three things:

That existing socratic-tutor skill, dropped into src/skills/,
A 12-line Flue agent definition importing it,
A single-file CLI loop wrapped around the whole thing.

That’s it. That was a working agent—on day one. Because the skill already contained the hard-won knowledge, and Flue already contained the harness machinery, the “build an agent” step collapsed into plugging one into the other.

The CLI loop deserves a look, because it shows what Flue leaves for you to write—pleasantly little. Four steps:

// 1. Build the agent bundle so edits to agents/ and skills/ get picked up
Bun.spawn(["bunx", "--bun", "flue", "build"]);

// 2. Spawn the compiled Flue server
const server = Bun.spawn(["bun", "dist/server.mjs"], { env: { PORT: "3789" } });

// 3. Poll /openapi.json until the server is ready

// 4. REPL: each line is a streaming invoke via @flue/sdk
const client = createFlueClient({ baseUrl: "http://localhost:3789" });
const instanceId = `cli_${crypto.randomUUID()}`;

const stream = client.agents.invoke("main", instanceId, {
  mode: "stream",  // SSE under the hood
  /* …message payload… */
});

Two details here matter more than they look:

The instance id is the conversation. Notice there’s no message history being managed in the CLI. You invoke agent "main" with an instanceId, and Flue keeps the conversation state server-side, keyed to that id. The whole “memory” problem—the thing that makes naive LLM wrappers painful—is one UUID. Reuse the id, resume the conversation.

Streaming is the default posture. mode: "stream" gives you the reply as an async iterable of chunks over SSE. A tutor that thinks for eight seconds and then dumps four paragraphs feels broken; one that starts talking immediately feels alive. Flue makes the right thing the easy thing.

One Flue gotcha Flue reserves src/app.ts for a user-supplied Hono app — put arbitrary code there and the build fails with a confusing error. The CLI therefore lives at src/main.ts. You will hit this; now you won’t be confused by it.

From socratic-tutor to acd-tutor

The generic skill proved the plumbing, but it wasn’t my tutor. So the next few commits did the second most instructive thing in the repo: forked the skill.

The new acd-tutor skill kept the Socratic spine—never reveal, always question—and replaced everything generic with something opinionated:

A fixed learner (a novice JS/TS programmer; no profile questions, no age tiers) and a fixed curriculum (lessons walking from “spot the action” to “extract the calculation out of the action”),
An exercise bank of real TypeScript files the tutor writes into your scratch directory,
A two-channel protocol: every turn, the tutor re-reads the current lesson file before reading your chat message. If you edited the file, the edit is your answer; chat is commentary. This is the move that makes it feel like Claude Code rather than a chatbot—the filesystem is part of the conversation,
Your editor, not mine. One thing that has always annoyed me about coding tutors is the artificial dev environment—the embedded web editor with no vim keys, no extensions, none of your muscle memory. Here the lesson file just opens in $EDITOR. Whatever you already use—Zed, VS Code, vim—that’s where you edit. The tutor adapts to your environment instead of imposing one,
Resumability: on a fresh start, the tutor lists the lesson files, reads the highest-numbered one, compares it against the exercise bank’s “done when” criteria, and picks up exactly where you left off. The scratch directory is the progress database.

And notice where all of that lives: in markdown. The agent definition barely changed. The CLI didn’t change at all. When the unit of iteration is a skill file, iterating on behavior stops requiring you to touch infrastructure—which is precisely the property you want when you’re tuning pedagogy, because you’ll be tuning it constantly. It also means you don’t have to understand any other part of the harness to change the lesson plan: a later commit expanded the curriculum to seven lessons by editing exactly two markdown files—no TypeScript touched.

At this point—a handful of commits in—the whole system looks like this:

The whole system after the first few commits.

You run bun start, the tutor writes lesson-1.ts, your editor pops open, and a voice in the terminal asks: “Look at processOrder — if you had to call it an action, a calculation, or data… which is it, and what’s your gut reason?”

It works. It’s also rough in three specific ways, and fixing them is the rest of this post.

Tools and boundaries

The first rough edge showed up immediately, as the best bugs do, in the most mundane way: the tutor wouldn’t open my editor.

In the early version, the skill did its file work through shell commands in the sandbox—${EDITOR:-zed} lesson-1.ts & and the like. But Flue’s local() sandbox only exposes an explicit allowlist of environment variables, and $EDITOR isn’t on it. The skill’s fallback silently kicked in, and the wrong editor opened every time. The first fix was the obvious one—forward the variable:

sandbox: local({ env: { EDITOR: process.env.EDITOR } }),

That worked. But it’s a smell, and it’s worth pausing on why: the pedagogy file now depended on a host environment variable, threaded through a sandbox allowlist, invoked via shell string interpolation written in markdown. Three layers of the system had quietly fused. The skill knew where files lived, which editor command to run, and how to background a process in bash. None of that is teaching knowledge.

So the real fix was to move file operations out of the sandbox entirely and into host-side tools — proper defineTool definitions that run in the server process:

// basename() blocks path traversal — "../x" and "/etc/x" both become leaf names.
const resolve = (filename) => join(opts.scratchDir, basename(filename));

defineTool({
  name: "openFile",
  /* …schema… */
  execute: async (args) => {
    const editor = opts.editor ?? process.env.EDITOR ?? "open";
    // shell:true handles multi-word commands like "zed -w"; detached +
    // unref so a waiting editor never blocks the server process.
    spawn(`${editor} ${JSON.stringify(path)}`, {
      shell: true, detached: true, stdio: "ignore",
    }).unref();
    return `Opened ${basename(args.filename)} in the learner's editor`;
  },
});

Four tools—listFiles, readFile, writeFile, openFile— and a design rule that I think is the most important idea in this section: the model addresses files by bare filename, and never learns where they live.

The model says readFile("lesson-1.ts"). The host resolves that to a real path inside the scratch directory. This is simultaneously:

A security boundary. The basename() in resolve means
../../etc/passwd and /etc/passwd both collapse to harmless leaf names. The model physically cannot escape the workspace, no matter what it generates. One line.
Better UX for the model. Tool calls get shorter and harder to get wrong. The agent instructions shrink from path-management choreography to one sentence: “Manage lesson files with listFiles/readFile/writeFile/openFile, addressing files by bare filename.”
A cleaner skill. All the bash incantations disappeared from SKILL.md. The pedagogy file went back to being about pedagogy.

And the $EDITOR hack? Deleted. The openFile tool runs on the host, where process.env.EDITOR just… exists. The sandbox went back to plain local(), doing the one job it’s actually good at here: letting the skill read its reference files.

This is the prompts-vs-tools distinction from earlier, paying rent. Where a capability lives determines what can go wrong with it.

Refactoring into layers

The second rough edge: the entire CLI—server spawning, readiness polling, signal handling, REPL, rendering—lived in one 136-line file. Fine for day one; a liability the moment you want to change anything. One refactor split it into five layers, and the split has survived every change since:

Agent definition — src/agents/main.ts: the Flue declaration. Tutor behavior lives in the skill, not here.
Server runner — src/runner.ts: generic startFlueServer() — builds, spawns dist/server.mjs, polls until ready, owns shutdown. Knows nothing about tutoring.
Agent I/O — src/agent-io.ts: createAgentSession(client, agent) — pins an agent + instance id and exposes send(payload) as an async iterable of chunks. Transport only. No printing.
Console frontend — the REPL loop and rendering. Knows nothing about Flue or the tutor; its reply source is a caller-supplied reply(line): AsyncIterable.
CLI entry — src/main.ts: composition only. Start server, create session, run console with the tutor-specific greeting strings.

Layers 2–4 are deliberately tutor-agnostic. The payoff isn’t aesthetic: it means a future web frontend reuses everything but layer 4, and—as we’re about to see—replacing the console UI wholesale touched nothing else.

A real terminal UI

The third rough edge was the experience itself. Raw readline does no justice to a streaming tutor: markdown arrives as literal asterisks, there’s no visual separation between you and the agent, and while a reply streams, your keystrokes go nowhere.

So the console became an Ink app—yes, React in the terminal, and the same UI library Claude Code itself is built with:

Because of the layering, this rewrite—readline to React—changed one import in src/main.ts. The agent, the tools, the skill, the runner: untouched. Three details from this phase are worth stealing:

Streamed markdown, rendered live. Chunks accumulate into the current turn and re-render through marked-terminal on every update — so headings, bold, and code blocks appear styled as they stream, not after.

A debug mode that shows the loop. A /debug toggle surfaces what the harness is actually doing. The agent-I/O layer formats every Flue event—tool calls, turns, token usage—into one-line diagnostics:

-> readFile({"filename":"lesson-2.ts"})
<- readFile ok in 3ms: // Lesson 2 — annotate each line…
turn anthropic/claude-sonnet-4-6 stop=end_turn tokens=8211/342 cache=7902r/0w in 6120ms

Remember the loop diagram from the top of this post? Debug mode is that diagram, live. Watching your agent read the lesson file before answering — exactly as the skill instructs — is the moment the architecture stops being abstract. It’s also how you catch it when it doesn’t.

Queueing instead of locking. The first Ink version disabled input while a reply streamed. The fix: keep input live, queue submissions, and send each one as the previous reply finishes — while slash commands like /debug execute immediately. Claude Code behaves the same way, and once you’ve felt it, a locked input feels broken.

Tuning the teacher

With the machinery settled, the work shifted to where it should be: the skill. Expanding the curriculum to seven lessons—including a gentler lesson 0—meant editing the exercise bank and the answer key. Markdown only. The harness didn’t know anything happened.

That’s the end state I’d encourage you to aim for with your own agent: a system where the infrastructure changes rarely and the knowledge changes constantly, and the two never block each other. The commit history of this repo tells that story plainly—TypeScript commits cluster early, markdown commits keep going.

Closing thoughts

Step back and look at what got built, and how little of it is “AI code”:

A skill holds the pedagogy—forked from an existing one, tuned endlessly, all markdown.
Four small tools hold the capabilities—with the host/model boundary doing double duty as a security model.
Flue holds the loop, the streaming, the conversation state, and the build.
A thin, layered CLI holds the experience.

Food for thought: given that layering, how hard would it be to add a web UI to this? The agent is already an HTTP server speaking SSE; the console is just one consumer of an AsyncIterable of chunks. A browser frontend would reuse layers 1–3 untouched and swap only the rendering. I suspect it’s a weekend—try it and tell me.

The recipe generalizes well beyond a programming tutor. Pick a book you love. Distill its curriculum into a skill—exercises, answer key, question patterns. Give the agent just enough tools to put real artifacts in front of the learner in their environment. Wrap a loop around it. The ever-patient teacher, it turns out, is mostly a markdown file with hands.

The full source—every commit referenced in this post—is at github.com/vishnugopal/acd-tutor.