Learning using Agents (a web version this time)

The problem I’ve been interested in is this: if you have content or principles that you’d like to learn from a book, is it possible to build an interactive agent that can help you grok that content easier?

In Part 1 I built a Socratic programming tutor as a CLI agent on Flue. It ended on a dare. The tutor was a terminal app — an Ink (React-in-the-terminal) UI talking to a Flue agent over HTTP — and I noted that the agent was already an HTTP server speaking SSE, and the console was just one consumer of an AsyncIterable of chunks. So:

Given that layering, how hard would it be to add a web UI to this? A browser frontend would reuse layers 1–3 untouched and swap only the rendering. I suspect it’s a weekend — try it and tell me.

I built the web UI, and the honest answer is in two parts. The rendering swap really was a weekend — the agent, the runner, the transport never moved. But the web UI also found the soft spots in the abstraction and made me firm them up: a chunk of logic that lived inside the console got pulled out into a shared layer, a single agent quietly became three, and the one genuinely hard problem — the filesystem that Part 1 made part of the conversation — had to be rebuilt for a browser that has no filesystem.

The whole arc is real commits, PR #11 through #21, and every snippet below links to its source. The spine of the story is one idea: where exactly is the seam between “the agent” and “a UI,” and is it the same seam for a terminal and a browser?

The seam, named

Part 1’s layering had a transport layer — src/agent-io.ts — whose whole job was to pin an agent and a conversation and hand back chunks. Strip it to its contract and the entire architecture of this post is two type declarations: src/agent-io.ts & src/shared/chunks.ts

// One streamed piece of an agent reply.
export type AgentChunk =
  | { kind: "text"; text: string }   // reply prose, streamed as deltas
  | { kind: "debug"; text: string };  // tool calls, turns, logs, …

export interface AgentSession {
  // Conversation id — memory lives server-side, keyed to this.
  instanceId: string;
  // Invoke the agent once; yield the reply's chunks as they stream in.
  send(payload: DirectAgentPayload): AsyncIterable<AgentChunk>;
}

That’s the seam. Above it: anything that wants to show a conversation to a human. Below it: Flue, the model, the tools, the loop. The console UI consumes send(). The claim of this post is that a browser is another consumer of the exact same method — and that almost everything interesting falls out of taking that claim literally.

Notice what the chunk type already gives us for free. There are two kinds: text (the reply you read) and debug (the tool calls and token counts behind Part 1’s /debug mode). Whatever consumes this stream gets streaming and observability in one vocabulary. The web UI will reuse both.

The same stack from Part 1, with a second head bolted on at the seam.

First, a prototype that lied

The temptation, with a clean seam in hand, is to wire the browser straight to the agent and start debugging two hard things at once: the UI’s interaction design and the agent integration. I didn’t. The first web commit is a throwaway: a standalone React app under prototypes/ with no agent behind it at all. Every tutor reply was scripted dummy data.

The point was to settle questions that have nothing to do with Flue. What does the home screen look like with a course of seven lessons? Where does the chat dock live on desktop versus mobile? How does a code editor feel next to a chat panel? How fast should autosave fire? You can answer all of that against canned data in an afternoon, and you should, because none of it should be entangled with stream parsing.

The prototype’s README is explicit that it’s scaffolding, and — crucially — it names the seam where the real agent will later plug in:

Standalone — not wired to the Flue server yet; all tutor behavior is scripted dummy data so the interactions can be refined before integration. […] Integration seam for later: useScriptedChat.send() ? replace with the Flue agent-session send() (streaming chunks land as tutor messages).

Read that last line again. The prototype’s fake chat hook was deliberately shaped like the real seam — a send() that yields chunks. The prototype wasn’t guessing at an interface; it was building against the one that already existed in agent-io.ts, with a stub on the other end. Once the real UI shipped, the prototype had served its purpose and PR #19 deleted the whole directory — 1,951 lines, gone in one commit, no regrets.

Prototype against the seam, not around it
A scripted prototype is only useful if throwing it away is cheap and its lessons transfer. Both held here because the fake send() matched the real send(). The design work (layout, autosave timing, the mascot) moved to the real UI; the fake data didn’t. If your prototype invents its own data shapes, you’ve prototyped a different app.

The backend is an adapter, not a rewrite

The real web UI needed a server. Not an agent server — that already exists — but something to serve the browser bundle and relay the chunk stream over HTTP. Here is the entire web entry point, and the thing to notice is how much it doesn’t do:

// The Flue server inherits this: the openFile tool signals the web
// editor instead of spawning the learner's local $EDITOR. (More below.)
process.env.TUTOR_OPEN_MODE = "web";

// Boot the agent server exactly like the console runner does.
const { client } = await startFlueServer({ port: Number(process.env.FLUE_PORT ?? "3789") });

// Serve the Code Buddy web app on top of it.
const web = startWebServer({
  client,
  agents: AGENT_CHOICES,
  workspaces: WORKSPACES,
  port: Number(process.env.WEB_PORT ?? "3790"),
});

That startFlueServer() is the same runner from Part 1 — builds the bundle, spawns dist/server.mjs, polls until ready, owns shutdown. The console’s main.ts calls it; the web’s main.ts calls it. Neither knows the other exists. This file is pure composition, exactly the role Part 1 assigned to src/main.ts: start the server, then attach a UI.

So what does startWebServer add? A thin HTTP/SSE skin over the same AgentSession. The one function that matters is the bridge from an AsyncIterable<AgentChunk> to a text/event-stream response — and it’s almost insultingly small:

// One `data: <json AgentChunk>` event per chunk, then a terminal `done`.
function sseReply(chunks: AsyncIterable<AgentChunk>): Response {
  const stream = new ReadableStream({
    async start(controller) {
      const emit = (frame) =>
        controller.enqueue(encode(`data: ${JSON.stringify(frame)}\n\n`));
      try {
        for await (const chunk of chunks) emit(chunk);  // ? the whole adapter
        emit({ kind: "done" });
      } catch (err) {
        emit({ kind: "error", text: String(err) });
      } finally {
        controller.close();
      }
    },
  });
  return new Response(stream, {
    headers: { "Content-Type": "text/event-stream" },
  });
}

The route that uses it is one line of real work: return sseReply(session.send({ message })). The server takes the chunk stream the agent layer already produces and copies each chunk onto the wire as JSON. The two extra frame kinds — done and error — are the wire’s punctuation: a stream that ends cleanly emits done, and a stream that throws emits error so a truncated reply can never look complete to the browser. (The real file also sends a comment heartbeat every 15 seconds, so a connection stays open through long tool-heavy stretches where the agent emits nothing.)

This is the payoff of Part 1’s insistence that the agent is a server you talk to, not a script you run. Adding HTTP transport for a browser meant writing an adapter, not relocating any logic.

The browser is just another consumer

On the browser side, the mirror image: take SSE off the wire and turn it back into the same AsyncIterable<AgentChunk> the server started with. It’s an async generator over fetch:

export async function* streamReply(
  sessionId: string, message: string, signal?: AbortSignal,
): AsyncGenerator<ReplyChunk> {
  const res = await fetch(`/api/sessions/${sessionId}/messages`, {
    method: "POST", body: JSON.stringify({ message }), signal,
  });
  const reader = res.body.getReader();
  let buffer = "";
  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    buffer += decoder.decode(value, { stream: true });
    const events = buffer.split("\n\n");   // SSE frames are \n\n-delimited
    buffer = events.pop() ?? "";          // keep the half-frame for next read
    for (const event of events) for (const line of event.split("\n")) {
      if (!line.startsWith("data: ")) continue;
      const frame = parseStreamFrame(line.slice(6));
      if (frame === null) continue;             // malformed: skip, don't crash
      if (frame.kind === "done") return;
      if (frame.kind === "error") throw new Error(frame.text);
      yield frame;                          // ? back to an AgentChunk
    }
  }
  throw new Error("Connection dropped before the reply finished");
}

Different transport, identical shape. The console’s agent-io.ts consumed @flue/sdk‘s SSE iterator directly; the browser can’t import that, so it re-implements the parse over fetch — but what comes out the far end is the same stream of { kind, text } chunks. The signal parameter is the one genuinely new concern: a browser tab can navigate away mid-reply, so the consumer can abort the fetch. A terminal can’t navigate away.

The fold that both UIs share

Here is where the abstraction got sharpened, and it’s the heart of this post.

A chunk stream is not a transcript. Streaming raw chunks gives you a problem: when /debug mode is on and a tool call lands in the middle of a reply, you want the transcript to read in order — the prose written so far, then the tool-call line, then the rest of the prose. And when debug mode is off, those debug chunks should vanish entirely. Folding a flat chunk stream into ordered transcript messages is real logic, with state.

In Part 1, that logic lived inside the console’s send loop. When the web UI needed the exact same behavior, the wrong move would have been to copy it. The right move was to recognize it as a pure calculation and lift it into src/shared/, where both UIs — and the test suite — import it:

// In-flight reply state, folded over the chunk stream (data).
export interface StreamState {
  text: string;       // prose accumulated since the last flush
  replied: boolean;   // has any prose been flushed yet?
}

// Fold one chunk into the reply state (calculation). A visible debug line
// flushes the buffered prose first, so the transcript stays chronological;
// hidden debug lines fold away entirely.
export function foldChunk(state: StreamState, chunk: AgentChunk, debugMode: boolean) {
  if (chunk.kind === "debug") {
    if (!debugMode) return { state, append: [] };
    const flushed = state.text === "" ? [] : [{ role: "tutor", text: state.text }];
    return {
      state: { text: "", replied: state.replied || state.text !== "" },
      append: [...flushed, { role: "debug", text: chunk.text }],
    };
  }
  return { state: { ...state, text: state.text + chunk.text }, append: [] };
}

No React. No DOM. No node: imports, no process.env. Given a state, a chunk, and a flag, it returns the next state and any transcript messages to append. That purity is what lets it bundle into the browser, the Node host, and the Flue build without modification — and what makes it trivially unit-testable, which it now is.

And now the two UIs collapse to the same five lines. The console’s hook:

let state = initialStreamState;
for await (const chunk of reply(line)) {
  const folded = foldChunk(state, chunk, debugModeRef.current);
  state = folded.state;
  append(...folded.append);          // push completed messages to the transcript
  setStreamingText(state.text);       // the in-flight line, re-rendered live
}
append(...finishStream(state, emptyReplyMessage));

The web’s hook, useAgentChat, runs the identical loop — same foldChunk, same finishStream — differing only in that reply(line) is streamReply(sessionId, …) and append/setStreamingText write to browser React state instead of Ink React state. The transcript logic is shared to the line; only the rendering target differs.

The tutor takes its own medicine
The thing this tutor teaches is the Actions–Calculations–Data split from Grokking Simplicity: pull the pure calculation out of the effectful action so you can test it and reuse it. Extracting foldChunk — a calculation — out of the console’s stateful send loop — an action — is that exact lesson, applied to the codebase that teaches it. The comments in the source even call useChatStream “the thin action shell” around the calculations. The web UI is what created the second caller that made the separation pay.

src/shared/ ended up holding everything that’s logic-without-a-UI: the chunk vocabulary and its SSE parser (chunks.ts), the fold (stream-fold.ts), lesson-filename math like “which lesson number is this, and which is newest” (lesson-names.ts), and — the next section — the catalog of agents. The rule for the directory is one sentence: if it touches node: or process.env or a rendering library, it doesn’t belong here.

One agent quietly becomes four

A web home screen wants something to show: a grid of tutors, each with a name, a blurb, a course outline. The CLI had exactly one agent, hardcoded. So before the web UI could have a meaningful landing page, the single agent had to become a small, declarative roster. PR #13 renamed src/agents/main.ts to src/agents/acd-tutor.ts and introduced a catalog.

There was also a product reason for doing this now. I didn’t want the browser to prove only that the ACD tutor could wear a nicer coat; I wanted to build a new kind of tutor and see whether the same idea held. An English tutor was the useful counterexample: argumentative essays instead of TypeScript, claims and evidence instead of actions and calculations, Markdown instead of code. If the same harness could carry that lesson plan too, the architecture was teaching-application shaped, not accidentally programming-tutor shaped.

The catalog is, again, pure Layer-0 data — the single source of truth for who the agents are, consumed identically by the console’s selection menu and the web’s home screen:

export const AGENTS = {
  "acd-tutor": {
    id: "acd-tutor",
    label: "ACD Tutor",
    description: "Learn to tell apart Actions, Calculations, and Data in real code.",
    greeting: `Hi! Welcome to ACD tutor! …`,
    actions: [CHECK_MY_WORK],
    workspace: { dirEnvVar: "ACD_TUTOR_SCRATCH_DIR", defaultDir: "/tmp/acd-tutor/scratch", editor: "code" },
    course: { title: "Actions · Calculations · Data", steps: [ /* 7 steps */ ] },
  },
  "argumentative-essay-tutor": { /* … editor: "markdown" … */ },
  "socratic-tutor": { /* … chat-only: no workspace … */ },
} satisfies Record<AgentId, AgentDefinition>;

An entry carries everything a UI needs to present a tutor without knowing anything about it: the label and blurb for a card, the greeting for the first transcript line, the action buttons (the ACD tutor’s “Check my work”), an optional course outline for a progress stepper, and an optional workspace — present only if the tutor manages lesson files. That last field is what the home screen reads to decide which screen to open: a tutor with a workspace gets the IDE layout (editor + chat), a chat-only tutor like the kids’ Socratic tutor gets a plain conversation.

The agent’s actual behavior still lives where Part 1 put it — in a skill and a Flue profile. The catalog is deliberately separate from the Flue side, because the catalog has to run in the browser and Flue profiles can’t. A small filesystem-consistency test asserts that every catalog id has a matching agent stub, profile, and skill on disk, so the two halves can’t drift. Here’s a profile, for completeness — note how little is left once the catalog holds the presentation:

export const acdProfile = defineAgentProfile({
  instructions: [
    "You are an Actions, Calculations, and Data tutor…",
    "Manage lesson files exclusively with listFiles/readFile/writeFile/openFile…",
    "On a fresh start, call listFiles first and resume where the learner left off.",
  ].join("\n"),
  skills: [acdTutor],
  tools: createLessonFileTools({ scratchDir: ACD_SCRATCH_DIR, openMode: OPEN_MODE }),
});

The dividend showed up one PR later. Adding a second real tutor — an argumentative-essay tutor that edits Markdown instead of TypeScript — was a catalog entry, a profile, and a skill. The home screen, the chat hook, the SSE plumbing, the fold: untouched. This is Part 1’s thesis — infrastructure changes rarely, knowledge changes constantly — extended from “one tutor’s lessons” to “the set of tutors itself.”

The hard part: a filesystem the browser doesn’t have

Everything so far has been pleasant. This is the part that wasn’t, and it’s the most interesting thing in the whole project.

Part 1’s best idea was that the filesystem is part of the conversation. The tutor writes lesson-1.ts into a scratch directory and opens it in your real $EDITOR; you edit the file; the tutor reads your edits back as your answer. The terminal UI got this almost for free, because a terminal lives on a machine with an editor and a filesystem.

A browser has neither. There is no $EDITOR to spawn, no path to read. So the file half of the conversation had to be rebuilt — and the design rule was to rebuild it without the agent noticing. The model still calls openFile("lesson-2.ts"). Where that lands is the only thing that changes.

It comes down to one branch in the openFile tool, switched by the TUTOR_OPEN_MODE that web/main.ts set at boot:

defineTool({
  name: "openFile",
  /* …schema: a bare filename… */
  execute: async (args) => {
    if ((await store.read(args.filename)) === null) return FILE_NOT_FOUND;
    const name = basename(args.filename);
    if (opts.openMode === "web") {
      // No host editor to spawn: drop a hidden .open-request signal that
      // the browser polls for after the reply, and switches to that tab.
      await store.requestOpen(name);
    } else {
      store.openInEditor(name, opts.editor);  // console: spawn $EDITOR
    }
    return `Opened ${name} in the learner's editor`;
  },
});

This is the prompts-vs-tools boundary from Part 1 paying rent a second time. The skill says “open the file”; the tool decides what “open” means in this environment. In web mode it writes a tiny hidden file, .open-request, into the same scratch directory. The whole contract is four lines:

// The web-mode handshake: openFile drops this; the client consumes it after
// each reply and opens that tab. Dotfiles are hidden from every lesson
// listing, so neither the model nor the student ever sees it.
export const OPEN_REQUEST_FILE = ".open-request";
export interface OpenRequest { filename: string; requestedAt: number; }

The other direction — the learner’s edits reaching the agent — is solved by making the browser editor and the agent share one workspace. The agent’s writeFile tool and the browser’s autosave (a debounced PUT /api/agents/:agent/files/:name) write to the same lesson-file store. The agent’s readFile reads what the editor saved. The store doesn’t know or care which side touched a file; it’s the workspace box at the bottom of the stack diagram, now with two writers instead of one.

The agent writes and signals; the browser autosaves and reads back. Same workspace, two writers — exactly the role the learner’s editor played in Part 1.

The piece that ties it together is what the browser does after each reply. The agent may have created a new lesson, rewritten the current one, or asked to open a specific file — and the learner may be mid-edit with unsaved work. Deciding what the editor should do is, once again, a pure calculation, kept out of the React component that runs it:

// open-request wins ? first file ? newer-and-clean ? tutor-rewrote-current
export function planSync({ files, openRequest, active, dirty }: SyncInput): SyncPlan {
  // The tutor's openFile call wins: open exactly that file.
  if (openRequest && files.includes(openRequest) && openRequest !== active)
    return { kind: "open", file: openRequest, toast: `? Beep opened ${openRequest}` };

  const newest = latestLessonFile(files);
  if (!newest) return { kind: "none" };
  if (active === null)                                  // first file appeared — open it
    return { kind: "open", file: newest, toast: `? ${newest} is ready!` };

  const newer = newest !== active &&
    (lessonNumber(newest) ?? 0) > (lessonNumber(active) ?? 0);
  if (newer)                                          // don't yank unsaved work
    return dirty ? { kind: "toast", toast: `? New lesson ready` }
                 : { kind: "open", file: newest, toast: `? ${newest} unlocked!` };

  return dirty ? { kind: "none" } : { kind: "refresh", file: active };
}

The most human line in the function is the dirty guard: if a new lesson unlocks while you’re typing, the editor doesn’t rip your file away — it shows a toast and waits. That kind of rule is exactly what you want isolated in a pure, testable function rather than tangled in effects. In the terminal version, your own editor enforced this for free; on the web, it’s a deliberate calculation. The trade-off is the honest cost of leaving the host: conveniences the OS gave you become code you own.

Conversation state, the second time

Part 1 made a point of how little state the CLI managed: the instance id is the conversation, and Flue keeps the history server-side. The web inherits that — the browser stores only a conversation id in localStorage, never the transcript — but a reload needs to repaint the conversation, and that surfaced a subtle bug worth the telling.

The server snapshots history off Flue’s agent_end event. The naive implementation stored that payload as the conversation. It was wrong: agent_end carries only the messages from that run — the latest prompt and reply — not the whole conversation. Storing it overwrote the entire transcript on every turn. The fix is to accumulate:

// Each agent_end reports only the new messages from that run, so history is
// built by accumulating turns — not overwriting. An empty turn (Flue's
// error/abort path) leaves the transcript alone, so a dropped stream never
// wipes history.
export function appendTurn(existing: TranscriptMessage[], messages: unknown[]) {
  const turn = toTranscript(messages);
  return turn.length === 0 ? existing : [...existing, ...turn];
}

It’s a small fix, but it’s the kind of bug you only meet once you have a UI that reloads — the terminal never reloads, so the terminal never hit it. New front-end, new failure modes; the seam held, but the consumer above it had its own state to get right.

Testing two UIs without a model

One last dividend of the seam. Because every UI consumes an AsyncIterable<AgentChunk>, you can test the whole stack — fold, file-sync, SSE, the lot — by feeding it canned chunks instead of a live model. PR #12 built a “faux agent”: a scripted Flue agent, isolated in its own Flue project so it never touches the production build, that replays pre-baked events (a greeting, a writeFile, an openFile) on demand.

The e2e version is built with @earendil-works/pi-ai, which matters because the test does not mock the web server or the lesson-file tools. It registers a fake provider/model pair, points a normal Flue agent at that model, and gives the agent the real writeFile/openFile tools. The scripted model then takes three assistant turns: write lesson-1.ts, request that the browser open it, and stream the final tutor message.

const faux = registerFauxProvider({
  api: "acd-faux",
  provider: "acd-faux",
  models: [{ id: "tutor" }],
});

const fauxApi = getApiProvider("acd-faux");
if (fauxApi) registerApiProvider(fauxApi);
registerProvider("acd-faux", { api: "acd-faux", baseUrl: "" });

faux.setResponses([
  fauxAssistantMessage(fauxToolCall("writeFile", { filename: "lesson-1.ts", content: "// Lesson 1..." })),
  fauxAssistantMessage(fauxToolCall("openFile", { filename: "lesson-1.ts" })),
  fauxAssistantMessage(fauxText("Created your first lesson...")),
]);

The odd-looking re-registration is the honest detail. The test bundle and @flue/runtime can resolve pi-ai as separate module instances, so registering the faux provider in the test bundle’s registry is not enough. The helper lifts the handler back out with getApiProvider, re-registers it through Flue’s registerApiProvider, and then calls registerProvider so acd-faux/tutor resolves like any other model. From the rest of the stack’s point of view, this is just an agent streaming from a provider.

At the smaller unit-test level, tiny builders keep the fixtures to just the fields the code reads:

export function textDelta(text: string): FlueEvent { return { type: "text_delta", text }; }

export function toolCall(opts): FlueEvent {
  return { type: "tool_call", toolName: opts.toolName, toolCallId: "tc_1",
           isError: opts.isError ?? false, result: opts.result, durationMs: opts.durationMs ?? 5 };
}

An end-to-end test can now drive a real agent session against the faux agent and assert that the browser renders the right transcript and opens the right tab — deterministically, offline, with no API key and no flakiness. The abstraction that let one backend feed two UIs is the same abstraction that lets a test stand in for the backend entirely. Good seams are good in more than one direction.

So — was it a weekend?

The rendering swap was. Pointing a browser at the agent took an SSE adapter on the server, an async generator on the client, and a React tree — and not one line of the agent, the runner, the tools, or the loop moved to make it happen. Part 1’s bet that “a browser frontend would reuse layers 1–3 untouched” paid off exactly as advertised.

But the more honest accounting is that the web UI was a forcing function. It’s the second caller that turns “code that happens to work” into “an abstraction.” A single consumer can hide a lot: logic tangled into the console’s send loop looked fine until a browser needed the same logic and the only clean answer was to lift foldChunk into a pure, shared, tested calculation. One hardcoded agent looked fine until a home screen needed a roster and the answer was a declarative catalog. The filesystem-as-conversation looked free until a browser had no filesystem and the answer was an explicit signal file and a sync plan.

That’s the real lesson, and it generalizes past tutors and past Flue: you don’t truly know where your seams are until something pulls on them from a new direction. The terminal proved the agent worked. The browser proved the architecture did. Build the second front-end not only because you want it, but because it’s the cheapest way to find out whether the first one was honest.

The full source — every commit referenced in this post — is at github.com/vishnugopal/acd-tutor. The web UI is the one consumer; the terminal from Part 1 is the other; the seam between them is twelve lines of type declaration. There’s room for a third — a mobile client, a voice loop, a Slack bot — and if the abstraction is as honest as it now looks, none of them should have to touch anything below send().

How I’ve been using it

I’ve since made a tutor for Chapters 8-9 of Grokking Simplicity, specifically those that deal with Stratified design, and frankly it does work. The amazing thing about this is that I can:

make the lessons using a more capable model like Opus, but:
run the tutor using a much less capable OSS model (since a lot of the material is pre-baked).

This is also part of how I’ve been thinking about education and agents in general. A subject matter expert & a sophisticated agent work together to build a guided curriculum, skill mapping, and question bank, but the actual guide can be a less capable model with strict guardrails.

The agent can describe program state with mermaid diagrams. Cool isn’t it?

The genuine bonus is that it’s helping me learn new things, and that’s part of the reason why I do side-projects like this 🙂