Build an AI Chatbot

A streaming chatbot powered by Claude. Answers appear token-by-token, and your API key never touches the browser.

A chat box, backed by Claude, that streams its answer as it thinks, the way every good AI product feels. The trick that makes it real (and safe): the model call happens on your server, so your API key stays secret, and the server streams the tokens straight to the browser.

We use Astro for the app and the Claude API (Anthropic) for the brain. By the end you have a working chat you can drop into any product.

What you’ll have at the end

A chat UI: type a message, watch the reply stream in word by word.
A server endpoint that calls Claude and pipes the stream to the client.
Conversation memory (the bot remembers the thread).
An API key that never leaves the server.

Before you start

Node 18+, a terminal.
An Anthropic API key → https://console.anthropic.com (Settings → API Keys). Add a little credit; a chat session costs fractions of a cent.
~1–2 hours.

Step 1 — Create the Astro app

npm create astro@latest chatbot
cd chatbot

Pick Empty, TypeScript: Strict, install deps: yes.

The page is static, but the endpoint that talks to Claude must run on the server. Add an adapter (so the build can serve an on-demand route) and the Anthropic SDK:

npx astro add node
npm install @anthropic-ai/sdk

Step 2 — Add your API key

Create .env:

ANTHROPIC_API_KEY=sk-ant-your-key

This is a real secret. Unlike the public Supabase anon key from the auth guide, an Anthropic key grants spending on your account. Keep it server-only: no PUBLIC_ prefix, never imported into client code, never logged. Confirm .env is gitignored.

Step 3 — The streaming endpoint

This receives the conversation, asks Claude, and streams the answer back as plain text chunks. Create src/pages/api/chat.ts:

import type { APIRoute } from "astro";
import Anthropic from "@anthropic-ai/sdk";

export const prerender = false; // must run on the server

const client = new Anthropic({ apiKey: import.meta.env.ANTHROPIC_API_KEY });

export const POST: APIRoute = async ({ request }) => {
  // Expect { messages: [{ role: "user" | "assistant", content: string }, ...] }
  let messages: { role: "user" | "assistant"; content: string }[];
  try {
    const body = await request.json();
    messages = body.messages;
    if (!Array.isArray(messages) || messages.length === 0) throw new Error("empty");
  } catch {
    return new Response("Bad request", { status: 400 });
  }

  const encoder = new TextEncoder();
  const stream = new ReadableStream({
    async start(controller) {
      try {
        const ai = client.messages.stream({
          model: "claude-sonnet-4-6", // workhorse default; see note below
          max_tokens: 1024, // caps reply length AND cost
          system: "You are a concise, friendly assistant. Keep answers tight.",
          messages,
        });

        for await (const event of ai) {
          if (
            event.type === "content_block_delta" &&
            event.delta.type === "text_delta"
          ) {
            controller.enqueue(encoder.encode(event.delta.text));
          }
        }
      } catch (e) {
        console.error("claude stream failed:", e);
        controller.enqueue(encoder.encode("\n\n[the assistant is unavailable right now]"));
      } finally {
        controller.close();
      }
    },
  });

  return new Response(stream, {
    headers: { "Content-Type": "text/plain; charset=utf-8", "Cache-Control": "no-store" },
  });
};

Which model? claude-sonnet-4-6 is the balanced default, smart, fast enough, cheap enough for chat. Swap the one string to tune:

claude-haiku-4-5-20251001 — fastest and cheapest, great for high-volume or simple chat.

claude-opus-4-8 — most capable, for hard reasoning. Pricier; reach for it when Sonnet struggles.

Step 4 — The chat UI

Replace src/pages/index.astro:

---
// static page; all the chat logic is the client script below
---

<html lang="en">
  <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <title>Chat</title>
    <style>
      body { font-family: system-ui, sans-serif; max-width: 40rem; margin: 2rem auto; padding: 0 1rem; }
      #log { display: flex; flex-direction: column; gap: .75rem; margin-bottom: 1rem; }
      .msg { padding: .6rem .8rem; border-radius: 10px; white-space: pre-wrap; line-height: 1.5; }
      .user { background: #18181b; color: #fff; align-self: flex-end; max-width: 80%; }
      .assistant { background: #f6f7f9; color: #18181b; align-self: flex-start; max-width: 80%; }
      form { display: flex; gap: .5rem; position: sticky; bottom: 1rem; background: #fff; }
      input { flex: 1; padding: .6rem .75rem; font-size: 1rem; border: 1px solid #e5e7eb; border-radius: 8px; }
      button { padding: .6rem 1rem; border: 0; border-radius: 8px; background: #18181b; color: #fff; }
      button:disabled { opacity: .5; }
    </style>
  </head>
  <body>
    <h1>Ask Claude</h1>
    <div id="log"></div>

    <form id="chat">
      <input id="msg" autocomplete="off" placeholder="Type a message…" required />
      <button type="submit">Send</button>
    </form>

    <script>
      const log = document.getElementById("log") as HTMLDivElement;
      const form = document.getElementById("chat") as HTMLFormElement;
      const input = document.getElementById("msg") as HTMLInputElement;
      const btn = form.querySelector("button") as HTMLButtonElement;

      // The full conversation we send to the server each turn = the bot's "memory".
      const messages: { role: "user" | "assistant"; content: string }[] = [];

      function bubble(role: "user" | "assistant", text: string) {
        const el = document.createElement("div");
        el.className = `msg ${role}`;
        el.textContent = text;
        log.appendChild(el);
        log.scrollTop = log.scrollHeight;
        return el;
      }

      form.addEventListener("submit", async (e) => {
        e.preventDefault();
        const text = input.value.trim();
        if (!text) return;
        input.value = "";

        messages.push({ role: "user", content: text });
        bubble("user", text);
        const out = bubble("assistant", "…");
        input.disabled = true;
        btn.disabled = true;

        try {
          const res = await fetch("/api/chat", {
            method: "POST",
            headers: { "Content-Type": "application/json" },
            body: JSON.stringify({ messages }),
          });
          if (!res.ok || !res.body) throw new Error("bad response");

          const reader = res.body.getReader();
          const decoder = new TextDecoder();
          let acc = "";
          while (true) {
            const { done, value } = await reader.read();
            if (done) break;
            acc += decoder.decode(value, { stream: true });
            out.textContent = acc; // repaint as tokens arrive
            log.scrollTop = log.scrollHeight;
          }
          messages.push({ role: "assistant", content: acc });
        } catch {
          out.textContent = "Network error. Try again.";
        } finally {
          input.disabled = false;
          btn.disabled = false;
          input.focus();
        }
      });
    </script>
  </body>
</html>

Test it

npm run dev

Open http://localhost:4321, ask something. The reply should stream in token by token. Ask a follow-up (“explain that simpler”), it remembers the thread, because we resend the whole messages array each turn.

Step 5 — Deploy

Swap the local Node adapter for your host’s (e.g. Cloudflare):

npx astro add cloudflare

Then in your host’s dashboard, add the environment variable ANTHROPIC_API_KEY. That’s the whole deploy, the static page ships to the edge and /api/chat runs as an on-demand function.

Gotcha: confirm your host supports streaming responses (Cloudflare, Vercel, and Node all do). If replies arrive all-at-once instead of streaming, the platform is buffering, check that you’re not behind a proxy that disables chunked responses.

You now have

A streaming AI chat with conversation memory and a server-protected key. Drop the /api/chat endpoint and the client loop into any product and you’ve got AI built in.

Make it yours (next ideas)

Persona: edit the system prompt to set the bot’s voice and rules.
Gate it: put the chat behind login (see Add Authentication) so usage ties to a user.
Trim cost: long threads resend a lot of tokens. Cap history length, or summarize old turns, to keep each call cheap.

Troubleshooting

401 / authentication error → bad or missing ANTHROPIC_API_KEY. Check .env locally and the env var in your host dashboard for production.
Reply appears all at once, not streaming → host or proxy is buffering the response. Verify the endpoint returns the ReadableStream (not awaited to a string) and the host supports streaming.
model not found → the model id changed; use one of the current ids in Step 3.
Costs creeping up → lower max_tokens, switch to claude-haiku-4-5-20251001, or trim the conversation history you resend each turn.

Where next

Add Authentication →