Build an AI Chatbot
A streaming chatbot powered by Claude. Answers appear token-by-token, and your API key never touches the browser.
A chat box, backed by Claude, that streams its answer as it thinks, the way every good AI product feels. The trick that makes it real (and safe): the model call happens on your server, so your API key stays secret, and the server streams the tokens straight to the browser.
We use Astro for the app and the Claude API (Anthropic) for the brain. By the end you have a working chat you can drop into any product.
What you’ll have at the end
- A chat UI: type a message, watch the reply stream in word by word.
- A server endpoint that calls Claude and pipes the stream to the client.
- Conversation memory (the bot remembers the thread).
- An API key that never leaves the server.
Before you start
- Node 18+, a terminal.
- An Anthropic API key → https://console.anthropic.com (Settings → API Keys). Add a little credit; a chat session costs fractions of a cent.
- ~1–2 hours.
Step 1 — Create the Astro app
npm create astro@latest chatbot
cd chatbot
Pick Empty, TypeScript: Strict, install deps: yes.
The page is static, but the endpoint that talks to Claude must run on the server. Add an adapter (so the build can serve an on-demand route) and the Anthropic SDK:
npx astro add node
npm install @anthropic-ai/sdk
Step 2 — Add your API key
Create .env:
ANTHROPIC_API_KEY=sk-ant-your-key
This is a real secret. Unlike the public Supabase anon key from the auth guide, an Anthropic key grants spending on your account. Keep it server-only: no
PUBLIC_prefix, never imported into client code, never logged. Confirm.envis gitignored.
Step 3 — The streaming endpoint
This receives the conversation, asks Claude, and streams the answer back as plain text
chunks. Create src/pages/api/chat.ts:
import type { APIRoute } from "astro";
import Anthropic from "@anthropic-ai/sdk";
export const prerender = false; // must run on the server
const client = new Anthropic({ apiKey: import.meta.env.ANTHROPIC_API_KEY });
export const POST: APIRoute = async ({ request }) => {
// Expect { messages: [{ role: "user" | "assistant", content: string }, ...] }
let messages: { role: "user" | "assistant"; content: string }[];
try {
const body = await request.json();
messages = body.messages;
if (!Array.isArray(messages) || messages.length === 0) throw new Error("empty");
} catch {
return new Response("Bad request", { status: 400 });
}
const encoder = new TextEncoder();
const stream = new ReadableStream({
async start(controller) {
try {
const ai = client.messages.stream({
model: "claude-sonnet-4-6", // workhorse default; see note below
max_tokens: 1024, // caps reply length AND cost
system: "You are a concise, friendly assistant. Keep answers tight.",
messages,
});
for await (const event of ai) {
if (
event.type === "content_block_delta" &&
event.delta.type === "text_delta"
) {
controller.enqueue(encoder.encode(event.delta.text));
}
}
} catch (e) {
console.error("claude stream failed:", e);
controller.enqueue(encoder.encode("\n\n[the assistant is unavailable right now]"));
} finally {
controller.close();
}
},
});
return new Response(stream, {
headers: { "Content-Type": "text/plain; charset=utf-8", "Cache-Control": "no-store" },
});
};
Which model?
claude-sonnet-4-6is the balanced default, smart, fast enough, cheap enough for chat. Swap the one string to tune:
claude-haiku-4-5-20251001— fastest and cheapest, great for high-volume or simple chat.claude-opus-4-8— most capable, for hard reasoning. Pricier; reach for it when Sonnet struggles.
Step 4 — The chat UI
Replace src/pages/index.astro:
---
// static page; all the chat logic is the client script below
---
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>Chat</title>
<style>
body { font-family: system-ui, sans-serif; max-width: 40rem; margin: 2rem auto; padding: 0 1rem; }
#log { display: flex; flex-direction: column; gap: .75rem; margin-bottom: 1rem; }
.msg { padding: .6rem .8rem; border-radius: 10px; white-space: pre-wrap; line-height: 1.5; }
.user { background: #18181b; color: #fff; align-self: flex-end; max-width: 80%; }
.assistant { background: #f6f7f9; color: #18181b; align-self: flex-start; max-width: 80%; }
form { display: flex; gap: .5rem; position: sticky; bottom: 1rem; background: #fff; }
input { flex: 1; padding: .6rem .75rem; font-size: 1rem; border: 1px solid #e5e7eb; border-radius: 8px; }
button { padding: .6rem 1rem; border: 0; border-radius: 8px; background: #18181b; color: #fff; }
button:disabled { opacity: .5; }
</style>
</head>
<body>
<h1>Ask Claude</h1>
<div id="log"></div>
<form id="chat">
<input id="msg" autocomplete="off" placeholder="Type a message…" required />
<button type="submit">Send</button>
</form>
<script>
const log = document.getElementById("log") as HTMLDivElement;
const form = document.getElementById("chat") as HTMLFormElement;
const input = document.getElementById("msg") as HTMLInputElement;
const btn = form.querySelector("button") as HTMLButtonElement;
// The full conversation we send to the server each turn = the bot's "memory".
const messages: { role: "user" | "assistant"; content: string }[] = [];
function bubble(role: "user" | "assistant", text: string) {
const el = document.createElement("div");
el.className = `msg ${role}`;
el.textContent = text;
log.appendChild(el);
log.scrollTop = log.scrollHeight;
return el;
}
form.addEventListener("submit", async (e) => {
e.preventDefault();
const text = input.value.trim();
if (!text) return;
input.value = "";
messages.push({ role: "user", content: text });
bubble("user", text);
const out = bubble("assistant", "…");
input.disabled = true;
btn.disabled = true;
try {
const res = await fetch("/api/chat", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ messages }),
});
if (!res.ok || !res.body) throw new Error("bad response");
const reader = res.body.getReader();
const decoder = new TextDecoder();
let acc = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
acc += decoder.decode(value, { stream: true });
out.textContent = acc; // repaint as tokens arrive
log.scrollTop = log.scrollHeight;
}
messages.push({ role: "assistant", content: acc });
} catch {
out.textContent = "Network error. Try again.";
} finally {
input.disabled = false;
btn.disabled = false;
input.focus();
}
});
</script>
</body>
</html>
Test it
npm run dev
Open http://localhost:4321, ask something. The reply should stream in token by token.
Ask a follow-up (“explain that simpler”), it remembers the thread, because we resend the
whole messages array each turn.
Step 5 — Deploy
Swap the local Node adapter for your host’s (e.g. Cloudflare):
npx astro add cloudflare
Then in your host’s dashboard, add the environment variable ANTHROPIC_API_KEY.
That’s the whole deploy, the static page ships to the edge and /api/chat runs as an
on-demand function.
Gotcha: confirm your host supports streaming responses (Cloudflare, Vercel, and Node all do). If replies arrive all-at-once instead of streaming, the platform is buffering, check that you’re not behind a proxy that disables chunked responses.
You now have
A streaming AI chat with conversation memory and a server-protected key. Drop the
/api/chat endpoint and the client loop into any product and you’ve got AI built in.
Make it yours (next ideas)
- Persona: edit the
systemprompt to set the bot’s voice and rules. - Gate it: put the chat behind login (see Add Authentication) so usage ties to a user.
- Trim cost: long threads resend a lot of tokens. Cap history length, or summarize old turns, to keep each call cheap.
Troubleshooting
- 401 / authentication error → bad or missing
ANTHROPIC_API_KEY. Check.envlocally and the env var in your host dashboard for production. - Reply appears all at once, not streaming → host or proxy is buffering the response.
Verify the endpoint returns the
ReadableStream(not awaited to a string) and the host supports streaming. model not found→ the model id changed; use one of the current ids in Step 3.- Costs creeping up → lower
max_tokens, switch toclaude-haiku-4-5-20251001, or trim the conversation history you resend each turn.