AI in a React App: Past the Demo, Into the Product

The first AI demo you build feels like cheating. A text input, a fetch to a model, a response on screen — thirty lines, maybe less, and suddenly your app talks. It's genuinely exciting, and that excitement is exactly the trap. Because the demo is the easy 20%. The moment you try to put that chat box in front of real users, the other 80% arrives all at once, and none of it is about the model.

The response doesn't appear all at once — it streams, token by token, and your UI has to handle that gracefully. You don't just want text back, you want structured data you can render into a component, and the model doesn't always cooperate. The request takes seconds, sometimes fails halfway, and your app has to stay honest about loading and error states the whole time. This is the part the demos skip, and it's the entire job.

So this article isn't "how to call an LLM." Calling the model is trivial. This is about the React-side reality of turning that call into something you'd actually ship — the streaming, the structure, the state — and where the current tools genuinely help versus where you're still on your own.

Why AI Broke My Normal Data Layer

Up to this point in the series, every data problem had the same shape. You fire a request, you wait, you get a response, TanStack Query caches it and hands it to your component. Request, response, done. It's a clean model and it covers almost everything.

AI breaks it in one specific way: the response isn't a single event, it's a stream. The model produces its answer incrementally, and the whole point — the thing that makes it feel alive — is showing those tokens as they arrive. A spinner that sits for eight seconds and then dumps a wall of text feels broken, even when it's technically correct. The perceived quality of an AI feature is almost entirely about the streaming.

That's a different primitive than request/response. You're not awaiting a value; you're consuming a sequence over time and rendering each chunk. Trying to force that into a normal useQuery fights the grain of the tool. This is why AI-specific React libraries exist — not because calling a model is hard, but because streaming a model's output into a UI is a genuinely different problem, and doing it by hand means manually managing readers, buffers, partial state, and cancellation.

The Streaming Chat, Handled For You

The good news is that this exact problem is now well-solved. The Vercel AI SDK (and the TanStack-flavored patterns around it) exists precisely to make streaming chat a solved problem on the React side. Its useChat hook is the useQuery of this world — it owns the messy part so you don't.

import { useChat } from 'ai/react';

export function SupportChat() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } =
    useChat({ api: '/api/chat' });

  return (
    <div className="support-chat">
      {messages.map((m) => (
        <div key={m.id} data-role={m.role}>
          {m.content}
        </div>
      ))}

      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} disabled={isLoading} />
        <button type="submit" disabled={isLoading}>Send</button>
      </form>
    </div>
  );
}

Look at what you didn't write. No stream reader. No manual buffering of partial tokens. No hand-rolled "append this chunk to the last message" logic. messages updates as tokens arrive, so the response types itself onto the screen. isLoading gives you the state to disable the input and show activity. The entire streaming problem — the thing that would take a day to get right by hand and stay subtly buggy — is inside the hook.

This is the same lesson as every other article in this series: the value isn't the raw capability, it's a library that takes the awkward part and gives you a clean React interface for it. useChat is to streaming what TanStack Query is to caching.

The Part That Actually Trips People Up: Structured Output

Streaming text is solved. Here's where it gets real: most of the time you don't want a paragraph of prose, you want data. You want the model to extract three fields from an invoice, or classify a support ticket, or return a list of suggested tags — and then you want to render that into an actual component, not a string.

The naive approach is to ask the model for JSON in the prompt and JSON.parse the result. Do this in a demo and it works. Do it in production and it breaks, because the model is a text generator, not an API. Sometimes it wraps the JSON in ````json fences. Sometimes it adds a friendly "Sure, here's the data:" preamble. Sometimes it hallucinates a field. JSON.parse throws, and now your feature is down because the model was chatty.

The current answer is to stop hoping and start enforcing. You define a Zod schema for the shape you need and let the SDK constrain the model to it:

import { generateObject } from 'ai';
import { z } from 'zod';

const ticketSchema = z.object({
  category: z.enum(['billing', 'technical', 'account', 'other']),
  priority: z.enum(['low', 'medium', 'high']),
  summary: z.string(),
});

const { object } = await generateObject({
  model,
  schema: ticketSchema,
  prompt: `Classify this support ticket: ${ticketText}`,
});

// object is typed as { category: ..., priority: ..., summary: string }
// and validated — not a hopeful JSON.parse.

The difference is the difference between a demo and a product. object isn't a string you're praying parses — it's a validated, typed value the SDK guarantees matches your schema, or the call fails loudly instead of feeding garbage into your UI. Notice this is the exact same Zod schema pattern from the forms article, pointed at a new problem. You already know this tool. The model becomes just another untrusted input source you validate at the boundary — which, if you've read the authentication piece, is exactly how you should treat everything crossing into your app.

Treat model output like user input, not like an API response. It is text that happens to look structured. Validate it, or it will embarrass you in production.

Where the Tools Stop and You Start

I want to be honest about the edges, because the ecosystem is young and the demos oversell.

The streaming and structured-output tools are solid. What they don't solve is everything around the call. Cost — every token is money, and a chat feature with no limits is a bill waiting to happen; you need throttling and usage caps, and those are your problem, not the SDK's. Latency — responses take seconds, and no amount of UI polish makes a slow model fast; you design around the wait, you don't remove it. Failure — models time out, hit rate limits, and return nonsense, so every AI feature needs the same error handling discipline as any other flaky network dependency, plus a graceful answer for "the model gave me something useless."

And the honest constraint underneath all of it: the model is non-deterministic. The same input can produce different output. That breaks a core assumption every other part of your app relies on. You can't snapshot-test it the normal way, you can't guarantee a given response, and you have to design UI that stays usable when the answer is wrong. That's not a library gap that'll be patched next quarter. That's the nature of the thing, and building AI features well means designing for it rather than pretending it away.

How I'd Actually Approach It

Strip it to decisions:

Streaming is the product, not a detail. The feel of an AI feature is the streaming. Don't fake it with a spinner.
Don't hand-roll the stream. Use useChat / the AI SDK. It's the useQuery of streaming — it owns the messy part.
Never trust free-form JSON. Use generateObject with a Zod schema. Treat model output as untrusted input, validated at the boundary.
Budget for cost, latency, and failure from day one. They're your responsibility, not the SDK's, and they don't go away.
Design for non-determinism. The answer can be wrong. The UI has to survive that.

The demos make AI look like a feature you drop in. It isn't. It's a new kind of data source — streaming, untrusted, non-deterministic, and metered — and the work is treating it with the same engineering seriousness you'd give any other unreliable dependency. The tools have gotten genuinely good at the streaming and the structure. The judgment — where to use it, how to fail, what to guarantee — is still entirely yours, and that's the part worth getting right.

The next article stays in real-time territory but drops the AI: WebSockets and real-time in React — how to keep a UI live when the server has something to say, and how it fits with the TanStack Query cache you already have.

If you're building an AI feature and it works beautifully in the demo but falls apart with real traffic, that gap is almost always one of these — streaming, structure, or the metered/non-deterministic reality underneath. Tell me where it breaks and I'll tell you which one it is.