Vercel AI SDK v5: what moving off raw OpenAI actually bought me

I spent two hours last Thursday rewriting the chat endpoint of a side project because my own code was driving me up a wall. I’d built it six months ago with the raw OpenAI Node SDK, a custom SSE stream, a home-rolled tool-call parser, and a state machine that felt clever at the time and reads like a ransom note now.

Short version: I moved it to Vercel AI SDK v5 and about a third of the file got deleted. Not refactored. Deleted. That’s usually the sign of a good abstraction.

I’m not here to tell you the SDK is magic or that it’s the future of AI apps. I’ll tell you what the migration felt like, which parts saved me real time, which parts I still work around, and the before/after code so you can judge for yourself. If you’re on the fence about adopting it in a Next.js app, this is the honest accounting I wish I’d had two weeks ago.

What actually changed in v5

The headline for v5 isn’t a new feature. It’s a cleaner mental model. Earlier versions had you mix imperative helpers like streamText, a separate tool-invocation hook, and a useChat that quietly did three different jobs depending on how you called it. Fine in small demos. Annoying once you had tool calls, custom state, and a backend that wasn’t a Next.js route.

v5 collapses most of that into two primitives that stay out of your way. On the server, streamText returns a standard web ReadableStream and treats tool calls as first-class. On the client, useChat now treats tool invocations as first-class messages instead of a side channel.

There are other wins. Better typed messages. A cleaner generateObject with Zod schemas. Simpler MCP support if you’re into that. But the thing that actually made me delete code was the tool-call flow. I’ll get to that in a minute. First, the old code.

The raw OpenAI client way

Here’s roughly what my endpoint looked like before. I’ve stripped it to the bones, but this is the real shape, not a toy:

// /app/api/chat/route.ts (before)
import OpenAI from "openai";

const client = new OpenAI();

export async function POST(req: Request) {
  const { messages } = await req.json();

  const stream = await client.chat.completions.create({
    model: "gpt-4o-mini",
    messages,
    tools: [searchDocsTool],
    stream: true,
  });

  const encoder = new TextEncoder();
  const body = new ReadableStream({
    async start(controller) {
      let toolBuffer = "";
      for await (const chunk of stream) {
        const delta = chunk.choices[0]?.delta;
        if (delta?.content) {
          controller.enqueue(encoder.encode(delta.content));
        }
        if (delta?.tool_calls) {
          toolBuffer += delta.tool_calls[0].function?.arguments ?? "";
        }
        if (chunk.choices[0]?.finish_reason === "tool_calls") {
          const args = JSON.parse(toolBuffer);
          const result = await runSearchDocs(args);
          // ... re-prompt the model with the result
          // ... more streaming
          // ... 40 more lines i'm not going to paste
        }
      }
      controller.close();
    },
  });

  return new Response(body, { headers: { "Content-Type": "text/plain" } });
}

That “40 more lines I’m not going to paste” is where all the real pain lived. Partial JSON in the tool arguments. Re-calling the model with the tool result. Mapping OpenAI’s delta format into something the client could render. A small ceremony every time I wanted to add a new tool. It worked. I shipped it. I also swore at it about once a week.

The AI SDK way, side by side

Here’s the same endpoint with v5:

// /app/api/chat/route.ts (after)
import { streamText, tool } from "ai";
import { openai } from "@ai-sdk/openai";
import { z } from "zod";

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = streamText({
    model: openai("gpt-4o-mini"),
    messages,
    tools: {
      searchDocs: tool({
        description: "Search the internal docs",
        parameters: z.object({ query: z.string() }),
        execute: async ({ query }) => runSearchDocs({ query }),
      }),
    },
  });

  return result.toDataStreamResponse();
}

That’s not a selective paste. That’s the whole file. The tool parsing, the re-prompt loop, the streaming format, the client/server message contract. The SDK owns all of it. I describe the tool and its execute function, and the framework handles the “got tool result, now re-call the model” step I used to hand-code.

The reason this is so short isn’t that v5 added a ton of features. It picked sensible defaults. Tools are Zod-typed. Messages are typed. The response format is a known standard that plugs into useChat on the client. You can still eject if you need to: streamText returns a full StreamTextResult with textStream, toolCalls, usage, and everything else you might reach for. But you don’t have to hand-roll the streaming protocol to get started.

I wrote recently about how Next.js server actions changed where I put my API boundary, and the AI SDK fits neatly alongside that shift. The chat endpoint stays an explicit route the SDK can stream from, and everything else goes in a server action.

Tool calling stops being a parsing exercise

The part of my old code that I really wanted to delete was the tool-call parser. In the raw OpenAI stream, tool arguments arrive as a sequence of partial JSON fragments across many SSE chunks. You buffer them, wait for finish_reason === "tool_calls", JSON.parse the buffer, catch failures, re-prompt the model with the result, and then continue streaming text back to the user.

With v5 you define the tool once and you’re done:

import { z } from "zod";
import { tool } from "ai";

export const searchDocs = tool({
  description: "Search internal docs by natural-language query",
  parameters: z.object({
    query: z.string(),
    limit: z.number().int().min(1).max(20).default(5),
  }),
  execute: async ({ query, limit }) => {
    const hits = await db.search(query, { limit });
    return { hits };
  },
});

No parsing. No re-prompt plumbing. The SDK calls execute, feeds the return value back to the model, and keeps streaming. If execute throws, the error surfaces as a typed tool result instead of a cryptic 500.

One thing I wish the AI SDK docs said louder: the Zod schema you define is the contract. If you’ve been writing vague JSON schemas and hoping the model gets it right, switching to Zod forces you to think about the shape you actually want, and the model responds to tighter schemas much better than to loose ones. That alone was worth the migration.

Streaming into the UI with useChat

The client side is where v5 made my React code smaller too. Here’s the chat page, trimmed:

// /app/chat/page.tsx
"use client";
import { useChat } from "ai/react";

export default function ChatPage() {
  const { messages, input, handleInputChange, handleSubmit, status } = useChat();

  return (
    <form onSubmit={handleSubmit}>
      {messages.map((m) => (
        <div key={m.id} className={m.role}>
          {m.parts.map((p, i) =>
            p.type === "text" ? <span key={i}>{p.text}</span>
            : p.type === "tool-invocation" ? <ToolCard key={i} part={p} />
            : null
          )}
        </div>
      ))}
      <input value={input} onChange={handleInputChange} disabled={status === "streaming"} />
    </form>
  );
}

The m.parts array is the big shift. A message is now a list of parts, not a single string. Text parts. Tool-invocation parts. Reasoning parts for models that emit them. Step-start and step-end markers. You render what you want and ignore what you don’t.

That sounds fussy until you use it. It’s what lets you show a “Looking up docs…” card while a tool is running, swap it for the result when it returns, and continue streaming the model’s reply below it. No coordination code on my end. I’d been faking that with custom SSE events and ref-based state, and all of that got deleted.

Where I still reach for the raw SDK

I don’t use the AI SDK for everything. Two cases where I still open openai directly:

Batch jobs that don’t need streaming. If I’m generating 5,000 product descriptions overnight, the raw SDK with a simple concurrency wrapper is cheaper on my attention. No streaming, no tool calls, no UI. The abstractions don’t earn their keep.

Tight token accounting. The SDK gives you usage at the end of a stream. If I need per-message billing accounting and I’m doing odd prompt-caching tricks on the vendor side, reading raw completion chunks is easier than fighting the wrapper.

The other thing worth knowing: the SDK is provider-agnostic in name but OpenAI-shaped in practice. It works fine with Anthropic, Google, and others via their adapters, but a tool-heavy app assumes the provider supports OpenAI-style function calling. If you’re targeting a model that doesn’t, read the adapter docs carefully before you bet your roadmap on it.

I build a lot of small internal tools around this stack. You can see a few of them in my work. For anything chat-shaped I default to v5 now. For everything else, I still pick the simplest thing that works.

What I’d do this weekend if I were you

If you already have a Next.js chat app on the raw OpenAI SDK, spend Saturday doing this migration on a branch. You don’t have to commit to anything. Install ai and @ai-sdk/openai, rewrite your route with streamText, move your tools into tool({ parameters: z.object(...) }), and swap your client for useChat. Run it against your existing tests.

If the rewrite feels like it’s saving you code, merge it. If it’s not, you’ve learned something real about where the abstraction earns its keep for your app. I wasn’t sure either until I saw my own diff.

One last thing: don’t port your prompts verbatim. The SDK sends a slightly different system-message shape, and prompts that leaned on quirks in the raw API often need a light rewrite. Budget an hour for that and you’ll save yourself a confused afternoon.