Claude Code with OpenAI SDK: A Practical Setup Guide
Why the OpenAI SDK without CLAUDE.md misuses the API
The OpenAI SDK is the most-used AI client library in 2026. The TypeScript SDK is on v4.x, the Python SDK is on v1.x, both have well-typed surfaces, both expose chat completions, responses, tools, structured outputs, batch processing, file uploads, and the Assistants API. The surface is large. Claude Code without explicit constraints picks defaults from the deprecated v3.x docs that still rank high on search results, mixes patterns from older tutorials, and produces code that works in development but fails in subtle ways at scale.
Common failure modes: streaming responses that buffer the full response before returning (defeating the point of streaming), tool calls that ignore the tool_calls array and only check content, retry loops that retry on non-retryable errors (400 bad request, 401 unauthorised), and structured output handlers that use JSON.parse on responses that may legitimately contain markdown fences or partial JSON. None of these surface as TypeScript errors because the SDK types accept all of these patterns.
This guide covers the CLAUDE.md template that locks Claude Code into the OpenAI SDK's correct usage: the client singleton, the streaming async iterator pattern, the tool calling loop, the structured output via response_format with zod schemas, and the retry semantics that distinguish retryable from non-retryable errors. For the broader AI orchestration context, Claude Code with the Vercel AI SDK covers the abstraction layer that sits above provider SDKs, and Claude Code with LangChain covers the alternative pattern for complex multi-step flows.
The OpenAI SDK CLAUDE.md template
The CLAUDE.md at your project root needs to declare: the SDK version, the API key environment variable, the default model, the client singleton pattern, the streaming convention, the tool calling loop, the structured output approach, and the hard rules that block the mistakes Claude makes most often.
# OpenAI SDK rules
## Stack
- openai ^4.x (TypeScript) or openai ^1.x (Python)
- TypeScript 5.x strict
- Node.js 20.x (or Next.js 14.x route handlers)
- OPENAI_API_KEY in .env.local (never hardcode)
- Default model: gpt-4o (override per call site if needed)
## Project structure
- src/lib/openai.ts , OpenAI client singleton
- src/lib/models.ts , model ID constants
- src/lib/tools.ts , tool definitions and handlers
- src/lib/schemas.ts , zod schemas for structured outputs
- src/app/api/chat/route.ts , streaming chat endpoint
- src/app/api/completion/* , one-shot completion endpoints
## Client singleton (ENFORCE)
The only OpenAI client in the project lives at src/lib/openai.ts:
import OpenAI from 'openai';
if (!process.env.OPENAI_API_KEY) throw new Error('OPENAI_API_KEY missing');
export const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
maxRetries: 2,
timeout: 30_000,
});
Every file that calls the API imports from this singleton.
NEVER instantiate new OpenAI() inline.
## Streaming pattern (MANDATORY)
For chat responses sent to the client, use the streaming pattern, NOT the await-the-full-response pattern.
const stream = await openai.chat.completions.create({
model, messages, stream: true,
});
for await (const chunk of stream) {
const token = chunk.choices[0]?.delta?.content ?? '';
// write to response stream
}
## Tool calling loop (MANDATORY)
After every chat.completions.create call that includes tools:
1. Check choices[0].finish_reason
2. If 'tool_calls', execute each tool and append the results
3. Re-invoke the model with the updated messages
4. Continue until finish_reason is 'stop' or the loop limit is reached
NEVER assume a single round-trip is enough when tools are present.
## Hard rules
- NEVER hardcode OPENAI_API_KEY in source files
- NEVER use the deprecated openai.createCompletion() or openai.Completion
- NEVER use JSON.parse on a model response without a try/catch
- NEVER retry on 400 (bad request) or 401 (unauthorised), only on 429 / 5xx
- NEVER call client.chat.completions.create without a per-call timeout
- NEVER log the full message history at info level (PII risk)
- ALWAYS use response_format with zod for structured outputs
- ALWAYS check choices[0].finish_reason before reading content
The client singleton rule and the streaming rule prevent the largest classes of bug. Without the singleton, every file that calls the API instantiates its own client, which means retry and timeout defaults vary across the codebase. Without the streaming pattern, the user waits for the full response before seeing any output, which makes the application feel slow even when the underlying generation is fast.
Install and client setup
For TypeScript:
npm i openai
For Python:
pip install openai
Add the API key to your environment file:
# .env.local
OPENAI_API_KEY=sk-proj-your-key-here
Create the singleton client:
// src/lib/openai.ts
import OpenAI from 'openai';
if (!process.env.OPENAI_API_KEY) {
throw new Error('OPENAI_API_KEY is not defined');
}
export const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
maxRetries: 2,
timeout: 30_000,
});
The constructor options matter and Claude defaults to skipping them. maxRetries: 2 means the SDK retries up to twice on retryable errors (429 rate limit, 5xx server errors) with exponential backoff. The SDK default is 2 in current versions but explicitly setting it locks the value against future SDK changes. timeout: 30_000 is the per-request timeout in milliseconds. Without it, the SDK uses no timeout, which means a stalled OpenAI server can hang your request indefinitely.
The chat completions pattern (non-streaming)
For one-shot completions where you need the full response before processing (function calling, structured outputs, content moderation):
// src/app/api/summarise/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { openai } from '@/lib/openai';
import { MODELS } from '@/lib/models';
export async function POST(req: NextRequest) {
const { text } = await req.json();
if (typeof text !== 'string' || text.length === 0) {
return NextResponse.json({ error: 'Invalid input' }, { status: 400 });
}
try {
const completion = await openai.chat.completions.create({
model: MODELS.fast,
messages: [
{ role: 'system', content: 'Summarise the user input in 2 sentences.' },
{ role: 'user', content: text },
],
temperature: 0.3,
max_tokens: 200,
});
const choice = completion.choices[0];
if (!choice) {
return NextResponse.json({ error: 'No completion choice' }, { status: 502 });
}
if (choice.finish_reason === 'length') {
console.warn('[OpenAI] Truncated at max_tokens');
}
return NextResponse.json({
summary: choice.message.content,
model: completion.model,
usage: completion.usage,
});
} catch (e) {
if (e instanceof OpenAI.APIError) {
console.error('[OpenAI] API error:', e.status, e.message);
return NextResponse.json(
{ error: 'OpenAI request failed', status: e.status },
{ status: e.status ?? 500 },
);
}
throw e;
}
}
Three details Claude misses without CLAUDE.md instruction.
The finish_reason check matters because OpenAI signals truncation, content filter blocks, and tool calls through this field, not through the content. A response that hit max_tokens will have finish_reason: 'length' and content that is mid-sentence. The user-facing application should either retry with a higher token budget or surface the truncation clearly.
The OpenAI.APIError instanceof check is the SDK-provided way to distinguish API errors (with an HTTP status) from generic errors (network issues, JSON parse failures). The SDK throws APIError subclasses with the status code preserved, which lets you map upstream errors to your application's error response cleanly.
The usage field on the completion (prompt_tokens, completion_tokens, total_tokens) is what you use to track cost. Logging this on every completion lets you correlate API spend with feature usage. Claude does not include the usage in responses by default. Add it to the CLAUDE.md response pattern.
The streaming pattern
For chat-style interfaces where the user benefits from seeing tokens as they generate:
// src/app/api/chat/route.ts
import { NextRequest } from 'next/server';
import { openai } from '@/lib/openai';
import { MODELS } from '@/lib/models';
export async function POST(req: NextRequest) {
const { messages } = await req.json();
const stream = await openai.chat.completions.create({
model: MODELS.fast,
messages,
stream: true,
stream_options: { include_usage: true },
});
const encoder = new TextEncoder();
const readable = new ReadableStream({
async start(controller) {
try {
for await (const chunk of stream) {
const token = chunk.choices[0]?.delta?.content ?? '';
if (token) {
controller.enqueue(encoder.encode(token));
}
if (chunk.usage) {
// Final usage chunk arrives after the last content chunk
console.log('[OpenAI] Usage:', chunk.usage);
}
}
} catch (e) {
console.error('[OpenAI stream] error:', e);
controller.error(e);
} finally {
controller.close();
}
},
});
return new Response(readable, {
headers: {
'Content-Type': 'text/plain; charset=utf-8',
'Cache-Control': 'no-cache',
'X-Content-Type-Options': 'nosniff',
},
});
}
The stream_options: { include_usage: true } flag is the detail that prevents lost cost tracking. Without it, the streaming response does not include the final usage tallies and you have no way to know the total tokens consumed unless you count them locally. With it set, the SDK emits a final chunk with the usage field populated, which you can log or forward to an analytics pipeline.
The 'Cache-Control': 'no-cache' header prevents intermediate proxies and CDNs from buffering the response. Without it, some edge networks buffer the full stream before forwarding to the client, which defeats the streaming. The header is harmless if the request is not actually streamed, so set it on every streaming endpoint by default.
The controller.error(e) in the catch block propagates the stream error to the client. Without it, an error mid-stream silently closes the stream as if the response had completed normally. The client then has no way to distinguish a clean end from an error.
Tool calling
The OpenAI tool calling API lets the model emit structured calls to functions you provide. The pattern is a loop: call the model, check if it returned tool calls, execute the tools, append the results, call the model again, repeat until the model returns content (not tool calls).
// src/lib/tools.ts
import type { ChatCompletionTool } from 'openai/resources/chat/completions';
export const tools: ChatCompletionTool[] = [
{
type: 'function',
function: {
name: 'get_weather',
description: 'Get the current weather for a location',
parameters: {
type: 'object',
properties: {
location: { type: 'string', description: 'City name' },
units: { type: 'string', enum: ['celsius', 'fahrenheit'] },
},
required: ['location'],
},
},
},
];
export async function executeTool(name: string, args: Record<string, unknown>): Promise<string> {
switch (name) {
case 'get_weather':
return JSON.stringify(await fetchWeather(args.location as string));
default:
return JSON.stringify({ error: `Unknown tool: ${name}` });
}
}
async function fetchWeather(location: string) {
return { location, temperature: 18, conditions: 'cloudy' };
}
The conversation loop with tool calling:
// src/lib/chat-with-tools.ts
import { openai } from '@/lib/openai';
import { MODELS } from '@/lib/models';
import { tools, executeTool } from '@/lib/tools';
import type { ChatCompletionMessageParam } from 'openai/resources/chat/completions';
const MAX_TOOL_ITERATIONS = 5;
export async function chatWithTools(initialMessages: ChatCompletionMessageParam[]) {
const messages = [...initialMessages];
for (let iteration = 0; iteration < MAX_TOOL_ITERATIONS; iteration++) {
const completion = await openai.chat.completions.create({
model: MODELS.fast,
messages,
tools,
tool_choice: 'auto',
});
const choice = completion.choices[0];
if (!choice) throw new Error('No completion choice');
const toolCalls = choice.message.tool_calls;
messages.push(choice.message);
if (!toolCalls || toolCalls.length === 0) {
// Model returned content, not a tool call, exit the loop
return choice.message.content;
}
// Execute each tool call and append the result
for (const call of toolCalls) {
const args = JSON.parse(call.function.arguments);
const result = await executeTool(call.function.name, args);
messages.push({
role: 'tool',
tool_call_id: call.id,
content: result,
});
}
}
throw new Error('Exceeded max tool iterations');
}
Four details matter and Claude consistently misses them.
The MAX_TOOL_ITERATIONS bound prevents an infinite loop. The model can theoretically keep emitting tool calls. Without an iteration limit, a buggy tool that returns errors can cause the model to keep retrying. Five iterations is a reasonable default for most workflows.
The messages.push(choice.message) line is the one Claude most often skips. The assistant's tool-call message must be in the message history before you push the tool-result messages. Without it, the next API call returns a 400 with "messages with role 'tool' must be a response to a preceding message with 'tool_calls'".
The tool_call_id: call.id on the tool result is mandatory. OpenAI's API matches tool results to tool calls via this ID. Without it, the API returns a 400. Claude sometimes generates code that uses the function name or a generated UUID instead, which always fails.
The JSON.parse(call.function.arguments) inside a try/catch matters in production because the model can occasionally produce malformed JSON in the arguments. The current models are very reliable but not perfect. Wrapping the parse in a try/catch and returning a tool error to the model lets it self-correct on the next iteration.
Structured outputs via zod
The response_format parameter with a JSON schema constrains the model to produce output that parses against a schema. Combined with zod for runtime validation, this gives you typed structured outputs:
// src/lib/schemas.ts
import { z } from 'zod';
import { zodResponseFormat } from 'openai/helpers/zod';
export const SummarySchema = z.object({
title: z.string().describe('A short headline for the summary'),
bullets: z.array(z.string()).describe('Three to five key points'),
sentiment: z.enum(['positive', 'neutral', 'negative']),
topics: z.array(z.string()).describe('High-level topic tags'),
});
export type Summary = z.infer<typeof SummarySchema>;
export const summaryResponseFormat = zodResponseFormat(SummarySchema, 'summary');
// src/app/api/structured-summary/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { openai } from '@/lib/openai';
import { MODELS } from '@/lib/models';
import { SummarySchema, summaryResponseFormat } from '@/lib/schemas';
export async function POST(req: NextRequest) {
const { text } = await req.json();
const completion = await openai.chat.completions.parse({
model: MODELS.fast,
messages: [
{ role: 'system', content: 'Extract a structured summary from the user input.' },
{ role: 'user', content: text },
],
response_format: summaryResponseFormat,
});
const parsed = completion.choices[0]?.message.parsed;
if (!parsed) {
return NextResponse.json({ error: 'No parsed output' }, { status: 502 });
}
// parsed is typed as Summary
return NextResponse.json(parsed);
}
The openai.chat.completions.parse() method (not create()) is the structured-output variant that validates the response against the zod schema and exposes a parsed field on the message. Using parse instead of create is the single most important rule for structured outputs. Without it, you get a JSON string in content and have to validate it yourself, which Claude often forgets to do.
Add a structured outputs section to CLAUDE.md:
## Structured outputs
- ALWAYS define schemas with zod in src/lib/schemas.ts
- ALWAYS use openai.chat.completions.parse(), not .create() for structured outputs
- ALWAYS read message.parsed, not message.content
- NEVER JSON.parse(message.content) when a parsed structured output is expected
- Schema descriptions are visible to the model, write them like instructions
- Use z.enum(['a', 'b']) for closed-set fields, model accuracy is much higher
The .describe() calls on zod fields are visible to the model when the schema is included in the request. They function as field-level instructions. A schema with descriptive .describe() calls produces dramatically better outputs than a schema without them.
Retry semantics
The OpenAI SDK retries automatically on certain error classes. Understanding which errors should be retried and which should not is the difference between resilient and brittle code.
| Status | Meaning | Retry policy |
|---|---|---|
| 400 | Bad request (invalid params) | NEVER retry, fix the request |
| 401 | Unauthorised (bad API key) | NEVER retry, fix the key |
| 403 | Forbidden (org/billing issue) | NEVER retry, surface to user |
| 408 | Request timeout | Retry with backoff |
| 409 | Conflict | NEVER retry, application logic error |
| 429 | Rate limit / quota | Retry with backoff respecting Retry-After |
| 500 | Server error | Retry with backoff |
| 502 | Bad gateway | Retry with backoff |
| 503 | Service unavailable | Retry with backoff |
| 504 | Gateway timeout | Retry with backoff |
The SDK's built-in retry handles 408, 429, and 5xx automatically. Custom retry logic for application-specific reasons should respect the same boundaries: never retry 4xx errors except 408 and 429.
// src/lib/with-retry.ts
import OpenAI from 'openai';
export async function withRetry<T>(
fn: () => Promise<T>,
options: { maxAttempts?: number; baseDelay?: number } = {},
): Promise<T> {
const { maxAttempts = 3, baseDelay = 1000 } = options;
let lastError: unknown;
for (let attempt = 0; attempt < maxAttempts; attempt++) {
try {
return await fn();
} catch (e) {
lastError = e;
if (e instanceof OpenAI.APIError) {
// Never retry these
if (e.status === 400 || e.status === 401 || e.status === 403 || e.status === 409) {
throw e;
}
}
if (attempt < maxAttempts - 1) {
const delay = baseDelay * Math.pow(2, attempt);
await new Promise(resolve => setTimeout(resolve, delay));
}
}
}
throw lastError;
}
For most code paths, the SDK's built-in retry is sufficient. The custom wrapper is only needed when you want to retry on application-level conditions (a tool returning an error, a structured output failing schema validation) on top of the SDK's transport retry.
Permission hooks for OpenAI scripts
An OpenAI project accumulates scripts: prompt testers, batch processors, fine-tuning launchers, file uploaders. Some are read-only or low-cost. Others trigger expensive operations: a fine-tune can cost hundreds of dollars, a batch over a large dataset can consume thousands of credits.
In .claude/settings.local.json:
{
"permissions": {
"allow": [
"Bash(npx tsx scripts/test-prompt.ts*)",
"Bash(npx tsx scripts/list-models.ts*)",
"Bash(npx tsx scripts/check-usage.ts*)"
],
"deny": [
"Bash(npx tsx scripts/finetune.ts*)",
"Bash(npx tsx scripts/batch-process.ts*)",
"Bash(npx tsx scripts/upload-file.ts*)"
]
}
}
For more on permission hooks across an AI codebase, Claude Code permissions covers the full configuration model.
Common Claude Code mistakes with the OpenAI SDK
Six patterns Claude generates incorrectly without CLAUDE.md constraints, with the correct replacement for each.
1. Inline OpenAI instantiation
Claude generates: const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }); at the top of every file.
Correct pattern: one singleton at src/lib/openai.ts, imported everywhere.
2. Missing finish_reason check
Claude generates: return completion.choices[0].message.content without checking truncation or tool calls.
Correct pattern: check finish_reason for 'stop', 'length', 'tool_calls', 'content_filter' and handle each.
3. Tool call loop without iteration limit
Claude generates: while (true) around the tool call cycle.
Correct pattern: for (let i = 0; i < MAX_TOOL_ITERATIONS; i++) with a bound and an error if exceeded.
4. JSON.parse without try/catch on structured outputs
Claude generates: const data = JSON.parse(message.content) for a model response.
Correct pattern: openai.chat.completions.parse() with a zod schema, read message.parsed.
5. Streaming without include_usage
Claude generates: a streaming completion without stream_options: { include_usage: true }.
Correct pattern: include the option and log the final usage chunk for cost tracking.
6. Retry on 400
Claude generates: a retry wrapper that retries on every error.
Correct pattern: retry only on 408, 429, and 5xx; surface 4xx errors immediately.
Add these six pairs to CLAUDE.md as before/after examples. Claude reproduces concrete patterns faster than abstract rules.
Building OpenAI integrations that survive production
The CLAUDE.md template in this guide produces OpenAI integrations where the client is a singleton, streaming is the default for user-facing endpoints, tool calling is a bounded loop with proper message-history handling, structured outputs flow through openai.chat.completions.parse() with zod schemas, and retries respect the status code semantics.
The underlying principle is shared with every other AI provider integration. The OpenAI SDK is well-typed and the happy path is easy to get right. The edges (truncation, tool errors, malformed JSON, rate limits, timeouts) all have correct patterns documented somewhere in the OpenAI docs, but Claude does not consistently surface those patterns without the CLAUDE.md telling it which ones to use. The template makes the production-correct path the default path.
For applications that need to switch between OpenAI and Anthropic based on the task, Claude Code with the Vercel AI SDK covers the provider-agnostic layer that wraps both. For applications that need multi-step agentic workflows with retrieval, Claude Code with LangChain covers the higher-level abstraction.
Get Claudify. The bundle includes an OpenAI SDK CLAUDE.md template with the singleton client, streaming pattern, tool calling loop, zod-based structured outputs, and all six common-mistake rules pre-configured.
More like this
Ready to upgrade your Claude Code setup?
Get Claudify