What replaces the OpenAI Assistants API?

OpenAI's official replacement is the Responses API combined with the Conversations API. Assistants become Prompts (dashboard-only), Threads become Conversations, Runs become Responses, and Run Steps become Items. It is a full architectural change, not a simple endpoint swap.

Does the Assistants API deprecation affect Azure OpenAI?

Yes. The Azure OpenAI Assistants API is also deprecated with the same August 26, 2026 retirement date. Azure users should migrate to the Microsoft Foundry Agents service, which is built on the Responses API.

Will Zapier workflows using OpenAI Assistants break?

Yes. Zapier has deprecated all ChatGPT (OpenAI) steps that use the Assistants API. Zaps using steps like 'Conversation With Assistant' and 'Create Assistant' will stop working on August 26, 2026. You need to rebuild affected Zaps using the new Responses API actions.

Can I automatically migrate Threads to Conversations?

No. OpenAI has explicitly stated they will not provide an automated tool for migrating Threads to Conversations. They recommend migrating new user threads onto Conversations and backfilling old conversation history as needed.

OpenAI Assistants API Shutdown: The 2026 Migration Guide

The OpenAI Assistants API shuts down on August 26, 2026. After that date, every call to /v1/assistants, /v1/threads, and /v1/threads/runs returns an error — no degraded mode, no grace period, no extension. If your SaaS product, helpdesk integration, or internal tooling was built on the Assistants API, you need a migration plan now.

The official replacement is a combination of the Responses API and the Conversations API. This is not a simple endpoint swap — it is a fundamentally different architecture. The object model, tool handling, state management, and cost model all change.

This guide covers the exact timeline, what breaks, the real technical differences between the old and new APIs, the hidden costs most teams miss, and your three strategic options — including whether it makes more sense to rebuild or to migrate off custom OpenAI integrations entirely.

The August 26, 2026 Hard Deadline

Danger

Hard Cutoff: On August 26, 2026, all Assistants API endpoints are permanently removed. Requests to /v1/assistants, /v1/threads, and related endpoints will fail. OpenAI has stated there is no extension option.

Here is the official timeline:

Date	Event
April 2024	Assistants API v2 beta released; v1 beta announced for deprecation
December 18, 2024	Assistants API v1 beta access discontinued (v2 only)
March 2025	Responses API launched; OpenAI signals Assistants sunset after feature parity
August 26, 2025	OpenAI officially notifies developers of Assistants API deprecation
August 26, 2026	Assistants API fully removed — all requests fail

On August 26th, 2025, OpenAI notified developers using the Assistants API of its deprecation and removal from the API one year later, on August 26, 2026. After achieving feature parity in the Responses API, OpenAI deprecated the Assistants API with a shutdown date of August 26, 2026.

This is not a soft deprecation where old endpoints linger for years. The Assistants API was always in beta — it never graduated to GA. OpenAI has already moved all new model releases (GPT-5 family, GPT-5.4) exclusively to the Responses API.

For the official deprecation notice and migration guidance, see OpenAI's Assistants API deprecation announcement and the Responses API documentation.

Azure OpenAI Users Are Also Affected

Earlier guidance from Microsoft suggested Azure OpenAI was unaffected. That has changed. The Azure OpenAI Assistants API itself is now deprecated and is scheduled to be fully retired on August 26, 2026. Anyone currently building or running solutions on the Azure OpenAI Assistants API should plan and execute a migration to the Microsoft Foundry Agents service.

If you are on Azure, your migration path is to the Foundry Agent Service, which is built on the Responses API under the hood. See Microsoft's Azure OpenAI Assistants retirement notice for details.

Why OpenAI Is Forcing the Move to the Responses API

The Assistants API had a specific design: persistent Assistant objects bundled model, instructions, and tools into a single server-side entity. Threads stored conversation messages. Runs executed the Assistant against a Thread asynchronously, requiring polling loops or webhooks to track completion.

The Responses API throws most of that out. Here is the concept mapping:

Assistants API	Responses API Equivalent	Key Difference
Assistants	Prompts	Prompts can only be created in the dashboard — no programmatic creation
Threads	Conversations	Conversations store items (tool calls, tool outputs), not just messages
Runs	Responses	Synchronous by default; no polling loops needed
Run Steps	Items	Output items include typed data: messages, tool calls, structured outputs

Assistants were persistent API objects that bundled model choice, instructions, and tool declarations — created and managed entirely through the API. Their replacement, prompts, can only be created in the dashboard, where you can version them.

That mapping sounds tidy on paper. In practice, it moves more orchestration into your application code: prompt version ownership, output parsing, tool loop handling, retry behavior, retention settings, and long-conversation management. Prompt engineering shifts to the dashboard UI, while the codebase handles execution only. If your engineering team managed Assistants through code and treated config promotion and rollback as a deployment concern, that workflow needs to change.

Feature Parity: What Maps Cleanly and What Does Not

Not every Assistants API feature has a direct equivalent in the Responses API. Before planning your migration, understand where parity exists and where gaps remain:

Assistants API Feature	Responses API Status	Notes
Streaming	✅ Supported	Event shapes differ — rewrite SSE handlers
Function calling	✅ Supported	Strict schemas by default; tool calls and outputs are separate item types tied by `call_id`
File Search	✅ Supported	Configuration and pricing differ
Code Interpreter	✅ Supported (as Hosted Shell)	Container pricing model changed
Annotations	⚠️ Partial	Handled differently via output items
File attachments on messages	⚠️ Partial	Attachments work through tool inputs, not message-level fields
Metadata on objects	⚠️ Partial	Metadata supported on conversations; not all object types carry arbitrary metadata
Truncation strategies	❌ No equivalent	You manage context window yourself or use conversation compaction
Run cancellation	❌ No equivalent	Responses are synchronous; use `background: true` with cancellation for long-running tasks
Programmatic Assistant creation	❌ No equivalent	Prompts are dashboard-only

This table is the real migration scope. Features marked ⚠️ or ❌ are where your rewrite effort concentrates.

Hidden Reasoning Traces Drive the Architecture

The deeper reason for the shift is about reasoning models.

GPT-5 and later models use internal chain-of-thought (CoT) reasoning. These reasoning tokens are generated before the final answer but are not exposed to the client. Since the API is stateful, "OpenAI can maintain the chain-of-thought in their backend, plug it in to the conversation for you, and then strip it out before sending it back down to the client."

With the old /chat/completions API, you passed the entire conversation history with each request. But if reasoning traces are hidden, you cannot round-trip them — the model forgets its reasoning between turns. Responses preserves the model's reasoning state across turns. In Chat Completions, reasoning is dropped between calls. Responses keeps the notebook open; step-by-step thought processes survive into the next turn.

This is not academic. GPT-5 integrated via Responses scores 5% better on TAUBench compared to Chat Completions, purely by taking advantage of preserved reasoning. For agentic workflows with multiple tool calls, the performance gap widens.

The bottom line: OpenAI designed the Responses API around the needs of reasoning models that keep secrets from the client. If you want the best performance from GPT-5+, you need the stateful API.

What Breaks: Zapier, Custom Integrations, and Legacy Workflows

The blast radius extends well beyond teams that wrote custom code against the Assistants API. Any tool, integration platform, or workflow that touched /v1/assistants or /v1/threads is affected.

Zapier Workflows

Zapier has deprecated all ChatGPT (OpenAI) steps that use the Assistants API. On August 26, 2026, Zaps using these steps will stop working. Affected steps include:

Conversation With Assistant (Legacy)
Create Assistant
Upload File (when used with assistants)
Create Vector Store
Assistant lookup actions

These steps are already hidden from the Zap editor for new workflows. Zapier recommends updating your Zaps to use the new action before this date to ensure your workflows continue to run smoothly. This is not an automatic migration — you need to rebuild affected Zaps manually using the new "Conversation (Responses API)" action.

If your support team uses Zapier to route, summarize, or triage helpdesk tickets through these steps, that is part of your migration scope.

Custom SaaS Integrations

If your product connects to OpenAI for AI-powered features — auto-replies, summarization, intelligent routing — and any of that code calls openai.beta.assistants.create() or openai.beta.threads.create(), it breaks on August 26.

The migration is not a find-and-replace. Key differences that require code rewrites:

No programmatic Assistant creation. Prompts are dashboard-only. If your system dynamically creates Assistants per tenant or per use case, you need a fundamentally different approach.
No automated Thread → Conversation migration. OpenAI will not provide an automated tool for migrating Threads to Conversations. They recommend migrating new user threads onto Conversations and backfilling old ones as necessary.
Different async model. Runs required complex asynchronous polling loops, could enter states like requires_action for tool execution, and the Thread was locked while a Run was in progress. Responses are synchronous by default (with an optional background: true mode for long-running tasks). If your application architecture is built around background polling for AI completions, you are looking at a significant refactor.
Different event shapes. Streaming events, structured output format, and function-calling schemas all differ. response_format becomes text.format, function definitions are strict by default in Responses, and tool calls and tool outputs are separate item types tied together with a call_id.

One useful release valve: Chat Completions remains supported, and Responses can be adopted incrementally. If you have simple one-shot prompts that do not need agentic tools or persistent state, you do not have to force every flow into Responses on day one. Migrate the flows that depend on agent features first.

What Your Audit Should Cover

Before you plan a migration, audit every system that touches OpenAI:

Search your codebase for openai.beta.assistants, openai.beta.threads, /v1/assistants, /v1/threads
Check third-party integrations — Zapier, Make, n8n, Retool, internal tools
Identify stored state — if you are storing Thread IDs in your database, those references become dead pointers
Map tool usage — Code Interpreter, File Search, and function calling all have different configuration in Responses
Catalog dynamic Assistant creation — any code that creates Assistants programmatically needs a fundamentally different pattern
Snapshot current behavior — export instructions, tool schemas, attached files, and representative conversations so you have a stable baseline for regression testing

The Hidden Costs and Limitations of the Responses API

The Responses API is genuinely more capable than the Assistants API. But the cost model is different, and if you are not paying attention, your bill will surprise you.

Tool Pricing

The biggest budgeting mistake is assuming Responses is priced like a plain text endpoint. The API call itself is not priced separately, but built-in tools add their own fees:

Pricing as of June 2025 — OpenAI updates pricing frequently, so verify against the current pricing page before budgeting.

Tool	Pricing (Responses API)
Web Search	$10/1K calls (reasoning models) or $25/1K calls (preview search, non-reasoning models) + token costs vary by mode
File Search	$0.10/GB/day storage (after 1 GB free) + $2.50/1K tool calls
Code Interpreter / Hosted Shell	Per 20-minute session per container ($0.03 for 1 GB up to $1.92 for 64 GB)
Remote MCP Servers	No additional tool cost — billed for output tokens only

Web search costs $10 per 1,000 calls plus token costs. File search costs $0.10/GB/day storage, $2.50/1K tool calls.

The web search pricing catches teams off guard. For gpt-4o-mini and gpt-4.1-mini with the non-preview web search tool, search content tokens are billed as a fixed block of 8,000 input tokens per call. That means every web search call on mini models incurs both the per-call fee and a significant fixed token charge for the retrieved content. Community reports confirm that actual bills frequently exceed what teams estimate from the per-call pricing alone.

Container Pricing Shift

Code Interpreter container pricing is transitioning: "Now: 1 GB for $0.03 / 64GB for $1.92 per container. Starting March 31, 2026: per 20-minute session per container." Even if your AI agent only runs a 3-second Python script to format a CSV, you are billed for the full 20-minute session.

Conversation State and Retention

Conversation state retention works differently than Threads, and you need to understand two separate policies:

Response objects are saved for 30 days by default and can be viewed in the dashboard or retrieved via API. You can disable this by setting store to false.
Conversation objects and items in them are not subject to the 30-day TTL. Any response attached to a conversation will have its items persisted with no 30-day TTL.

The retention policy depends on whether you are using Conversations or raw previous_response_id chaining. Plan accordingly.

Info

Compliance note: The store: false parameter prevents OpenAI from retaining response data, which matters for teams in regulated industries (healthcare, finance, government). If your Assistants API integration relied on specific data handling commitments, review how store: false and conversation retention interact with your data processing agreements before migrating. Data residency and GDPR obligations may require different configuration choices in the Responses API than what you had in Assistants.

Token Billing on Chained Responses

Even when using previous_response_id, all previous input tokens for responses in the chain are billed as input tokens in the API. Long conversation chains get progressively more expensive with each turn. OpenAI offers compaction guidance to mitigate this, but you have to implement compaction logic yourself.

The net effect: do not approve a migration budget until you reprice a real workload end to end.

Migration Options: Rebuild, Switch Models, or Upgrade Your Platform

You have three realistic paths. Each has different engineering cost, risk, and long-term implications.

Option 1: Rebuild Your Integration on the Responses API

Best for: Teams with dedicated engineering capacity who want to stay on OpenAI and maintain full control. Choose this if the agent itself is product IP — custom orchestration, proprietary tools, regulated workflows, or cross-system actions that an off-the-shelf platform will never model cleanly.

What it involves:

Rewrite all Assistants API calls to use client.responses.create()
Convert Assistants to dashboard Prompts and store Prompt IDs in source control
Replace Threads with Conversations (or manual previous_response_id chaining)
Rewrite async polling loops to synchronous or streaming patterns
Rewrite tool-loop handling — output items, tool calls, and tool outputs are separate types with stricter function schemas
Implement cost monitoring for web search, file search, and Code Interpreter tool calls
Run shadow traffic comparisons to detect regressions before cutover

A minimal text-only turn in the new model:

import OpenAI from 'openai';
 
const client = new OpenAI();
 
const conversation = await client.conversations.create({
  metadata: { account_id: accountId }
});
 
const response = await client.responses.create({
  prompt: { id: process.env.SUPPORT_PROMPT_ID },
  conversation: conversation.id,
  input: [{ role: 'user', content: userMessage }],
});
 
console.log(response.output_text);

That snippet covers a plain text turn. If your app uses functions, web search, file search, or other tools, you need to inspect output items and handle tool calls explicitly rather than treating output_text as the whole story.

Side-by-Side: Function Calling Migration

The minimal snippet above shows the simple case. For teams using function calling — the most common migration complexity — here is how the pattern changes:

Assistants API (before):

// Create assistant with tools
const assistant = await client.beta.assistants.create({
  model: 'gpt-4o',
  instructions: 'You are a support agent.',
  tools: [{ type: 'function', function: { name: 'lookup_order', parameters: { /* ... */ } } }],
});
 
// Create thread, add message, create run
const thread = await client.beta.threads.create();
await client.beta.threads.messages.create(thread.id, { role: 'user', content: userMessage });
const run = await client.beta.threads.runs.create(thread.id, { assistant_id: assistant.id });
 
// Poll for completion
let status = run.status;
while (status === 'queued' || status === 'in_progress') {
  await sleep(1000);
  const updated = await client.beta.threads.runs.retrieve(thread.id, run.id);
  status = updated.status;
  if (status === 'requires_action') {
    // Submit tool outputs and continue polling
    const toolCalls = updated.required_action.submit_tool_outputs.tool_calls;
    const outputs = await Promise.all(toolCalls.map(async (tc) => ({
      tool_call_id: tc.id,
      output: JSON.stringify(await handleFunction(tc.function.name, JSON.parse(tc.function.arguments))),
    })));
    await client.beta.threads.runs.submitToolOutputs(thread.id, run.id, { tool_outputs: outputs });
  }
}

Responses API (after):

// No assistant object — tools declared inline or via Prompt
const response = await client.responses.create({
  model: 'gpt-4o',
  instructions: 'You are a support agent.',
  conversation: conversation.id,
  input: [{ role: 'user', content: userMessage }],
  tools: [{ type: 'function', name: 'lookup_order', parameters: { /* ... */ }, strict: true }],
});
 
// Check output items for tool calls — no polling
const toolCalls = response.output.filter(item => item.type === 'function_call');
if (toolCalls.length > 0) {
  const toolOutputs = await Promise.all(toolCalls.map(async (tc) => ({
    type: 'function_call_output' as const,
    call_id: tc.call_id,
    output: JSON.stringify(await handleFunction(tc.name, JSON.parse(tc.arguments))),
  })));
 
  // Send tool outputs back in a follow-up response
  const finalResponse = await client.responses.create({
    model: 'gpt-4o',
    conversation: conversation.id,
    input: toolOutputs,
    previous_response_id: response.id,
  });
  console.log(finalResponse.output_text);
} else {
  console.log(response.output_text);
}

Key differences visible in the code: no polling loop, no run status state machine, explicit call_id linking between tool calls and outputs, strict: true on function schemas by default, and tool outputs submitted as input items in a follow-up response rather than through a separate submission endpoint.

OpenAI's Agents SDK: For teams building more complex agent workflows in Python, OpenAI also offers the Agents SDK — a Python framework that sits on top of the Responses API and provides higher-level abstractions for tool registration, agent handoffs, guardrails, and tracing. If your migration involves multi-agent orchestration or complex tool chains, evaluate the Agents SDK before writing raw Responses API code — it handles the tool loop, output item inspection, and multi-step orchestration that you would otherwise implement yourself.

Risk: This is typically 2–6 weeks of real engineering work for a production integration. The biggest traps are around multi-step tool calling orchestration and the loss of programmatic Assistant creation. If your system dynamically created Assistants per tenant, you need a fundamentally different pattern — likely Prompts dashboard plus configuration-as-code.

Writing the API wrapper is the easy part. The real challenge is migrating historical conversation data without breaking the context window. For teams considering a DIY script approach, understand the failure modes first: why DIY AI migration scripts fail at scale.

Option 2: Use the Deadline to Add Model Portability

Best for: Teams already frustrated with OpenAI's pricing, rate limits, or deprecation cadence.

If you are already rewriting your integration, you could decouple from OpenAI entirely. Anthropic's Claude and Google's Gemini both offer completions-style APIs that do not require this kind of architectural shift. By adopting standards like the Model Context Protocol (MCP) and standardizing your tool-calling architecture, you can route requests across providers.

The trade-off: you lose access to OpenAI's built-in tools (web search, file search, Code Interpreter) and the reasoning token preservation that gives GPT-5 its multi-turn edge. You gain portability and a simpler integration pattern.

Be realistic about the escape hatch, though. Anthropic's OpenAI SDK compatibility layer is explicitly intended mostly to test and compare capabilities and is not considered a long-term or production-ready solution for most use cases. Google's Gemini API supports function calling but is stateless for these flows, requiring thought signatures to preserve context when you manage history manually. Portability can be the right strategic move, but it usually adds another abstraction layer on top of the migration rather than replacing it.

This path makes the most sense if your AI features are relatively simple — summarization, classification, single-turn Q&A — and do not depend on multi-turn reasoning or built-in tool execution.

Option 3: Migrate to a Platform with Native AI

Best for: SaaS companies that bolted OpenAI Assistants onto their helpdesk, CRM, or support system and are now maintaining fragile custom wrappers.

Ask honestly: did you build a custom OpenAI integration because your platform did not have native AI, and now it does?

Modern helpdesks like Intercom, Front, Zendesk, and Gorgias have shipped native AI features — auto-replies, intent detection, summarization, intelligent routing — maintained by the platform vendor. When OpenAI changes their API again (and they will), the vendor absorbs the migration. You don't.

If your custom Assistants API integration was replicating what a modern helpdesk's native AI does, the forced migration is an opportunity to consolidate. You get out of the business of maintaining OpenAI wrappers entirely.

For teams evaluating this path, we have published a deep-dive on Intercom's AI and workflow capabilities.

The critical piece in this scenario is not the API rewrite — it is the data migration. Moving historical tickets, conversations, customer records, automations, and macros cleanly is what determines whether your team has a smooth transition or a multi-week fire drill. We have covered why separating migration from implementation reduces risk.

The Full Migration Checklist

Regardless of which option you choose, execute these steps before August 26:

Audit all Assistants API usage across your codebase, CI/CD pipelines, and third-party platforms (Zapier, Make, etc.)
Catalog every Assistant object — export instructions, tool configurations, model settings, and metadata from the OpenAI dashboard
Snapshot current behavior — export representative conversations and tool schemas for a regression testing baseline
Map stored Thread IDs in your database to understand the scope of conversation state migration
Create Prompts in the dashboard and store Prompt IDs in source control; decide who owns prompt versioning
Prototype a single end-to-end flow in the Responses API before committing to a full rewrite. If you need persistent state, prototype the Conversations API early — it is a different data model than Threads.
Implement cost monitoring for built-in tool calls (web search and file search have per-call fees that did not exist in the Assistants API)
Run parallel traffic — send the same requests to both APIs and compare outputs before cutting over production. For output comparison, log both response sets and diff structured outputs (tool call sequences, extracted entities, final answers) rather than doing string comparison on free-text responses. If you have evaluation criteria for your AI outputs, run them against both response sets to catch quality regressions.
Migrate production traffic gradually using feature flags or per-tenant cutover
Set a hard internal deadline of July 2026 — do not wait until August

Warning

No Automated Thread Migration: OpenAI has explicitly stated they will not provide a tool to migrate Threads to Conversations. Plan for manual backfill of critical conversation history.

When to Bring In Help

If your Assistants API integration is a thin wrapper around a single model — a chatbot that does Q&A on your docs — the rewrite is straightforward. A senior engineer can probably handle it in a week.

But if your integration is deeply embedded in a production helpdesk, CRM, or customer-facing workflow — with multi-tenant Assistant configurations, stored Thread state, complex tool orchestration, and real customer conversation history — the migration has more moving parts than an API rewrite.

Separate two decisions: who owns the agent runtime after migration, and who preserves the data and relationships underneath it. Teams that bundle both into one vague "AI migration" project usually underestimate timeline and QA.

At ClonePartner, we handle the data layer — conversation ID continuity, historical data preservation, attachment handling, workflow rewiring, and cutover sequencing. Whether you are rewiring a custom OpenAI integration or migrating to a native-AI platform entirely, we ensure your data model and relationship structure remains intact during the transition.

We are typically brought in when the AI agent touches a helpdesk or CRM, multiple no-code automations sit around the core workflow, historical conversations must remain searchable, or the business cannot tolerate a messy dual-stack cutover.

OpenAI Assistants API Shutdown: The 2026 Migration Guide

Planning a migration?

The August 26, 2026 Hard Deadline

Azure OpenAI Users Are Also Affected

Why OpenAI Is Forcing the Move to the Responses API

Feature Parity: What Maps Cleanly and What Does Not

Hidden Reasoning Traces Drive the Architecture

What Breaks: Zapier, Custom Integrations, and Legacy Workflows

Zapier Workflows

Custom SaaS Integrations

What Your Audit Should Cover

The Hidden Costs and Limitations of the Responses API

Tool Pricing

Container Pricing Shift

Conversation State and Retention

Token Billing on Chained Responses

Migration Options: Rebuild, Switch Models, or Upgrade Your Platform

Option 1: Rebuild Your Integration on the Responses API

Side-by-Side: Function Calling Migration

Option 2: Use the Deadline to Add Model Portability

Option 3: Migrate to a Platform with Native AI

The Full Migration Checklist

When to Bring In Help

Frequently Asked Questions

More from our Blog

The Data Migration Risk Model: Why DIY AI Scripts Fail and How to Engineer Accountability

The 2026 Technical Guide to Mastering Intercom's AI, Workflows, and Data

Why Data Migration Isn’t Implementation — How Splitting Teams Cuts Risk, Time & Cost

Using Generative AI in SaaS Data Migration

What Is Salesforce Headless 360? The AI Agent Platform Explained