OpenAI Assistants API Shutdown: The 2026 Migration Guide
The OpenAI Assistants API shuts down August 26, 2026. Here's what breaks, how the Responses API differs, real migration costs, and your three strategic options.
The OpenAI Assistants API shuts down on August 26, 2026. After that date, every call to /v1/assistants, /v1/threads, and /v1/threads/runs returns an error — no degraded mode, no grace period, no extension. If your SaaS product, helpdesk integration, or internal tooling was built on the Assistants API, you need a migration plan now.
The official replacement is a combination of the Responses API and the Conversations API. This is not a simple endpoint swap — it is a fundamentally different architecture. The object model, tool handling, state management, and cost model all change.
This guide covers the exact timeline, what breaks, the real technical differences between the old and new APIs, the hidden costs most teams miss, and your three strategic options — including whether it makes more sense to rebuild or to migrate off custom OpenAI integrations entirely.
The August 26, 2026 Hard Deadline
Hard Cutoff: On August 26, 2026, all Assistants API endpoints are permanently removed. Requests to /v1/assistants, /v1/threads, and related endpoints will fail. OpenAI has stated there is no extension option.
Here is the official timeline:
| Date | Event |
|---|---|
| April 2024 | Assistants API v2 beta released; v1 beta announced for deprecation |
| December 18, 2024 | Assistants API v1 beta access discontinued (v2 only) |
| March 2025 | Responses API launched; OpenAI signals Assistants sunset after feature parity |
| August 26, 2025 | OpenAI officially notifies developers of Assistants API deprecation |
| August 26, 2026 | Assistants API fully removed — all requests fail |
On August 26th, 2025, OpenAI notified developers using the Assistants API of its deprecation and removal from the API one year later, on August 26, 2026. After achieving feature parity in the Responses API, OpenAI deprecated the Assistants API with a shutdown date of August 26, 2026.
This is not a soft deprecation where old endpoints linger for years. The Assistants API was always in beta — it never graduated to GA. OpenAI has already moved all new model releases (GPT-5 family, GPT-5.4) exclusively to the Responses API.
Azure OpenAI Users Are Also Affected
Earlier guidance from Microsoft suggested Azure OpenAI was unaffected. That has changed. The Azure OpenAI Assistants API itself is now deprecated and is scheduled to be fully retired on August 26, 2026. Anyone currently building or running solutions on the Azure OpenAI Assistants API should plan and execute a migration to the Microsoft Foundry Agents service.
If you are on Azure, your migration path is to the Foundry Agent Service, which is built on the Responses API under the hood.
Why OpenAI Is Forcing the Move to the Responses API
The Assistants API had a specific design: persistent Assistant objects bundled model, instructions, and tools into a single server-side entity. Threads stored conversation messages. Runs executed the Assistant against a Thread asynchronously, requiring polling loops or webhooks to track completion.
The Responses API throws most of that out. Here is the concept mapping:
| Assistants API | Responses API Equivalent | Key Difference |
|---|---|---|
| Assistants | Prompts | Prompts can only be created in the dashboard — no programmatic creation |
| Threads | Conversations | Conversations store items (tool calls, tool outputs), not just messages |
| Runs | Responses | Synchronous by default; no polling loops needed |
| Run Steps | Items | Output items include typed data: messages, tool calls, structured outputs |
Assistants were persistent API objects that bundled model choice, instructions, and tool declarations — created and managed entirely through the API. Their replacement, prompts, can only be created in the dashboard, where you can version them.
That mapping sounds tidy on paper. In practice, it moves more orchestration into your application code: prompt version ownership, output parsing, tool loop handling, retry behavior, retention settings, and long-conversation management. Prompt engineering shifts to the dashboard UI, while the codebase handles execution only. If your engineering team managed Assistants through code and treated config promotion and rollback as a deployment concern, that workflow needs to change.
Hidden Reasoning Traces Drive the Architecture
The deeper reason for the shift is about reasoning models.
GPT-5 and later models use internal chain-of-thought (CoT) reasoning. These reasoning tokens are generated before the final answer but are not exposed to the client. Since the API is stateful, "OpenAI can maintain the chain-of-thought in their backend, plug it in to the conversation for you, and then strip it out before sending it back down to the client."
With the old /chat/completions API, you passed the entire conversation history with each request. But if reasoning traces are hidden, you cannot round-trip them — the model forgets its reasoning between turns. Responses preserves the model's reasoning state across turns. In Chat Completions, reasoning is dropped between calls. Responses keeps the notebook open; step-by-step thought processes survive into the next turn.
This is not academic. GPT-5 integrated via Responses scores 5% better on TAUBench compared to Chat Completions, purely by taking advantage of preserved reasoning. For agentic workflows with multiple tool calls, the performance gap widens.
The bottom line: OpenAI designed the Responses API around the needs of reasoning models that keep secrets from the client. If you want the best performance from GPT-5+, you need the stateful API.
What Breaks: Zapier, Custom Integrations, and Legacy Workflows
The blast radius extends well beyond teams that wrote custom code against the Assistants API. Any tool, integration platform, or workflow that touched /v1/assistants or /v1/threads is affected.
Zapier Workflows
Zapier has deprecated all ChatGPT (OpenAI) steps that use the Assistants API. On August 26, 2026, Zaps using these steps will stop working. Affected steps include:
- Conversation With Assistant (Legacy)
- Create Assistant
- Upload File (when used with assistants)
- Create Vector Store
- Assistant lookup actions
These steps are already hidden from the Zap editor for new workflows. Zapier recommends updating your Zaps to use the new action before this date to ensure your workflows continue to run smoothly. This is not an automatic migration — you need to rebuild affected Zaps manually using the new "Conversation (Responses API)" action.
If your support team uses Zapier to route, summarize, or triage helpdesk tickets through these steps, that is part of your migration scope.
Custom SaaS Integrations
If your product connects to OpenAI for AI-powered features — auto-replies, summarization, intelligent routing — and any of that code calls openai.beta.assistants.create() or openai.beta.threads.create(), it breaks on August 26.
The migration is not a find-and-replace. Key differences that require code rewrites:
- No programmatic Assistant creation. Prompts are dashboard-only. If your system dynamically creates Assistants per tenant or per use case, you need a fundamentally different approach.
- No automated Thread → Conversation migration. OpenAI will not provide an automated tool for migrating Threads to Conversations. They recommend migrating new user threads onto Conversations and backfilling old ones as necessary.
- Different async model. Runs required complex asynchronous polling loops, could enter states like
requires_actionfor tool execution, and the Thread was locked while a Run was in progress. Responses are synchronous by default (with an optionalbackground: truemode for long-running tasks). If your application architecture is built around background polling for AI completions, you are looking at a significant refactor. - Different event shapes. Streaming events, structured output format, and function-calling schemas all differ.
response_formatbecomestext.format, function definitions are strict by default in Responses, and tool calls and tool outputs are separate item types tied together with acall_id.
One useful release valve: Chat Completions remains supported, and Responses can be adopted incrementally. If you have simple one-shot prompts that do not need agentic tools or persistent state, you do not have to force every flow into Responses on day one. Migrate the flows that depend on agent features first.
What Your Audit Should Cover
Before you plan a migration, audit every system that touches OpenAI:
- Search your codebase for
openai.beta.assistants,openai.beta.threads,/v1/assistants,/v1/threads - Check third-party integrations — Zapier, Make, n8n, Retool, internal tools
- Identify stored state — if you are storing Thread IDs in your database, those references become dead pointers
- Map tool usage — Code Interpreter, File Search, and function calling all have different configuration in Responses
- Catalog dynamic Assistant creation — any code that creates Assistants programmatically needs a fundamentally different pattern
- Snapshot current behavior — export instructions, tool schemas, attached files, and representative conversations so you have a stable baseline for regression testing
The Hidden Costs and Limitations of the Responses API
The Responses API is genuinely more capable than the Assistants API. But the cost model is different, and if you are not paying attention, your bill will surprise you.
Tool Pricing
The biggest budgeting mistake is assuming Responses is priced like a plain text endpoint. The API call itself is not priced separately, but built-in tools add their own fees:
| Tool | Pricing (Responses API) |
|---|---|
| Web Search | $10/1K calls (reasoning models) or $25/1K calls (preview search, non-reasoning models) + token costs vary by mode |
| File Search | $0.10/GB/day storage (after 1 GB free) + $2.50/1K tool calls |
| Code Interpreter / Hosted Shell | Per 20-minute session per container ($0.03 for 1 GB up to $1.92 for 64 GB) |
| Remote MCP Servers | No additional tool cost — billed for output tokens only |
Web search costs $10 per 1,000 calls plus token costs. File search costs $0.10/GB/day storage, $2.50/1K tool calls.
The web search pricing catches teams off guard. For gpt-4o-mini and gpt-4.1-mini with the non-preview web search tool, search content tokens are billed as a fixed block of 8,000 input tokens per call. That means every web search call on mini models incurs both the per-call fee and a significant fixed token charge for the retrieved content. Community reports confirm that actual bills frequently exceed what teams estimate from the per-call pricing alone.
Container Pricing Shift
Code Interpreter container pricing is transitioning: "Now: 1 GB for $0.03 / 64GB for $1.92 per container. Starting March 31, 2026: per 20-minute session per container." Even if your AI agent only runs a 3-second Python script to format a CSV, you are billed for the full 20-minute session.
Conversation State and Retention
Conversation state retention works differently than Threads, and you need to understand two separate policies:
- Response objects are saved for 30 days by default and can be viewed in the dashboard or retrieved via API. You can disable this by setting
storetofalse. - Conversation objects and items in them are not subject to the 30-day TTL. Any response attached to a conversation will have its items persisted with no 30-day TTL.
The retention policy depends on whether you are using Conversations or raw previous_response_id chaining. Plan accordingly.
Token Billing on Chained Responses
Even when using previous_response_id, all previous input tokens for responses in the chain are billed as input tokens in the API. Long conversation chains get progressively more expensive with each turn. OpenAI offers compaction guidance to mitigate this, but you have to implement compaction logic yourself.
The net effect: do not approve a migration budget until you reprice a real workload end to end.
Migration Options: Rebuild, Switch Models, or Upgrade Your Platform
You have three realistic paths. Each has different engineering cost, risk, and long-term implications.
Option 1: Rebuild Your Integration on the Responses API
Best for: Teams with dedicated engineering capacity who want to stay on OpenAI and maintain full control. Choose this if the agent itself is product IP — custom orchestration, proprietary tools, regulated workflows, or cross-system actions that an off-the-shelf platform will never model cleanly.
What it involves:
- Rewrite all Assistants API calls to use
client.responses.create() - Convert Assistants to dashboard Prompts and store Prompt IDs in source control
- Replace Threads with Conversations (or manual
previous_response_idchaining) - Rewrite async polling loops to synchronous or streaming patterns
- Rewrite tool-loop handling — output items, tool calls, and tool outputs are separate types with stricter function schemas
- Implement cost monitoring for web search, file search, and Code Interpreter tool calls
- Run shadow traffic comparisons to detect regressions before cutover
A minimal text-only turn in the new model:
import OpenAI from 'openai';
const client = new OpenAI();
const conversation = await client.conversations.create({
metadata: { account_id: accountId }
});
const response = await client.responses.create({
prompt: { id: process.env.SUPPORT_PROMPT_ID },
conversation: conversation.id,
input: [{ role: 'user', content: userMessage }],
});
console.log(response.output_text);That snippet covers a plain text turn. If your app uses functions, web search, file search, or other tools, you need to inspect output items and handle tool calls explicitly rather than treating output_text as the whole story.
Risk: This is typically 2–6 weeks of real engineering work for a production integration. The biggest traps are around multi-step tool calling orchestration and the loss of programmatic Assistant creation. If your system dynamically created Assistants per tenant, you need a fundamentally different pattern — likely Prompts dashboard plus configuration-as-code.
Writing the API wrapper is the easy part. The real challenge is migrating historical conversation data without breaking the context window. For teams considering a DIY script approach, understand the failure modes first: why DIY AI migration scripts fail at scale.
Option 2: Use the Deadline to Add Model Portability
Best for: Teams already frustrated with OpenAI's pricing, rate limits, or deprecation cadence.
If you are already rewriting your integration, you could decouple from OpenAI entirely. Anthropic's Claude and Google's Gemini both offer completions-style APIs that do not require this kind of architectural shift. By adopting standards like the Model Context Protocol (MCP) and standardizing your tool-calling architecture, you can route requests across providers.
The trade-off: you lose access to OpenAI's built-in tools (web search, file search, Code Interpreter) and the reasoning token preservation that gives GPT-5 its multi-turn edge. You gain portability and a simpler integration pattern.
Be realistic about the escape hatch, though. Anthropic's OpenAI SDK compatibility layer is explicitly intended mostly to test and compare capabilities and is not considered a long-term or production-ready solution for most use cases. Google's Gemini API supports function calling but is stateless for these flows, requiring thought signatures to preserve context when you manage history manually. Portability can be the right strategic move, but it usually adds another abstraction layer on top of the migration rather than replacing it.
This path makes the most sense if your AI features are relatively simple — summarization, classification, single-turn Q&A — and do not depend on multi-turn reasoning or built-in tool execution.
Option 3: Migrate to a Platform with Native AI
Best for: SaaS companies that bolted OpenAI Assistants onto their helpdesk, CRM, or support system and are now maintaining fragile custom wrappers.
Ask honestly: did you build a custom OpenAI integration because your platform did not have native AI, and now it does?
Modern helpdesks like Intercom, Front, Zendesk, and Gorgias have shipped native AI features — auto-replies, intent detection, summarization, intelligent routing — maintained by the platform vendor. When OpenAI changes their API again (and they will), the vendor absorbs the migration. You don't.
If your custom Assistants API integration was replicating what a modern helpdesk's native AI does, the forced migration is an opportunity to consolidate. You get out of the business of maintaining OpenAI wrappers entirely.
For teams evaluating this path, we have published a deep-dive on Intercom's AI and workflow capabilities.
The critical piece in this scenario is not the API rewrite — it is the data migration. Moving historical tickets, conversations, customer records, automations, and macros cleanly is what determines whether your team has a smooth transition or a multi-week fire drill. We have covered why separating migration from implementation reduces risk.
The Full Migration Checklist
Regardless of which option you choose, execute these steps before August 26:
- Audit all Assistants API usage across your codebase, CI/CD pipelines, and third-party platforms (Zapier, Make, etc.)
- Catalog every Assistant object — export instructions, tool configurations, model settings, and metadata from the OpenAI dashboard
- Snapshot current behavior — export representative conversations and tool schemas for a regression testing baseline
- Map stored Thread IDs in your database to understand the scope of conversation state migration
- Create Prompts in the dashboard and store Prompt IDs in source control; decide who owns prompt versioning
- Prototype a single end-to-end flow in the Responses API before committing to a full rewrite. If you need persistent state, prototype the Conversations API early — it is a different data model than Threads.
- Implement cost monitoring for built-in tool calls (web search and file search have per-call fees that did not exist in the Assistants API)
- Run parallel traffic — send the same requests to both APIs and compare outputs before cutting over production
- Migrate production traffic gradually using feature flags or per-tenant cutover
- Set a hard internal deadline of July 2026 — do not wait until August
No Automated Thread Migration: OpenAI has explicitly stated they will not provide a tool to migrate Threads to Conversations. Plan for manual backfill of critical conversation history.
When to Bring In Help
If your Assistants API integration is a thin wrapper around a single model — a chatbot that does Q&A on your docs — the rewrite is straightforward. A senior engineer can probably handle it in a week.
But if your integration is deeply embedded in a production helpdesk, CRM, or customer-facing workflow — with multi-tenant Assistant configurations, stored Thread state, complex tool orchestration, and real customer conversation history — the migration has more moving parts than an API rewrite.
Separate two decisions: who owns the agent runtime after migration, and who preserves the data and relationships underneath it. Teams that bundle both into one vague "AI migration" project usually underestimate timeline and QA.
At ClonePartner, we handle the data layer — conversation ID continuity, historical data preservation, attachment handling, workflow rewiring, and cutover sequencing. Whether you are rewiring a custom OpenAI integration or migrating to a native-AI platform entirely, we ensure your data model and relationship structure remains intact during the transition.
We are typically brought in when the AI agent touches a helpdesk or CRM, multiple no-code automations sit around the core workflow, historical conversations must remain searchable, or the business cannot tolerate a messy dual-stack cutover.
Frequently Asked Questions
- When does the OpenAI Assistants API shut down?
- The Assistants API is fully removed on August 26, 2026. After that date, all requests to /v1/assistants, /v1/threads, and related endpoints will fail. OpenAI announced the deprecation on August 26, 2025, giving developers exactly one year.
- What replaces the OpenAI Assistants API?
- OpenAI's official replacement is the Responses API combined with the Conversations API. Assistants become Prompts (dashboard-only), Threads become Conversations, Runs become Responses, and Run Steps become Items. It is a full architectural change, not a simple endpoint swap.
- Does the Assistants API deprecation affect Azure OpenAI?
- Yes. The Azure OpenAI Assistants API is also deprecated with the same August 26, 2026 retirement date. Azure users should migrate to the Microsoft Foundry Agents service, which is built on the Responses API.
- Will Zapier workflows using OpenAI Assistants break?
- Yes. Zapier has deprecated all ChatGPT (OpenAI) steps that use the Assistants API. Zaps using steps like 'Conversation With Assistant' and 'Create Assistant' will stop working on August 26, 2026. You need to rebuild affected Zaps using the new Responses API actions.
- Can I automatically migrate Threads to Conversations?
- No. OpenAI has explicitly stated they will not provide an automated tool for migrating Threads to Conversations. They recommend migrating new user threads onto Conversations and backfilling old conversation history as needed.