I've been using Codex for the last 24 hours, and background mode boosts your output. You can have Codex work on n+ features async. I had it building a database model alongside frontend authentication, and did both pretty well.
responses api is a hosted thing and so it made most sense for it to directly connect to other hosted services (like remote mcp servers).
Prior to the release of the Responses API, the Assistants API was the best way (for our use cases, at least) to interact with OpenAI's API, so hopefully some clarity on the plan for it is released soon (now that Responses API has some of the things that it was previously missing)
We're almost ready to share a migration guide. Today, we closed the gap between Assistants and Responses by launching Code Interpreter and support for multiple vector stores in File Search.
We still need to add support for Assistants and Threads objects to Responses before we can give devs a simple migration path. Working on this actively and hope to have all of this out in the coming weeks.
I started my MVP product with assistants and migrated to responses pretty easily. I handle a few more things myself but other than that it's not really been difficult.
Anybody knows how searching multiple vector stores is implemented? The obvious plan would be to allow something like:
"vector_store_ids": ["<vector_store_id1>", "<vector_store_id2>", ...]
We switched over to https://ai.pydantic.dev/ which I really like. LLM agnostic and the team is very receptive to feedback.
responses api, by contrast, is stateful — only send the latest message, and openai stores the conversation history, while keeping track of other details on behalf of the calling app, like parallel tool call states.
but i would say that since chat completions has become an informal industry standard, the responses api feels like an attempt by openai to break away from that shared interface, because it is so easy to swap out providers with nothing more than a base url and a model id, to a paradigm which requires data migration as well as replacement infrastructure (containers for code execution, for example).
for example, you can give the responses api access to 3 tools: a vector store with some user memories (file_search), the shopify mcp server, and code_interpreter. you can then ask it to look up some user memories, find relevant items in the shopify mcp store, and then download them into a csv file. all of this can be done in a single api call that involves multiple model turns and tool calls.
p.s. - you can also use responses statelessly by setting store=false.
Why would anyone want to use Responses statelessly? Just trying to understand.
So, so weird that they still don't want you to see their models' reasoning process, to the point that even highly trusted organizations with ZDR contracts only get them in a black-box encrypted form. Gemini has no issue showing its work. Why can't OpenAI?
The customer service itself was surreal enough that it was easier just to migrate to Anthropic
It's not weird at all. R1-distills have shown that you can get pretty close to the real thing with post-training on enough completions. I believe gemini has also stopped showing the thinking steps (apparently the GLM series of open access models were heavily trained on gemini data).
ToS violations can't be enforced in any effective way, and certainly not cross-borders. Their only way to maintain whatever moat thinking models give them is to simply not show the thinking parts.
And yes, the above is true even if you are ULTRA.
You can still view your old thinking traces from prior turns and conversations.
skeptrune•5h ago
Reasoning summaries also look great. Anything that provides extra explainability is a win in my book.