Great for connecting your local LLM coding and vision models to Claude Code and Codex.
General improvements
> Vision pipeline - images described by your vision model, transparent to the client
> Dual OCR pipeline - smart routing for PDFs and tool output (text extraction first, vision fallback for scanned docs). Dedicated OCR models like
> PaddleOCR-VL are ~17x faster than general vision models on document pages
> Brave & Tavily search integration - native behavior for Claude Code and Codex when configured on the proxy
> Per-model processor routing - override vision, OCR, and search settings per model
> Context window auto-detection from backends
SSE keepalive improvements during pipeline processing
Full MCP SSE endpoint for web search on OpenCode, Qwen Code, Claw, and other MCP-compatible agents
Docker update for easier deployment (limited testing so far)
Codex-specific
> Full Responses API translation - Chat Completions under the hood, your local backend doesn't need to support /v1/responses
yatesdr•4h ago
Great for connecting your local LLM coding and vision models to Claude Code and Codex.
General improvements
> Vision pipeline - images described by your vision model, transparent to the client
> Dual OCR pipeline - smart routing for PDFs and tool output (text extraction first, vision fallback for scanned docs). Dedicated OCR models like
> PaddleOCR-VL are ~17x faster than general vision models on document pages
> Brave & Tavily search integration - native behavior for Claude Code and Codex when configured on the proxy
> Per-model processor routing - override vision, OCR, and search settings per model
> Context window auto-detection from backends SSE keepalive improvements during pipeline processing Full MCP SSE endpoint for web search on OpenCode, Qwen Code, Claw, and other MCP-compatible agents Docker update for easier deployment (limited testing so far)
Codex-specific
> Full Responses API translation - Chat Completions under the hood, your local backend doesn't need to support /v1/responses
> Reasoning token display - reasoning_summary_text.delta events so Codex shows thinking natively
> Native search UI - emits web_search_call output items so Codex renders "Searched N results" in its interface
> Structured tool output - Codex's view_image returns arrays/objects, not strings. The proxy handles all three formats
> mcp_tool_call_output and mcp_list_tools input types handled (Codex sends these, other backends choke on them)
> Config generator produces config.toml with provider, reasoning effort, context window, and optional Tavily MCP
Claude Code-specific:
> Full Messages API translation - Anthropic protocol to Chat Completions, so Claude Code works with vLLM/llama-server
> Thinking blocks - backend reasoning tokens wrapped as thinking/signature_delta content blocks so Claude Code renders them
> web_search_20250305 server tool intercepted and executed proxy-side
> PDF type: "document" blocks extracted to text before forwarding
> Streaming search with server_tool_use + web_search_tool_result blocks so Claude Code shows "Did N searches"
> /anthropic/v1/messages explicit route for clients that use the Anthropic base URL convention
> Config generator produces settings.json with Sonnet/Opus/Haiku tier selectors, thinking toggles, and start scripts