OpenAI-Compatible API Providers: 2026 Matrix
A June 2026 matrix of AI providers with OpenAI-compatible APIs: endpoints, streaming, tool calling, vision, and gotchas. Plus how Speka's gateway normalizes them.
Which AI Providers Have an OpenAI-Compatible API? (2026 Matrix)
Last updated: June 2026
Key takeaways
- OpenAI compatibility means an API exposes a
/v1/chat/completions(and usually/v1/embeddings) endpoint that accepts the same request/response JSON shape as OpenAI, so the official OpenAI SDK works by changing onlybase_urland the API key. - "Compatible" is not binary. Most providers cover chat completions and streaming, but support for tool calling, JSON mode, vision, and image generation varies per model — verify the specific model, not just the platform.
- Self-hosting stacks (vLLM, SGLang, TGI, Ollama) all ship an OpenAI-compatible server, so the same client code runs against local and hosted models.
- Gateways like Speka, OpenRouter, and Together AI normalize many model providers behind one OpenAI-shaped API and one key, trading raw provider count for consistency and unified billing.
- Speka is an OpenAI-compatible gateway with 16 frontier models from 7 labs at
https://speka.me/v1; it's a drop-in for the OpenAI SDK with native tool calling, JSON mode, SSE streaming, embeddings, and image generation.
What does "OpenAI-compatible API" actually mean?
An OpenAI-compatible API is any HTTP API that implements the same endpoints, request body, and response schema as the OpenAI Chat Completions API. In practice that means a provider exposes POST /v1/chat/completions, accepts a messages array plus parameters like model, temperature, tools, and stream, and returns the familiar choices[].message (or streamed choices[].delta) structure. Because the contract matches, you can point the official OpenAI Python SDK — or any OpenAI client — at the provider by setting base_url and api_key, and the rest of your code is unchanged. Authentication is almost always an HTTP Authorization: Bearer <key> header per RFC 6750, and streaming is delivered as Server-Sent Events.
The catch: compatibility is a spectrum. A provider can be compatible for basic chat but diverge on tool calling, structured outputs, the newer Responses API, Assistants/Threads, or batch/files endpoints. Treat the matrix below as a starting map and confirm the exact capability for the model you intend to call.
Which AI providers have an OpenAI-compatible API? (matrix)
The table below summarizes major hosts and self-hosted servers. All rows are as of June 2026 — verify before relying on them, because endpoints and per-model capabilities change frequently.
| Provider / stack | Type | OpenAI-compatible endpoint | Streaming (SSE) | Native tool calling | Vision | Image gen | Notes / gotchas (as of June 2026, verify) |
|---|---|---|---|---|---|---|---|
| Speka | Gateway | https://speka.me/v1 |
Yes | Yes | Yes (Llama 3.2 Vision) | Yes (FLUX) | 16 models / 7 labs; drop-in OpenAI SDK; JSON mode, embeddings. |
| OpenRouter | Gateway | Yes — documented drop-in | Yes | Yes (model-dependent) | Yes (model-dependent) | Yes | ~300+ models/providers (sources range 300+ to 500+); built-in usage analytics; free-model catalog with rate limits; BYOK. |
| Together AI | Gateway | https://api.together.ai/v1 |
Yes | Yes (most chat models) | Yes (some) | Yes | 200+ models; namespaced model ids; Responses API, Assistants/Threads, OpenAI-shaped Batch/Files and moderation not supported. |
| vLLM | Self-host | OpenAI-compatible server | Yes | Yes (model-dependent) | Model-dependent | No | You run the server; --api-key optional; capabilities depend on the served model. |
| SGLang | Self-host | OpenAI-compatible server | Yes | Yes (model-dependent) | Model-dependent | No | High-throughput serving runtime; you operate it. |
| Hugging Face TGI | Self-host | Messages API (/v1/chat/completions) |
Yes | Yes (model-dependent) | Model-dependent | No | "Messages API" is the OpenAI-compatible surface. |
| Ollama | Local / self-host | Partial OpenAI compat | Yes | Yes (model-dependent) | Model-dependent | No | Runs models locally; documented OpenAI-compatible endpoints, partial coverage. |
For self-hosting, see the vLLM OpenAI-compatible server docs, the SGLang repository and SGLang docs, Text Generation Inference (TGI) with its Messages API, and Ollama with its OpenAI compatibility docs. Two caveats on the table: model counts shift constantly, and "Yes (model-dependent)" means the platform supports the feature but a given model may not — always check the model card.
How do I call an OpenAI-compatible provider? (curl + Python)
The whole point of compatibility is that the request looks identical across providers; only base_url, key, and model change. Here is a chat completion against Speka with curl:
curl https://speka.me/v1/chat/completions \
-H "Authorization: Bearer sk-speka-live-..." \
-H "Content-Type: application/json" \
-d '{
"model": "meta/llama-3.3-70b-instruct",
"messages": [
{"role": "system", "content": "You are a precise assistant."},
{"role": "user", "content": "Explain SSE streaming in one sentence."}
],
"stream": false
}'
The same call with the OpenAI Python SDK — note that only two lines differ from a stock OpenAI setup:
from openai import OpenAI
client = OpenAI(
base_url="https://speka.me/v1",
api_key="sk-speka-live-...",
)
resp = client.chat.completions.create(
model="meta/llama-3.3-70b-instruct",
messages=[
{"role": "user", "content": "Give me a 3-item checklist for API key rotation."}
],
)
print(resp.choices[0].message.content)
Because Speka speaks the OpenAI wire format, the same base_url swap works in LangChain's ChatOpenAI, LlamaIndex, and workflow tools like n8n wherever they accept a custom OpenAI base URL.
What breaks when a provider is only "mostly" compatible?
Compatibility gaps tend to cluster in a few predictable places. Tool calling is the most common: a platform supports the tools parameter, but a specific model returns malformed arguments or ignores them. Structured outputs / JSON mode may be advertised platform-wide yet enforced only on newer models. The Responses API, Assistants/Threads, and OpenAI-shaped Batch/Files endpoints are frequently absent on third-party hosts — Together AI, for example, documents that these are not supported. Model id namespacing also bites: hosted open models are usually namespaced (e.g. meta-llama/Llama-4-Maverick on Together, meta/llama-3.3-70b-instruct on Speka), so a hardcoded OpenAI model name will 404.
The defensive move is to treat each capability as a per-model fact. Before shipping, send a probe request that exercises tool calling, streaming, and JSON mode against the exact model id you plan to use in production.
How does a unified gateway normalize different providers?
A gateway sits between your application and many upstream model providers and presents a single OpenAI-compatible surface. It normalizes three things: the wire format (one request/response schema regardless of which lab built the model), authentication (one Bearer key instead of one per provider), and billing/observability (one invoice and one usage view). OpenRouter and Together AI both follow this pattern across hundreds of models; OpenRouter additionally exposes a built-in usage analytics dashboard, and Together offers usage/spend dashboards.
Speka takes a deliberately smaller, curated approach: 16 frontier models from 7 labs (DeepSeek, NVIDIA, Meta, Mistral AI, Moonshot AI, OpenAI, Black Forest Labs) behind one https://speka.me/v1 endpoint. Every model is reachable with the same key prefix sk-speka-live-... and the same OpenAI client code. That includes reasoning models like DeepSeek V4 Flash (deepseek-ai/deepseek-v4-flash, 128K context, $0.27/$1.10 per 1M in/out) and NVIDIA's Nemotron Super 49B; chat models like Llama 3.3 70B Instruct ($0.20/$0.20), Llama 4 Maverick, Mistral Large 3, and Kimi K2.6; a code model in GPT-OSS 120B ($0.15/$0.60); Llama 3.2 90B/11B Vision; NV-EmbedQA E5 v5 embeddings; and image generation via FLUX.1 [dev] ($0.04/image) and FLUX.1 [schnell] ($0.02/image).
Which models does Speka host, and what do they cost?
Pricing is usage-based per token, every plan includes a monthly allowance, and overage is billed at standard per-token rates with no penalties. A few representative entries:
| Model | Speka model id | Type | Price (per 1M in / out) |
|---|---|---|---|
| DeepSeek V4 Flash | deepseek-ai/deepseek-v4-flash |
Reasoning | $0.27 / $1.10 |
| Nemotron Nano 9B | nvidia/llama-3.3-nemotron-super-49b-v1.5 (Nano variant) |
Reasoning | $0.10 / $0.10 |
| Llama 3.3 70B Instruct | meta/llama-3.3-70b-instruct |
Chat | $0.20 / $0.20 |
| Llama 3.1 8B Instruct | meta/llama-3.1-8b-instruct |
Chat | $0.05 / $0.05 |
| Mistral Large 3 | mistralai/mistral-large-3-675b-instruct-2512 |
Chat | $0.90 / $2.70 |
| Kimi K2.6 | moonshotai/kimi-k2.6 |
Chat | $0.50 / $2.00 |
| GPT-OSS 120B | openai/gpt-oss-120b |
Code | $0.15 / $0.60 |
| Llama 3.2 90B Vision | meta/llama-3.2-90b-vision |
Vision | $0.35 / $0.40 |
| NV-EmbedQA E5 v5 | nvidia/nv-embedqa-e5-v5 |
Embeddings | $0.01 / 1M |
See the full catalog and per-model spec on /models, and the plan tiers on /pricing. Source labs include Meta's Llama 3.3 70B, DeepSeek, GPT-OSS 120B, FLUX.1-dev from Black Forest Labs, Mistral AI, and models served via NVIDIA NIM.
Frequently asked questions
What is an OpenAI-compatible API?
An OpenAI-compatible API implements the same endpoints and JSON schema as OpenAI's Chat Completions API, typically POST /v1/chat/completions with a messages array. Because the request and response shapes match, you can use the official OpenAI SDK by changing only the base URL and API key. Coverage of advanced features like tool calling or vision varies by provider and model.
Can I use the OpenAI Python SDK with Speka?
Yes. Speka is a drop-in for the OpenAI SDK. Set base_url="https://speka.me/v1" and api_key="sk-speka-live-...", then call client.chat.completions.create(...) with a Speka model id such as meta/llama-3.3-70b-instruct. Streaming, tool/function calling, JSON mode, and embeddings work through the same client without additional libraries. See /docs for endpoint details.
Is OpenRouter or Together AI more compatible than Speka?
They differ in scope, not in the core contract. OpenRouter (300+ models, sources vary up to 500+) and Together AI (200+ models) expose far larger catalogs and are documented as OpenAI-compatible, though some OpenAI endpoints like Responses or Assistants are not supported on Together. Speka curates 16 frontier models with the same OpenAI-compatible chat, tools, JSON, embeddings, and image generation surface.
Does OpenAI compatibility include tool calling and vision?
Not automatically. A platform may accept the OpenAI tools parameter and image inputs while a specific model on it does not support them. Compatibility guarantees the request/response shape, not per-model capability. Always confirm tool calling, structured outputs, and vision against the exact model id you plan to use, ideally with a probe request before production.
How do I self-host an OpenAI-compatible endpoint?
Use a serving runtime that ships an OpenAI-compatible server: vLLM, SGLang, and Hugging Face TGI (via its Messages API) all expose /v1/chat/completions, and Ollama provides documented OpenAI-compatible endpoints for local models. You start the server, point the OpenAI SDK at its base URL, and capabilities like tool calling and vision depend on the model you load.
Are model names the same across OpenAI-compatible providers?
No. OpenAI uses ids like gpt-4o, while hosted open-model providers namespace by lab, e.g. meta/llama-3.3-70b-instruct on Speka or meta-llama/Llama-4-Maverick on Together AI. A model id valid on one provider will usually 404 on another. Keep model ids configurable rather than hardcoded, and read each provider's model list before switching base URLs.
Try it on Speka
If you already use the OpenAI SDK, you can be calling 16 frontier models in two lines of change: swap base_url to https://speka.me/v1 and drop in your sk-speka-live-... key. The Free plan needs no credit card and includes $1 of usage at 10 rpm, so you can validate compatibility before committing. Create a key on /signup, browse the catalog on /models, and read the integration guide in the docs.
Build with every frontier model
One agentic, OpenAI-compatible API key. Your first key is free and takes 30 seconds.