langchainopenai-compatiblellamaindex

Connect LangChain, n8n, Dify & LlamaIndex to an OpenAI API

Q: Can I use Speka for embeddings and image generation in these frameworks?

Yes. Speka exposes OpenAI-shaped /v1/embeddings and image-generation routes. In LlamaIndex, point the embedding model at nvidia/nv-embedqa-e5-v5 using the same base URL and key. For images, call black-forest-labs/flux-1-dev or flux-1-schnell. Any framework node that supports OpenAI embeddings or image endpoints can target Speka by overriding the base URL.

Point LangChain, n8n, Dify, and LlamaIndex at any OpenAI-compatible API. Exact base_url, key, and model config to connect each framework to Speka.

Speka Engineering

Jun 8, 2026 · 8 min read

Connect LangChain, n8n, Dify & LlamaIndex to Any OpenAI-Compatible API

Last updated: June 2026

Key takeaways

Any framework that speaks the OpenAI Chat Completions API can point at Speka by changing two values: the base URL (https://speka.me/v1) and the API key (sk-speka-live-...).
LangChain and LlamaIndex accept a base_url/api_base argument on their OpenAI chat classes — no subclassing, no wrappers.
n8n and Dify wire up via an OpenAI-compatible credential/model provider with a custom base URL field in the UI; no code required.
Model IDs are namespaced and explicit. You pass real catalog IDs like meta/llama-3.3-70b-instruct or deepseek-ai/deepseek-v4-flash, not aliases.
Speka hosts 16 frontier models from 7 labs with native tool calling, JSON mode, streaming, embeddings, and image generation — all behind one OpenAI-shaped endpoint.

What does "OpenAI-compatible API" mean for these frameworks?

An OpenAI-compatible API is any HTTP endpoint that accepts and returns the same JSON shapes as OpenAI's Chat Completions endpoint: a POST /v1/chat/completions request with model, messages, and optional tools, stream, and response_format fields, authenticated with an Authorization: Bearer header per RFC 6750. Because LangChain, LlamaIndex, n8n, and Dify all build on this contract, they treat the provider as a swappable backend. To connect them to Speka, you override the base URL to https://speka.me/v1, supply a sk-speka-live-... key, and reference a model from the Speka catalog. Nothing about your prompt logic, tool definitions, or streaming code changes. The same pattern works against self-hosted servers like vLLM, SGLang, TGI, and Ollama — Speka is simply a hosted endpoint that implements the identical contract.

How do I connect LangChain to Speka?

LangChain's ChatOpenAI class takes a base_url and api_key directly. Set them and pass a real Speka model ID:

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="https://speka.me/v1",
    api_key="sk-speka-live-...",
    model="meta/llama-3.3-70b-instruct",
    temperature=0.2,
)

print(llm.invoke("Summarize the CAP theorem in two sentences.").content)

Tool calling, structured output (with_structured_output), and .stream() all work unchanged because they compile down to the standard tools and stream request parameters. For reasoning-heavy chains, swap the model to deepseek-ai/deepseek-v4-flash; for code, use openai/gpt-oss-120b. See the LangChain ChatOpenAI integration docs for the full argument list. Speka's docs list every supported parameter.

How do I connect LlamaIndex to Speka?

LlamaIndex uses api_base rather than base_url, but the idea is identical. Point its OpenAI LLM class at Speka:

from llama_index.llms.openai import OpenAI
from llama_index.core import Settings

Settings.llm = OpenAI(
    api_base="https://speka.me/v1",
    api_key="sk-speka-live-...",
    model="mistralai/mistral-large-3-675b-instruct-2512",
)

resp = Settings.llm.complete("Explain vector quantization in one paragraph.")
print(resp.text)

For retrieval pipelines you also need an embedding model. Speka exposes nvidia/nv-embedqa-e5-v5 at $0.01 per 1M tokens through the standard /v1/embeddings route, so you can set both the LLM and the embedding model to Speka backends. The LlamaIndex framework documentation covers wiring custom embedding providers. Because the embeddings endpoint is OpenAI-shaped, you reuse the same base URL and key.

How do I connect n8n to Speka?

n8n does not require code. Both n8n and Dify expose an OpenAI node / OpenAI-compatible model provider with an editable base URL, so you authenticate once in the UI and select your model.

In n8n:

Add an OpenAI credential (or any node that exposes a "Base URL" / custom endpoint field).
Set the Base URL to https://speka.me/v1.
Paste your sk-speka-live-... key as the API key.
In the node, set the Model field to a real ID such as meta/llama-4-maverick-17b-128e-instruct. If n8n only offers a dropdown of OpenAI model names, use the "specify by ID" / expression option to type the Speka model string verbatim.

The n8n documentation describes credential management and the OpenAI node. Once the credential points at Speka, every AI step in your workflow routes through the gateway.

How do I connect Dify to Speka?

Dify groups model integrations under Settings → Model Provider. Choose the OpenAI-API-compatible provider (not the first-party "OpenAI" provider, which locks the endpoint):

Open Model Provider and add a model under OpenAI-API-compatible.
Set API Base / Endpoint URL to https://speka.me/v1.
Enter your sk-speka-live-... key.
Set Model Name to the exact Speka ID, e.g. deepseek-ai/deepseek-v4-flash, and declare its capabilities (chat, tool calling, vision) and context window (128K for that model).

Dify will then offer the model in any app, agent, or workflow node. Function calling and JSON output flow through as long as you mark the model as supporting them in the provider form.

Config reference for all four frameworks

Framework	Where you set it	Base URL field	Key	Model field
LangChain	`ChatOpenAI(...)`	`base_url=`	`api_key=`	`model=`
LlamaIndex	`OpenAI(...)` LLM	`api_base=`	`api_key=`	`model=`
n8n	OpenAI credential + node	"Base URL"	API key field	Model (by ID/expression)
Dify	Model Provider → OpenAI-API-compatible	"API Base / Endpoint URL"	API key field	"Model Name" (exact ID)

The three constants never change across frameworks:

Setting	Value
Base URL	`https://speka.me/v1`
Auth header	`Authorization: Bearer sk-speka-live-...`
Example model	`meta/llama-3.3-70b-instruct`

How do I verify the endpoint with curl first?

Before wiring a framework, confirm the credential and a model ID with a raw request. This is the fastest way to isolate a 401 (bad key) from a model-not-found error:

curl https://speka.me/v1/chat/completions \
  -H "Authorization: Bearer sk-speka-live-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta/llama-3.3-70b-instruct",
    "messages": [{"role": "user", "content": "Reply with the single word: ok"}]
  }'

To stream, add "stream": true and parse the Server-Sent Events data frames. The same request also works through the OpenAI Python SDK — just set base_url="https://speka.me/v1". This is the drop-in path: change the base URL and key, keep everything else.

Which models should I reference per task?

Pass the exact catalog ID; aliases are not accepted. A few common choices:

Task	Model ID	Price (in/out per 1M)
General chat	`meta/llama-3.3-70b-instruct`	$0.20 / $0.20
Reasoning	`deepseek-ai/deepseek-v4-flash`	$0.27 / $1.10
Code	`openai/gpt-oss-120b`	$0.15 / $0.60
Cheap/fast	`meta/llama-3.1-8b-instruct`	$0.05 / $0.05
Embeddings	`nvidia/nv-embedqa-e5-v5`	$0.01 / 1M
Image	`black-forest-labs/flux-1-dev`	$0.04 / image

The full list of 16 models from 7 labs (DeepSeek, NVIDIA, Meta, Mistral AI, Moonshot AI, OpenAI, Black Forest Labs) lives on the models page, with per-token rates on the pricing page.

How does Speka compare to other OpenAI-compatible gateways?

Speka is one of several gateways that implement the OpenAI contract. The trade-off is breadth versus a curated, predictable catalog with transparent per-token pricing and no overage penalties.

Gateway	OpenAI-compatible	Approx. models	Image gen	As of
Speka	Yes (drop-in)	16 curated	Yes (FLUX)	Jun 2026
OpenRouter	Yes (drop-in)	300+ (docs); some sources cite more	Yes	Jun 2026
Together AI	Yes (base URL + key)	200+	Yes (FLUX, etc.)	Jun 2026

OpenRouter and Together AI publish larger catalogs; figures vary by source and are dated June 2026. If your workflow needs a small set of vetted frontier models with stable IDs and clear pricing, the Speka catalog is straightforward to standardize on across LangChain, LlamaIndex, n8n, and Dify.

The same base_url swap also targets self-hosted servers — vLLM, SGLang, Hugging Face TGI's Messages API, and Ollama's OpenAI compatibility layer — so you can develop against a local model and switch to Speka by changing one URL.

Frequently asked questions

Do I need to change my prompts or tool definitions to use Speka?

No. Speka implements the OpenAI Chat Completions contract, so your messages, tools, response_format, and streaming code stay identical. You change only the base URL to https://speka.me/v1 and the API key to sk-speka-live-..., then reference a Speka model ID. Function calling, JSON mode, and SSE streaming behave exactly as they do against OpenAI.

Why does my request return a model-not-found error?

Speka requires exact, namespaced model IDs rather than aliases. Use values like meta/llama-3.3-70b-instruct or deepseek-ai/deepseek-v4-flash, copied verbatim from the models page. In n8n or Dify, type the full ID into the model field instead of selecting an OpenAI default. Aliases such as "gpt-4o" or "deepseek-r1" are not hosted and will fail.

What is the difference between base_url and api_base?

They name the same thing: the root endpoint your client posts to. LangChain's ChatOpenAI uses the argument base_url, while LlamaIndex's OpenAI LLM uses api_base. Both should be set to https://speka.me/v1. The difference is purely naming convention between libraries; the underlying HTTP request to /chat/completions is identical.

Can I use Speka for embeddings and image generation in these frameworks?

Yes. Speka exposes OpenAI-shaped /v1/embeddings and image-generation routes. In LlamaIndex, point the embedding model at nvidia/nv-embedqa-e5-v5 using the same base URL and key. For images, call black-forest-labs/flux-1-dev or flux-1-schnell. Any framework node that supports OpenAI embeddings or image endpoints can target Speka by overriding the base URL.

Is there a free tier to test the integration?

Yes. The Free plan is $0/month with $1 of usage included, no credit card, one API key, and a 10 requests-per-minute limit — enough to validate a LangChain or n8n wiring end to end. Paid plans (Starter $19, Pro $99, Scale $399) raise included usage, rate limits, and key counts. Overage is billed at standard per-token rates with no penalty. See the pricing page for details.

Try it on Speka

Pick a framework, set the base URL to https://speka.me/v1, paste a sk-speka-live-... key, and reference a real model ID. The same two-line change works in LangChain, LlamaIndex, n8n, and Dify. Create a free account to generate a key, then check the docs for the full parameter reference and the models page for IDs and pricing.