Documentation

API reference

Speka speaks the OpenAI API dialect. If you've used OpenAI, you already know how to use us — point your client at our base URL and use any model id from the catalog.

Quickstart

Base URL:

base-url
https://api.speka.online/v1

Install the OpenAI SDK and make your first call:

quickstart.py
from openai import OpenAI

client = OpenAI(
    base_url="https://api.speka.online/v1",
    api_key="sk-speka-live-...",  # your Speka key
)

resp = client.chat.completions.create(
    model="meta/llama-3.3-70b-instruct",
    messages=[{"role": "user", "content": "Write a haiku about GPUs."}],
    stream=True,
)
for chunk in resp:
    print(chunk.choices[0].delta.content or "", end="")

Authentication

Pass your key in the Authorization header as a bearer token. Create and revoke keys in your dashboard. Keys are shown once — store them securely.

auth
Authorization: Bearer sk-speka-live-...

Chat completions

POST /v1/chat/completions — supports messages, temperature, max_tokens, tools, response_format and more.

chat.ts
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.speka.online/v1",
  apiKey: process.env.SPEKA_API_KEY, // sk-speka-live-...
});

const stream = await client.chat.completions.create({
  model: "deepseek-ai/deepseek-r1",
  messages: [{ role: "user", content: "Solve: 23 * 47" }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

Streaming

Set stream: true to receive server-sent events. We proxy the upstream stream directly, so time-to-first-token stays low.

Embeddings

POST /v1/embeddings returns vectors for retrieval and semantic search.

embeddings.sh
curl https://api.speka.online/v1/embeddings \
  -H "Authorization: Bearer sk-speka-live-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nvidia/llama-3.2-nv-embedqa-1b-v2",
    "input": ["The quick brown fox"]
  }'

Image generation

POST /v1/images/generations with an image model id such as black-forest-labs/flux.1-dev returns generated images.

Errors & rate limits

Errors use the OpenAI envelope: { "error": { "message", "type", "code" } }. Common statuses:

  • 401Missing or invalid key.
  • 402Usage allowance exhausted — upgrade or add credits.
  • 429Rate limit exceeded — see the Retry-After header.
  • 5xxUpstream issue — we auto-retry across capacity.