Getting started

X25: Autonomous LLM Routing

X25 routes every LLM call to the cheapest model that meets your quality bar. It learns per project using Thompson Sampling, judges quality automatically, escalates when needed, and trains a custom model on your call history at Stage 4.

✦

79% cheaper than always-frontier, with equal or better quality on most tasks. Three lines of code. No model selection. No fallback logic. No prompt engineering for routing.

5-minute setup

Install the SDK
```
pip install x25-sdkbash
```
Generate your API key
Log into your dashboard with appropriate credentials after creating an account. Go to the hamburger menu on the top navigation bar and click on Generate API Key. Check your organisation name, then click Proceed. You can now copy your retrieved API key to authenticate your SDK calls.

Route your first call

from  x25 import X25

agent = X25(
    api_key="sk-x25-a1b2c3d4...",
    prompt="your-prompt",
    org="my-research-lab")

result = agent.complete()

print(result)python

Installation

The SDK has two components: the Python client (lightweight, just httpx) and the gateway server (runs locally or on your infrastructure).

SDK only

pip install x25-sdkbash

Gateway + full stack

pip install -r requirements.txt
# includes FastAPI, LangGraph, LangChain, OpenAI, aiosqlite, numpybash

Environment variables

# .env
OPENROUTER_API_KEY=sk-or-...    # required, routes to 300+ models
OPENAI_API_KEY=sk-...           # required, LLM-as-judge + classifierbash

Authentication

Every project gets an API key scoped to org. X25 uses this to isolate learning, savings, and audit records per project.

POST /keys/create

Field	Type	Description
org	string	Your project identifier. Used to track learning, savings, and stage.
rate_limit_rpm	int	Requests per minute. Default `0` = unlimited.

Pass the key as a Bearer token on every request:

Authorization: Bearer sk-x25-a1b2c3d4...http

⚠

Keys are shown once at creation. Store them in your .env file immediately.

SDK

X25()

Create an agent instance. The agent holds your project config and re-uses it across calls.

X25(
    org: str = "default",
    optimize_for: dict = {"cost": 0.33, "quality": 0.34, "latency": 0.33},
    policy: dict = {},
    gateway_url: str = "http://localhost:8000",
    api_key: str = None,
)python

Parameter	Type	Description
org	str	Your project ID. X25 learns per-project. Separate projects get separate Thompson arms, audit logs, and stage progression.
optimize_for	dict	Weights summing to 1.0 across `cost`, `quality`, `latency`. See optimize_for.
policy	dict	Hard constraints. E.g. `{"max_cost_per_call_usd": 0.02}`.
gateway_url	str	Where the X25 gateway is running.
api_key	str	Your project API key from `POST /keys/create`.

agent.complete()

Route a prompt to the best available model. X25 classifies the task, selects a tier, dispatches, judges quality, escalates if needed, and updates its own learning. All in one call.

agent.complete(
    prompt: str,
    hint: str = None,
) -> X25Responsepython

Parameter	Type	Description
prompt	str	The text to route.
hint	str	Optional task type hint. One of: `summary`, `code`, `classification`, `extraction`, `qa`, `reasoning`, `creative`. If omitted, X25 classifies automatically.

X25Response

@dataclass
class X25Response:
    text:             str    # the model's answer
    model_used:       str    # e.g. "deepseek/deepseek-v4-flash"
    provider:         str    # e.g. "deepseek"
    task_type:        str    # classified task: "summary", "code", etc.
    cost_usd:         float  # actual cost of this call
    latency_ms:       float  # end-to-end latency in milliseconds
    quality_score:    float  # 0.0–1.0, LLM-as-judge
    cascade_steps:    int    # 1 = first tier passed, 2+ = escalated
    audit_hash:       str    # SHA-256 hash, tamper-evident
    goal_match:       dict   # per-dimension score vs your optimize_forpython

optimize_for

Tell X25 what matters most for your project. Weights must sum to 1.0. X25 normalises automatically if they don't.

Key	What it controls	Example value
cost	How aggressively to prefer cheaper tiers	`0.6`
quality	Minimum quality bar and reward weight	`0.3`
latency	Preference for faster models	`0.1`

Common profiles

# Cost-first, for high-volume pipelines
optimize_for={"cost": 0.7, "quality": 0.2, "latency": 0.1}

# Quality-first, for research and publication
optimize_for={"cost": 0.1, "quality": 0.8, "latency": 0.1}

# Balanced, default
optimize_for={"cost": 0.33, "quality": 0.34, "latency": 0.33}

# Latency-first, for real-time use
optimize_for={"cost": 0.1, "quality": 0.2, "latency": 0.7}python

API Reference

POST /route

The core routing endpoint. The SDK calls this on every agent.complete().

POST/route

Field	Type	Description
prompt	string	The text to route.
org	string	Project ID. Resolved from API key if provided.
optimize_for	object	Cost / quality / latency weights.
hint	string?	Optional task type hint.
policy	object?	Hard constraints.

GET /stats/{project}

Aggregated stats for a project. Use all to aggregate across all projects.

GET/stats/my-research-lab

200 Response

{
  "total_calls": 301,
  "total_cost_usd": 0.055,
  "total_saved_usd": 0.307,
  "avg_quality": 0.957,
  "avg_latency_ms": 10341,
  "model_distribution": { "deepseek-v4-flash": 233, "...": 68 },
  "tier_distribution": { "slm": 269, "mid": 14, "frontier": 18 }
}

GET /stage/{project}

Current stage and progression for a project.

GET/stage/my-research-lab

200 Response

{
  "stage": 3,
  "stage_name": "Feedback",
  "total_calls": 235,
  "calls_to_next_stage": 265,
  "progress_in_stage": 0.117,
  "improvement_available": true
}

GET /thompson/{project}

Live Thompson Sampling state. Shows Beta distributions for each tier.

GET/thompson/my-research-lab

200 Response

{
  "org": "my-research-lab",
  "arms": [
    { "tier": "slm", "alpha": 207.4, "beta": 2.0, "mean_reward": 0.99, "confidence": 1.0 },
    { "tier": "mid", "alpha": 3.0,   "beta": 2.6, "mean_reward": 0.54, "confidence": 0.11 },
    { "tier": "frontier", "alpha": 5.0, "beta": 4.2, "mean_reward": 0.54, "confidence": 0.18 }
  ]
}

Concepts

Stages

X25 autonomously advances through four stages as calls accumulate. Each stage unlocks new capabilities.

Stage 1

Explore

0 – 49 calls

X25 tries all three tiers and learns which models handle your task types reliably.

Stage 2

Exploit

50 – 199 calls

Routing converges to the cheapest reliable tier. Drift detection starts weekly.

Stage 3

Feedback

200 – 499 calls

POST /feedback unlocked. Submit labelled examples to improve routing accuracy.

Stage 4

Fine-tune

500+ calls

Audit log extracted to JSONL. Unsloth LoRA training script generated. Custom model enters routing pool at ~$0.01/1M tokens.

Thompson Sampling

X25 models each tier as a Beta distribution Beta(α, β) representing the posterior over "reward probability": how often this tier passes quality on your tasks.

# Each call:
θ_slm      = Beta(α_slm, β_slm).sample()
θ_mid      = Beta(α_mid, β_mid).sample()
θ_frontier = Beta(α_frontier, β_frontier).sample()
tier       = argmax([θ_slm, θ_mid, θ_frontier])

# After observing quality, update the dispatched arm:
α += reward
β += (1 - reward)python

Cheaper tiers naturally receive more traffic as their α grows. Frontier is selected only when SLM and mid lack sufficient evidence.

Audit trail

Every routing decision is written to a hash-chained audit log. Any modification to a historical record breaks the chain.

# Each record contains:
record_hash = SHA256(prev_hash + record_content)

# Verify integrity:
curl http://localhost:8000/verify
# → {"intact": true, "total_records": 301}bash

ℹ

The audit trail is the foundation of Stage 4. The call history is extracted, formatted as Alpaca JSONL, and used to fine-tune a custom SLM on your exact task patterns.

Fine-tuning (Stage 4)

At 500 calls, X25 extracts your highest-quality call history and generates a LoRA fine-tuning script for Llama 3.2 3B.

# Trigger at Stage 4:
curl -X POST http://localhost:8000/improve/my-research-lab \
  -H "Authorization: Bearer sk-x25-..."bash

Step	What happens
Extract	Audit log filtered for quality_score > 0.7, formatted as Alpaca JSONL
Train	Unsloth LoRA on Llama 3.2 3B, ~45 min on Colab T4, ~8GB VRAM
Register	`POST /improve/{project}/register`. Custom model enters routing pool at tier=slm
Route	Thompson Sampling explores the custom model. If reward >= 0.6 consistently, selected for ~85-92% of calls at $0.01/1M tokens

Views

Manager terminal

A business-facing view of X25. Shows weekly savings, call volume, model mix, and a chat interface where managers can ask questions about AI spend and routing decisions. No code required.

x25 routing terminal

Open full screen ↗

What managers see

Panel	What it shows
Hero banner	Weekly savings vs naive frontier, total calls, quality delta
Chat	Ask questions about AI spend. X25 answers with routing receipts showing exactly which model ran and why
Observatory	30-day savings curve, model mix by call volume, recent routing decisions table
Ticker	Live model prices, router uptime, queue depth, saved today