X25 / Documentation
Getting started

X25: Autonomous LLM Routing

X25 routes every LLM call to the cheapest model that meets your quality bar. It learns per project using Thompson Sampling, judges quality automatically, escalates when needed, and trains a custom model on your call history at Stage 4.

79% cheaper than always-frontier, with equal or better quality on most tasks. Three lines of code. No model selection. No fallback logic. No prompt engineering for routing.

5-minute setup

  1. Install the SDK
    pip install x25-sdk
    bash
  2. Generate your API key

    Log into your dashboard with appropriate credentials after creating an account. Go to the hamburger menu on the top navigation bar and click on Generate API Key. Check your organisation name, then click Proceed. You can now copy your retrieved API key to authenticate your SDK calls.

  3. Route your first call
    from  x25 import X25
    
    agent = X25(
        api_key="sk-x25-a1b2c3d4...",
        prompt="your-prompt",
        org="my-research-lab")
    
    result = agent.complete()
    
    print(result)
    python

Installation

The SDK has two components: the Python client (lightweight, just httpx) and the gateway server (runs locally or on your infrastructure).

SDK only

pip install x25-sdk
bash

Gateway + full stack

pip install -r requirements.txt
# includes FastAPI, LangGraph, LangChain, OpenAI, aiosqlite, numpy
bash

Environment variables

# .env
OPENROUTER_API_KEY=sk-or-...    # required, routes to 300+ models
OPENAI_API_KEY=sk-...           # required, LLM-as-judge + classifier
bash

Authentication

Every project gets an API key scoped to org. X25 uses this to isolate learning, savings, and audit records per project.

POST /keys/create
Field Type Description
org string Your project identifier. Used to track learning, savings, and stage.
rate_limit_rpm int Requests per minute. Default 0 = unlimited.

Pass the key as a Bearer token on every request:

Authorization: Bearer sk-x25-a1b2c3d4...
http
Keys are shown once at creation. Store them in your .env file immediately.
SDK

X25()

Create an agent instance. The agent holds your project config and re-uses it across calls.

X25(
    org: str = "default",
    optimize_for: dict = {"cost": 0.33, "quality": 0.34, "latency": 0.33},
    policy: dict = {},
    gateway_url: str = "http://localhost:8000",
    api_key: str = None,
)
python
Parameter Type Description
org str Your project ID. X25 learns per-project. Separate projects get separate Thompson arms, audit logs, and stage progression.
optimize_for dict Weights summing to 1.0 across cost, quality, latency. See optimize_for.
policy dict Hard constraints. E.g. {"max_cost_per_call_usd": 0.02}.
gateway_url str Where the X25 gateway is running.
api_key str Your project API key from POST /keys/create.

agent.complete()

Route a prompt to the best available model. X25 classifies the task, selects a tier, dispatches, judges quality, escalates if needed, and updates its own learning. All in one call.

agent.complete(
    prompt: str,
    hint: str = None,
) -> X25Response
python
Parameter Type Description
prompt str The text to route.
hint str Optional task type hint. One of: summary, code, classification, extraction, qa, reasoning, creative. If omitted, X25 classifies automatically.

X25Response

@dataclass
class X25Response:
    text:             str    # the model's answer
    model_used:       str    # e.g. "deepseek/deepseek-v4-flash"
    provider:         str    # e.g. "deepseek"
    task_type:        str    # classified task: "summary", "code", etc.
    cost_usd:         float  # actual cost of this call
    latency_ms:       float  # end-to-end latency in milliseconds
    quality_score:    float  # 0.0–1.0, LLM-as-judge
    cascade_steps:    int    # 1 = first tier passed, 2+ = escalated
    audit_hash:       str    # SHA-256 hash, tamper-evident
    goal_match:       dict   # per-dimension score vs your optimize_for
python

optimize_for

Tell X25 what matters most for your project. Weights must sum to 1.0. X25 normalises automatically if they don't.

Key What it controls Example value
cost How aggressively to prefer cheaper tiers 0.6
quality Minimum quality bar and reward weight 0.3
latency Preference for faster models 0.1

Common profiles

# Cost-first, for high-volume pipelines
optimize_for={"cost": 0.7, "quality": 0.2, "latency": 0.1}

# Quality-first, for research and publication
optimize_for={"cost": 0.1, "quality": 0.8, "latency": 0.1}

# Balanced, default
optimize_for={"cost": 0.33, "quality": 0.34, "latency": 0.33}

# Latency-first, for real-time use
optimize_for={"cost": 0.1, "quality": 0.2, "latency": 0.7}
python
API Reference

POST /route

The core routing endpoint. The SDK calls this on every agent.complete().

POST/route
Field Type Description
prompt string The text to route.
org string Project ID. Resolved from API key if provided.
optimize_for object Cost / quality / latency weights.
hint string? Optional task type hint.
policy object? Hard constraints.

GET /stats/{project}

Aggregated stats for a project. Use all to aggregate across all projects.

GET/stats/my-research-lab
200 Response
{ "total_calls": 301, "total_cost_usd": 0.055, "total_saved_usd": 0.307, "avg_quality": 0.957, "avg_latency_ms": 10341, "model_distribution": { "deepseek-v4-flash": 233, "...": 68 }, "tier_distribution": { "slm": 269, "mid": 14, "frontier": 18 } }

GET /stage/{project}

Current stage and progression for a project.

GET/stage/my-research-lab
200 Response
{ "stage": 3, "stage_name": "Feedback", "total_calls": 235, "calls_to_next_stage": 265, "progress_in_stage": 0.117, "improvement_available": true }

GET /thompson/{project}

Live Thompson Sampling state. Shows Beta distributions for each tier.

GET/thompson/my-research-lab
200 Response
{ "org": "my-research-lab", "arms": [ { "tier": "slm", "alpha": 207.4, "beta": 2.0, "mean_reward": 0.99, "confidence": 1.0 }, { "tier": "mid", "alpha": 3.0, "beta": 2.6, "mean_reward": 0.54, "confidence": 0.11 }, { "tier": "frontier", "alpha": 5.0, "beta": 4.2, "mean_reward": 0.54, "confidence": 0.18 } ] }
Concepts

Stages

X25 autonomously advances through four stages as calls accumulate. Each stage unlocks new capabilities.

Stage 1
Explore
0 – 49 calls
X25 tries all three tiers and learns which models handle your task types reliably.
Stage 2
Exploit
50 – 199 calls
Routing converges to the cheapest reliable tier. Drift detection starts weekly.
Stage 3
Feedback
200 – 499 calls
POST /feedback unlocked. Submit labelled examples to improve routing accuracy.
Stage 4
Fine-tune
500+ calls
Audit log extracted to JSONL. Unsloth LoRA training script generated. Custom model enters routing pool at ~$0.01/1M tokens.

Thompson Sampling

X25 models each tier as a Beta distribution Beta(α, β) representing the posterior over "reward probability": how often this tier passes quality on your tasks.

# Each call:
θ_slm      = Beta(α_slm, β_slm).sample()
θ_mid      = Beta(α_mid, β_mid).sample()
θ_frontier = Beta(α_frontier, β_frontier).sample()
tier       = argmax([θ_slm, θ_mid, θ_frontier])

# After observing quality, update the dispatched arm:
α += reward
β += (1 - reward)
python

Cheaper tiers naturally receive more traffic as their α grows. Frontier is selected only when SLM and mid lack sufficient evidence.

Audit trail

Every routing decision is written to a hash-chained audit log. Any modification to a historical record breaks the chain.

# Each record contains:
record_hash = SHA256(prev_hash + record_content)

# Verify integrity:
curl http://localhost:8000/verify
# → {"intact": true, "total_records": 301}
bash
The audit trail is the foundation of Stage 4. The call history is extracted, formatted as Alpaca JSONL, and used to fine-tune a custom SLM on your exact task patterns.

Fine-tuning (Stage 4)

At 500 calls, X25 extracts your highest-quality call history and generates a LoRA fine-tuning script for Llama 3.2 3B.

# Trigger at Stage 4:
curl -X POST http://localhost:8000/improve/my-research-lab \
  -H "Authorization: Bearer sk-x25-..."
bash
Step What happens
Extract Audit log filtered for quality_score > 0.7, formatted as Alpaca JSONL
Train Unsloth LoRA on Llama 3.2 3B, ~45 min on Colab T4, ~8GB VRAM
Register POST /improve/{project}/register. Custom model enters routing pool at tier=slm
Route Thompson Sampling explores the custom model. If reward >= 0.6 consistently, selected for ~85-92% of calls at $0.01/1M tokens
Views

Manager terminal

A business-facing view of X25. Shows weekly savings, call volume, model mix, and a chat interface where managers can ask questions about AI spend and routing decisions. No code required.

x25 routing terminal
Open full screen ↗

What managers see

Panel What it shows
Hero banner Weekly savings vs naive frontier, total calls, quality delta
Chat Ask questions about AI spend. X25 answers with routing receipts showing exactly which model ran and why
Observatory 30-day savings curve, model mix by call volume, recent routing decisions table
Ticker Live model prices, router uptime, queue depth, saved today