For builders
Inference your existing OpenAI code already speaks.
Point your OpenAI client at the Orogen gateway and your code runs unchanged, at roughly $0.40 per 1M output tokens for Llama-3.1-70B. Every response comes back with a signed receipt you can verify yourself. You change one line, the base URL, and nothing else in your application has to move.
Quickstart
Five minutes from your first request.
Get an API key, set your base URL to the Orogen gateway, and send the same request body you already send to OpenAI. Chat completions, completions, embeddings, and SSE streaming all behave the way your client expects. Here is the whole change:
from openai import OpenAI
client = OpenAI(
api_key="orog_...",
base_url="https://gateway.orogen.network/v1",
)
response = client.chat.completions.create(
model="llama-3.1-70b-instruct",
messages=[{"role": "user", "content": "Summarize this contract clause..."}],
)That request runs today against the live gateway. When you want to pick a hardware tier per request or read the receipt object directly, switch to the Orogen SDK for Python or TypeScript. The plain OpenAI client keeps working either way.
Verifiability
A receipt you can check yourself.
Most inference networks rent you a GPU and ask you to trust whatever comes back. You cannot tell which model actually answered, whether it ran the quantization you paid for, or whether the hardware was what it claimed. Orogen works the other way around. Every response comes back with a signed receipt that binds the model, your input, the output, and a hardware attestation. You verify it locally in one call, and the chain stores the commitment so a disputed result has a record to arbitrate against.
- The receipt binds model and adapter hashes, your prompt hash, the output hash, and the operator's hardware attestation.
- Verify it client-side with a single SDK call, or check the commitment against the chain yourself.
- Validators independently re-run a sample of jobs on identical hardware, and any mismatch opens a challenge.
Pricing
Pricing you can read off a single line.
Roughly $0.40 per 1M output tokens for Llama-3.1-70B at the standard datacenter tier, against $10 to $15 for comparable closed-model APIs. You pick the hardware tier per request, you pay only for the tokens you use, and there is no subscription floor. Credits are USD-pegged, so what you top up is what you spend.
- Pay-as-you-go from the first token, with no monthly minimum.
- Per-request tier selection, from premium datacenter GPUs down to commodity hardware, priced accordingly.
- Session-pinned multi-turn calls return to the same warm operator, which lowers your marginal-turn cost.
| Tier | Hardware floor | Typical models | Pricing | Use case |
|---|---|---|---|---|
| dc-premium | 8× B200 / 8× H200 / NVL72 | DeepSeek-V3 671B, Llama-4-MoE, frontier MoE | 1.0× base | Enterprise frontier-tier, premium latency SLA |
| dc-standard | 8× H100 SXM | 30–70B dense, Mixtral, large MoE | 0.6× base | Mainstream production inference |
| cloud-rented | 1–2× H100 PCIe / H200 | 7–30B dense | 0.4× base | Spot capacity, secondary regions |
| prosumer | 1–2× RTX 5090 / PRO 6000 | 7–14B quantized | 0.25× base | Lower-tier, hobbyist-friendly |
| edge | Mac Studio Ultra, dual 3090 | ≤ 32B single-user | 0.15× base | Private single-tenant inference |
| embed-only | CPU AVX-512, Apple M-series | Embeddings, re-ranking, classification | 0.10× base | Cheap embedding + classification flows |
Stake floors are at-launch parameters; governance can adjust ±20% per epoch with timelock. Verification layers are described in How it works.
Service levels
A real SLA, per tier.
Each hardware tier commits the operator to a published latency target, a verification posture, and deterministic kernels, and your request inherits that commitment when you choose the tier. Routing accounts for served demand, attestation status, and prior verification results, so a request lands on hardware that has earned it. The standard datacenter tier carries replay sampling on a share of jobs. Lower-cost tiers carry more sampling, because cheaper hardware earns less benefit of the doubt.
What to expect
Where verifiability ends, said plainly.
The receipt is cryptographic at the response level. You can prove that a specific operator returned a specific output for your specific input, on attested hardware. Coverage across the network is statistical. Validators replay a sample of jobs rather than every job, which is what keeps inference cheap while still catching operators who cut corners. We do not claim a HIPAA-equivalent posture today, and side-channel risk on shared GPU hardware is a known class of concern that the attestation layer reduces rather than eliminates. We disclose any confirmed verification or security issue within 30 days. If your workload needs proof on every single query, the zkML path exists and earns its cost only for high-stakes jobs, so talk to us before you build on it.
- Cryptographic proof per response, statistical coverage across the network.
- No HIPAA-equivalent claim today, and side-channel risk on shared hardware is named, not hidden.
- Confirmed issues disclosed within 30 days.
Who it is for
Metered compute, built for applications.
Orogen is an inference API you wire into a product, not a free chatbot. You bring your own application, you pay for the tokens you generate, and you get back work you can audit. If you are running OpenAI in production and want the same interface at open-model prices with a receipt attached, the move is one base URL away.