BLACK
BOX

Drop-in OpenAI replacement. Runs locally via Ollama. Point your client's base_url at it. Your models, your keys, zero cloud.

Get Started → See Why

API Endpoints

SHA
256

Key hashing

Cloud deps

∞

Models via Ollama

FastAPI· Ollama· SQLite· OpenAI-Compatible· SHA-256 Auth· Uvicorn ASGI· Async httpx· Zero Cloud· Python 3.10+· FastAPI· Ollama· SQLite· OpenAI-Compatible· SHA-256 Auth· Uvicorn ASGI· Async httpx· Zero Cloud· Python 3.10+·

Corporate API

Pay Per
Token.

Usage bills you can't predict. Rate limits you can't control. Your prompts on someone else's servers.

BlackBox

Burn CPU,
Not Cash.

Unlimited calls. Zero per-token cost. Every request stays on your machine. No rewrites needed.

How It Works

// System Logic

Authenticate

Bearer token extracted from header, SHA-256 hashed, matched against local SQLite api_keys table. Raw key shown once — on creation.

POST /v1/keys

Execute

Standard OpenAI payload intercepted and mapped to Ollama schema. Routed asynchronously via httpx to your local model. No streaming delay.

POST /v1/chat/completions

Observe

Every request logged locally — model, tokens, latency, key used. Query and filter at any time. Nothing leaves your machine.

GET /v1/usage

Quickstart

One
Command.
Done.

Run it. Grab the auto-generated admin key from stdout. Point your existing OpenAI client at localhost:8000. Nothing else changes.

quickstart.py

# install
pip install -r requirements.txt

# start — admin key printed once to stdout
uvicorn app.main:app --host 0.0.0.0 --port 8000

# nothing else changes in your code
from openai import OpenAI

client = OpenAI(
  base_url="http://localhost:8000/v1",
  api_key="bk-xxxxxxxxxxxxxxxxxxxx",
)
resp = client.chat.completions.create(
  model="llama3",
  messages=[{"role": "user", "content": "Hello"}],
)
→ works. exactly like OpenAI. locally.

BLACKBOX

Pay PerToken.

Burn CPU,Not Cash.