Drop-in OpenAI replacement. Runs locally via Ollama.
Point your client's base_url at it.
Your models, your keys, zero cloud.
Corporate API
Usage bills you can't predict. Rate limits you can't control. Your prompts on someone else's servers.
BlackBox
Unlimited calls. Zero per-token cost. Every request stays on your machine. No rewrites needed.
Bearer token extracted from header, SHA-256 hashed, matched against local SQLite api_keys table. Raw key shown once — on creation.
Standard OpenAI payload intercepted and mapped to Ollama schema. Routed asynchronously via httpx to your local model. No streaming delay.
POST /v1/chat/completionsEvery request logged locally — model, tokens, latency, key used. Query and filter at any time. Nothing leaves your machine.
GET /v1/usageQuickstart
Run it. Grab the auto-generated admin key from stdout. Point your existing OpenAI client at localhost:8000. Nothing else changes.
# install pip install -r requirements.txt # start — admin key printed once to stdout uvicorn app.main:app --host 0.0.0.0 --port 8000 # nothing else changes in your code from openai import OpenAI client = OpenAI( base_url="http://localhost:8000/v1", api_key="bk-xxxxxxxxxxxxxxxxxxxx", ) resp = client.chat.completions.create( model="llama3", messages=[{"role": "user", "content": "Hello"}], ) → works. exactly like OpenAI. locally.