skip to content
docs · beta

The shortest path from a model file to a useful runtime.

These pages are a working draft. The engine, the storage layer, and the CLI all exist; the documentation around them is catching up. Read the {quickstart} first — it’ll take you about ten minutes to a first generated token on your own machine.

01
ten minutes, one token

Quickstart

Conifer ships as a single native binary plus an optional desktop shell. Install via the curl one-liner, point it at a GGUF file, and watch it pick a quantization, load it, and start streaming.

$ curl -fsSL https://conifer.build/install.sh | sh
$ conifer init # detect hardware, write ~/.conifer/config.toml
$ conifer pull qwen2-7b # fetch + quantize for your machine
$ conifer run qwen2-7b "summarize: $(cat README.md)"

For the desktop app, see /install. For the browser-only surface (no install at all), see /chat.

02
what conifer owns

Concepts

Conifer isn’t a model — it’s everything that has to be true before a model can run well on a particular machine. The pieces:

engine
The inference runtime. Reads GGUF/safetensors, fuses ops for your accelerator, schedules tokens, returns logits. Backed by conifer-engine, a native Rust binary.
model store
The on-disk pool of weights, manifests, and quantization variants. Conifer dedupes shared tensors across models and never downloads something it already has on disk.
quantization planner
Given a target accelerator, picks the precision profile that fits in memory and stays above your latency floor. Defaults are opinionated; everything is overridable.
scheduler
Owns the latency budget. Decides batch size, KV-cache placement, and prefetch order. Optimizes for first-token latency at batch 1, not throughput.
hardware profile
What Conifer learned about your machine the first time it ran: accelerator type, memory bandwidth, unified-memory yes/no, thermal headroom. Stored at ~/.conifer/profile.json; refresh with conifer init --reprobe.
03
recipes

How-to

pick a model that fits
conifer fit takes a model name (or URL) and reports whether it fits, at what quantization, and what tokens/sec you can expect at batch 1. No download until you accept the plan.
embed conifer in your app
Use the CLI’s --json stream mode and pipe it into your process, or call the engine’s C ABI directly. See reference for the FFI surface.
run a headless server
conifer serve exposes an OpenAI-compatible /v1/chat/completions on localhost:8080. No authentication by default — bind to a unix socket if you want access control.
diagnose a slow first token
conifer trace writes a per-phase breakdown of one generation. Look for the phase taking the most wall time and adjust the scheduler from there.
04
surfaces

Reference

CLI
Full flag listing lives in conifer --help and in this page’s next pass. The shape that matters today: conifer {init|pull|fit|run|serve|trace}.
HTTP API
OpenAI-compatible subset: /v1/chat/completions, /v1/models, /v1/embeddings. Streaming via server-sent events. Conifer-specific extensions sit under /v1/conifer/* and are documented inline.
FFI
The Rust crate exports a stable C ABI (conifer_engine_init, conifer_engine_step, conifer_engine_free). Header file ships next to the binary releases.
config file
~/.conifer/config.toml. Tier overrides, model paths, accelerator selection, scheduler knobs. conifer config edit opens it in $EDITOR.

Spotted something wrong? File it on GitHub or write to [email protected]. We’d rather hear about a wrong line of docs than ship around it.