Choosing a model
The setup wizard offers four model options. Your hardware decides which is realistic.
| Model | VRAM | Disk | Speed | Notes |
|---|---|---|---|---|
| qwen3.6:35b-a3b (default) | 24 GB+ | ~23 GB | Fast | MoE. 35B params, only 3B active at a time. Best quality, surprisingly fast for its size. |
| qwen3.5:9b | 8 GB+ | ~5.2 GB | Medium | Older fallback. Works fine on a 12 GB card. |
| gemma3:4b | 4 GB+ or CPU | ~3.3 GB | Slow | Google. CPU-friendly. |
| llama3.2:3b | CPU only | ~2 GB | Very slow | Anything with 4 GB RAM. Goals take much longer. |
What “fast” actually means
For the recommended qwen3.6 on a 24 GB+ GPU:
- A goal cycle is typically 200-600 seconds (3-10 min).
- One existence-loop call (picking the next goal) is ~5-30s.
- One planning call is ~30-90s.
- One LLM step (
ollama_chat) is ~50-100s. - Validation (semantic + fact-check) is ~30-120s.
On CPU with llama3.2:3b: multiply everything by ~10. The system still works but you’ll see fewer goal completions per hour.
Why qwen3.6 is the default
- Local-only (no cloud calls)
- MoE means inference is fast (3B active params) but reasoning quality matches 30B+ dense models
- Handles 32K context comfortably, important because the existence prompt is large (worldview + suffering + peer state + capabilities)
- Doesn’t produce schema-shaped output with empty fields under
format=json(a problem with some thinking-mode models)
Switching models later
Edit config.json:
{
"ollama": {
"default_model": "qwen3.5:9b"
}
}
Then python3 hollow.py stop and python3 hollow.py to restart. Make sure the model is pulled first: ollama pull qwen3.5:9b.
What about Claude / OpenAI / etc.?
Not supported and won’t be. The system is designed for sustained 24/7 unattended operation; cloud LLMs would burn $100+/day in goal-selection alone. Everything runs locally on whatever model you pick, and the substrate’s design (suffering, capability locks, lesson promotion) is calibrated for local inference speeds.
The exception is invoke_claude: agents can submit implementation requests to YOU, the operator. If you happen to use Claude Code as your editor, you can hook MCP and read/satisfy those requests there. But it’s a human-in-the-loop channel, not an agent-side LLM call.