The short answer
Which model should you use?
This is the question we get asked most. The honest answer depends on what you're building, what you're willing to spend, and how much you care about privacy. Here's the quick version.
“Best all-around for agent tasks”
Anthropic Claude (Sonnet 4.6 or Opus 4.6)
Strongest reasoning and tool-calling. Sonnet for daily use, Opus for complex multi-step work. This is what we use for every client project.
“Tightest budget, still cloud-hosted”
OpenAI Codex OAuth or Qwen free tier
Codex OAuth lets you reuse your ChatGPT subscription with OpenClaw. Qwen gives you 2,000 free requests per day. Both get you running without any API billing.
“Zero API cost, full privacy”
Ollama with Llama 3.3 or Qwen 2.5 Coder
Run on your own hardware. Requires a decent GPU for good speed, or CPU inference if you're patient. Data never leaves your machine.
“Want to try many models quickly”
OpenRouter (200+ models, one API key)
Swap models without managing multiple accounts. Pay provider rates plus a small markup. We use this whenever we're evaluating a new model.
“Team with spend controls needed”
LiteLLM proxy with virtual keys
Self-hosted proxy. Set max_budget and budget_duration per key. Route to any backend. This is our go-to for team deployments.
Cloud providers
Premium Hosted
These are the heavy hitters. If you want the best reasoning, the most reliable tool calling, and you're okay paying per token, this is where you start. We run Anthropic Claude for every client build at The Operator Vault, and it's what we recommend to anyone doing serious agent work.
Anthropic (Claude)
~$3-75/M tokens (varies by model)
Models: Claude Opus 4.6, Sonnet 4.6, Haiku 4.5
Auth: API key
Claude is our default recommendation for serious agent work. The reasoning quality and tool calling are a step above everything else we've tested. If you're building automations that need to think through multi-step problems, Claude is where you should be.
OpenAI (GPT)
~$0.15-60/M tokens (varies by model)
Models: GPT-5.1 Codex, GPT-4o, GPT-4o-mini
Auth: API key or ChatGPT Codex OAuth
Solid all-around choice, especially if you already have an OpenAI account. Tool calling is strong, and Codex OAuth lets you reuse your existing ChatGPT subscription instead of paying per token. Great as a fallback behind Claude.
Amazon Bedrock
Same as direct API + AWS margin
Models: Claude, Llama, Mistral, and more via AWS
Auth: AWS credentials (IAM role or access key)
If your team already lives in AWS, Bedrock saves you from managing another API key. OpenClaw auto-discovers which models are available in your region. The slight AWS markup is worth it for the unified billing.
Run on your hardware
Local / Self-Hosted
If you want to experiment without spending a dollar, this is where to start. Download a model, point OpenClaw at localhost, and you're running. Quality won't match Claude or GPT for complex tasks, but for simple automations and learning the ropes, local models work surprisingly well. Most of our community members start here.
Ollama
$0/month
Models: Llama 3.3, Qwen 2.5 Coder, Mistral, DeepSeek R1
Auth: None needed (local)
The easiest way to run local models. OpenClaw auto-discovers whatever you have installed and you can start running agents immediately. Most of our community members start here when they want to try local inference. Just pull a model and go.
vLLM
$0/month
Models: Any HuggingFace model you load
Auth: None needed (local)
Better throughput than Ollama for production workloads. If you're running multiple agents or need faster inference, vLLM's OpenAI-compatible API is the move. Takes a bit more setup, but worth it for heavier use.
LM Studio
$0/month
Models: Any GGUF model you download
Auth: None needed (local)
The most visual option. Download models through a clean GUI, then OpenClaw connects via the openai-responses API on localhost:1234. Good for people who prefer clicking over terminal commands.
Model brokers
Aggregators / Gateways
Think of these as model brokers. Instead of managing API keys for every provider, you get one key that routes to whatever model you need. We lean on OpenRouter for testing and LiteLLM for team deployments where we need spend controls. If you're not sure which model to commit to, start here.
OpenRouter
Provider pricing + small markup
Models: 200+ models across all major providers
Auth: OpenRouter API key
We use OpenRouter when testing new models before committing to a provider. One API key, instant access to 200+ models. The markup is small and worth it for the flexibility. Swap between Claude, GPT, Llama, and dozens of others without touching your config.
LiteLLM
Self-hosted proxy, you control billing
Models: Any model behind your LiteLLM proxy
Auth: LiteLLM API key
This is what we recommend for teams. Set up virtual keys with spend limits so each team member or project has its own budget. You control the proxy, you see every request, and you can route to any backend provider. Takes 10 minutes to set up, saves headaches later.
Hugging Face Inference
Free tier available, then pay-per-use
Models: Open-source models hosted by HF
Auth: HF token (fine-grained, with inference permission)
A solid free option for testing open-source models without running them locally. Use the :fastest or :cheapest suffix to let HF pick the best backend. Good for experimenting before you decide whether to run the model yourself.
Data stays yours
Privacy-First
Some projects require that data never touches a third-party server. Whether it's client confidentiality, regulatory requirements, or personal preference, these options keep everything under your control. We've used Venice AI for client work where data can't leave a controlled environment, and Ollama for fully air-gapped setups.
Venice AI
Credit-based (private models cheaper)
Models: Llama, Qwen, DeepSeek, Mistral (private mode) + Claude, GPT (anonymized)
Auth: Venice API key
Venice is interesting for privacy-conscious operators. We've tested it for client work where data can't leave a controlled environment. Their private mode is zero-logging, and their anonymized mode strips metadata before proxying to major providers. A good middle ground between privacy and model quality.
Ollama (local)
$0
Models: Any model you download
Auth: None
The ultimate privacy option. Nothing leaves your machine, period. No API calls, no logging, no third parties. If you're handling sensitive data and need a complete air gap, run Ollama on a machine that never touches the internet after you download the model.
Use what you already pay for
Subscription Reuse
Already paying for a ChatGPT subscription or have access to Qwen's free tier? You can use those with OpenClaw instead of paying for a separate API key. This is the most budget-friendly way to run cloud-hosted models.
OpenAI Codex OAuth
Your ChatGPT subscription
Models: GPT-5.1 Codex, GPT-4o
Auth: OAuth login via openclaw onboard --auth-choice openai-codex
If you're already paying for ChatGPT Plus or Pro, this lets you use that subscription with OpenClaw instead of paying for API tokens on top. No API key needed. Just authenticate through OAuth and you're running. One of the easiest ways to get started.
Qwen OAuth
Free tier: 2,000 requests/day
Models: Qwen Coder, Qwen Vision
Auth: Device-code OAuth (plugin required)
Genuinely free and surprisingly capable. Run openclaw plugins enable qwen-portal-auth to activate it, authenticate via device code, and you get 2,000 requests per day at zero cost. A great starting point if you want to test OpenClaw without any financial commitment.
Reliability
Automatic model failover
Provider outages happen. Rate limits hit at the worst times. Instead of manually switching models when something breaks, you can set up a chain. OpenClaw tries your first choice, and if that fails, automatically moves to the next. We set this up on every production deployment because the last thing you want is an agent going dark at 2 AM because one API had a hiccup.
Try primary model
anthropic/claude-sonnet-4-6
If Anthropic is down or rate-limited, try first fallback
openai/gpt-4o
If OpenAI also fails, fall back to local
ollama/llama3.3 (no API, no downtime)
The real tradeoff
Local vs Hosted models
This is the first decision you need to make. Both paths work. The right choice depends on your budget, your hardware, and how much you care about model quality versus privacy. Here's an honest breakdown from what we've seen building automations for clients.
Hosted (API)
Claude, GPT, and Gemini run on massive GPU clusters managed by their providers. You pay per token, and the quality is consistently high. For complex agent tasks (multi-step reasoning, tool calling chains, long context analysis), hosted models are still significantly ahead of local alternatives. This is what we use for every client build, and what we recommend if your budget allows it.
Local (Ollama / vLLM)
Open-source models run on your own hardware. Zero API costs, full privacy, and no rate limits. The tradeoff is compute: you need a capable GPU for fast inference, or accept slower CPU-only speeds. Tool calling support varies by model. If you're just learning OpenClaw or running simple automations, local models are a great way to start without spending anything.
For most of our clients, we start with Claude Sonnet as the primary model and set up Ollama as a local fallback. This gives you the best reasoning when it matters, with zero-cost coverage when the API is down. If you're just starting out and want to keep costs at zero, go Ollama-only and upgrade when you need more capability.
Setup reference
How to configure each provider
Setting up a provider takes about 30 seconds. Set the environment variable, pick the model reference, and you're running. The onboarding wizard handles most of this automatically, but here's the cheat sheet if you want to configure things manually.
| Provider | Env Variable | Model Ref Example |
|---|---|---|
| Anthropic | ANTHROPIC_API_KEY | anthropic/claude-opus-4-6 |
| OpenAI | OPENAI_API_KEY | openai/gpt-5.1-codex |
| Ollama | OLLAMA_API_KEY="ollama-local" | ollama/llama3.3 |
| OpenRouter | OPENROUTER_API_KEY | openrouter/anthropic/claude-sonnet-4-5 |
| Venice AI | VENICE_API_KEY | venice/llama-3.3-70b |
| LiteLLM | LITELLM_API_KEY | (your proxy model ref) |
| Amazon Bedrock | AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY | amazon-bedrock/us.anthropic.claude-opus-4-6-v1:0 |
| Hugging Face | HF_TOKEN | huggingface/(model-name) |
Most people overthink provider setup. Pick one, set the env var, restart the Gateway. You can always swap later. If you are paralyzed by choice, start with Anthropic (best quality) or Ollama (free). You can add the others in 30 seconds whenever you want.
Written by
Kevin Jeppesen
Founder, The Operator Vault
Kevin is an early OpenClaw adopter who has saved an estimated 400 to 500 hours through AI automation. He stress-tests new workflows daily, sharing what actually works through step-by-step guides and a security-conscious approach to operating AI with real tools.
Models FAQ
Common questions about models
Configure your first
AI model in the Workshop.
Our free course walks you through OpenClaw setup, including model provider configuration. One hour, 100% free, lifetime access.
