Getting Started with Models
Models in fast-agent are selected with a model string:
The shortest useful examples are aliases:
fast-agent --model sonnet
fast-agent --model gpt55
fast-agent --model gemini
fast-agent --model grok
fast-agent --model kimi
Full alias tables and model capabilities are generated from the source tree to reduce drift:
- Providers and Models lists provider configuration and generated alias tables.
- Models Reference lists generated model capabilities such as structured outputs, reasoning, verbosity, and supported input modalities.
First-class providers
These providers have native fast-agent support and provider-specific feature handling.
| Provider | Start with | Main features |
|---|---|---|
| OpenAI Responses | gpt55, gpt54, gpt52, gpt-5-mini, codex |
GPT-5 class models, reasoning, text verbosity, structured outputs, web_search, SSE/WebSocket transports, service tiers, connectors (e.g. GMail, Dropbox etc) |
| Anthropic | sonnet, opus, opus47, haiku |
Claude 4.x, prompt caching, reasoning budgets/adaptive thinking, structured outputs, web_search, web_fetch, long context, task budget where supported |
gemini, gemini3, gemini3.1, gemini3flash |
Gemini native API, structured outputs, thinking controls, text/image/PDF/audio/video input, YouTube links through media attachments | |
| xAI | grok, grok4, grok-4.3 |
Grok via Responses-compatible API, structured outputs, reasoning controls, web_search, x_search, SSE/WebSocket transports |
| Hugging Face | kimi, kimi26, deepseek, glm, minimax, qwen35, gpt-oss |
Inference Providers routing, explicit provider suffixes, curated aliases, structured/tool-use tested aliases, reasoning toggles where supported |
OpenAI Responses
Use the responses provider for GPT-5 class OpenAI models.
fast-agent --model "responses.gpt-5.5?reasoning=medium"
fast-agent --model "responses.gpt-5.5?web_search=on"
fast-agent --model "responses.gpt-5.5?verbosity=high&transport=ws"
fast-agent --model "responses.gpt-5.5?service_tier=fast"
Useful query parameters:
reasoning=none|minimal|low|medium|high|xhighdepending on modelverbosity=low|medium|highweb_search=on|offtransport=sse|ws|autoservice_tier=fast|flexwhere supported
Use the openai provider for Chat Completions-style models such as openai.gpt-4.1.
Anthropic
Anthropic support includes Claude-specific reasoning, caching, web tools, and structured-output selection.
fast-agent --model sonnet
fast-agent --model "sonnet?reasoning=4096"
fast-agent --model "opus?reasoning=auto"
fast-agent --model "opus?web_search=on&web_fetch=on"
fast-agent --model "opus?task_budget=128k"
Useful query parameters and config:
reasoning=auto|low|medium|high|max|offon adaptive-thinking modelsreasoning=<tokens>on budget-thinking models, for examplereasoning=4096web_search=on|offweb_fetch=on|offtask_budget=20k|128k|offwhere supportedanthropic.cache_mode: auto|prompt|offanthropic.cache_ttl: 5m|1h
Structured outputs default to JSON schema on models that support Anthropic's structured-output
feature. Older models fall back to the legacy tool_use flow.
Use the native Google provider for Gemini models.
fast-agent --model gemini
fast-agent --model "gemini3?reasoning=auto"
fast-agent --model "google.gemini-3.1-pro-preview?reasoning=high"
Google models support structured outputs and multimodal inputs. Current fast-agent model metadata advertises text, image, PDF, audio, and video tokenization for Gemini models. YouTube links can be attached as media links when using a model that supports video input.
Useful query parameters:
reasoning=auto|minimal|low|medium|high|offstructured=json- sampling controls such as
temperature,top_p, andtop_kwhere applicable
xAI Grok
Use the xai provider for Grok models.
fast-agent --model grok
fast-agent --model "xai.grok-4.3?reasoning=high"
fast-agent --model "xai.grok-4.3?web_search=on"
fast-agent --model "xai.grok-4.3?x_search=on"
Useful query parameters:
reasoning=none|low|medium|highon reasoning-capable Grok modelsweb_search=on|offfor xAI web searchx_search=on|offfor xAI's X Search remote tool
web_search and x_search are distinct provider-managed tools.
Hugging Face Inference Providers
Use the hf provider for Hugging Face Inference Providers.
fast-agent --model kimi
fast-agent --model kimi26instant
fast-agent --model "hf.moonshotai/Kimi-K2.6:novita?reasoning=on"
fast-agent --model "hf.deepseek-ai/DeepSeek-V4-Pro:together"
Syntax:
If no provider suffix is supplied, Hugging Face auto-routes the request. Curated aliases such as
kimi, deepseek, glm, and minimax include provider choices and request defaults that have
been tested with fast-agent features such as structured outputs and tool use. Capability can still
vary by backing provider.
Model string format
Model strings follow this format:
- provider: the LLM provider, for example
responses,anthropic,google,xai,hf,azure,openrouter,generic, ortensorzero - model_name: the model or deployment name
- query parameters: provider/model-specific overrides such as
reasoning,structured,context,transport,service_tier,temperature(tempalias),web_search,web_fetch,x_search, andtask_budget
Examples:
responses.gpt-5.5?reasoning=mediumresponses.gpt-5.5?web_search=onsonnet?reasoning=4096opus?web_search=on&web_fetch=ongemini3?reasoning=autoxai.grok-4.3?x_search=onkimi26instanthf.moonshotai/Kimi-K2.6:novita?reasoning=onazure.my-deploymentgeneric.llama3.2:latestopenrouter.google/gemini-2.5-pro-exp-03-25:freetensorzero.my_tensorzero_function
Precedence
Model specifications follow this precedence order, highest to lowest:
- Explicitly set in agent decorators
- Command-line arguments with
--model - Default model in
fast-agent.yaml FAST_AGENT_MODELenvironment variable- System default (
gpt-5-mini?reasoning=low)
Reasoning
You can also set reasoning directly in the model string query. This is especially useful for provider-specific reasoning modes:
responses.gpt-5.5?reasoning=mediumsonnet?reasoning=4096(budget tokens)opus?reasoning=auto(adaptive default)gemini3?reasoning=highxai.grok-4.3?reasoning=none
Temperature and sampling
You can set sampling temperature directly in the model string query:
responses.gpt-5.5?temperature=0.2openai.gpt-4.1?temp=0.7hf.moonshotai/Kimi-K2.6:novita?temperature=1.0&top_p=0.95
If temperature is omitted, fast-agent does not send a temperature parameter.
Only explicit values (for example via ?temperature= / ?temp= or request
params/config) are forwarded.
Model presets and model references
For convenience, popular models have built-in model presets such as codex or sonnet.
These are documented on the LLM Providers page.
You can also create local model overlays. These are environment-local named model entries that
bundle endpoint settings, auth, request defaults, and local metadata under a short token such as
qwen-local. See Model Overlays.
You can also define your own namespaced model references in fast-agent.yaml and
reference them with exact tokens like $system.fast.
If a configured model reference cannot be resolved, fast-agent logs a warning and automatically falls back to the next lower-precedence model source.
Default configuration
You can set a default model for your application in your fast-agent.yaml:
History saving
You can save the conversation history to a file by sending a ***SAVE_HISTORY <filename> message. This can then be reviewed, edited, loaded, or served with the prompt-server or replayed with the playback model.
File Format / MCP Serialization
If the filetype is json, fast-agent saves a {"messages": [...]} JSON container. It can contain either MCP PromptMessage objects (legacy) or PromptMessageExtended objects (preserves tool calls, channels, etc). fast_agent.load_prompt and prompt-server will load either the text or JSON format directly.
This can be helpful when developing applications to:
- Save a conversation for editing
- Set up in-context learning
- Produce realistic test scenarios to exercise edge conditions etc. with the Playback model