Getting Started with Models

Models in fast-agent are selected with a model string:

provider.model_name[.reasoning_effort][?query=value&...]

The shortest useful examples are aliases:

fast-agent --model sonnet
fast-agent --model gpt55
fast-agent --model gemini
fast-agent --model grok
fast-agent --model kimi

Full alias tables and model capabilities are generated from the source tree to reduce drift:

Providers and Models lists provider configuration and generated alias tables.
Models Reference lists generated model capabilities such as structured outputs, reasoning, verbosity, and supported input modalities.

First-class providers

These providers have native fast-agent support and provider-specific feature handling.

Provider	Start with	Main features
OpenAI Responses	`gpt55`, `gpt54`, `gpt52`, `gpt-5-mini`, `codex`	GPT-5 class models, reasoning, text verbosity, structured outputs, `web_search`, SSE/WebSocket transports, service tiers, connectors (e.g. GMail, Dropbox etc)
Anthropic	`sonnet`, `opus`, `opus47`, `haiku`	Claude 4.x, prompt caching, reasoning budgets/adaptive thinking, structured outputs, `web_search`, `web_fetch`, long context, task budget where supported
Google	`gemini`, `gemini3`, `gemini3.1`, `gemini3flash`	Gemini native API, structured outputs, thinking controls, text/image/PDF/audio/video input, YouTube links through media attachments
xAI	`grok`, `grok4`, `grok-4.3`	Grok via Responses-compatible API, structured outputs, reasoning controls, `web_search`, `x_search`, SSE/WebSocket transports
Hugging Face	`kimi`, `kimi26`, `deepseek`, `glm`, `minimax`, `qwen35`, `gpt-oss`	Inference Providers routing, explicit provider suffixes, curated aliases, structured/tool-use tested aliases, reasoning toggles where supported

OpenAI Responses

Use the responses provider for GPT-5 class OpenAI models.

fast-agent --model "responses.gpt-5.5?reasoning=medium"
fast-agent --model "responses.gpt-5.5?web_search=on"
fast-agent --model "responses.gpt-5.5?verbosity=high&transport=ws"
fast-agent --model "responses.gpt-5.5?service_tier=fast"

Useful query parameters:

reasoning=none|minimal|low|medium|high|xhigh depending on model
verbosity=low|medium|high
web_search=on|off
transport=sse|ws|auto
service_tier=fast|flex where supported

Use the openai provider for Chat Completions-style models such as openai.gpt-4.1.

Anthropic

Anthropic support includes Claude-specific reasoning, caching, web tools, and structured-output selection.

fast-agent --model sonnet
fast-agent --model "sonnet?reasoning=4096"
fast-agent --model "opus?reasoning=auto"
fast-agent --model "opus?web_search=on&web_fetch=on"
fast-agent --model "opus?task_budget=128k"

Useful query parameters and config:

reasoning=auto|low|medium|high|max|off on adaptive-thinking models
reasoning=<tokens> on budget-thinking models, for example reasoning=4096
web_search=on|off
web_fetch=on|off
task_budget=20k|128k|off where supported
anthropic.cache_mode: auto|prompt|off
anthropic.cache_ttl: 5m|1h

Structured outputs default to JSON schema on models that support Anthropic's structured-output feature. Older models fall back to the legacy tool_use flow.

Google

Use the native Google provider for Gemini models.

fast-agent --model gemini
fast-agent --model "gemini3?reasoning=auto"
fast-agent --model "google.gemini-3.1-pro-preview?reasoning=high"

Google models support structured outputs and multimodal inputs. Current fast-agent model metadata advertises text, image, PDF, audio, and video tokenization for Gemini models. YouTube links can be attached as media links when using a model that supports video input.

Useful query parameters:

reasoning=auto|minimal|low|medium|high|off
structured=json
sampling controls such as temperature, top_p, and top_k where applicable

xAI Grok

Use the xai provider for Grok models.

fast-agent --model grok
fast-agent --model "xai.grok-4.3?reasoning=high"
fast-agent --model "xai.grok-4.3?web_search=on"
fast-agent --model "xai.grok-4.3?x_search=on"

Useful query parameters:

reasoning=none|low|medium|high on reasoning-capable Grok models
web_search=on|off for xAI web search
x_search=on|off for xAI's X Search remote tool

web_search and x_search are distinct provider-managed tools.

Hugging Face Inference Providers

Use the hf provider for Hugging Face Inference Providers.

fast-agent --model kimi
fast-agent --model kimi26instant
fast-agent --model "hf.moonshotai/Kimi-K2.6:novita?reasoning=on"
fast-agent --model "hf.deepseek-ai/DeepSeek-V4-Pro:together"

Syntax:

hf.<model_name>[:provider][?query=value&...]

If no provider suffix is supplied, Hugging Face auto-routes the request. Curated aliases such as kimi, deepseek, glm, and minimax include provider choices and request defaults that have been tested with fast-agent features such as structured outputs and tool use. Capability can still vary by backing provider.

Model string format

Model strings follow this format:

provider.model_name[?reasoning=value][&query=value...]

provider: the LLM provider, for example responses, anthropic, google, xai, hf, azure, openrouter, generic, or tensorzero
model_name: the model or deployment name
query parameters: provider/model-specific overrides such as reasoning, structured, context, transport, service_tier, temperature (temp alias), web_search, web_fetch, x_search, and task_budget

Examples:

responses.gpt-5.5?reasoning=medium
responses.gpt-5.5?web_search=on
sonnet?reasoning=4096
opus?web_search=on&web_fetch=on
gemini3?reasoning=auto
xai.grok-4.3?x_search=on
kimi26instant
hf.moonshotai/Kimi-K2.6:novita?reasoning=on
azure.my-deployment
generic.llama3.2:latest
openrouter.google/gemini-2.5-pro-exp-03-25:free
tensorzero.my_tensorzero_function

Precedence

Model specifications follow this precedence order, highest to lowest:

Explicitly set in agent decorators
Command-line arguments with --model
Default model in fast-agent.yaml
FAST_AGENT_MODEL environment variable
System default (gpt-5-mini?reasoning=low)

Reasoning

You can also set reasoning directly in the model string query. This is especially useful for provider-specific reasoning modes:

responses.gpt-5.5?reasoning=medium
sonnet?reasoning=4096 (budget tokens)
opus?reasoning=auto (adaptive default)
gemini3?reasoning=high
xai.grok-4.3?reasoning=none

Temperature and sampling

You can set sampling temperature directly in the model string query:

responses.gpt-5.5?temperature=0.2
openai.gpt-4.1?temp=0.7
hf.moonshotai/Kimi-K2.6:novita?temperature=1.0&top_p=0.95

If temperature is omitted, fast-agent does not send a temperature parameter. Only explicit values (for example via ?temperature= / ?temp= or request params/config) are forwarded.

Model presets and model references

For convenience, popular models have built-in model presets such as codex or sonnet. These are documented on the LLM Providers page.

You can also create local model overlays. These are environment-local named model entries that bundle endpoint settings, auth, request defaults, and local metadata under a short token such as qwen-local. See Model Overlays.

You can also define your own namespaced model references in fast-agent.yaml and reference them with exact tokens like $system.fast.

If a configured model reference cannot be resolved, fast-agent logs a warning and automatically falls back to the next lower-precedence model source.

Default configuration

You can set a default model for your application in your fast-agent.yaml:

default_model: "gpt-5-mini?reasoning=low"

History saving

You can save the conversation history to a file by sending a ***SAVE_HISTORY <filename> message. This can then be reviewed, edited, loaded, or served with the prompt-server or replayed with the playback model.

File Format / MCP Serialization

If the filetype is json, fast-agent saves a {"messages": [...]} JSON container. It can contain either MCP PromptMessage objects (legacy) or PromptMessageExtended objects (preserves tool calls, channels, etc). fast_agent.load_prompt and prompt-server will load either the text or JSON format directly.

This can be helpful when developing applications to:

Save a conversation for editing
Set up in-context learning
Produce realistic test scenarios to exercise edge conditions etc. with the Playback model