Skip to content

agent.run()

Run an LLM agent that can use a sandboxed container as a tool.

The agent receives a prompt, calls an LLM, and iteratively invokes built-in tools backed by the container image you provide. Results stream back in real time.

typescript
const result = await agent.run(options);

Options

Required

FieldTypeDescription
namestringUnique task/agent name
promptstringInitial user message or instruction
modelstringModel specifier: provider/model-name (see Providers)
imagestringContainer image for the sandbox (e.g., "alpine", "ubuntu:22.04")

Optional

FieldTypeDescription
mountsobjectVolume mounts: { "name": volumeHandle }
outputVolumePathstringPath inside the container to write result.json
llmobjectLLM generation overrides (see LLM Config)
thinkingobjectExtended thinking config (see Thinking)
safetyobjectSafety filter overrides (see Safety)
contextGuardobjectContext window management (see Context Guard)
limitsobjectHard turn/token limits (see Limits)
contextobjectPre-inject prior task outputs into session (see Context)
validationobjectOutput validation via Expr expressions (see Validation)
outputSchemaobjectJSON schema for structured output (see Output Schema)
toolTimeoutstringPer-tool timeout duration, e.g. "60s", "5m" (see Tool Timeout)
toolsarrayAgent + task tools callable by the LLM (see Tools)
memoryobjectCross-run memory (see Agent Memory)

Providers

The model field uses the format provider/model-name. Values after the first / are passed verbatim to the provider.

Providermodel prefixAuth env varNotes
anthropicanthropic/...ANTHROPIC_API_KEYDirect Anthropic API
openaiopenai/...OPENAI_API_KEYDirect OpenAI API
openrouteropenrouter/...OPENROUTER_API_KEYProxies 200+ models
ollamaollama/...(none)Local Ollama at localhost:11434

API key resolution order (first match wins):

  1. Pipeline-scoped secret agent/<provider>
  2. Global-scoped secret agent/<provider>
  3. Environment variable {PROVIDER}_API_KEY

LLM Config

Fine-tune generation parameters. All fields are optional; omitting a field uses the provider's built-in default.

yaml
llm:
  temperature: 0.2 # float, 0.0–2.0 (default: provider default, typically 1.0)
  max_tokens: 8192 # int, max output tokens (default: provider default)
typescript
llm: {
  temperature?: number;  // 0.0–2.0; lower = more deterministic
  max_tokens?: number;   // caps output length; provider default if omitted
}

Note for Anthropic with thinking: max_tokens must be greater than the thinking budget. If you set a thinking.budget without max_tokens, the runtime defaults max_tokens to 8192.

Thinking

Enable extended reasoning / chain-of-thought for supported models. The extra thinking tokens are billed but not included in result.text.

yaml
thinking:
  budget: 10000 # int >= 1024, required when this block is present
  level: medium # string, Gemini only; omit for Anthropic
typescript
thinking: {
  budget: number;  // thinking token budget (minimum 1024)
  level?: "low" | "medium" | "high" | "minimal"; // Gemini only
}
FieldProvider supportNotes
budgetAllAnthropic: maps to ThinkingBudgetTokens
levelGemini onlyIgnored for Anthropic; controls depth of reasoning

Safety

Override per-category safety filters. Keys are harm category names (case-insensitive); values are threshold names.

yaml
safety:
  harassment: block_none
  dangerous_content: block_none
typescript
safety?: {
  [category: string]: string;
};

Category names

YAML keyMaps to
harassmentHARM_CATEGORY_HARASSMENT
hate_speechHARM_CATEGORY_HATE_SPEECH
sexually_explicitHARM_CATEGORY_SEXUALLY_EXPLICIT
dangerous_contentHARM_CATEGORY_DANGEROUS_CONTENT
civic_integrityHARM_CATEGORY_CIVIC_INTEGRITY

Threshold values

YAML valueEffect
block_noneAllow all content
block_only_highBlock only high-confidence harm
block_medium_and_aboveBlock medium + high (provider default)
block_low_and_aboveBlock low, medium, and high
offDisable the filter entirely

Safety settings are applied to Gemini and OpenAI-compatible models. Anthropic manages safety at the API level and ignores this field.

Context Guard

Automatically manage the context window to prevent token-limit errors on long agent runs.

yaml
context_guard:
  strategy: threshold # "threshold" or "sliding_window"
  max_tokens: 100000 # for threshold strategy (default: 128000)
  max_turns: 30 # for sliding_window strategy (default: 30)
typescript
contextGuard?: {
  strategy: "threshold" | "sliding_window";
  max_tokens?: number;  // threshold: evict history when total exceeds this
  max_turns?: number;   // sliding_window: keep only the last N turns
};
Strategymax_tokens defaultmax_turns defaultBehaviour
threshold128000N/ATruncates history once total tokens exceed the limit
sliding_windowN/A30Keeps only the most-recent N conversation turns

Strategy inference: If contextGuard is provided with only max_turns (no strategy), the strategy automatically becomes "sliding_window". If only max_tokens is provided, the strategy becomes "threshold". An error is returned if an invalid strategy is explicitly specified.

Omitting contextGuard entirely disables context management; the full conversation history is sent to the model on every turn.

Progressive Persistence

Agent runs write results incrementally as the agent executes, not just when finished. This ensures visibility into progress and durability against crashes:

  • Usage metrics (token counts, LLM requests, tool calls): written to the task storage key after every LLM turn. The UI task tree automatically polls and displays live token/tool counts while the agent is running.
  • Audit log: each event is appended to a dedicated storage namespace (/agent-audit/{runID}/...) as it occurs, before the next LLM call. Events are not returned in the web UI (they're for post-run analysis), but they are stored immediately to prevent data loss if the agent crashes.

This behavior is transparent to pipeline code. The result returned by agent.run() still includes the complete auditLog array in memory.

Built-in Tools

Every agent run has these tools available automatically — no configuration required.

ToolDescription
run_scriptRun a multi-line shell script inside the sandbox container
read_fileRead file contents with optional line offset and limit (default: 2 000 lines)
grepSearch file contents with regex patterns. Supports glob_filter and max_results
globFind files by name pattern (e.g. **/*.go). Returns matching paths sorted
write_fileCreate or overwrite a file. Reads previous content first (truncated to 4 KB) for agent context
list_tasksList all tasks in the current pipeline run with their status and timing
get_task_resultFetch the stdout, stderr, and exit code for a specific task by name

list_tasks is always pre-fetched and injected into the session before the agent's first turn, so the agent knows the run state immediately without spending a tool-call round-trip on orientation.

get_task_result supports fuzzy name matching — a partial or approximate task name is fine. Byte-length truncation is applied when the output is large (default 4 096 bytes; override with max_bytes in the tool call or via context.max_bytes).

Limits

Hard limits that stop agent execution to prevent runaway agents.

yaml
limits:
  max_turns: 50 # max LLM round-trips (default: 50)
  max_total_tokens: 0 # total token budget; 0 = unlimited
typescript
limits?: {
  max_turns?: number;        // default: 50
  max_total_tokens?: number; // 0 or omitted = unlimited
};

A warning is emitted 2 turns before the limit is reached. When the limit is hit, the agent is stopped and result.status is "limit_exceeded".

Validation

Validate the agent's final text output using an Expr boolean expression. If the expression returns false, a follow-up prompt is sent asking the model to correct its output.

yaml
validation:
  expr: 'text != "" && text contains "summary"'
  prompt: >-
    Output valid JSON with a "summary" field and an "issues" array.
typescript
validation?: {
  expr: string;    // Expr boolean expression; env: { text, status }
  prompt?: string; // custom follow-up message on failure
};

The expression environment provides text (the agent's output) and status ("success"). If prompt is omitted, a generic follow-up is used.

Output Schema

Request structured JSON output from the agent. The schema is included in the system prompt and validated after the agent finishes. If the output does not conform, a follow-up turn is sent asking the agent to fix its response.

yaml
output_schema:
  summary: string
  issues[]:
    severity: critical|high|medium|low
    description: string
    file?: string
    line?: int

Compact DSL

SyntaxMeaning
"string"Required string field
"int"Required integer field
"number"Required number (float) field
"bool"Required boolean field
"a|b|c"Required string enum
"field?"Optional field (suffix ? on key)
"field[]"Required array (suffix [] on key)
{ nested: ... }Nested object

Schema validation checks: JSON parse, required fields, types, enums, array items, and nested objects. If validation fails, the error message is sent to the agent so it can retry.

Tool Timeout

Set a per-tool execution timeout for all sandbox-backed tools. If a tool exceeds the timeout, an error is returned to the agent so it can adjust its approach (e.g., break a large operation into smaller steps).

yaml
tool_timeout: "120s" # applies to all tools except run_script
typescript
toolTimeout?: string; // Go duration format: "60s", "5m", etc.
ToolDefault timeout
run_script5 minutes (300s)
All other tools1 minute (60s)

When tool_timeout is set, it overrides the default for all tools (including run_script). If you need long-running scripts but fast tool timeouts, prefer leaving tool_timeout unset and relying on the defaults.

Context

Pre-fetch selected task outputs into the agent's session history before the first turn. This saves the agent from calling get_task_result explicitly for outputs it is likely to need.

yaml
context:
  max_bytes: 8192 # max bytes per field; default 4096
  tasks:
    - name: build # fuzzy-matched against task names in the run
      field: stdout # "stdout" | "stderr" | "both" (default: "both")
    - name: lint
typescript
context?: {
  max_bytes?: number;           // truncation limit per field (default 4096)
  tasks?: Array<{
    name: string;               // task name (fuzzy matched)
    field?: "stdout" | "stderr" | "both"; // which field(s) to include
  }>;
};

Each entry is injected as a synthetic get_task_result tool-call/response pair. The agent sees the output as if it had already called the tool, and the audit log records these under type: "pre_context".

Concourse YAML — Agent Step Ergonomics

When writing agent steps in Concourse-compatible YAML pipelines, two defaults reduce boilerplate.

Auto-output volume

If you do not declare a config.outputs block, the runtime automatically creates an output volume named after the agent. This volume holds the result.json written at the end of the agent run and is registered in the job's known mounts so subsequent steps can reference it.

yaml
# Before — explicit output declaration required
- agent: code-reviewer
  prompt: Review the code
  model: openrouter/google/gemini-2.0-flash
  config:
    platform: linux
    image: alpine
    outputs:
      - name: code-reviewer # redundant — same as agent name

# After — outputs block can be omitted entirely
- agent: code-reviewer
  prompt: Review the code
  model: openrouter/google/gemini-2.0-flash
  config:
    platform: linux
    image: alpine

Auto-inputs from context.tasks

If context.tasks references a prior agent by name and that agent produced an auto-named output volume (see above), the volume is automatically mounted as an input. There is no need to list it in config.inputs.

yaml
# Before — explicit inputs + context.tasks both required
- agent: summarizer
  prompt: Summarize the findings
  model: openrouter/google/gemini-2.0-flash
  config:
    platform: linux
    image: alpine
    inputs:
      - name: code-reviewer # duplicate: also listed in context.tasks
  context:
    tasks:
      - name: code-reviewer

# After — config.inputs can be omitted
- agent: summarizer
  prompt: Summarize the findings
  model: openrouter/google/gemini-2.0-flash
  config:
    platform: linux
    image: alpine
  context:
    tasks:
      - name: code-reviewer

Loading config from a URI

Both task steps and agent steps accept a uri field as an alternative to file. The uri field supports three schemes:

SchemeDescription
file://Load from a volume mount (same path format as file)
http://Fetch config over HTTP
https://Fetch config over HTTPS

file and uri are mutually exclusive — specifying both is a validation error.

file:// URIs

A file:// URI resolves against known volume mounts, using the same mountname/relative/path format as the file field. Path traversal with .. is not allowed.

yaml
# These two are equivalent:
- task: my-task
  file: repo/tasks/build.yml

- task: my-task
  uri: "file://repo/tasks/build.yml"

http:// and https:// URIs

HTTP URIs fetch the YAML config from a remote server. The response must return a 2xx status code; non-OK responses are treated as errors.

yaml
- task: my-task
  uri: "https://example.com/tasks/build.yml"

- agent: code-reviewer
  uri: "https://example.com/agents/reviewer.yml"
  model: openrouter/google/gemini-2.0-flash

The fetched content is parsed as YAML and merged with any inline fields on the step (inline values override fetched values, prompts are concatenated).

Tools

The tools array lets you give an agent additional capabilities beyond the built-in sandbox tools. Each entry is either an agent tool (LLM sub-agent) or a task tool (container command), distinguished by field presence.

Agent tools

An entry with an agent: field creates an LLM sub-agent that the parent can call as a tool. The sub-agent gets its own prompt, model, and optionally its own container image.

yaml
- agent: orchestrator
  file: repo/agents/orchestrator.yml
  config:
    platform: linux
    image: alpine/git
    inputs:
      - name: repo
      - name: diff
  tools:
    - agent: code-quality-reviewer
      file: repo/agents/code-quality.yml
    - agent: security-reviewer
      file: repo/agents/security.yml
FieldTypeDescription
agentstringRequired. Tool name the parent LLM uses to call this agent
filestringLoad prompt, model, and config from a YAML file on a volume
uristringLoad config from a URI (file://, http://, https://)
promptstringAgent instruction (concatenated with file:/uri: prompt if both exist)
modelstringModel specifier; defaults to the parent's model if omitted

Shared-container (agent image matches the parent's or is omitted): the sub-agent runs inside the parent's ADK session, sharing the same sandbox, mounts, and tool set.

Own-container (agent declares a different config.image): a separate sandbox is spun up. The agent runs to completion and returns its final text to the parent. Results are persisted at a nested storage path:

jobs/{job}/N/agent/{parent}/sub-agents/{tool-name}/run

Task tools

An entry with a task: field creates a container command the LLM can call as a tool. The command runs in the parent's sandbox and returns stdout/stderr/exit code.

yaml
tools:
  - task: run-linter
    description: "Run the project linter and return results"
    config:
      run:
        path: golangci-lint
        args: ["run", "./..."]
      env:
        GOPROXY: "off"
  - task: post-comment
    description: "Post a GitHub PR comment"
    file: repo/tasks/post-comment.yml
FieldTypeDescription
taskstringRequired. Tool name the parent LLM uses to call this task
descriptionstringDescription shown to the LLM (defaults to "Run task: {name}")
configobjectTask config with run.path, run.args, env, image
filestringLoad task config from a YAML file on a volume
uristringLoad task config from a URI (file://, http://, https://)

When the LLM calls a task tool, it can pass a request string that is set as the TOOL_REQUEST environment variable, allowing dynamic input.

YAML example — pr-review pipeline

Below is a simplified version of the multi-reviewer PR analysis pipeline from examples/agent/pr-review.yml:

yaml
- agent: final-review
  file: repo/examples/agent/agents/final-reviewer.yml
  tools:
    - agent: code-quality-reviewer
      file: repo/examples/agent/agents/specialist-reviewer.yml
      prompt: "Review for code quality — readability, naming, structure, DRY violations."
    - agent: security-reviewer
      file: repo/examples/agent/agents/specialist-reviewer.yml
      prompt: "Audit for security issues — injection, authentication, data exposure, OWASP Top 10."
    - agent: maintainability-reviewer
      file: repo/examples/agent/agents/specialist-reviewer.yml
      prompt: "Evaluate for maintainability — test coverage, cyclomatic complexity, documentation."
  validation:
    expr: 'text != "" && text contains "summary"'
    prompt: >-
      You must output valid JSON containing a "summary" field and an
      "issues" array. Do not include prose outside the JSON object.

All three specialist tools share a single specialist-reviewer.yml template, differentiated by the inline prompt field. The orchestrator's prompt (in final-reviewer.yml) instructs it to call all three and synthesize their findings into JSON.

Return Value

Audit Log

typescript
{
  text: string; // final agent response text
  status: string; // "success"
  toolCalls: Array<{
    name: string;
    args?: Record<string, unknown>;
    result?: Record<string, unknown>;
    exitCode?: number;
  }>;
  usage: {
    promptTokens: number;
    completionTokens: number;
    totalTokens: number;
    llmRequests: number;
    toolCallCount: number;
  }
  auditLog: Array<AuditEvent>; // full ordered conversation log (see below)
}

The auditLog field contains every event that occurred during the agent run in chronological order. It is stored in the pipeline run's storage payload alongside stdout, toolCalls, and usage for offline inspection.

typescript
interface AuditEvent {
  type:
    | "pre_context"
    | "user_message"
    | "tool_call"
    | "tool_response"
    | "model_text"
    | "model_final";
  timestamp?: string; // ISO 8601 UTC
  invocationId?: string; // groups events within one LLM turn
  author?: string; // agent name or "user"
  text?: string; // model text or user prompt
  toolName?: string; // for tool_call / tool_response / pre_context
  toolCallId?: string; // pairs a tool_call with its tool_response
  toolArgs?: Record<string, unknown>;
  toolResult?: Record<string, unknown>;
  usage?: { // per-event token counts (model events only)
    promptTokens: number;
    completionTokens: number;
    totalTokens: number;
  };
}
typeWhen emitted
pre_contextSynthetic tool result injected before the first turn (list_tasks or context.tasks entry)
user_messageThe initial prompt sent by the pipeline
tool_callThe model requests a tool invocation
tool_responseThe tool result is returned to the model
model_textAn intermediate text chunk from the model
model_finalThe concluding model response (last turn)

Callbacks

Agent runs support optional callbacks for streaming output, incremental usage updates, and audit event notifications. All callbacks are optional.

typescript
await agent.run({
  name: "agent",
  prompt: "Build the project",
  model: "anthropic/claude-3-5-sonnet-20241022",
  image: "golang:1.22",
  onOutput: (stream, chunk) => {
    // stream is "stdout" or "stderr"
    console.log(`${stream}: ${chunk}`);
  },
  onUsage: (usage) => {
    // called after every LLM turn or tool invocation that changes token counts
    console.log(`Tokens: ${usage.totalTokens}, Requests: ${usage.llmRequests}`);
  },
  onAuditEvent: (event) => {
    // called once per audit event (model_text, tool_call, etc.)
    console.log(`Event: ${event.type}`);
  },
});
CallbackTypeWhen invoked
onOutput(stream: "stdout" | "stderr", chunk: string) => voidEach stdout/stderr chunk from sandbox tool calls
onUsage(usage: UsageMetrics) => voidAfter every token count or tool call count change
onAuditEvent(event: AuditEvent) => voidAfter every audit event (model turn, tool result)

Type definitions:

typescript
interface UsageMetrics {
  promptTokens: number;
  completionTokens: number;
  totalTokens: number;
  llmRequests: number;
  toolCallCount: number;
}

The onUsage and onAuditEvent callbacks power the progressive persistence feature — they are automatically wired to incremental storage writes when called from the pipeline runner. Custom callbacks can observe the same events in real time.

Examples

Minimal agent

typescript
const result = await agent.run({
  name: "summarize",
  prompt: "Summarize the files in /workspace",
  model: "openrouter/google/gemini-2.0-flash",
  image: "alpine",
});

console.log(result.text);

Agent with volumes and LLM tuning

typescript
const repo = await volumes.create("repo", 500);

await runtime.run({
  name: "clone",
  image: "alpine/git",
  command: {
    path: "git",
    args: ["clone", "https://github.com/example/app", "/repo"],
  },
  mounts: { "/repo": repo },
});

const result = await agent.run({
  name: "review",
  prompt: "Review the code for security issues and summarize findings.",
  model: "anthropic/claude-3-5-sonnet-20241022",
  image: "alpine",
  mounts: { "repo": repo },
  llm: { temperature: 0.1, max_tokens: 4096 },
  thinking: { budget: 2048 },
  safety: { dangerous_content: "block_only_high" },
  contextGuard: { strategy: "threshold", max_tokens: 80000 },
});

Streaming output callback

typescript
await agent.run({
  name: "agent",
  prompt: "Run the test suite and report failures.",
  model: "openrouter/anthropic/claude-3-5-sonnet",
  image: "golang:1.22",
  mounts: { "src": srcVolume },
  onOutput: (stream, chunk) => {
    // stream is "stdout" or "stderr"
    process.stdout.write(chunk);
  },
});