Skip to main content
execute_batch_tools is an explicit computer action that lets the agent run several computer actions back-to-back as a single trajectory step, with a single screenshot at the end instead of one screenshot after every action. It is the explicit counterpart to the implicit “natively batched tool calls” behavior the v1 agent harnesses already support — both forms produce the exact same trajectory step (execute_batch_tools with the same tool_calls payload), so they are interchangeable from caching, replay, and billing perspectives.
Use the computer tool with action='execute_batch_tools' and pass the actions in the tool_calls parameter.

Why this tool exists

When the agent generates multiple computer tool calls in a single response, Cyberdesk already collapses them into one execute_batch_tools trajectory step automatically. That implicit form continues to work as before. This explicit tool gives you (and the agent) a way to express the same intent inside a single tool call rather than relying on the model emitting parallel tool calls. It is useful when:
  • The model you are running does not parallelize tool calls reliably (some models prefer to emit one tool call per turn).
  • You want the prompt to make batching unambiguous: “use execute_batch_tools to do X, Y, Z” reads more clearly than “batch these in one response”.
  • You want a clear, copy-pasteable shape for grouping actions inside <batched_tool_call> blocks in your workflow instructions.
Both implicit (native) and explicit batching map to the same env.execute_batch_tools(tool_calls) call internally. Existing trajectories that used implicit batching continue to replay unchanged. New trajectories that use the explicit tool save and replay in the identical format.

Why batching saves time

Every non-batched computer action automatically attaches a fresh full-screen screenshot to the next model turn. That round-trip — screenshot, base64-encode, send to the model, wait for the next decision — is the dominant cost on tasks where the agent already knows the next several actions from the current screen. Batching a sequence of actions into one trajectory step:
  • Skips intermediate screenshots. Only the last computer action in the batch produces a screenshot (with one exception — see below).
  • Skips intermediate model calls. The model decides on the whole batch once, instead of being re-asked after each action.
  • Stays cache-replayable. A batched trajectory step replays the actions in order on cached runs, with the same speed-up. Loops also expand and re-execute batched bodies correctly.
For form filling, multi-key navigation, hover-then-click sequences, and similar deterministic flows, this can be many seconds faster per turn.

When to use it (and when NOT to)

Use execute_batch_tools when every action in the batch is safe to run without re-checking the screen between them. Good candidates:
  • Filling a known form: left_click field → type value → key to advance → type next value → …
  • Hover/move-then-click sequences where the click target is already determined by the current screenshot.
  • Pressing a known sequence of keys (tab tab tab enter).
  • Closing a known modal then clicking the next button.
Do NOT batch when any later action depends on:
  • Whether a click opened or changed something on screen.
  • Whether validation, navigation, a popup, a dialog, a dropdown, or a layout change occurred.
  • Discovering a coordinate or target that is not already clearly known on the most recent screenshot.
If you are unsure, keep the actions separate — a single missed condition can cascade into multiple wrong actions before the agent gets the next screenshot.

Things to avoid

  • Do not put an execute_batch_tools call inside another execute_batch_tools.tool_calls array. Nesting is rejected.
  • Do not put focused_action, start_loop, or end_loop_iteration inside a batch unless the rest of the workflow can tolerate the loop / focused-action firing without intermediate observation. These actions still work inside a batch, but you usually want a fresh screenshot afterwards.
  • Do not rely on execute_batch_tools to “force” the model to plan farther ahead than it should. Batching is for actions that you genuinely know are safe right now.

Parameters

ParameterTypeDescription
actionstringMust be "execute_batch_tools".
tool_callsarrayOrdered list of computer actions to run. Each entry is shaped exactly like a normal computer tool invocation: an object with its own action field and that action’s parameters.
Each entry in tool_calls may use either the canonical wrapper form (which exactly matches what natively batched tool calls produce on the wire):
{ "name": "computer", "input": { "action": "left_click", "coordinate": [120, 240] } }
…or the equivalent shorthand the model can emit directly:
{ "action": "left_click", "coordinate": [120, 240] }
Both are accepted and normalize to the same trajectory shape.

Examples

Filling a login form

Use execute_batch_tools with tool_calls=[
  {"action": "left_click", "coordinate": [320, 410]},
  {"action": "type", "text": "{username}"},
  {"action": "key", "text": "tab"},
  {"action": "type", "text": "{$password}"},
  {"action": "left_click", "coordinate": [320, 540]}
]
This runs five actions in a single trajectory step, with a single screenshot at the end (after the final left_click). Equivalent to the agent emitting five computer tool calls in one response.

Closing a popup, then opening a menu

In a batched tool call, close the cookie banner and open the user menu:
- left_click on the banner's "Accept" button at the known coordinate
- left_click on the avatar in the top-right corner
The agent should produce one execute_batch_tools call (or one batch of native tool calls) covering both clicks, since the avatar coordinate is already known from the current screenshot.

Pressing a sequence of keys

Use execute_batch_tools with tool_calls=[
  {"action": "key", "text": "ctrl+a"},
  {"action": "key", "text": "delete"},
  {"action": "type", "text": "{{new_value}}"}
]

How it behaves at runtime

  • Single trajectory step. The whole batch is recorded as one execute_batch_tools step in the trajectory, with tool_calls as its argument. This is the same shape that natively batched parallel tool calls produce, so caching and replay are identical for both.
  • Single screenshot. Only the last computer action attaches a screenshot, except when an inner action is screenshot with an extract_prompt (those always produce their image because the extraction needs it).
  • Inter-action pacing. A small delay is inserted between consecutive actions to keep dense bursts from overwhelming the target machine. This matches the pacing used by implicit batching and cached replay.
  • Loops. When the body of a loop iteration uses execute_batch_tools, the system expands the batch when replaying the loop, so each remaining iteration reproduces the same actions in the same order.
  • Errors. If one action inside the batch fails, the failure is included in the aggregated tool result alongside the successes. The model sees the full per-action outcome list and can decide how to recover.

Backward compatibility

  • Existing trajectories that used the implicit native-batching path (multiple computer tool calls in one response) continue to load and replay unchanged. Their stored step is already named execute_batch_tools, so nothing about replay or billing changes.
  • New runs may freely mix both forms — the agent can natively batch in one turn and explicitly call execute_batch_tools in another, and both turns are persisted in the same trajectory format.
  • v0 harnesses do not expose this tool. It is available on the v1 main and v1 focused agents only.
  • Async extraction patterns — how extract_prompt interacts with batched and run-scoped extractions.
  • Trajectories — how trajectory steps are recorded and replayed.
  • Usage-based billing — how batched steps are counted (execute_batch_tools is billed by the number of inner actions).