What is Grab Reference Image?
grab_reference_image
lets the agent retrieve a prompt image at runtime by its filename (tail only). This is ideal for using reference visuals during analysis.
This returns an image for analysis only. When focusing on small regions, prefer using a fresh full screenshot with
zoom_bounding_box
; include a small margin in the region so labels/edges aren’t cropped. Before generating coordinate actions, the agent should take a fresh full screenshot (without zoom) and base coordinates on that.Full‑Page Screenshot Requirements
When providing reference images, always capture the entire screen that the agent sees.- Use full‑page screenshots only (no crops or scaled‑down images).
- Match the agent’s viewport exactly: same aspect ratio and pixel resolution.
- Cropped or smaller images often lead to misaligned coordinates and targeting errors.
- If you need to focus on a region, instruct the agent to take a fresh full screenshot and use zoom features for analysis rather than providing a cropped reference image.
Parameters
- text (string, required): The tail filename of the prompt image (e.g.,
logo.png
). Include the file extension (e.g.,.png
,.jpg
).
Behavior
- Returns the image as base64 (PNG or JPEG). Other formats are not supported.
- If multiple prompt images share the same tail filename, the call returns an error. Use unique names.
- If the filename is not found, the call returns an error.
Best Practices
- Keep filenames unique in a prompt. If you import duplicates in the editor, rename them.
- Place the image near the instructions it relates to.
- Use full‑page screenshots; avoid cropped images. When you need to focus on a region, rely on zoom features after taking a fresh full screenshot.
Example Prompt Snippets
Related
- See
Save Screenshot
for capturing run attachments - See
Prompting Overview
for where to add and reference images