Extract Prompt is a powerful vision-based extraction tool you add on top of a screenshot, that uses advanced AI vision models to read, interpret, and extract data from screenshots. Unlike clipboard-based extraction which requires selectable text, Extract Prompt can extract data from any visible content—including images, PDFs, charts, tables, and complex layouts.
In your prompts, use the screenshot action with the extract_prompt parameter to trigger this feature.
Trajectories & Extract Prompt: Extract prompts work seamlessly with Cyberdesk’s trajectory caching system. When a trajectory is replayed, extract prompts re-execute to capture fresh data from the current screen, ensuring dynamic data extraction even in cached workflows.
Cyberdesk offers three complementary methods for data extraction:
Clipboard Extraction (copy_to_clipboard)
- Direct copy via Ctrl+C, deterministic and instant
- Best for: Selectable text fields (IDs, numbers, dates)
- Limitation: Only works with copyable text
Focused Action (focused_action)
- Dynamic decision-making with vision-based extraction
- Best for: Runtime decisions, conditional logic, setting runtime variables
- Use when: You need to make decisions during workflow execution
Extract Prompt (This Tool)
- Pure vision-based extraction with flexible async processing
- Best for: Large-scale data extraction, non-copyable content, parallel processing
- Use when: Extracting data for output, not needed for navigation decisions
Extract Prompt excels at extracting data from:
- Non-selectable or non-copyable text
- Images, PDFs, scanned documents
- Charts, graphs, and visualizations
- Complex tables and multi-column layouts
- Dynamic content that changes between runs
How It Works
Basic Flow
- Agent takes a screenshot (optionally with zoom)
- Screenshot is sent to a strong vision model with your extraction instruction
- Vision model reads the screen and extracts the requested data
- Extracted text is returned as the tool result
With Async Processing
When using process_async, extractions can run in the background:
- Batch scope: Extractions run in parallel within current tool call batch
- Run scope: Extractions run for entire workflow lifetime, only awaited at the end
The process_async Parameter
The process_async parameter controls when and how extractions are processed:
Synchronous (Default)
screenshot with extract_prompt="Extract customer name and ID as JSON"
- When:
process_async not set, false, or None
- Behavior: Blocks until extraction completes
- Use for: Single extractions where you need immediate results
Batch-Scoped Async
screenshot with extract_prompt="Extract order data" and process_async="batch"
- When:
process_async=true or process_async="batch"
- Behavior: Extraction runs in parallel with other batch extractions, all complete before next agent step
- Use for: Scrolling through lists, extracting from multiple views in sequence
- Special: Extraction agent has access to
upsert_runtime_values tool (see below)
Run-Scoped Async
screenshot with extract_prompt="Extract all product catalog data" and process_async="run"
- When:
process_async="run"
- Behavior: Extraction runs completely in background for entire run, only awaited at final output
- Use for: Large extractions not needed for navigation, maximum parallelism
- Special: Extraction agent has access to
upsert_runtime_values tool (see below)
When to Use Each Mode
Use Synchronous When:
- Extracting a single value you need immediately
- The extraction result determines next steps
- Speed is not critical (< 5 extractions total)
- You want simple, predictable behavior
Example:
"Navigate to the order details page. Take a screenshot with extract_prompt
'Extract the order status as a single word: Pending, Processing, or Shipped'
and use this to decide the next action."
Use Batch-Scoped Async When:
- Scrolling through lists or paginated content
- Extracting from multiple sequential views
- Extractions don’t depend on each other
- Results should be ready for next agent decision
Example:
"Scroll through the inventory list and extract data from each page:
- Take screenshot with extract_prompt='Extract all visible product SKUs, names,
and quantities as JSON array' and process_async='batch'
- Scroll down
- Take screenshot with extract_prompt='Extract all visible product SKUs, names,
and quantities as JSON array' and process_async='batch'
- Repeat until bottom of list
All batch extractions will complete in parallel before the next agent step."
Use Run-Scoped Async When:
- Extracting large amounts of data
- Extraction is only needed for final output, not navigation
- You want maximum performance (fully non-blocking)
- You may want to store specific values as runtime variables mid-extraction
Example:
"Take a screenshot of the dashboard with extract_prompt='Extract all customer
metrics, revenue charts, and KPIs as detailed JSON' and process_async='run'.
Continue with other tasks—this extraction will complete in the background
and be included automatically in the final output."
When using process_async (either "batch" or "run"), the extraction becomes a proper agent loop with access to the upsert_runtime_values tool. This provides powerful flexibility:
The extraction agent can:
- Store specific values as runtime variables via
upsert_runtime_values
- Provide final observations as text
- Do BOTH: Store values AND provide observations
System Prompt (Async Modes - Both Batch and Run)
When using any async mode, the extraction agent receives:
You are an extraction assistant. You have two capabilities:
1. Call upsert_runtime_values to store specific extracted values that should
be available throughout the workflow as {{key_name}} placeholders.
2. Provide final observations as text describing what you see on screen.
You can do BOTH: store specific values AND provide observations, or just do
one or the other. Your final text message will be recorded as the extraction result.
Batch-Scoped: Store and Observe During List Processing:
"Scroll through product catalog:
- Take screenshot with extract_prompt='Extract product_sku as runtime variable
using upsert_runtime_values. Then describe the product details including name,
price, and description.' and process_async='batch'
- Scroll down
- Take screenshot with extract_prompt='Extract product_sku as runtime variable.
Then describe product details.' and process_async='batch'
- Repeat
All extractions run in parallel. The {{product_sku}} values become available before
the next agent step, and all product descriptions are included in final output."
Run-Scoped: Store Specific Fields, Observe the Rest:
"Take a screenshot with extract_prompt='Extract the invoice number as
invoice_number and store it as a runtime variable. Also describe the payment
status and due date in your observation.' and process_async='run'
Continue with other workflow steps. The {{invoice_number}} will be available
immediately once the extraction completes, and the full observation will be
in the final output."
Pure Observation (No Runtime Variables):
"Take a screenshot with extract_prompt='Describe all visible customer
information including name, address, contact details, and account status.
Format as JSON.' and process_async='run'
This large extraction will run in background and be included in final output."
Multiple Runtime Variables with Observation:
"Take a screenshot with extract_prompt='Extract customer_id and order_total
as runtime variables. Then provide a detailed summary of the order including
line items, shipping address, and special instructions.' and process_async='run'
The {{customer_id}} and {{order_total}} will be available for later workflow
steps once extraction completes."
Async extractions (batch or run) with upsert_runtime_values enable powerful patterns where you can store specific runtime variables AND provide comprehensive observations in a single extraction. Synchronous extractions use a simple vision model call without tool access.
Real-World Examples
Synchronous (Simple):
"Navigate to patient record for MRN {patient_mrn}. Take a screenshot with
extract_prompt 'Extract the patient age as a number' to determine if
pediatric workflow is needed."
Batch-Scoped (List Processing):
"Go to the lab results page and extract all results:
- Take screenshot with extract_prompt='Extract all visible lab tests as JSON
array with fields: test_name, result, reference_range, status' and
process_async='batch'
- Scroll down to next page
- Take screenshot with extract_prompt='Extract all visible lab tests as JSON
array with fields: test_name, result, reference_range, status' and
process_async='batch'
- Continue until all pages extracted
All extractions will complete in parallel at end of batch."
Run-Scoped (Large Extraction):
"Take a screenshot of the entire patient chart summary with extract_prompt=
'Extract comprehensive patient data including demographics, vital signs,
medications, allergies, and recent visit notes. Format as detailed JSON.'
and process_async='run'
Continue documenting the visit—the chart data will be extracted in background
and included in final documentation output."
Batch-Scoped with Runtime Variable:
"Navigate to product catalog. For each page:
- Take screenshot with extract_prompt='Extract all product data as JSON array:
{sku, name, price, stock}. If you see a product with sku={target_sku}, store
its price as target_product_price runtime variable.' and process_async='batch'
- Scroll to next page
- Repeat
Once {{target_product_price}} is set, use it for price comparison calculations."
Run-Scoped (Full Catalog):
"Take screenshot of product grid with extract_prompt='Extract all visible products
with full details: SKU, name, description, price, images, ratings, reviews.
Format as comprehensive JSON array.' and process_async='run'
Continue with inventory reconciliation workflow. The full product data will be
extracted in background and included in final export."
Finance: Transaction Data
Synchronous (Decision-Making):
"Open transaction {transaction_id}. Take screenshot with extract_prompt=
'Extract the transaction status: Pending, Completed, or Failed' to determine
if manual review is needed."
Run-Scoped with Multiple Runtime Variables:
"Open the monthly statement. Take screenshot with extract_prompt='Extract
statement_date and total_balance as runtime variables, then provide detailed
breakdown of all transactions, fees, and interest charges.' and process_async='run'
Use {{statement_date}} and {{total_balance}} in the reconciliation report filename."
Insurance: Claims Processing
Batch-Scoped (Multiple Claims):
"Navigate to pending claims queue. For each claim:
- Take screenshot with extract_prompt='Extract claim_id, patient_name,
service_date, amount, status as JSON' and process_async='batch'
- Click next claim
- Repeat for first 20 claims
All claim extractions process in parallel before next step."
Run-Scoped (Detailed Claim):
"Open claim {claim_id}. Take screenshot with extract_prompt='Extract claim_id
and policy_number as runtime variables. Then extract complete claim details:
diagnosis codes, procedure codes, provider info, dates of service, all line items
with amounts, adjustments, and approval status.' and process_async='run'
Continue with approval workflow. The {{claim_id}} and {{policy_number}} are
immediately available, and full claim data will be in final output."
Always Request JSON for Structured Data
Best Practice: Always request strict JSON with explicit keys for structured data extraction.
Good:
extract_prompt='Extract customer data as JSON: {customer_id: string, name: string,
email: string, phone: string, status: string}'
Also Good:
extract_prompt='Extract all visible products as JSON array where each item has:
{sku: string, name: string, price: number, stock: number}'
Avoid:
extract_prompt='Extract the customer information' // Too vague, may return prose
Simple Object:
{
"field_name": "type",
"another_field": "type"
}
Array of Objects:
[
{
"field1": "value1",
"field2": "value2"
}
]
Nested Structure:
{
"customer": {
"id": "string",
"name": "string"
},
"orders": [
{
"order_id": "string",
"total": "number"
}
]
}
Integration with Output Schemas
When your workflow has an output schema defined, extracted data automatically flows into the final structured output.
How It Works
- Define Output Schema in workflow settings:
{
"customer_id": "string",
"order_total": "number",
"order_items": "array",
"shipping_address": "string"
}
- Extract Data During Workflow:
"Take screenshot with extract_prompt='Extract customer_id, order_total,
order_items array, and shipping_address as JSON' and process_async='run'"
- Automatic Transformation:
- At run completion, extraction results (+ any runtime variables + focused action observations) are automatically transformed to match your output schema
- No manual output construction needed!
Optimization Tip: If all your data is already in runtime values and you want zero LLM transformation, set your output schema to {"only_runtime_values": true} to return runtime values directly. The transformation agent also automatically references existing runtime values instead of regenerating them, reducing lossiness. Learn more about output optimization.
Output Schema:
{
"patient_mrn": "string",
"vital_signs": {
"blood_pressure": "string",
"heart_rate": "number",
"temperature": "number"
},
"medications": "array",
"lab_results": "array"
}
Workflow:
"Navigate to patient {patient_name} record.
Take screenshot with extract_prompt='Extract complete patient data as JSON
matching the schema: patient_mrn, vital_signs (blood_pressure, heart_rate,
temperature), medications array, and lab_results array. Store patient_mrn as
a runtime variable for later use.' and process_async='run'
Continue with documentation workflow. The patient_mrn is available as
{{patient_mrn}}, and all extracted data will be automatically transformed
to the output schema."
| Feature | Extract Prompt | Focused Action | Copy to Clipboard |
|---|
| Text Type | ✅ Any visible text | ✅ Any visible text | ⚠️ Only copyable text |
| Speed | 🐌 2-5 sec (sync) ⚡ Non-blocking (async) | 🐌 5-10 sec | ⚡ Instant |
| Decision Making | ❌ Read-only | ✅ Dynamic decisions | ❌ Read-only |
| Runtime Variables | ✅ Yes (async modes only) | ✅ Yes | ✅ Yes |
| Async Options | ✅ Batch & Run scopes | ❌ Always synchronous | ❌ Always synchronous |
| Use Case | Large-scale extraction, non-copyable content | Dynamic navigation, decisions | Fast field extraction |
| Cost | 💳 Vision tokens | 💳 Vision tokens | 💰 No AI cost |
When to Use Each
Use Extract Prompt When:
- Extracting large amounts of data for output
- Working with non-copyable content (images, PDFs, charts)
- Need parallel/async processing for speed
- Extraction doesn’t affect navigation decisions
- Want to store runtime variables from extraction (run-scope)
Use Focused Action When:
- Making dynamic decisions during workflow
- Conditional logic based on screen content
- Selecting from lists based on runtime criteria
- Verification steps that determine next actions
Use Copy to Clipboard When:
- Text is selectable and copyable
- Need deterministic, byte-exact extraction
- Speed is critical
- Working with simple text fields (IDs, numbers, dates)
Advanced Patterns
Combine all three methods for optimal performance:
"Extract order data using the most efficient method for each field:
Fast clipboard extraction (copyable fields):
- Triple-click Order ID field, use copy_to_clipboard with key name 'order_id'
- Triple-click Customer Name field, use copy_to_clipboard with key name 'customer_name'
Vision-based extraction (non-copyable content):
- Take screenshot of order items table with extract_prompt='Extract all line items
as JSON array: {item_name, sku, quantity, unit_price, total}' and process_async='batch'
- Take screenshot of shipping label image with extract_prompt='Extract tracking
number and carrier name' and process_async='batch'
Dynamic decision (navigation):
- Use focused_action to check if order requires special handling based on total amount
All values automatically included in final output."
Extract summary data first, then detailed data based on results:
"Step 1 - Summary (Synchronous):
Take screenshot with extract_prompt='Count how many pending orders are visible
and return as: {pending_count: number}'
Step 2 - Details (Batch-Scoped):
If pending_count > 0, for each pending order:
- Take screenshot with extract_prompt='Extract order details as JSON' and
process_async='batch'
- Click next order
- Repeat
Step 3 - Analytics (Run-Scoped):
Take screenshot of analytics dashboard with extract_prompt='Extract complete
analytics: sales trends, top products, customer segments' and process_async='run'
Continue workflow while analytics extraction runs in background."
Start with high-level extraction, add detail as needed:
"Navigate to dashboard.
Level 1 - Overview (Run-Scoped):
Take screenshot with extract_prompt='Extract high-level metrics: total_revenue,
total_orders, active_customers, growth_rate. Store total_revenue as runtime
variable.' and process_async='run'
Level 2 - Category Breakdown (Batch-Scoped):
For each product category:
- Take screenshot with extract_prompt='Extract category sales data as JSON'
and process_async='batch'
- Click next category
Level 3 - Individual Products (Decision-Based):
Use focused_action to identify top 5 products by revenue, then for each:
- Take screenshot with extract_prompt='Extract detailed product metrics'
and process_async='batch'
All extractions complete before final output generation."
Runtime Variable Integration
Use run-scoped extraction to set variables for later workflow steps:
"Open invoice {invoice_number}.
Take screenshot with extract_prompt='Extract invoice_date, due_date, and
total_amount as runtime variables. Then extract complete line item details,
tax breakdown, and payment terms.' and process_async='run'
While that extraction runs in background:
- Navigate to customer portal
- Login with credentials
- Go to payment page
- Wait for {{total_amount}} to be available
- Enter {{total_amount}} in payment field
- Set payment date to {{due_date}}
- Submit payment
The full invoice details will be in the final output along with payment confirmation."
Error Handling and Best Practices
Clear Instructions
Be Specific: The clearer your extraction prompt, the better the results.
Good:
extract_prompt='Extract patient vital signs as JSON: {systolic: number,
diastolic: number, heart_rate: number, temperature: number, oxygen_saturation: number}'
Avoid:
extract_prompt='Get the vital signs' // Too vague
Field Type Specification
Always specify expected data types:
extract_prompt='Extract order data as JSON: {
order_id: string,
order_date: string (YYYY-MM-DD),
total: number,
items: array of {name: string, quantity: number, price: number},
status: string (one of: Pending, Shipped, Delivered)
}'
Handling Missing Data
Instruct the model how to handle missing fields:
extract_prompt='Extract customer data as JSON. If any field is not visible
or not available, use null for that field: {name: string|null, email: string|null,
phone: string|null, address: string|null}'
Vision Model Limitations
Vision Model Considerations:
- Complex tables may require multiple extractions
- Very small text may not be readable (use zoom if needed)
- Similar-looking characters (0/O, 1/l) may be confused
- For critical data, consider verification steps
Zoom for Better Accuracy
"Take screenshot with zoom_bounding_box=[x1, y1, x2, y2] and extract_prompt=
'Extract the small-print account number visible in the zoomed area' for
higher accuracy on tiny text"
Use Async When Possible
Performance Tip: If extraction results aren’t needed for navigation, always use process_async="run" for maximum parallelism and speed.
Instead of:
Take screenshot, extract field1
Take screenshot, extract field2
Take screenshot, extract field3
Do this:
Take screenshot with extract_prompt='Extract all three fields as JSON:
{field1, field2, field3}'
Before (Slow):
Extract A (synchronous, blocks 3s)
Extract B (synchronous, blocks 3s)
Extract C (synchronous, blocks 3s)
Total: 9 seconds
After (Fast):
Extract A with process_async='batch'
Extract B with process_async='batch'
Extract C with process_async='batch'
All complete in parallel: ~3 seconds total
Common Pitfalls
Avoid These Mistakes:
- Using synchronous for large extractions that aren’t needed for navigation
- Not specifying JSON format for structured data
- Vague instructions that lead to unpredictable results
- Extracting copyable text instead of using
copy_to_clipboard
- Not using async when extracting from multiple pages/views
❌ Incorrect Usage
"Extract the data from the screen" // Too vague
"Get all the information" // No format specified
"Read the table" // No structure guidance
✅ Correct Usage
"Take screenshot with extract_prompt='Extract invoice data as JSON: {invoice_number:
string, date: string, total: number, line_items: array of {description: string,
amount: number}}' and process_async='run'"
Complete Workflow Example
Scenario: E-Commerce Order Processing
Output Schema:
{
"order_id": "string",
"customer": {
"name": "string",
"email": "string"
},
"items": "array",
"total": "number",
"shipping": {
"address": "string",
"method": "string",
"tracking": "string"
}
}
Workflow Instructions:
"Log into admin portal with {admin_username} and {$admin_password}.
Navigate to order {order_id} details page.
Extract order data using optimal methods:
1. Fast clipboard extraction (copyable fields):
- Triple-click Order ID, use copy_to_clipboard with key name 'order_id'
- Triple-click Customer Email, use copy_to_clipboard with key name 'customer_email'
2. Vision extraction with runtime variables (run-scoped):
Take screenshot with extract_prompt='Extract customer_name and order_total as
runtime variables. Then extract complete order details: all line items with
names/prices/quantities, shipping address, shipping method, and tracking number.
Format as detailed JSON matching the output schema.' and process_async='run'
3. Continue workflow while extraction runs:
- Generate shipping label
- Update inventory system with {{order_total}}
- Send notification to {{customer_name}}
- Download packing slip and mark_file_for_export
The extracted data, clipboard values, and runtime variables will automatically
be transformed into the final output_data JSON."
Result:
- Order ID and email extracted instantly via clipboard
- Large extraction runs in background while workflow continues
- Runtime variables available for inventory and notification steps
- All data automatically formatted to output schema
- Maximum performance with hybrid approach
Summary
Extract Prompt is your go-to tool for vision-based data extraction with flexible async processing:
- Synchronous: Simple, immediate extractions for decision-making
- Batch-Scoped Async: Parallel processing within tool batches (great for lists)
- Run-Scoped Async: Fully non-blocking extraction for entire run lifetime
Combine with focused_action for dynamic decisions and copy_to_clipboard for fast clipboard extraction to create highly efficient workflows.
Remember: Choose the right tool for each task—clipboard for copyable text (fast), extract_prompt for large-scale/async extraction (flexible), and focused_action for decisions (dynamic).