Extract Prompt - Cyberdesk Docs

What is Extract Prompt?

Extract Prompt is a powerful vision-based extraction tool you add on top of a screenshot, that uses advanced AI vision models to read, interpret, and extract data from screenshots. Unlike clipboard-based extraction which requires selectable text, Extract Prompt can extract data from any visible content—including images, PDFs, charts, tables, and complex layouts.

In your prompts, use the screenshot action with the extract_prompt parameter to trigger this feature.

Trajectories & Extract Prompt: Extract prompts work seamlessly with Cyberdesk’s trajectory caching system. When a trajectory is replayed, extract prompts re-execute to capture fresh data from the current screen, ensuring dynamic data extraction even in cached workflows.

Why Extract Prompt Exists

Cyberdesk offers three complementary methods for data extraction: Clipboard Extraction (copy_to_clipboard)

Direct copy via Ctrl+C, deterministic and instant
Best for: Selectable text fields (IDs, numbers, dates)
Limitation: Only works with copyable text

Focused Action (focused_action)

Dynamic decision-making with vision-based extraction
Best for: Runtime decisions, conditional logic, setting runtime variables
Use when: You need to make decisions during workflow execution

Extract Prompt (This Tool)

Pure vision-based extraction with flexible async processing
Best for: Large-scale data extraction, non-copyable content, parallel processing
Use when: Extracting data for output, not needed for navigation decisions

Extract Prompt excels at extracting data from:

Non-selectable or non-copyable text
Images, PDFs, scanned documents
Charts, graphs, and visualizations
Complex tables and multi-column layouts
Dynamic content that changes between runs

How It Works

Basic Flow

Agent takes a screenshot (optionally with zoom)
Screenshot is sent to a strong vision model with your extraction instruction
Vision model reads the screen and extracts the requested data
Extracted text is returned as the tool result

With Async Processing

When using process_async, extractions can run in the background:

Batch scope: Extractions run in parallel within current tool call batch
Run scope: Extractions run for entire workflow lifetime, only awaited at the end

The process_async Parameter

The process_async parameter controls when and how extractions are processed:

Synchronous (Default)

screenshot with extract_prompt="Extract customer name and ID as JSON"

When: process_async not set, false, or None
Behavior: Blocks until extraction completes
Use for: Single extractions where you need immediate results

Batch-Scoped Async

screenshot with extract_prompt="Extract order data" and process_async="batch"

When: process_async=true or process_async="batch"
Behavior: Extraction runs in parallel with other batch extractions, all complete before next agent step
Use for: Scrolling through lists, extracting from multiple views in sequence
Special: Extraction agent has access to upsert_runtime_values tool (see below)

Run-Scoped Async

screenshot with extract_prompt="Extract all product catalog data" and process_async="run"

When: process_async="run"
Behavior: Extraction runs completely in background for entire run, only awaited at final output
Use for: Large extractions not needed for navigation, maximum parallelism
Special: Extraction agent has access to upsert_runtime_values tool (see below)

When to Use Each Mode

Use Synchronous When:

Extracting a single value you need immediately
The extraction result determines next steps
Speed is not critical (< 5 extractions total)
You want simple, predictable behavior

Example:

"Navigate to the order details page. Take a screenshot with extract_prompt 
'Extract the order status as a single word: Pending, Processing, or Shipped' 
and use this to decide the next action."

Use Batch-Scoped Async When:

Scrolling through lists or paginated content
Extracting from multiple sequential views
Extractions don’t depend on each other
Results should be ready for next agent decision

Example:

"Scroll through the inventory list and extract data from each page:
- Take screenshot with extract_prompt='Extract all visible product SKUs, names, 
  and quantities as JSON array' and process_async='batch'
- Scroll down
- Take screenshot with extract_prompt='Extract all visible product SKUs, names, 
  and quantities as JSON array' and process_async='batch'
- Repeat until bottom of list

All batch extractions will complete in parallel before the next agent step."

Use Run-Scoped Async When:

Extracting large amounts of data
Extraction is only needed for final output, not navigation
You want maximum performance (fully non-blocking)
You may want to store specific values as runtime variables mid-extraction

Example:

"Take a screenshot of the dashboard with extract_prompt='Extract all customer 
metrics, revenue charts, and KPIs as detailed JSON' and process_async='run'. 
Continue with other tasks—this extraction will complete in the background 
and be included automatically in the final output."

Async Extraction: Advanced Capabilities with upsert_runtime_values

When using process_async (either "batch" or "run"), the extraction becomes a proper agent loop with access to the upsert_runtime_values tool. This provides powerful flexibility:

The upsert_runtime_values Tool

The extraction agent can:

Store specific values as runtime variables via upsert_runtime_values
Provide final observations as text
Do BOTH: Store values AND provide observations

System Prompt (Async Modes - Both Batch and Run)

When using any async mode, the extraction agent receives:

You are an extraction assistant. You have two capabilities:

1. Call upsert_runtime_values to store specific extracted values that should 
   be available throughout the workflow as {{key_name}} placeholders.

2. Provide final observations as text describing what you see on screen.

You can do BOTH: store specific values AND provide observations, or just do 
one or the other. Your final text message will be recorded as the extraction result.

Use Cases for Async Extraction with Runtime Variables

Batch-Scoped: Store and Observe During List Processing:

"Scroll through product catalog:
- Take screenshot with extract_prompt='Extract product_sku as runtime variable 
  using upsert_runtime_values. Then describe the product details including name, 
  price, and description.' and process_async='batch'
- Scroll down
- Take screenshot with extract_prompt='Extract product_sku as runtime variable. 
  Then describe product details.' and process_async='batch'
- Repeat

All extractions run in parallel. The {{product_sku}} values become available before 
the next agent step, and all product descriptions are included in final output."

Run-Scoped: Store Specific Fields, Observe the Rest:

"Take a screenshot with extract_prompt='Extract the invoice number as 
invoice_number and store it as a runtime variable. Also describe the payment 
status and due date in your observation.' and process_async='run'

Continue with other workflow steps. The {{invoice_number}} will be available 
immediately once the extraction completes, and the full observation will be 
in the final output."

Pure Observation (No Runtime Variables):

"Take a screenshot with extract_prompt='Describe all visible customer 
information including name, address, contact details, and account status. 
Format as JSON.' and process_async='run'

This large extraction will run in background and be included in final output."

Multiple Runtime Variables with Observation:

"Take a screenshot with extract_prompt='Extract customer_id and order_total 
as runtime variables. Then provide a detailed summary of the order including 
line items, shipping address, and special instructions.' and process_async='run'

The {{customer_id}} and {{order_total}} will be available for later workflow 
steps once extraction completes."

Async extractions (batch or run) with upsert_runtime_values enable powerful patterns where you can store specific runtime variables AND provide comprehensive observations in a single extraction. Synchronous extractions use a simple vision model call without tool access.

Real-World Examples

Healthcare: Patient Data Extraction

Synchronous (Simple):

"Navigate to patient record for MRN {patient_mrn}. Take a screenshot with 
extract_prompt 'Extract the patient age as a number' to determine if 
pediatric workflow is needed."

Batch-Scoped (List Processing):

"Go to the lab results page and extract all results:
- Take screenshot with extract_prompt='Extract all visible lab tests as JSON 
  array with fields: test_name, result, reference_range, status' and 
  process_async='batch'
- Scroll down to next page
- Take screenshot with extract_prompt='Extract all visible lab tests as JSON 
  array with fields: test_name, result, reference_range, status' and 
  process_async='batch'
- Continue until all pages extracted

All extractions will complete in parallel at end of batch."

Run-Scoped (Large Extraction):

"Take a screenshot of the entire patient chart summary with extract_prompt=
'Extract comprehensive patient data including demographics, vital signs, 
medications, allergies, and recent visit notes. Format as detailed JSON.' 
and process_async='run'

Continue documenting the visit—the chart data will be extracted in background 
and included in final documentation output."

E-Commerce: Product Catalog Extraction

Batch-Scoped with Runtime Variable:

"Navigate to product catalog. For each page:
- Take screenshot with extract_prompt='Extract all product data as JSON array: 
  {sku, name, price, stock}. If you see a product with sku={target_sku}, store 
  its price as target_product_price runtime variable.' and process_async='batch'
- Scroll to next page
- Repeat

Once {{target_product_price}} is set, use it for price comparison calculations."

Run-Scoped (Full Catalog):

"Take screenshot of product grid with extract_prompt='Extract all visible products 
with full details: SKU, name, description, price, images, ratings, reviews. 
Format as comprehensive JSON array.' and process_async='run'

Continue with inventory reconciliation workflow. The full product data will be 
extracted in background and included in final export."

Finance: Transaction Data

Synchronous (Decision-Making):

"Open transaction {transaction_id}. Take screenshot with extract_prompt=
'Extract the transaction status: Pending, Completed, or Failed' to determine 
if manual review is needed."

Run-Scoped with Multiple Runtime Variables:

"Open the monthly statement. Take screenshot with extract_prompt='Extract 
statement_date and total_balance as runtime variables, then provide detailed 
breakdown of all transactions, fees, and interest charges.' and process_async='run'

Use {{statement_date}} and {{total_balance}} in the reconciliation report filename."

Insurance: Claims Processing

Batch-Scoped (Multiple Claims):

"Navigate to pending claims queue. For each claim:
- Take screenshot with extract_prompt='Extract claim_id, patient_name, 
  service_date, amount, status as JSON' and process_async='batch'
- Click next claim
- Repeat for first 20 claims

All claim extractions process in parallel before next step."

Run-Scoped (Detailed Claim):

"Open claim {claim_id}. Take screenshot with extract_prompt='Extract claim_id 
and policy_number as runtime variables. Then extract complete claim details: 
diagnosis codes, procedure codes, provider info, dates of service, all line items 
with amounts, adjustments, and approval status.' and process_async='run'

Continue with approval workflow. The {{claim_id}} and {{policy_number}} are 
immediately available, and full claim data will be in final output."

Formatting Guidelines

Always Request JSON for Structured Data

Best Practice: Always request strict JSON with explicit keys for structured data extraction.

Good:

extract_prompt='Extract customer data as JSON: {customer_id: string, name: string, 
email: string, phone: string, status: string}'

Also Good:

extract_prompt='Extract all visible products as JSON array where each item has: 
{sku: string, name: string, price: number, stock: number}'

Avoid:

extract_prompt='Extract the customer information'  // Too vague, may return prose

Multi-Field Extraction Templates

Simple Object:

{
  "field_name": "type",
  "another_field": "type"
}

Array of Objects:

[
  {
    "field1": "value1",
    "field2": "value2"
  }
]

Nested Structure:

{
  "customer": {
    "id": "string",
    "name": "string"
  },
  "orders": [
    {
      "order_id": "string",
      "total": "number"
    }
  ]
}

Integration with Output Schemas

When your workflow has an output schema defined, extracted data automatically flows into the final structured output.

How It Works

Define Output Schema in workflow settings:

{
  "customer_id": "string",
  "order_total": "number",
  "order_items": "array",
  "shipping_address": "string"
}

Extract Data During Workflow:

"Take screenshot with extract_prompt='Extract customer_id, order_total, 
order_items array, and shipping_address as JSON' and process_async='run'"

Automatic Transformation:

At run completion, extraction results (+ any runtime variables + focused action observations) are automatically transformed to match your output schema
No manual output construction needed!

Optimization Tip: If all your data is already in runtime values and you want zero LLM transformation, set your output schema to {"only_runtime_values": true} to return runtime values directly. The transformation agent also automatically references existing runtime values instead of regenerating them, reducing lossiness. Learn more about output optimization.

Example: Healthcare Record Extraction

Output Schema:

{
  "patient_mrn": "string",
  "vital_signs": {
    "blood_pressure": "string",
    "heart_rate": "number",
    "temperature": "number"
  },
  "medications": "array",
  "lab_results": "array"
}

Workflow:

"Navigate to patient {patient_name} record.

Take screenshot with extract_prompt='Extract complete patient data as JSON 
matching the schema: patient_mrn, vital_signs (blood_pressure, heart_rate, 
temperature), medications array, and lab_results array. Store patient_mrn as 
a runtime variable for later use.' and process_async='run'

Continue with documentation workflow. The patient_mrn is available as 
{{patient_mrn}}, and all extracted data will be automatically transformed 
to the output schema."

Comparison with Other Extraction Methods

Feature	Extract Prompt	Focused Action	Copy to Clipboard
Text Type	✅ Any visible text	✅ Any visible text	⚠️ Only copyable text
Speed	🐌 2-5 sec (sync) ⚡ Non-blocking (async)	🐌 5-10 sec	⚡ Instant
Decision Making	❌ Read-only	✅ Dynamic decisions	❌ Read-only
Runtime Variables	✅ Yes (async modes only)	✅ Yes	✅ Yes
Async Options	✅ Batch & Run scopes	❌ Always synchronous	❌ Always synchronous
Use Case	Large-scale extraction, non-copyable content	Dynamic navigation, decisions	Fast field extraction
Cost	💳 Vision tokens	💳 Vision tokens	💰 No AI cost

When to Use Each

Use Extract Prompt When:

Extracting large amounts of data for output
Working with non-copyable content (images, PDFs, charts)
Need parallel/async processing for speed
Extraction doesn’t affect navigation decisions
Want to store runtime variables from extraction (run-scope)

Use Focused Action When:

Making dynamic decisions during workflow
Conditional logic based on screen content
Selecting from lists based on runtime criteria
Verification steps that determine next actions

Use Copy to Clipboard When:

Text is selectable and copyable
Need deterministic, byte-exact extraction
Speed is critical
Working with simple text fields (IDs, numbers, dates)

Advanced Patterns

Hybrid Extraction Strategy

Combine all three methods for optimal performance:

"Extract order data using the most efficient method for each field:

Fast clipboard extraction (copyable fields):
- Triple-click Order ID field, use copy_to_clipboard with key name 'order_id'
- Triple-click Customer Name field, use copy_to_clipboard with key name 'customer_name'

Vision-based extraction (non-copyable content):
- Take screenshot of order items table with extract_prompt='Extract all line items 
  as JSON array: {item_name, sku, quantity, unit_price, total}' and process_async='batch'
- Take screenshot of shipping label image with extract_prompt='Extract tracking 
  number and carrier name' and process_async='batch'

Dynamic decision (navigation):
- Use focused_action to check if order requires special handling based on total amount

All values automatically included in final output."

Cascading Extractions

Extract summary data first, then detailed data based on results:

"Step 1 - Summary (Synchronous):
Take screenshot with extract_prompt='Count how many pending orders are visible 
and return as: {pending_count: number}'

Step 2 - Details (Batch-Scoped):
If pending_count > 0, for each pending order:
- Take screenshot with extract_prompt='Extract order details as JSON' and 
  process_async='batch'
- Click next order
- Repeat

Step 3 - Analytics (Run-Scoped):
Take screenshot of analytics dashboard with extract_prompt='Extract complete 
analytics: sales trends, top products, customer segments' and process_async='run'

Continue workflow while analytics extraction runs in background."

Progressive Detail Extraction

Start with high-level extraction, add detail as needed:

"Navigate to dashboard.

Level 1 - Overview (Run-Scoped):
Take screenshot with extract_prompt='Extract high-level metrics: total_revenue, 
total_orders, active_customers, growth_rate. Store total_revenue as runtime 
variable.' and process_async='run'

Level 2 - Category Breakdown (Batch-Scoped):
For each product category:
- Take screenshot with extract_prompt='Extract category sales data as JSON' 
  and process_async='batch'
- Click next category

Level 3 - Individual Products (Decision-Based):
Use focused_action to identify top 5 products by revenue, then for each:
- Take screenshot with extract_prompt='Extract detailed product metrics' 
  and process_async='batch'

All extractions complete before final output generation."

Runtime Variable Integration

Use run-scoped extraction to set variables for later workflow steps:

"Open invoice {invoice_number}.

Take screenshot with extract_prompt='Extract invoice_date, due_date, and 
total_amount as runtime variables. Then extract complete line item details, 
tax breakdown, and payment terms.' and process_async='run'

While that extraction runs in background:
- Navigate to customer portal
- Login with credentials
- Go to payment page
- Wait for {{total_amount}} to be available
- Enter {{total_amount}} in payment field
- Set payment date to {{due_date}}
- Submit payment

The full invoice details will be in the final output along with payment confirmation."

Model Override (Optional)

You can optionally specify which model to use for extraction by adding the model parameter:

Take screenshot with extract_prompt="Extract all invoice data as JSON" and model="Sonnet 4.5"

Unlike focused_action, any vision-capable model works for extract_prompt—it doesn’t need computer use capabilities since it only reads the screen without performing actions.

In the prompt editor, type model="" or use the / slash menu and select “Model Override” to access the model picker. See Model Configuration for details on per-action model overrides and using different models for cost optimization.

Error Handling and Best Practices

Clear Instructions

Be Specific: The clearer your extraction prompt, the better the results.

Good:

extract_prompt='Extract patient vital signs as JSON: {systolic: number, 
diastolic: number, heart_rate: number, temperature: number, oxygen_saturation: number}'

Avoid:

extract_prompt='Get the vital signs'  // Too vague

Field Type Specification

Always specify expected data types:

extract_prompt='Extract order data as JSON: {
  order_id: string,
  order_date: string (YYYY-MM-DD),
  total: number,
  items: array of {name: string, quantity: number, price: number},
  status: string (one of: Pending, Shipped, Delivered)
}'

Handling Missing Data

Instruct the model how to handle missing fields:

extract_prompt='Extract customer data as JSON. If any field is not visible 
or not available, use null for that field: {name: string|null, email: string|null, 
phone: string|null, address: string|null}'

Vision Model Limitations

Vision Model Considerations:

Complex tables may require multiple extractions
Very small text may not be readable (use zoom if needed)
Similar-looking characters (0/O, 1/l) may be confused
For critical data, consider verification steps

Zoom for Better Accuracy

"Take screenshot with zoom_bounding_box=[x1, y1, x2, y2] and extract_prompt=
'Extract the small-print account number visible in the zoomed area' for 
higher accuracy on tiny text"

Performance Optimization

Use Async When Possible

Performance Tip: If extraction results aren’t needed for navigation, always use process_async="run" for maximum parallelism and speed.

Batch Similar Extractions

Instead of:

Take screenshot, extract field1
Take screenshot, extract field2  
Take screenshot, extract field3

Do this:

Take screenshot with extract_prompt='Extract all three fields as JSON: 
{field1, field2, field3}'

Minimize Synchronous Extractions

Before (Slow):

Extract A (synchronous, blocks 3s)
Extract B (synchronous, blocks 3s)
Extract C (synchronous, blocks 3s)
Total: 9 seconds

After (Fast):

Extract A with process_async='batch'
Extract B with process_async='batch'
Extract C with process_async='batch'
All complete in parallel: ~3 seconds total

Common Pitfalls

Avoid These Mistakes:

Using synchronous for large extractions that aren’t needed for navigation
Not specifying JSON format for structured data
Vague instructions that lead to unpredictable results
Extracting copyable text instead of using copy_to_clipboard
Not using async when extracting from multiple pages/views

❌ Incorrect Usage

"Extract the data from the screen"  // Too vague
"Get all the information"           // No format specified
"Read the table"                    // No structure guidance

✅ Correct Usage

"Take screenshot with extract_prompt='Extract invoice data as JSON: {invoice_number: 
string, date: string, total: number, line_items: array of {description: string, 
amount: number}}' and process_async='run'"

Complete Workflow Example

Scenario: E-Commerce Order Processing

Output Schema:

{
  "order_id": "string",
  "customer": {
    "name": "string",
    "email": "string"
  },
  "items": "array",
  "total": "number",
  "shipping": {
    "address": "string",
    "method": "string",
    "tracking": "string"
  }
}

Workflow Instructions:

"Log into admin portal with {admin_username} and {$admin_password}.

Navigate to order {order_id} details page.

Extract order data using optimal methods:

1. Fast clipboard extraction (copyable fields):
   - Triple-click Order ID, use copy_to_clipboard with key name 'order_id'
   - Triple-click Customer Email, use copy_to_clipboard with key name 'customer_email'

2. Vision extraction with runtime variables (run-scoped):
   Take screenshot with extract_prompt='Extract customer_name and order_total as 
   runtime variables. Then extract complete order details: all line items with 
   names/prices/quantities, shipping address, shipping method, and tracking number. 
   Format as detailed JSON matching the output schema.' and process_async='run'

3. Continue workflow while extraction runs:
   - Generate shipping label
   - Update inventory system with {{order_total}}
   - Send notification to {{customer_name}}
   - Download packing slip and mark_file_for_export

The extracted data, clipboard values, and runtime variables will automatically 
be transformed into the final output_data JSON."

Result:

Order ID and email extracted instantly via clipboard
Large extraction runs in background while workflow continues
Runtime variables available for inventory and notification steps
All data automatically formatted to output schema
Maximum performance with hybrid approach

Summary

Extract Prompt is your go-to tool for vision-based data extraction with flexible async processing:

Synchronous: Simple, immediate extractions for decision-making
Batch-Scoped Async: Parallel processing within tool batches (great for lists)
Run-Scoped Async: Fully non-blocking extraction for entire run lifetime

Combine with focused_action for dynamic decisions, copy_to_clipboard for fast clipboard extraction, and looping tools for efficient batch processing to create highly efficient workflows.

Remember: Choose the right tool for each task—clipboard for copyable text (fast), extract_prompt for large-scale/async extraction (flexible), focused_action for decisions (dynamic), and loops for repetitive patterns (efficient).

Getting started

SDK Guides

Cyberdriver

Workflow Prompting

Webhooks

Concepts

Additional Details

​What is Extract Prompt?

​Why Extract Prompt Exists

​How It Works

​Basic Flow

​With Async Processing

​The process_async Parameter

​Synchronous (Default)

​Batch-Scoped Async

​Run-Scoped Async

​When to Use Each Mode

​Use Synchronous When:

​Use Batch-Scoped Async When:

​Use Run-Scoped Async When:

​Async Extraction: Advanced Capabilities with upsert_runtime_values

​The upsert_runtime_values Tool

​System Prompt (Async Modes - Both Batch and Run)

​Use Cases for Async Extraction with Runtime Variables

​Real-World Examples

​Healthcare: Patient Data Extraction

​E-Commerce: Product Catalog Extraction

​Finance: Transaction Data

​Insurance: Claims Processing

​Formatting Guidelines

​Always Request JSON for Structured Data

​Multi-Field Extraction Templates

​Integration with Output Schemas

​How It Works

​Example: Healthcare Record Extraction

​Comparison with Other Extraction Methods

​When to Use Each

​Advanced Patterns

​Hybrid Extraction Strategy

​Cascading Extractions

​Progressive Detail Extraction

​Runtime Variable Integration

​Model Override (Optional)

​Error Handling and Best Practices

​Clear Instructions

​Field Type Specification

​Handling Missing Data

​Vision Model Limitations

​Zoom for Better Accuracy

​Performance Optimization

​Use Async When Possible

​Batch Similar Extractions

​Minimize Synchronous Extractions

​Common Pitfalls

​❌ Incorrect Usage

​✅ Correct Usage

​Complete Workflow Example

​Scenario: E-Commerce Order Processing

​Summary

What is Extract Prompt?

Why Extract Prompt Exists

How It Works

Basic Flow

With Async Processing

The process_async Parameter

Synchronous (Default)

Batch-Scoped Async

Run-Scoped Async

When to Use Each Mode

Use Synchronous When:

Use Batch-Scoped Async When:

Use Run-Scoped Async When:

Async Extraction: Advanced Capabilities with upsert_runtime_values

The upsert_runtime_values Tool

System Prompt (Async Modes - Both Batch and Run)

Use Cases for Async Extraction with Runtime Variables

Real-World Examples

Healthcare: Patient Data Extraction

E-Commerce: Product Catalog Extraction

Finance: Transaction Data

Insurance: Claims Processing

Formatting Guidelines

Always Request JSON for Structured Data

Multi-Field Extraction Templates

Integration with Output Schemas

How It Works

Example: Healthcare Record Extraction

Comparison with Other Extraction Methods

When to Use Each

Advanced Patterns

Hybrid Extraction Strategy

Cascading Extractions

Progressive Detail Extraction

Runtime Variable Integration

Model Override (Optional)

Error Handling and Best Practices

Clear Instructions

Field Type Specification

Handling Missing Data

Vision Model Limitations

Zoom for Better Accuracy

Performance Optimization

Use Async When Possible

Batch Similar Extractions

Minimize Synchronous Extractions

Common Pitfalls

❌ Incorrect Usage

✅ Correct Usage

Complete Workflow Example

Scenario: E-Commerce Order Processing

Summary