> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cyberdesk.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Async Extraction Patterns

> Understanding synchronous, batch-scoped, and run-scoped async extraction modes

## Overview

Cyberdesk provides flexible async extraction modes that let you optimize workflow performance based on when and how you need extracted data. Understanding these patterns is key to building fast, efficient workflows.

<Info>
  Async extraction works seamlessly with [Cyberdesk's trajectory caching system](/concepts/trajectories). During trajectory replay, extract prompts re-execute to capture fresh data, so you get both the speed benefits of caching and the flexibility of dynamic data extraction.
</Info>

## The Three Processing Modes

### Synchronous (Default)

**When**: `process_async` is omitted

**Behavior**: Extraction blocks until complete

**Processing Time**: 2-5 seconds per extraction

**Use When**:

* You need the result immediately for the next decision
* Extracting a single value
* The extraction determines workflow branching
* Simple workflows with \< 5 total extractions

**Example**:

```text theme={null}
"Navigate to order details. Take a screenshot with extract_prompt='Extract 
the order status as one word: Pending, Processing, or Shipped' to determine 
if the order needs manual intervention."
```

**Timing Diagram**:

```
Agent Step 1 → Extract (3s) → Agent Step 2 → Extract (3s) → Agent Step 3
Total: 6 seconds of extraction time
```

### Batch-Scoped Async

**When**: `process_async="batch"`

**Behavior**: When multiple screenshot extractions run in the same batched tool phase, they run in parallel and complete before the next agent step. If Cyberdesk executes the screenshot as a standalone tool call instead, it falls back to synchronous extraction.

**Processing Time**: \~3 seconds for entire batch (no matter how many extractions)

**Use When**:

* Scrolling through lists or paginated content
* Extracting from multiple sequential views
* Extractions don't depend on each other
* Results should be ready for next agent decision
* Want to store runtime variables from extractions before next agent turn

**Runtime Values**: Like any `extract_prompt` call, the extraction agent can call `upsert_runtime_values` when your prompt explicitly tells it to save or store values. In batch mode, those values become available before the next agent step once the batch finishes.

**Example**:

```text theme={null}
"Scroll through the product catalog and extract all data:
- Take screenshot with extract_prompt='Extract visible products as JSON' 
  and process_async='batch'
- Scroll down
- Take screenshot with extract_prompt='Extract visible products as JSON' 
  and process_async='batch'
- Scroll down
- Take screenshot with extract_prompt='Extract visible products as JSON' 
  and process_async='batch'

All extractions complete in parallel before next agent turn."
```

**Timing Diagram**:

```
Agent Step: [Screenshot + Extract] → [Scroll] → [Screenshot + Extract] → [Scroll] → [Screenshot + Extract]
            └─────────── All 3 extractions run in parallel (3s total) ──────────┘
                                                                                  ↓
                                                                          Agent Step 2
Total: ~3 seconds for all extractions combined
```

**Performance Benefit**: 3-5x faster than synchronous when extracting from multiple views in one batched phase

### Run-Scoped Async

**When**: `process_async="run"`

**Behavior**: Extraction runs completely in background for entire workflow, only awaited at final output generation

**Requirement**: The workflow must have an `output_schema`; otherwise Cyberdesk returns an error and asks you to use synchronous or batch mode instead

**Processing Time**: Non-blocking, completes while workflow continues

**Use When**:

* Large data extractions not needed for navigation
* Extraction is only for final output
* You want maximum parallelism
* Need to set runtime variables from extraction that won't be used until later

**Runtime Values**: Like any `extract_prompt` call, the extraction agent can call `upsert_runtime_values` when your prompt explicitly tells it to save or store values. In run scope, those values become available once the background extraction finishes.

**Example**:

```text theme={null}
"Navigate to analytics dashboard. Take screenshot with extract_prompt='Extract 
complete analytics data: all metrics, charts, KPIs, trends as detailed JSON' 
and process_async='run'

Continue with report generation workflow. The analytics extraction will complete 
in the background and be included automatically in final output."
```

**Timing Diagram**:

```
Agent Step 1 → [Start Extract (non-blocking)] → Agent Step 2 → Agent Step 3 → Agent Step 4
                        ↓ (running in background)
                        ↓
                        ↓
               [Extraction completes while workflow continues]
                        ↓
                        └─────────────────────────────────────→ Final Output Generation
                                                                 (waits for completion)
Total: 0 seconds of blocking time, extraction happens in parallel with workflow
```

**Performance Benefit**: Maximum parallelism, zero blocking time during workflow execution

## Comparing the Modes

| Aspect                | Synchronous                                             | Batch-Scoped                                            | Run-Scoped                                              |
| --------------------- | ------------------------------------------------------- | ------------------------------------------------------- | ------------------------------------------------------- |
| **Blocking**          | ✅ Blocks each time                                      | ✅ Blocks at end of batch                                | ❌ Non-blocking                                          |
| **Parallelism**       | ❌ Sequential                                            | ✅ Within batch                                          | ✅ Across entire run                                     |
| **Best For**          | Single extraction, decisions                            | Lists, pagination                                       | Large output data                                       |
| **Result Available**  | Immediately                                             | Before next step                                        | At final output                                         |
| **Runtime Variables** | ✅ Yes, if the prompt explicitly asks to save/store them | ✅ Yes, if the prompt explicitly asks to save/store them | ✅ Yes, if the prompt explicitly asks to save/store them |
| **Performance**       | Slowest (N × 3s)                                        | Fast (3s per batch)                                     | Fastest (0s blocking)                                   |

## Performance Examples

### Scenario: Extract from 10 Pages of Data

**Synchronous**:

```
Page 1: Extract (3s) → Navigate
Page 2: Extract (3s) → Navigate
...
Page 10: Extract (3s)
Total: 30 seconds of extraction time
```

**Batch-Scoped**:

```
Batch 1: [Page 1 Extract, Navigate, Page 2 Extract, Navigate, Page 3 Extract]
         All 3 extractions in parallel: 3s
Batch 2: [Page 4 Extract, Navigate, Page 5 Extract, Navigate, Page 6 Extract]
         All 3 extractions in parallel: 3s
Batch 3: [Page 7 Extract, Navigate, Page 8 Extract, Navigate, Page 9 Extract]
         All 3 extractions in parallel: 3s
Batch 4: [Page 10 Extract]
         1 extraction: 3s
Total: 12 seconds of extraction time (2.5x faster)
```

**Run-Scoped** (if extraction not needed for navigation):

```
Start: Launch extraction for all pages → Continue workflow
       All 10 extractions run in background while workflow completes
Total: 0 seconds of blocking time (10x faster)
```

## Advanced Pattern: Hybrid Extraction

Combine multiple modes for optimal performance:

```text theme={null}
"Navigate to customer portal.

Step 1 - Quick Decision (Synchronous):
Take screenshot with extract_prompt='Extract customer account status: 
Active, Suspended, or Closed' to determine workflow path.

Step 2 - List Processing (Batch-Scoped):
If status is Active, extract recent transactions:
- Take screenshot with extract_prompt='Extract visible transactions as JSON' 
  and process_async='batch'
- Scroll down
- Take screenshot with extract_prompt='Extract visible transactions as JSON' 
  and process_async='batch'
- Repeat for all pages

Step 3 - Detailed Analytics (Run-Scoped):
Take screenshot with extract_prompt='Extract complete account analytics: 
spending patterns, category breakdown, payment history' and process_async='run'

Continue with report generation. The analytics extraction completes in background."
```

**Result**:

* Fast decision making (synchronous where needed)
* Efficient list processing (batch-scoped parallelism)
* Zero blocking for large data (run-scoped for final output)

## Extraction Modes with Runtime Variables

All `extract_prompt` modes use the extraction agent, and that agent can call `upsert_runtime_values` when your prompt explicitly tells it to save or store runtime values. The main difference is timing:

**Synchronous**: Variables are available immediately when the extraction returns
**Batch-Scoped**: Variables are available before the next agent step once the batch finishes
**Run-Scoped**: Variables are available when the background extraction completes

### The Extraction Agent

Every `extract_prompt` call uses the extraction agent. Async modes (`batch` and `run`) add concurrency on top of the same capabilities:

1. **Call upsert\_runtime\_values** to store specific fields
2. **Provide final observations** as text
3. **Do both**: Store values AND provide observations

### System Prompt (All Extraction Modes)

When an extraction runs, the extraction agent receives guidance like:

```
You are an extraction assistant. You have two capabilities:

1. Call upsert_runtime_values to store specific extracted values that should 
   be available throughout the workflow as {{key_name}} placeholders.

2. Provide final observations as text describing what you see on screen.

You can do BOTH: store specific values AND provide observations, or just do 
one or the other. Your final text message will be recorded as the extraction result.
```

### Example: Synchronous with Runtime Variables (Available Immediately)

```text theme={null}
"Open the customer profile.

Take screenshot with extract_prompt='Extract customer_id and membership_tier and
store them as runtime variables using upsert_runtime_values. Then provide a
short summary of the visible account status.'

Use {{customer_id}} immediately in the next step."
```

**What Happens**:

1. Screenshot taken
2. Extraction agent analyzes screenshot
3. Calls `upsert_runtime_values({customer_id: "C-1024", membership_tier: "Gold"})`
4. Provides observation about the visible account status
5. **Variables are available immediately** when the extraction returns
6. Agent proceeds with `{{customer_id}}` and `{{membership_tier}}` available

### Example: Batch-Scoped with Runtime Variables (Available Before Next Step)

```text theme={null}
"Scroll through order list:
- Take screenshot with extract_prompt='Extract order_id as runtime variable 
  using upsert_runtime_values, then describe order status and customer name' 
  and process_async='batch'
- Use {{order_id}} to determine if this order needs special handling
- If yes, click on order
- Repeat for next order"
```

**What Happens**:

1. Screenshot taken
2. Extraction agent analyzes screenshot
3. Calls `upsert_runtime_values({order_id: "ORD-123"})`
4. Provides observation about status and customer
5. All batch extractions complete in parallel
6. **Variables available before next agent step** - can be used immediately
7. Agent proceeds with `{{order_id}}` available

### Example: Run-Scoped with Runtime Variables (Available When Extraction Completes)

```text theme={null}
"Open invoice {invoice_number}.

Take screenshot with extract_prompt='Extract the invoice_date and total_amount 
and store them as runtime variables using upsert_runtime_values. Then provide 
a detailed description of all line items, tax breakdown, and payment terms.' 
and process_async='run'

Continue with payment workflow. The {{invoice_date}} and {{total_amount}} 
variables will be available once extraction completes in background, and the 
full invoice details will be in the final output."
```

**What Happens**:

1. Extraction starts in background
2. Workflow continues with other tasks
3. Extraction agent analyzes screenshot (in background)
4. Calls `upsert_runtime_values({invoice_date: "2024-01-15", total_amount: 1250.00})`
5. **Variables become available** once extraction completes
6. Agent then provides detailed observation about line items, taxes, etc.
7. Both the runtime variables and observation text are included in final output

### Example: Pure Observation (No Runtime Variables)

```text theme={null}
"Take screenshot with extract_prompt='Describe all customer information 
visible on screen including contact details, order history, and preferences. 
Format as detailed JSON.' and process_async='run'

This large extraction runs in background and will be in final output."
```

### Example: Multiple Runtime Variables

```text theme={null}
"Take screenshot with extract_prompt='Extract customer_id, order_total, and 
expected_delivery_date as runtime variables using upsert_runtime_values. Then 
provide a summary of order contents, shipping address, and special instructions.' 
and process_async='run'

Use {{customer_id}} and {{order_total}} in subsequent workflow steps once available."
```

### Array and Object Operators

When accumulating data across multiple extractions (e.g., scrolling through a list), use MongoDB-style operators to append to arrays instead of replacing values:

| Operator   | Description                      | Example                                  |
| ---------- | -------------------------------- | ---------------------------------------- |
| `$append`  | Append item to array             | `{"items": {"$append": "new_item"}}`     |
| `$prepend` | Prepend item to array            | `{"items": {"$prepend": "first"}}`       |
| `$concat`  | Concatenate arrays               | `{"items": {"$concat": ["a", "b"]}}`     |
| `$merge`   | Shallow merge objects            | `{"config": {"$merge": {"key": "val"}}}` |
| `$remove`  | Remove first occurrence by value | `{"tags": {"$remove": "old"}}`           |
| `$pop`     | Remove last element              | `{"stack": {"$pop": true}}`              |

#### Example: Accumulating Extracted Items Across Pages

```text theme={null}
"Scroll through the products list and extract all items:

Page 1:
- Take screenshot with extract_prompt='Extract all visible products as JSON array. 
  For each product, call upsert_runtime_values with {\"products\": {\"$append\": {...product data...}}} 
  to add it to the running list.' and process_async='batch'

Page 2:
- Scroll down
- Take screenshot with extract_prompt='Extract all visible products as JSON array. 
  For each product, call upsert_runtime_values with {\"products\": {\"$append\": {...product data...}}} 
  to add it to the running list.' and process_async='batch'

After all pages, {{products}} contains the complete list of all extracted products."
```

<Tip>
  The `$append` operator creates the array if it doesn't exist, so you don't need to initialize `{{products}}` before the first extraction.
</Tip>

## Choosing the Right Mode

Use this decision tree:

```
Do you need the extraction result to make the next decision?
├─ YES → Use Synchronous
│         Example: "Extract status to determine next action"
│
└─ NO → Is this a list/multiple views?
    ├─ YES → Use Batch-Scoped Async
    │         Example: "Scroll and extract from each page"
    │
    └─ NO → Is this only for final output?
        ├─ YES → Use Run-Scoped Async
        │         Example: "Extract analytics for report"
        │
        └─ DEPENDS → Can the value arrive later while the workflow keeps going?
            ├─ YES → Use Run-Scoped Async (requires `output_schema`)
            │         Example: "Extract invoice_number for later use"
            │
            └─ NO → Use Synchronous if < 3 extractions
                    Use Batch-Scoped if >= 3 independent extractions in one batched phase
```

## Real-World Patterns

### Healthcare: Patient Record Processing

```text theme={null}
"Navigate to patient {patient_mrn}.

Quick Check (Synchronous):
Take screenshot with extract_prompt='Extract patient age as number' to 
determine if pediatric workflow is needed.

Medical History (Batch-Scoped):
Navigate through each section of medical history:
- Take screenshot with extract_prompt='Extract diagnosis history' and process_async='batch'
- Go to medications tab
- Take screenshot with extract_prompt='Extract current medications' and process_async='batch'
- Go to allergies tab
- Take screenshot with extract_prompt='Extract allergies' and process_async='batch'

Comprehensive Record (Run-Scoped):
Take screenshot of complete patient chart with extract_prompt='Extract full 
patient record including all demographics, vitals, lab results, imaging reports, 
and visit notes. Store patient_mrn as runtime variable.' and process_async='run'

Continue with documentation workflow using {{patient_mrn}} for file naming."
```

### E-Commerce: Inventory Extraction

```text theme={null}
"Log into inventory system.

Category Check (Synchronous):
Take screenshot with extract_prompt='Count visible categories' to determine 
navigation depth.

Per-Category Extraction (Batch-Scoped):
For each category:
- Navigate to category
- Take screenshot with extract_prompt='Extract all products as JSON' and process_async='batch'
- Repeat

Full Catalog Analytics (Run-Scoped):
Take screenshot with extract_prompt='Extract complete inventory analytics: 
stock levels, trends, low-stock alerts, reorder recommendations. Store 
total_products and low_stock_count as runtime variables.' and process_async='run'

Generate inventory report using {{total_products}} and {{low_stock_count}} in report header."
```

### Finance: Transaction Processing

```text theme={null}
"Open account {account_number}.

Balance Check (Synchronous):
Take screenshot with extract_prompt='Extract current balance as number' to 
verify sufficient funds.

Recent Transactions (Batch-Scoped):
Scroll through transaction history:
- Take screenshot with extract_prompt='Extract visible transactions' and process_async='batch'
- Scroll down
- Take screenshot with extract_prompt='Extract visible transactions' and process_async='batch'
- Repeat

Account Analysis (Run-Scoped):
Take screenshot with extract_prompt='Extract complete account analysis: 
spending categories, monthly trends, unusual patterns. Store account_type 
and risk_level as runtime variables.' and process_async='run'

Continue with report generation using {{account_type}} and {{risk_level}} for classification."
```

## Best Practices

### 1. Start with Synchronous, Optimize Later

Begin with simple synchronous extraction, then optimize bottlenecks:

```text theme={null}
# Initial (works, but slow)
"Extract field A (sync) → Extract field B (sync) → Extract field C (sync)"

# Optimized (3x faster)
"Extract all three fields with process_async='batch' in one batch"
```

### 2. Use Batch-Scoped for Lists

Any time you're iterating (scrolling, clicking next, navigating pages), use batch-scoped:

```text theme={null}
"For each item in list:
- Extract data with process_async='batch'
- Navigate to next
All extractions complete in parallel"
```

### 3. Use Run-Scoped for Final Output Only

If the extracted data doesn't influence navigation or decisions, make it run-scoped:

```text theme={null}
"Take screenshot with extract_prompt='Extract complete report data' 
and process_async='run'

Continue workflow - extraction completes in background"
```

### 4. Combine with Other Extraction Methods

Use [copy\_to\_clipboard](/workflow-prompting/copy-to-clipboard) for fast copyable text, and [extract\_prompt](/workflow-prompting/extract-prompt) for vision-based extraction:

```text theme={null}
"Extract order data:
- Triple-click Order ID, use copy_to_clipboard with key 'order_id' (instant)
- Take screenshot with extract_prompt='Extract order items' and process_async='batch' (parallel)
- Take screenshot with extract_prompt='Extract shipping analytics' and process_async='run' (background)

Optimal performance using all available methods"
```

### 5. Set Runtime Variables from Run-Scoped Extractions

When you need specific values mid-workflow but also want large extractions:

```text theme={null}
"Take screenshot with extract_prompt='Extract customer_id and order_total 
as runtime variables. Then extract complete order details, history, and 
preferences as detailed JSON.' and process_async='run'

The {{customer_id}} and {{order_total}} become available once extraction completes, 
while full details are in final output."
```

## Performance Metrics

Based on typical workflow patterns:

| Scenario                    | Synchronous | Batch-Scoped | Run-Scoped | Speedup |
| --------------------------- | ----------- | ------------ | ---------- | ------- |
| 1 extraction                | 3s          | 3s           | 0s\*       | 1x      |
| 5 extractions (sequential)  | 15s         | 3-6s         | 0s\*       | 2.5-5x  |
| 10 extractions (sequential) | 30s         | 9-12s        | 0s\*       | 2.5-10x |
| 20 extractions (sequential) | 60s         | 18-24s       | 0s\*       | 2.5-10x |

\*Run-scoped shows 0s blocking time but still processes in background; workflow continues unblocked

## Common Patterns Summary

| Pattern                       | Mode         | Example                                               |
| ----------------------------- | ------------ | ----------------------------------------------------- |
| **Decision Making**           | Synchronous  | "Extract status to determine next step"               |
| **List Processing**           | Batch-Scoped | "Scroll and extract from each page"                   |
| **Final Output Data**         | Run-Scoped   | "Extract analytics for report"                        |
| **Background Runtime Values** | Run-Scoped   | "Extract and store ID for later use"                  |
| **Mixed Requirements**        | Hybrid       | Sync for decisions + Batch for lists + Run for output |

## Migration Guide

### From Synchronous to Batch-Scoped

**Before**:

```text theme={null}
"Extract from page 1
Navigate
Extract from page 2
Navigate
Extract from page 3"
```

**After**:

```text theme={null}
"Extract from page 1 with process_async='batch'
Navigate
Extract from page 2 with process_async='batch'
Navigate
Extract from page 3 with process_async='batch'"
```

**Benefit**: 3x faster with no other changes

### From Batch-Scoped to Run-Scoped

**Before**:

```text theme={null}
"Extract complete data with process_async='batch'
Continue workflow (must wait for extraction)"
```

**After**:

```text theme={null}
"Extract complete data with process_async='run'
Continue workflow (extraction in background)"
```

**Benefit**: Zero blocking time, maximum parallelism

**Caution**: Only use if extraction result not needed for navigation

## Summary

* **Synchronous**: Simple, reliable, blocks until complete. Use for decisions or values you need immediately.
* **Batch-Scoped**: Parallel within batch, 3-5x faster for lists. Use for iteration.
* **Run-Scoped**: Maximum parallelism, zero blocking. Use for output-only data.

Choose based on when you need the data, not just "async is faster" - the right pattern depends on your workflow structure.

For more information, see:

* [Extract Prompt](/workflow-prompting/extract-prompt) - Detailed extraction syntax and examples
* [Trajectories 101](/concepts/trajectories) - How caching amplifies these performance benefits
