Cyberdesk Docs

Tutorials

Learn how to use the Cyberdesk TypeScript SDK effectively with these step-by-step tutorials.

Setup: All examples assume you have initialized the SDK client as shown in the Quickstart:

import { createCyberdeskClient } from 'cyberdesk';
 
const cyberdesk = createCyberdeskClient({
  apiKey: 'YOUR_API_KEY',
});

Tutorial 1: Creating and Interacting with a Desktop

This tutorial walks you through creating a desktop instance and performing basic interactions using the SDK.

Step 1: Create a Desktop Instance

Use launchDesktop:

async function createDesktop() {
  console.log('Creating desktop...');
  const launchResult = await cyberdesk.launchDesktop({
    body: { timeout_ms: 3600000 } // Optional: 1-hour timeout
  });
 
  if ('error' in launchResult) {
    console.error('Failed to create desktop:', launchResult.error);
    throw new Error('Failed to create desktop: ' + launchResult.error);
  }
 
  console.log(`Desktop created with ID: ${launchResult.id}`);
  // Note: You might need to wait or use getDesktop to get the streamUrl
  return launchResult.id;
}
 
// Example usage:
// const desktopId = await createDesktop();

Step 2: Perform Mouse Actions

Use executeComputerAction with type: 'click_mouse':

async function clickOnDesktop(desktopId: string) {
  console.log(`Clicking on desktop ${desktopId}...`);
  const clickResult = await cyberdesk.executeComputerAction({
    path: { id: desktopId },
    body: {
      type: 'click_mouse',
      x: 100,
      y: 200,
      button: 'left' // Optional
    }
  });
 
  if ('error' in clickResult) {
    console.error('Mouse click failed:', clickResult.error);
    throw new Error('Mouse click failed: ' + clickResult.error);
  }
 
  console.log('Mouse click successful:', clickResult);
}
 
// Example usage:
// await clickOnDesktop(desktopId);

Step 3: Type Text

Use executeComputerAction with type: 'type':

async function typeOnDesktop(desktopId: string) {
  console.log(`Typing on desktop ${desktopId}...`);
  const typeResult = await cyberdesk.executeComputerAction({
    path: { id: desktopId },
    body: {
      type: 'type',
      text: 'Hello, Cyberdesk SDK!'
    }
  });
 
  if ('error' in typeResult) {
    console.error('Typing failed:', typeResult.error);
    throw new Error('Typing failed: ' + typeResult.error);
  }
 
  console.log('Typing successful:', typeResult);
}
 
// Example usage:
// await typeOnDesktop(desktopId);

Step 4: Take a Screenshot

Use executeComputerAction with type: 'screenshot':

async function captureScreenshot(desktopId: string) {
  console.log(`Taking screenshot of desktop ${desktopId}...`);
  const screenshotResult = await cyberdesk.executeComputerAction({
    path: { id: desktopId },
    body: {
      type: 'screenshot'
    }
  });
 
  if ('error' in screenshotResult) {
    console.error('Screenshot failed:', screenshotResult.error);
    throw new Error('Screenshot failed: ' + screenshotResult.error);
  }
 
  console.log('Screenshot successful.');
  // Access the image data:
  // const base64Image = screenshotResult.base64_image;
  // console.log('Base64 Image length:', base64Image?.length);
  return screenshotResult.base64_image;
}
 
// Example usage:
// const imageData = await captureScreenshot(desktopId);

Step 5: Execute a Bash Command

Use executeBashAction:

async function runCommand(desktopId: string) {
  console.log(`Running command on desktop ${desktopId}...`);
  const bashResult = await cyberdesk.executeBashAction({
    path: { id: desktopId },
    body: {
      command: 'ls -la /tmp'
    }
  });
 
  if ('error' in bashResult) {
    console.error('Bash command failed:', bashResult.error);
    throw new Error('Bash command failed: ' + bashResult.error);
  }
 
  console.log('Command successful.');
  console.log('Output:', bashResult.output);
  return bashResult.output;
}
 
// Example usage:
// const commandOutput = await runCommand(desktopId);

Step 6: Stop the Desktop

Use terminateDesktop:

async function stopDesktop(desktopId: string) {
  console.log(`Stopping desktop ${desktopId}...`);
  const stopResult = await cyberdesk.terminateDesktop({
    path: { id: desktopId }
  });
 
  if ('error' in stopResult) {
    console.error('Failed to stop desktop:', stopResult.error);
    // Decide if you need to throw an error or just log
  }
 
  console.log('Desktop stop requested.', stopResult);
}
 
// Example usage:
// await stopDesktop(desktopId);

Tutorial 2: Automating Web Testing

This tutorial demonstrates how to use the Cyberdesk SDK to automate simple web browser tasks.

Step 1: Create a Desktop Instance

Create a desktop instance using the createDesktop function from Tutorial 1.

// const desktopId = await createDesktop();

Step 2: Open a Web Browser and Wait

Use executeBashAction to open a browser (e.g., Firefox) in the background, then use executeComputerAction with type: 'wait' to allow time for it to load.

async function openBrowserAndWait(desktopId: string, url: string, waitMs: number = 5000) {
  console.log(`Opening ${url} on desktop ${desktopId}...`);
  const openResult = await cyberdesk.executeBashAction({
    path: { id: desktopId },
    body: {
      command: `firefox ${url} &` // Run in background
    }
  });
 
  if ('error' in openResult) {
    console.error('Failed to send open browser command:', openResult.error);
    throw new Error('Failed to send open browser command: ' + openResult.error);
  }
  console.log('Browser opening command sent.');
 
  console.log(`Waiting ${waitMs}ms for browser to load...`);
  const waitResult = await cyberdesk.executeComputerAction({
    path: { id: desktopId },
    body: {
      type: 'wait',
      ms: waitMs
    }
  });
 
  if ('error' in waitResult) {
    console.error('Wait action failed:', waitResult.error);
    throw new Error('Wait action failed: ' + waitResult.error);
  }
  console.log('Wait complete.');
}
 
// Example usage:
// await openBrowserAndWait(desktopId, 'https://example.com');

Step 3: Take a Screenshot for Verification

Capture a screenshot using the captureScreenshot function from Tutorial 1 to verify the page loaded.

async function verifyPageLoad(desktopId: string) {
  console.log('Taking verification screenshot...');
  const imageData = await captureScreenshot(desktopId);
  if (imageData) {
    console.log('Verification screenshot captured (length:', imageData.length, ')');
    // Add logic here to analyze the screenshot if needed
  } else {
    console.warn('Could not capture verification screenshot.');
  }
}
 
// Example usage:
// await verifyPageLoad(desktopId);

Step 4: Stop the Desktop

Remember to stop the desktop instance using the stopDesktop function from Tutorial 1 when the test is complete.

// await stopDesktop(desktopId);

Tutorial 3: Integrating with AI Models for Computer Use

This tutorial demonstrates how to integrate the Cyberdesk SDK with AI models (like Anthropic's Claude) using the Vercel AI SDK to create agents that can use computers.

Step 1: Prerequisites and Setup

Install the necessary dependencies:

npm install cyberdesk @ai-sdk/anthropic ai

Initialize the Cyberdesk SDK client (likely in a shared module, e.g., @/lib/cyberdeskClient.ts):

// src/lib/cyberdeskClient.ts
import { createCyberdeskClient } from 'cyberdesk';
 
const client = createCyberdeskClient({
  apiKey: process.env.CYBERDESK_API_KEY || 'YOUR_API_KEY', // Use environment variable
});
 
export default client;

Ensure you have CYBERDESK_API_KEY set in your environment variables.

Step 2: Implement executeComputerAction Utility

Create a utility function that maps AI tool parameters to the Cyberdesk SDK's executeComputerAction method. This function handles the various action types supported by the AI tool.

// src/utils/computer-use.ts
import client from '@/lib/cyberdeskClient';
import type { ExecuteComputerActionParams } from "cyberdesk"
 
// Define the action types your AI tool might use
export type ClaudeComputerActionType0124 = /* ... (action types like 'left_click', 'type', etc.) ... */
  | "left_click"
  | "right_click"
  // ... (include all action types from the provided code) ...
  | "screenshot";
 
export async function executeComputerAction(
  action: ClaudeComputerActionType0124,
  desktopId: string,
  coordinate?: { x: number; y: number },
  text?: string,
  duration?: number,
  scroll_amount?: number,
  scroll_direction?: "left" | "right" | "down" | "up",
  start_coordinate?: { x: number; y: number }
): Promise<string | { type: "image"; data: string }> {
  try {
    let requestBody: ExecuteComputerActionParams['body'];
 
    // Map the AI tool action to the Cyberdesk SDK's expected format
    switch (action) {
      case 'left_click':
        requestBody = {
          type: 'click_mouse',
          x: coordinate?.x,
          y: coordinate?.y,
          button: 'left',
          click_type: 'click',
          num_of_clicks: 1
        };
        break;
      // ... Map other actions (right_click, type, scroll, screenshot, etc.) ...
      case 'type':
        requestBody = {
          type: 'type',
          text: text || ''
        };
        break;
      case 'screenshot':
         requestBody = {
           type: 'screenshot'
         };
         break;
      // ... (Include all case mappings from the provided computer-use.ts code) ...
      default: {
        const _exhaustiveCheck: never = action;
        throw new Error(`Unhandled action: ${action}`);
      }
    }
 
    const clientParams: ExecuteComputerActionParams = {
      path: { id: desktopId },
      body: requestBody
    };
 
    // *** Use the Cyberdesk SDK client ***
    const result = await client.executeComputerAction(clientParams);
 
    // Check the raw response status embedded in the SDK result
    if (result.response.status !== 200) {
      let errorDetails = `Failed with status: ${result.response.status}`;
      try {
        // Attempt to parse error details from the response body
        const errorBody = await result.response.json();
        errorDetails = errorBody.message || errorBody.error || JSON.stringify(errorBody);
      } catch (e) { /* Failed to parse body */ }
      throw new Error(`Failed to execute computer action ${action}: ${errorDetails}`);
    }
 
    const data = result.data; // Access the parsed data from the SDK result
 
    if (data?.base64_image) {
      return {
        type: "image",
        data: data.base64_image
      };
    }
 
    return data?.output || 'Action completed successfully';
 
  } catch (error) {
    console.error(`Error executing computer action ${action}:`, error);
    throw error; // Re-throw to be handled by the AI SDK
  }
}

Note: The full mapping logic for all action types is omitted for brevity but should be included as shown in the computer-use.ts file.

Step 3: Implement executeBashCommand Utility

Create a similar utility for bash commands, calling the executeBashAction SDK method.

// src/utils/bash.ts
import client from '@/lib/cyberdeskClient';
 
export async function executeBashCommand(
  command: string,
  desktopId: string
): Promise<string> {
  try {
    // *** Use the Cyberdesk SDK client ***
    const result = await client.executeBashAction({
        path: { id: desktopId },
        body: { command },
    });
 
    // Check the raw response status
    if (result.response.status !== 200) {
      let errorDetails = `Failed with status: ${result.response.status}`;
      try {
        const errorBody = await result.response.json();
        errorDetails = errorBody.message || errorBody.error || JSON.stringify(errorBody);
      } catch (e) { /* Failed to parse body */ }
      throw new Error(`Failed to execute bash command: ${errorDetails}`);
    }
 
    const data = result.data; // Access the parsed data
    return data?.output || ''; // Return output or empty string
 
  } catch (error) {
    console.error(`Error executing bash command "${command}":`, error);
    // Return a meaningful error message for the AI to potentially see
    return 'Error executing bash command: ' + (error as Error).message;
  }
}

Step 4: Set Up the AI Tools and API Route

Create an API route (e.g., /api/chat) that uses the Vercel AI SDK (streamText) and defines tools that call your utility functions.

// src/app/api/chat/route.ts
import { anthropic } from '@ai-sdk/anthropic';
import { streamText } from 'ai';
import { executeComputerAction } from '../../../utils/computer-use'; // Adjust path
import { executeBashCommand } from '../../../utils/bash'; // Adjust path
 
// Define result types if needed
interface ComputerActionResult { type: "image"; data: string; }
 
export const maxDuration = 300;
 
export async function POST(req: Request) {
  const desktopId = req.headers.get('X-Desktop-Id');
  const { messages } = await req.json();
 
  if (!desktopId) {
    return Response.json({ error: "Desktop ID is required" }, { status: 400 });
  }
 
  const lastMessage = messages[messages.length - 1];
  const userContent = /* ... logic to extract text from lastMessage.content ... */ "";
 
  // Define the computer tool using the AI SDK
  const computerTool = anthropic.tools.computer_20250124({
    displayWidthPx: 1024,
    displayHeightPx: 768,
    // The execute function calls *your* utility function, which uses the Cyberdesk SDK
    execute: async ({ action, coordinate, duration, scroll_amount, scroll_direction, start_coordinate, text }) => {
      const coordinateObj = coordinate ? { x: coordinate[0], y: coordinate[1] } : undefined;
      const startCoordinateObj = start_coordinate ? { x: start_coordinate[0], y: start_coordinate[1] } : undefined;
      
      // *** Call your utility function ***
      const result = await executeComputerAction(
        action,
        desktopId,
        coordinateObj,
        text,
        duration,
        scroll_amount,
        scroll_direction,
        startCoordinateObj
      );
 
      // Format the result for the AI SDK tool
      return (typeof result === 'string')
        ? { type: "text" as const, text: result }
        : { type: "image" as const, data: result.data };
    },
    // Optional: Format the tool result content for the AI model
    experimental_toToolResultContent(result: { type: "text"; text: string } | ComputerActionResult) {
      return result.type === 'text'
        ? [{ type: 'text', text: result.text }]
        : [{ type: 'image', data: result.data, mimeType: 'image/jpeg' }];
    },
  });
 
  // Define the bash tool using the AI SDK
  const bashTool = anthropic.tools.bash_20250124({
    // The execute function calls *your* utility function
    execute: async ({ command }) => {
        // *** Call your utility function ***
        const output = await executeBashCommand(command, desktopId);
        return output; // Return the string output directly
    }
  });
 
  try {
    // Call the AI model with the tools
    const response = streamText({
      model: anthropic("claude-3-7-sonnet-20250219"),
      prompt: userContent,
      system: "You are an AI assistant that can control a computer...", // Your system prompt
      tools: {
        computer: computerTool,
        bash: bashTool
      },
      maxSteps: 100
    });
 
    return response.toDataStreamResponse();
 
  } catch (error) {
    console.error("Error calling Anthropic:", error);
    return Response.json({ error: "Failed to process request" }, { status: 500 });
  }
}

Step 5: Frontend Implementation

A frontend application would:

  1. Create a desktop instance (perhaps via another API route that uses cyberdesk.launchDesktop).
  2. Render a chat interface.
  3. Display the desktop stream using the stream_url.
  4. Send user messages to the /api/chat endpoint, including the desktopId in the X-Desktop-Id header.
  5. Process the streamed response from the API, updating the chat UI.

(Refer to the Cyberdesk Starter Assistant UI template for a full frontend example).

Conclusion

By wrapping the Cyberdesk SDK methods within utility functions called by your AI tool's execute logic, you can seamlessly integrate robust desktop control into your AI agents. The SDK handles the direct API communication, authentication, and response parsing, simplifying your integration code.

On this page