Automation API

Android device automation

Writing Scripts

TypeScript basics, Agent API, and common patterns

Automation scripts are written in TypeScript and have access to the agent object which provides all the APIs for interacting with the device.

The Agent Object

The agent object is globally available and provides:

TypeScript
agent = {
actions: { ... }, // Device interactions (tap, swipe, type, etc.)
utils: { ... }, // Utilities (random helpers, job tasks, files)
info: { ... }, // Device information
control: { ... }, // Automation control (pause, delay)
display: { ... }, // Display settings
email: { ... }, // Email utilities
notifications: { ... }, // Notification callbacks
constants: { ... }, // Action constants
arguments: { ... }, // Parameters and job variables
}

Common Actions

Touch Gestures

Tapping and Swiping
TypeScript
// Simple tap at coordinates
await agent.actions.tap(500, 1000);
// Swipe from point A to B over 500ms
await agent.actions.swipe(500, 1500, 500, 500, 500);
// Long press for 2 seconds
await agent.actions.hold(500, 1000, 2000);
// Double tap with 100ms interval
await agent.actions.doubleTap(500, 1000, 100);
// Human-like random tap within node bounds
const button = screen.findTextOne("Submit");
if (button) {
button.randomClick();
}

Text Input

Typing and Clipboard
TypeScript
// Type text (requires focused input field)
await agent.actions.writeText("Hello, World!");
// Copy text to clipboard
await agent.actions.copyText("Text to copy");
// Paste from clipboard
await agent.actions.paste();
// Hide keyboard
await agent.actions.hideKeyboard();

Navigation

Navigation Actions
TypeScript
// System navigation
await agent.actions.goBack();
await agent.actions.goHome();
await agent.actions.recents();
// D-pad navigation (Android 13+ only)
await agent.actions.dpad("down");
await agent.actions.dpad("center");

App Management

Launching Apps
TypeScript
// Launch app by package name
await agent.actions.launchApp("com.example.myapp");
// Launch fresh (tries closing the existing app first)
await agent.actions.launchApp("com.example.myapp", true);
// Open URL on in-app browser
await agent.actions.browse("https://example.com");
// List all installed apps
const apps = await agent.actions.listApps();
console.log(apps["com.android.chrome"]); // "Chrome"

Screen Content

Get the current screen content as an accessibility tree:

Reading Screen Content
TypeScript
// Get current screen content
const screen = await agent.actions.screenContent();
// Get all nodes recursively
const allNodes = getAllNodes(screen);
// Find nodes by text
const buttons = allNodes.filter(node =>
node.text?.toLowerCase().includes("submit")
);
// Find nodes by ID
const loginBtn = allNodes.find(node =>
node.viewId === "com.example:id/login_button"
);
// Find nodes by class
const editTexts = allNodes.filter(node =>
node.className === "android.widget.EditText"
);

Helper Functions

These helper functions are globally available:

Node Helper Functions
TypeScript
// Get all descendant nodes recursively
const allNodes = getAllNodes(screen);
// Find nodes by viewId (resourceId)
const nodes = findNodesById(screen, "com.example:id/button");
// Find nodes by exact text
const textNodes = findNodesByText(screen, "Submit");
// Check if node has specific text
const hasText = nodeHasText(node, "Continue");

Interacting with Nodes

Node Interactions
TypeScript
// Find a button and tap its center
const button = allNodes.find(n => n.text === "Submit" && n.clickable);
if (button) {
const { left, top, right, bottom } = button.boundsInScreen;
const centerX = (left + right) / 2;
const centerY = (top + bottom) / 2;
await agent.actions.tap(centerX, centerY);
}
// Or use accessibility action for more reliable clicks
await button.performAction(agent.constants.ACTION_CLICK);
// Scroll a node
await scrollableNode.performAction(agent.constants.ACTION_SCROLL_FORWARD);
// Focus an input field
await inputField.performAction(agent.constants.ACTION_FOCUS);

Screenshots

Taking Screenshots
TypeScript
// Take a screenshot (maxWidth, maxHeight, quality)
const screenshot = await agent.actions.screenshot(1080, 1920, 80);
// Result contains base64 image data
const { base64, width, height } = screenshot;
// Use for debugging or storing
console.log(`Screenshot: ${width}x${height}`);

Common Patterns

Sleep/Wait Function

Delay Utilities
TypeScript
// Simple sleep function
function sleep(ms: number): Promise<void> {
return new Promise(resolve => setTimeout(resolve, ms));
}
// Sleep with random range (more human-like)
function sleepRandom(min: number, max: number): Promise<void> {
const ms = Math.floor(Math.random() * (max - min) + min);
return new Promise(resolve => setTimeout(resolve, ms));
}
// Usage
await sleep(2000); // Wait 2 seconds
await sleepRandom(1000, 3000); // Wait 1-3 seconds randomly

Wait for Screen State

Waiting for Conditions
TypeScript
// Wait for a specific element to appear
async function waitForElement(
condition: (nodes: AndroidNode[]) => boolean,
timeout: number = 10000
): Promise<boolean> {
const startTime = Date.now();
while (Date.now() - startTime < timeout) {
const screen = await agent.actions.screenContent();
const allNodes = getAllNodes(screen);
if (condition(allNodes)) {
return true;
}
await sleep(500);
}
return false;
}
// Usage: Wait for "Home" text to appear
const found = await waitForElement(nodes =>
nodes.some(n => n.text === "Home")
);

Retry with Backoff

Retry Pattern
TypeScript
async function withRetry<T>(
fn: () => Promise<T>,
maxAttempts: number = 3,
baseDelay: number = 1000
): Promise<T> {
let lastError: Error | undefined;
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
try {
return await fn();
} catch (error) {
lastError = error as Error;
console.log(`Attempt ${attempt} failed: ${lastError.message}`);
if (attempt < maxAttempts) {
const delay = baseDelay * Math.pow(2, attempt - 1);
await sleep(delay);
}
}
}
throw lastError;
}

Logging

Console Logging
TypeScript
// Standard console methods are available
console.log("Info message");
console.warn("Warning message");
console.error("Error message");
// Log objects
console.log("Screen content:", { nodeCount: allNodes.length });
// Debug with context
console.log(`[Stage: ${currentStage}] Processing screen...`);

File Operations

Reading and Writing Files
TypeScript
// Check if file exists
const exists = agent.utils.files.exists("/sdcard/Download/data.json");
// Read file content
const content = agent.utils.files.readFullFile("/sdcard/Download/data.json");
const data = JSON.parse(content);
// List directory
const files = agent.utils.files.list("/sdcard/Download");
for (const file of files) {
console.log(file.name, file.isDirectory);
}
// Save file to device
await agent.actions.saveFile("result.json", JSON.stringify(data));

Network Monitoring

Network Callback
TypeScript
// Track network connectivity
let isOnline = true;
agent.utils.setNetworkCallback((networkAvailable) => {
isOnline = networkAvailable;
if (!networkAvailable) {
console.warn("Network connection lost!");
}
});
// Check before network operations
if (!isOnline) {
console.log("Waiting for network...");
await sleep(5000);
}
// Refresh mobile IP (airplane mode toggle)
await agent.actions.airplane();

Best Practices

Use Random Delays

Add random delays between actions to appear more human-like. Use sleepRandom(1000, 2000) instead of fixed delays.

Use randomClick for Taps

Instead of tapping exact center of elements, use node.randomClick() to tap at random positions within the element bounds.

Prefer performAction over tap

Use node.performAction() with accessibility actions for more reliable interactions, especially for buttons and inputs.

Handle Unknown Screens

Always have fallback logic for screens you don't recognize. Log unknown states and retry after a short delay.

Next Steps