Automation API

Android device automation

Image Recognition

Actions

OCR and image analysis capabilities

Access these methods through agent.actions. Extract text and analyze images using ML Kit.

recognizeText()

TypeScript
recognizeText(imageBase64: string): Promise<TextJSON>

Performs OCR on an image using ML Kit.

Parameters

NameTypeDescription
imageBase64stringBase64-encoded image

Returns

TextJSONHierarchical text structure with confidence and bounding boxes

Examples

TypeScript
const { screenshot } = await agent.actions.screenshot(1080, 1920, 90);
const result = await agent.actions.recognizeText(screenshot);
console.log("Full text:", result.text);
// Access individual text blocks
for (const block of result.textBlocks) {
console.log("Block:", block.text, "at", block.boundingBox);
}

Return Types

TextJSON

Root level OCR result containing all recognized text.

TypeScript
interface TextJSON {
text: string; // Complete recognized text
textBlocks: TextBlock[]; // Array of text blocks
}

TextBlock

A block of text, typically a paragraph.

TypeScript
interface TextBlock {
text: string;
boundingBox: BoundingBox;
cornerPoints: Point[];
recognizedLanguages: string[];
lines: TextLine[];
}

TextLine

A line of text within a block.

TypeScript
interface TextLine {
text: string;
boundingBox: BoundingBox;
cornerPoints: Point[];
recognizedLanguages: string[];
elements: TextElement[];
confidence: number;
angle: number;
}

TextElement

Individual text element (usually a word).

TypeScript
interface TextElement {
text: string;
boundingBox: BoundingBox;
cornerPoints: Point[];
recognizedLanguages: string[];
symbols: TextSymbol[];
confidence: number;
angle: number;
}

TextSymbol

Individual character/symbol.

TypeScript
interface TextSymbol {
text: string;
boundingBox: BoundingBox;
cornerPoints: Point[];
confidence: number;
angle: number;
}

BoundingBox

TypeScript
interface BoundingBox {
left: number;
top: number;
right: number;
bottom: number;
}