Image Recognition

Actions

OCR and image analysis capabilities

Access these methods through agent.actions. Extract text and analyze images using ML Kit.

`recognizeText()`

TypeScript

recognizeText(imageBase64: string): Promise<TextJSON>

Performs OCR on an image using ML Kit.

Parameters

Name	Type	Description
`imageBase64`	`string`	Base64-encoded image

Returns

TextJSONHierarchical text structure with confidence and bounding boxes

Examples

TypeScript

const { screenshot } = await agent.actions.screenshot(1080, 1920, 90);
const result = await agent.actions.recognizeText(screenshot);
console.log("Full text:", result.text);

// Access individual text blocks
for (const block of result.textBlocks) {
  console.log("Block:", block.text, "at", block.boundingBox);
}

Return Types

TextJSON

Root level OCR result containing all recognized text.

TypeScript

interface TextJSON {
  text: string;           // Complete recognized text
  textBlocks: TextBlock[]; // Array of text blocks
}

TextBlock

A block of text, typically a paragraph.

TypeScript

interface TextBlock {
  text: string;
  boundingBox: BoundingBox;
  cornerPoints: Point[];
  recognizedLanguages: string[];
  lines: TextLine[];
}

TextLine

A line of text within a block.

TypeScript

interface TextLine {
  text: string;
  boundingBox: BoundingBox;
  cornerPoints: Point[];
  recognizedLanguages: string[];
  elements: TextElement[];
  confidence: number;
  angle: number;
}

TextElement

Individual text element (usually a word).

TypeScript

interface TextElement {
  text: string;
  boundingBox: BoundingBox;
  cornerPoints: Point[];
  recognizedLanguages: string[];
  symbols: TextSymbol[];
  confidence: number;
  angle: number;
}

TextSymbol

Individual character/symbol.

TypeScript

interface TextSymbol {
  text: string;
  boundingBox: BoundingBox;
  cornerPoints: Point[];
  confidence: number;
  angle: number;
}

BoundingBox

TypeScript

interface BoundingBox {
  left: number;
  top: number;
  right: number;
  bottom: number;
}