Image Recognition
ActionsOCR and image analysis capabilities
Access these methods through agent.actions. Extract text and analyze images using ML Kit.
recognizeText()
TypeScript
recognizeText(imageBase64: string): Promise<TextJSON>Performs OCR on an image using ML Kit.
Parameters
| Name | Type | Description |
|---|---|---|
imageBase64 | string | Base64-encoded image |
Returns
TextJSONHierarchical text structure with confidence and bounding boxes
Examples
TypeScript
const { screenshot } = await agent.actions.screenshot(1080, 1920, 90);const result = await agent.actions.recognizeText(screenshot);console.log("Full text:", result.text);
// Access individual text blocksfor (const block of result.textBlocks) { console.log("Block:", block.text, "at", block.boundingBox);}Return Types
TextJSON
Root level OCR result containing all recognized text.
TypeScript
interface TextJSON { text: string; // Complete recognized text textBlocks: TextBlock[]; // Array of text blocks}TextBlock
A block of text, typically a paragraph.
TypeScript
interface TextBlock { text: string; boundingBox: BoundingBox; cornerPoints: Point[]; recognizedLanguages: string[]; lines: TextLine[];}TextLine
A line of text within a block.
TypeScript
interface TextLine { text: string; boundingBox: BoundingBox; cornerPoints: Point[]; recognizedLanguages: string[]; elements: TextElement[]; confidence: number; angle: number;}TextElement
Individual text element (usually a word).
TypeScript
interface TextElement { text: string; boundingBox: BoundingBox; cornerPoints: Point[]; recognizedLanguages: string[]; symbols: TextSymbol[]; confidence: number; angle: number;}TextSymbol
Individual character/symbol.
TypeScript
interface TextSymbol { text: string; boundingBox: BoundingBox; cornerPoints: Point[]; confidence: number; angle: number;}BoundingBox
TypeScript
interface BoundingBox { left: number; top: number; right: number; bottom: number;}