# Analyze Image

**Action ID:** `describe_image`

## Description

Use AI vision models to analyze and describe image content

## Connection

| Name               | Description                                 | Required | Category |
| ------------------ | ------------------------------------------- | -------- | -------- |
| PixelML Connection | The PixelML connection to call PixelML API. | True     | pixelml  |

## Input Parameters

| Name       | Type     | Required | Default                                   | Description                                                                                                                     |
| ---------- | -------- | :------: | ----------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- |
| image\_url | string   |     ✓    | -                                         | The URL of the image to analyze and describe                                                                                    |
| model      | dropdown |     -    | Google Gemini 1.5 Flash                   | The AI vision model to use for image analysis. Available options: Google Gemini 1.5 Flash, Google Gemini 1.5 Pro, OpenAI GPT-4o |
| prompt     | string   |     -    | Describe the image as an alternative text | Instructions for how the AI should analyze and describe the image                                                               |

<details>

<summary>View JSON Schema</summary>

### Input Schema

```json
{
  "description": "Describe image node input.",
  "properties": {
    "image_url": {
      "description": "The URL of the image to analyze and describe",
      "title": "Image URL",
      "type": "string"
    },
    "model": {
      "default": "Google Gemini 1.5 Flash",
      "description": "The AI vision model to use for image analysis. Gemini Flash offers fast, efficient image understanding.",
      "enum": [
        "Google Gemini 1.5 Flash",
        "Google Gemini 1.5 Pro",
        "OpenAI GPT-4o"
      ],
      "title": "Vision Model",
      "type": "string"
    },
    "prompt": {
      "default": "Describe the image as an alternative text",
      "description": "Instructions for how the AI should analyze and describe the image",
      "title": "Instructions",
      "type": "string"
    }
  },
  "required": [
    "image_url"
  ],
  "title": "DescribeImageInput",
  "type": "object"
}
```

</details>

## Output Parameters

| Name    | Type   | Description                                           |
| ------- | ------ | ----------------------------------------------------- |
| content | string | The AI-generated description or analysis of the image |

<details>

<summary>View JSON Schema</summary>

### Output Schema

```json
{
  "description": "Describe image node output.",
  "properties": {
    "content": {
      "title": "Content",
      "type": "string"
    }
  },
  "required": [
    "content"
  ],
  "title": "DescribeImageOutput",
  "type": "object"
}
```

</details>

## How It Works

This node uses advanced AI vision models to analyze and describe image content. You provide an image URL and optional instructions, and the AI processes the visual information to generate detailed text descriptions. The node supports multiple state-of-the-art vision models including Google Gemini and OpenAI GPT-4o, each optimized for different analysis needs.

## Usage Examples

### Example 1: Generate Alt Text

**Input:**

```
image_url: "https://example.com/product-photo.jpg"
model: "Google Gemini 1.5 Flash"
prompt: "Describe the image as an alternative text"
```

**Output:**

```
content: "A modern stainless steel coffee maker on a white kitchen counter, with a glass carafe and digital display showing 7:30 AM"
```

### Example 2: Detailed Scene Analysis

**Input:**

```
image_url: "https://example.com/landscape.jpg"
model: "Google Gemini 1.5 Pro"
prompt: "Provide a detailed description of this landscape including weather conditions, time of day, prominent features, and mood"
```

**Output:**

```
content: "A serene mountain landscape captured during golden hour, with the setting sun casting warm orange and pink hues across the sky. Snow-capped peaks dominate the background, while a pristine alpine lake reflects the colorful sky in the foreground. Pine trees line the lake's edge, and the overall mood is peaceful and contemplative. The clear weather and long shadows suggest late afternoon in autumn or early winter."
```

### Example 3: Product Feature Extraction

**Input:**

```
image_url: "https://example.com/smartphone.jpg"
model: "OpenAI GPT-4o"
prompt: "List all visible features and specifications of this product"
```

**Output:**

```
content: "The image shows a modern smartphone with the following visible features: a 6.7-inch edge-to-edge display, triple camera system on the back, metallic frame in silver color, front-facing camera centered at the top, volume buttons on the left side, power button on the right side, USB-C charging port at the bottom, and what appears to be a fingerprint sensor integrated into the power button."
```

## Common Use Cases

* **Accessibility**: Generate alt text for images to improve web accessibility
* **Content Moderation**: Analyze images for inappropriate or unwanted content
* **Product Cataloging**: Extract product features and details from images
* **Image Search**: Create searchable descriptions for large image libraries
* **Quality Assurance**: Verify that images meet specific criteria or contain required elements
* **Social Media Management**: Generate captions and descriptions for social media posts
* **E-commerce**: Automatically generate product descriptions from images

## Error Handling

| Error Type         | Cause                                             | Solution                                                  |
| ------------------ | ------------------------------------------------- | --------------------------------------------------------- |
| Invalid Image URL  | URL is malformed or inaccessible                  | Verify the image URL is valid and publicly accessible     |
| Image Not Found    | The URL returns a 404 error                       | Check that the image exists at the specified URL          |
| Unsupported Format | Image format is not supported by the vision model | Use common formats like JPEG, PNG, or WEBP                |
| Image Too Large    | Image file size exceeds limit                     | Reduce image file size or resolution                      |
| Model Unavailable  | Selected model is temporarily unavailable         | Try a different model or retry later                      |
| Connection Failed  | Unable to access PixelML API                      | Check PixelML connection credentials and API availability |
| Rate Limited       | Too many requests in a short period               | Wait before making additional requests                    |

## Notes

* **Model Selection**: Gemini 1.5 Flash is fastest and most cost-effective for simple descriptions. Gemini 1.5 Pro and GPT-4o offer more detailed analysis for complex images.
* **Prompt Customization**: Customize the prompt to get specific types of descriptions (accessibility text, detailed analysis, feature lists, etc.).
* **Image Quality**: Higher resolution images with good lighting produce more accurate and detailed descriptions.
* **Processing Time**: Response time varies by model - Flash is fastest (1-2 seconds), Pro and GPT-4o take slightly longer (2-5 seconds).
* **Context Understanding**: Vision models can understand context, relationships between objects, and even read text within images.
* **Language Support**: All models support multilingual prompts and can generate descriptions in multiple languages.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.agenticflow.ai/reference/nodes/describe_image.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
