Analyze Image

Action ID: describe_image

Description

Use AI vision models to analyze and describe image content

Connection

Name
Description
Required
Category

PixelML Connection

The PixelML connection to call PixelML API.

True

pixelml

Input Parameters

Name
Type
Required
Default
Description

image_url

string

-

The URL of the image to analyze and describe

model

dropdown

-

Google Gemini 1.5 Flash

The AI vision model to use for image analysis. Available options: Google Gemini 1.5 Flash, Google Gemini 1.5 Pro, OpenAI GPT-4o

prompt

string

-

Describe the image as an alternative text

Instructions for how the AI should analyze and describe the image

View JSON Schema

Input Schema

{
  "description": "Describe image node input.",
  "properties": {
    "image_url": {
      "description": "The URL of the image to analyze and describe",
      "title": "Image URL",
      "type": "string"
    },
    "model": {
      "default": "Google Gemini 1.5 Flash",
      "description": "The AI vision model to use for image analysis. Gemini Flash offers fast, efficient image understanding.",
      "enum": [
        "Google Gemini 1.5 Flash",
        "Google Gemini 1.5 Pro",
        "OpenAI GPT-4o"
      ],
      "title": "Vision Model",
      "type": "string"
    },
    "prompt": {
      "default": "Describe the image as an alternative text",
      "description": "Instructions for how the AI should analyze and describe the image",
      "title": "Instructions",
      "type": "string"
    }
  },
  "required": [
    "image_url"
  ],
  "title": "DescribeImageInput",
  "type": "object"
}

Output Parameters

Name
Type
Description

content

string

The AI-generated description or analysis of the image

View JSON Schema

Output Schema

{
  "description": "Describe image node output.",
  "properties": {
    "content": {
      "title": "Content",
      "type": "string"
    }
  },
  "required": [
    "content"
  ],
  "title": "DescribeImageOutput",
  "type": "object"
}

How It Works

This node uses advanced AI vision models to analyze and describe image content. You provide an image URL and optional instructions, and the AI processes the visual information to generate detailed text descriptions. The node supports multiple state-of-the-art vision models including Google Gemini and OpenAI GPT-4o, each optimized for different analysis needs.

Usage Examples

Example 1: Generate Alt Text

Input:

image_url: "https://example.com/product-photo.jpg"
model: "Google Gemini 1.5 Flash"
prompt: "Describe the image as an alternative text"

Output:

content: "A modern stainless steel coffee maker on a white kitchen counter, with a glass carafe and digital display showing 7:30 AM"

Example 2: Detailed Scene Analysis

Input:

image_url: "https://example.com/landscape.jpg"
model: "Google Gemini 1.5 Pro"
prompt: "Provide a detailed description of this landscape including weather conditions, time of day, prominent features, and mood"

Output:

content: "A serene mountain landscape captured during golden hour, with the setting sun casting warm orange and pink hues across the sky. Snow-capped peaks dominate the background, while a pristine alpine lake reflects the colorful sky in the foreground. Pine trees line the lake's edge, and the overall mood is peaceful and contemplative. The clear weather and long shadows suggest late afternoon in autumn or early winter."

Example 3: Product Feature Extraction

Input:

image_url: "https://example.com/smartphone.jpg"
model: "OpenAI GPT-4o"
prompt: "List all visible features and specifications of this product"

Output:

content: "The image shows a modern smartphone with the following visible features: a 6.7-inch edge-to-edge display, triple camera system on the back, metallic frame in silver color, front-facing camera centered at the top, volume buttons on the left side, power button on the right side, USB-C charging port at the bottom, and what appears to be a fingerprint sensor integrated into the power button."

Common Use Cases

  • Accessibility: Generate alt text for images to improve web accessibility

  • Content Moderation: Analyze images for inappropriate or unwanted content

  • Product Cataloging: Extract product features and details from images

  • Image Search: Create searchable descriptions for large image libraries

  • Quality Assurance: Verify that images meet specific criteria or contain required elements

  • Social Media Management: Generate captions and descriptions for social media posts

  • E-commerce: Automatically generate product descriptions from images

Error Handling

Error Type
Cause
Solution

Invalid Image URL

URL is malformed or inaccessible

Verify the image URL is valid and publicly accessible

Image Not Found

The URL returns a 404 error

Check that the image exists at the specified URL

Unsupported Format

Image format is not supported by the vision model

Use common formats like JPEG, PNG, or WEBP

Image Too Large

Image file size exceeds limit

Reduce image file size or resolution

Model Unavailable

Selected model is temporarily unavailable

Try a different model or retry later

Connection Failed

Unable to access PixelML API

Check PixelML connection credentials and API availability

Rate Limited

Too many requests in a short period

Wait before making additional requests

Notes

  • Model Selection: Gemini 1.5 Flash is fastest and most cost-effective for simple descriptions. Gemini 1.5 Pro and GPT-4o offer more detailed analysis for complex images.

  • Prompt Customization: Customize the prompt to get specific types of descriptions (accessibility text, detailed analysis, feature lists, etc.).

  • Image Quality: Higher resolution images with good lighting produce more accurate and detailed descriptions.

  • Processing Time: Response time varies by model - Flash is fastest (1-2 seconds), Pro and GPT-4o take slightly longer (2-5 seconds).

  • Context Understanding: Vision models can understand context, relationships between objects, and even read text within images.

  • Language Support: All models support multilingual prompts and can generate descriptions in multiple languages.

Last updated

Was this helpful?