Create AI-Assisted UGC-Style Video (Nano Banana PRO + VEO3)

Template link: https://agenticflow.ai/app/marketplace/items/01KF3642641Y1JHMR19ADFTQV2arrow-up-right

👉 Goal

Build a single AI agent that can create a complete UGC-style video end-to-end, from AI model image generation to final video output, within one guided agent experience.

Veo 3.1 only supports videos up to 8 seconds. We recommend creating multiple short videos with this template and stitching them together to form a longer final video.

The entire process — from first prompt to final video — is handled inside one agent, ensuring:

  • Consistent AI model identity

  • Clear user guidance

  • Reliable, repeatable output for marketing and UGC use cases


⚙️ Overall Process

  1. (Start) ↓

  2. Step 1: Create Model Prompt ↓

  3. Step 2: Generate Base Model Image ↓

  4. Step 3: Upload Clothing Image ↓

  5. Step 4: Generate Model Wearing Clothing ↓

  6. Step 5: Generate 5 Video Frames ↓

  7. Step 6: Generate UGC Video with VEO3 ↓

  8. (Finish / revise)


🧰 Required Tools

  • 4 workflows using Nano Banana PRO

    • NanoBanana_generate_model_ref

    • Nanobanana_wear_clothes_on_model

    • Nanobanana_generate_model_frame

    • Nanobanana_generate_model_wear_frame

  • 1 workflow using VEO3

    • VEO3_generate_video_from_frames


📝 HOW TO USE

-> Example inputs: Model Image and Product Image linkarrow-up-right

Step 1: Describe the AI Model You Want

What you need to do

Describe the AI person/model you want to appear in your video.You can mention:

  • Gender

  • Age range

  • Style (casual, influencer, lifestyle, professional, etc.)

  • Vibe or personality (friendly, confident, calm, cheerful…)

Example prompts

  • “A young female lifestyle influencer in her mid-20s, friendly and natural.”

  • “A confident woman around 30, clean casual style, suitable for product reviews.”

  • “A modern female content creator with a warm smile and approachable vibe.”

📌 You don’t need to worry about technical wording — the system will refine it for you.


Step 2: Generate the AI Model Image

What you need to do

The AI model will be generated automatically based on your description from Step 1.

What happens next

An AI model image will be generated:

  • Plain white / clean background

  • No accessories (no glasses, hats, jewelry)

  • Neutral pose

Your confirmation

Check the image and confirm:

“Yes, this model looks good.”

If not, you can ask to regenerate.


Step 3: Upload the Clothing / Product Image

What you need to do

Upload 1 image of the clothing or product you want to feature.

Requirements

  • Only one item

  • No human wearing it

  • Flat lay or mannequin is best

Example

  • A shirt laid flat on a table

  • A dress on a mannequin

📌 This image is only used as clothing input, not as a model.


Step 4: Dress the AI Model

What you need to do

Nothing new to upload — just review the result.The system will generate an image where:

  • The AI model is wearing the clothing you uploaded

Your confirmation

Confirm:

“Yes, this looks good. Let’s make the video.”


Step 5: Create the Video Frames (Most Important Step)

  • To make a video, we need 5 images (frames). You will describe each frame one by one, and the system will generate them immediately.

👉 For each frame, you only need to write a short description of:

  • Pose

  • Action

  • Emotion


✍️ Example Ideas (For Inspiration Only)

These are just examples to help you understand what a good description looks like. You are free to write your own ideas.

  • The girl stretching, one hand rubbing her eye like she just woke up

  • The girl holding a gray package and smiling gently

  • The girl holding the package without revealing what’s inside

  • The girl holding the shirt and showing its design

  • The girl wearing the outfit, posing confidently and smiling


Frame-by-Frame: What You Write

First Frame (Opening)

What to describe: how the video startsExample:

“The girl from (model_image) is holding a gray plastic package with both hands, smiling gently at the camera”


Last Frame (Ending)

Example:

“She then holds up the clothes from (clothing_image), clearly showing the design, fabric, and shape to the camera. Her movements are slow and natural, as if explaining or highlighting key details. Then she is wearing the outfit, same background, standing and dancing confidently and smiling happily at the camera”

📌 You don’t need to worry about whether the model should wear the outfit —the system understands this from your description.

Because VEO 3.1 only allows using either first/last frames or reference frames, you can modify the VEO3_generate_video_from_frames workflow according to how you want to use it. Please refer to the documentation for more details:

https://ai.google.dev/gemini-api/docs/video?example=dialogue#reference-imagesarrow-up-right


Step 6: Write the Video Prompt (Final Step)

What you need to do

Write one overall video prompt that explains:

  • The story flow

  • The mood

  • The UGC style

Example video prompt

“Create a short, natural UGC-style video with smooth and realistic transitions. The video should feel casual and authentic, like a real person recording a product review. The girl from (model_image) is holding a gray plastic package with both hands, smiling gently at the camera. Her posture is relaxed and natural, as if introducing a product she just received. She continues holding the same gray package but without revealing its contents. Her expression remains friendly and slightly curious, creating a sense of anticipation. Movements should be subtle and continuous. She then holds up the clothes from (clothing_image), clearly showing the design, fabric, and shape to the camera. Her movements are slow and natural, as if explaining or highlighting key details.”

📌 No images are regenerated here — this prompt only controls how the video is animated.


🎬 Final Result

The result looks like this Watch your video herearrow-up-right

Because Veo 3.1 only supports generating videos with a maximum length of 8 seconds, we recommend using or modifying this template to create multiple short videos, then editing and stitching them together into a longer video. This approach helps you produce a more complete and flexible final video that better fits your intended use case.

Last updated