Create AI-Assisted UGC-Style Video (Nano Banana PRO + VEO3)
Template link: https://agenticflow.ai/app/marketplace/items/01KF3642641Y1JHMR19ADFTQV2
👉 Goal
Build a single AI agent that can create a complete UGC-style video end-to-end, from AI model image generation to final video output, within one guided agent experience.
Veo 3.1 only supports videos up to 8 seconds. We recommend creating multiple short videos with this template and stitching them together to form a longer final video.
The entire process — from first prompt to final video — is handled inside one agent, ensuring:
Consistent AI model identity
Clear user guidance
Reliable, repeatable output for marketing and UGC use cases
⚙️ Overall Process
(Start) ↓
Step 1: Create Model Prompt ↓
Step 2: Generate Base Model Image ↓
Step 3: Upload Clothing Image ↓
Step 4: Generate Model Wearing Clothing ↓
Step 5: Generate 5 Video Frames ↓
Step 6: Generate UGC Video with VEO3 ↓
(Finish / revise)
🧰 Required Tools
4 workflows using Nano Banana PRO
NanoBanana_generate_model_refNanobanana_wear_clothes_on_modelNanobanana_generate_model_frameNanobanana_generate_model_wear_frame
1 workflow using VEO3
VEO3_generate_video_from_frames
📝 HOW TO USE
-> Example inputs: Model Image and Product Image link
Step 1: Describe the AI Model You Want
What you need to do
Describe the AI person/model you want to appear in your video.You can mention:
Gender
Age range
Style (casual, influencer, lifestyle, professional, etc.)
Vibe or personality (friendly, confident, calm, cheerful…)
Example prompts
“A young female lifestyle influencer in her mid-20s, friendly and natural.”
“A confident woman around 30, clean casual style, suitable for product reviews.”
“A modern female content creator with a warm smile and approachable vibe.”
📌 You don’t need to worry about technical wording — the system will refine it for you.
Step 2: Generate the AI Model Image
What you need to do
The AI model will be generated automatically based on your description from Step 1.
What happens next
An AI model image will be generated:
Plain white / clean background
No accessories (no glasses, hats, jewelry)
Neutral pose
Your confirmation
Check the image and confirm:
“Yes, this model looks good.”
If not, you can ask to regenerate.
Step 3: Upload the Clothing / Product Image
What you need to do
Upload 1 image of the clothing or product you want to feature.
Requirements
Only one item
No human wearing it
Flat lay or mannequin is best
Example
A shirt laid flat on a table
A dress on a mannequin
📌 This image is only used as clothing input, not as a model.
Step 4: Dress the AI Model
What you need to do
Nothing new to upload — just review the result.The system will generate an image where:
The AI model is wearing the clothing you uploaded
Your confirmation
Confirm:
“Yes, this looks good. Let’s make the video.”
Step 5: Create the Video Frames (Most Important Step)
To make a video, we need 5 images (frames). You will describe each frame one by one, and the system will generate them immediately.
👉 For each frame, you only need to write a short description of:
Pose
Action
Emotion
✍️ Example Ideas (For Inspiration Only)
These are just examples to help you understand what a good description looks like. You are free to write your own ideas.
The girl stretching, one hand rubbing her eye like she just woke up
The girl holding a gray package and smiling gently
The girl holding the package without revealing what’s inside
The girl holding the shirt and showing its design
The girl wearing the outfit, posing confidently and smiling
Frame-by-Frame: What You Write
First Frame (Opening)
What to describe: how the video startsExample:
“The girl from (model_image) is holding a gray plastic package with both hands, smiling gently at the camera”
Last Frame (Ending)
Example:
“She then holds up the clothes from (clothing_image), clearly showing the design, fabric, and shape to the camera. Her movements are slow and natural, as if explaining or highlighting key details. Then she is wearing the outfit, same background, standing and dancing confidently and smiling happily at the camera”
📌 You don’t need to worry about whether the model should wear the outfit —the system understands this from your description.
Because VEO 3.1 only allows using either first/last frames or reference frames, you can modify the VEO3_generate_video_from_frames workflow according to how you want to use it.
Please refer to the documentation for more details:
https://ai.google.dev/gemini-api/docs/video?example=dialogue#reference-images
Step 6: Write the Video Prompt (Final Step)
What you need to do
Write one overall video prompt that explains:
The story flow
The mood
The UGC style
Example video prompt
“Create a short, natural UGC-style video with smooth and realistic transitions. The video should feel casual and authentic, like a real person recording a product review. The girl from (model_image) is holding a gray plastic package with both hands, smiling gently at the camera. Her posture is relaxed and natural, as if introducing a product she just received. She continues holding the same gray package but without revealing its contents. Her expression remains friendly and slightly curious, creating a sense of anticipation. Movements should be subtle and continuous. She then holds up the clothes from (clothing_image), clearly showing the design, fabric, and shape to the camera. Her movements are slow and natural, as if explaining or highlighting key details.”
📌 No images are regenerated here — this prompt only controls how the video is animated.
🎬 Final Result
The result looks like this Watch your video here
Because Veo 3.1 only supports generating videos with a maximum length of 8 seconds, we recommend using or modifying this template to create multiple short videos, then editing and stitching them together into a longer video. This approach helps you produce a more complete and flexible final video that better fits your intended use case.
Last updated