Videoagent Video Studio
pexoai/pexo-skillsThis skill enables the creation of short AI-generated videos from text prompts or images, supporting various modes like text-to-video and image-to-video, including reference-based options. It offers multiple models for different styles and use cases, automatically selecting the appropriate backend for best results. Designed for users seeking to generate videos, animate images, or produce AI-driven clips quickly without requiring API keys.
š¬ VideoAgent Video Studio
Use when: User asks to generate a video, create a video from text, animate an image, make a short clip, or produce AI video. Generate short AI videos with 7 backends. This skill picks the right mode (text-to-video or image-to-video), enhances the prompt for best results, and returns the video URL.
Quick Reference
User Intent
Mode
Typical Duration
"Make a video of..." (no image)
text-to-video
4ā10 s
"Animate this image" / "Make this move"
image-to-video
4ā6 s
"Turn this into a video with..."
image-to-video
4ā6 s
Cinematic, story, ad
Prefer text-to-video with detailed prompt
5ā10 s
Generation Modes
Mode Description Models text-to-video Text prompt only ā video minimax, kling, veo, hunyuan, grok, seedance image-to-video Single image + prompt ā animated clip minimax, kling, veo, pixverse, grok, seedance reference-based Reference images/video ā consistent output minimax, kling, veo, hunyuan, grok, seedance
Models (use --model <id>)
Model ID
T2V
I2V
Reference
Notes
minimax
ā
ā
ā
Subject reference image, character consistency
kling
ā
ā
ā
Multi-element / character / keyframe (O3)
veo
ā
ā
ā
Google Veo 3.1, multiple reference images
hunyuan
ā
ā
ā
Video-to-video style transfer
pixverse
ā
ā
ā
Stylized image-to-video
grok
ā
ā
ā
Video editing via reference video
seedance
ā
ā
ā
Seedance 1.5 Pro, synchronized audio, 4ā12 s
Full model details and endpoint reference: references/models.md.
How to Generate a Video
Step 1 ā Choose mode and enhance the prompt
- Text-to-video: Expand with subject, action, camera movement, lighting, and style. Be specific about motion (e.g. "camera slowly zooms in", "character walks left to right").
- Image-to-video: Describe the motion to apply to the image (e.g. "gentle breeze in the hair", "camera pans across the scene"). See references/prompt_guide.md for patterns.
Step 2 ā Run the script
Text-to-video:
node {baseDir}/tools/generate.js \
--mode text-to-video \
--prompt "<enhanced prompt>" \
--duration <seconds> \
--aspect-ratio <ratio>
Image-to-video:
node {baseDir}/tools/generate.js \
--mode image-to-video \
--prompt "<motion description>" \
--image-url "<public image URL>" \
--duration <seconds> \
--aspect-ratio <ratio>
Parameters:
Parameter
Default
Description
--mode
text-to-video
text-to-video or image-to-video
--prompt
(required)
Scene or motion description
--image-url
ā
Required for image-to-video; public image URL
--duration
5
Length in seconds (typically 4ā10)
--aspect-ratio
16:9
16:9, 9:16, 1:1, 4:3, 3:4
--model
auto
Model ID (e.g. kling, veo, grok, seedance); auto = proxy picks
Other commands:
Command
Description
node tools/generate.js --list-models
List available models from the proxy
node tools/generate.js --status --job-id <id>
Check async job status
Step 3 ā Return the result
The script returns JSON:
{
"success": true,
"mode": "text-to-video",
"videoUrl": "https://...",
"duration": 5,
"aspectRatio": "16:9"
}
Send videoUrl to the user.
Example Conversations
User: "Generate a short video of a cat walking in the rain, cinematic."
node {baseDir}/tools/generate.js \
--mode text-to-video \
--prompt "A cat walking through rain, wet streets, neon reflections, cinematic lighting, slow motion, 4K" \
--duration 5 \
--aspect-ratio 16:9
User: "Animate this photo" (user uploads a landscape)
node {baseDir}/tools/generate.js \
--mode image-to-video \
--prompt "Gentle clouds moving across the sky, subtle grass movement, cinematic atmosphere" \
--image-url "https://..." \
--duration 5 \
--aspect-ratio 16:9
User: "Make a 10-second vertical video of a coffee pour, slow motion."
node {baseDir}/tools/generate.js \
--mode text-to-video \
--prompt "Close-up of coffee pouring into a white cup, slow motion, steam rising, soft lighting, product shot" \
--duration 10 \
--aspect-ratio 9:16
User: "Use Google Veo for a cinematic shot."
node {baseDir}/tools/generate.js \
--mode text-to-video \
--model veo \
--prompt "A dragon flying through cloudy skies, cinematic lighting, 8s" \
--duration 8 \
--aspect-ratio 16:9
User: "Animate this portrait."
node {baseDir}/tools/generate.js \
--mode image-to-video \
--model grok \
--prompt "Gentle smile, subtle head turn" \
--image-url "https://..." \
--duration 5
Setup
Zero API keys by default. Requests go through a hosted proxy. Set these for a custom proxy or token:
Variable
Required
Description
VIDEO_STUDIO_PROXY_URL
No
Proxy base URL
VIDEO_STUDIO_TOKEN
No
Auth token if the proxy requires it
Knowledge Base
- references/prompt_guide.md ā Prompt patterns for text-to-video and image-to-video.
- references/models.md ā Model list, capabilities, and selection guide.
- references/calling_guide.md ā Per-model endpoint details, input parameters, and special handling.
GitHub Owner
Owner: pexoai
Files
models.md
- View: https://github.com/pexoai/pexo-skills/blob/HEAD/skills/videoagent-video-studio/references/models.md
prompt_guide.md
prompt_guide.md
models.md
- View: https://github.com/pexoai/pexo-skills/blob/HEAD/skills/videoagent-video-studio/references/models.md
calling_guide.md
SKILL.md
name: videoagent-video-studio version: 2.1.0 author: pexoai emoji: "š¬" tags:
- video
- video-generation
- text-to-video
- image-to-video
- veo
- grok
- kling
- seedance
- minimax
- hunyuan
- pixverse
description: >
Generate short AI videos from text or images ā text-to-video, image-to-video, and reference-based generation ā with zero API key setup. Use when the user wants to create a video clip, animate an image, or generate video from a description.
metadata:
openclaw:
emoji: "š¬"
install:
- id: node kind: node label: "No dependencies needed ā all calls go through the hosted proxy"
š¬ VideoAgent Video Studio
Use when: User asks to generate a video, create a video from text, animate an image, make a short clip, or produce AI video. Generate short AI videos with 7 backends. This skill picks the right mode (text-to-video or image-to-video), enhances the prompt for best results, and returns the video URL.
Quick Reference
| User Intent | Mode | Typical Duration |
|---|---|---|
| "Make a video of..." (no image) | text-to-video | 4ā10 s |
| "Animate this image" / "Make this move" | image-to-video | 4ā6 s |
| "Turn this into a video with..." | image-to-video | 4ā6 s |
| Cinematic, story, ad | Prefer text-to-video with detailed prompt | 5ā10 s |
Generation Modes
| Mode | Description | Models |
|---|---|---|
| text-to-video | Text prompt only ā video | minimax, kling, veo, hunyuan, grok, seedance |
| image-to-video | Single image + prompt ā animated clip | minimax, kling, veo, pixverse, grok, seedance |
| reference-based | Reference images/video ā consistent output | minimax, kling, veo, hunyuan, grok, seedance |
Models (use --model <id>)
| Model ID | T2V | I2V | Reference | Notes |
|---|---|---|---|---|
minimax | ā | ā | ā | Subject reference image, character consistency |
kling | ā | ā | ā | Multi-element / character / keyframe (O3) |
veo | ā | ā | ā | Google Veo 3.1, multiple reference images |
hunyuan | ā | ā | ā | Video-to-video style transfer |
pixverse | ā | ā | ā | Stylized image-to-video |
grok | ā | ā | ā | Video editing via reference video |
seedance | ā | ā | ā | Seedance 1.5 Pro, synchronized audio, 4ā12 s |
| Full model details and endpoint reference: references/models.md. |
How to Generate a Video
Step 1 ā Choose mode and enhance the prompt
- Text-to-video: Expand with subject, action, camera movement, lighting, and style. Be specific about motion (e.g. "camera slowly zooms in", "character walks left to right").
- Image-to-video: Describe the motion to apply to the image (e.g. "gentle breeze in the hair", "camera pans across the scene"). See references/prompt_guide.md for patterns.
Step 2 ā Run the script
Text-to-video:
node {baseDir}/tools/generate.js \
--mode text-to-video \
--prompt "<enhanced prompt>" \
--duration <seconds> \
--aspect-ratio <ratio>
Image-to-video:
node {baseDir}/tools/generate.js \
--mode image-to-video \
--prompt "<motion description>" \
--image-url "<public image URL>" \
--duration <seconds> \
--aspect-ratio <ratio>
Parameters:
| Parameter | Default | Description |
|---|---|---|
--mode | text-to-video | text-to-video or image-to-video |
--prompt | (required) | Scene or motion description |
--image-url | ā | Required for image-to-video; public image URL |
--duration | 5 | Length in seconds (typically 4ā10) |
--aspect-ratio | 16:9 | 16:9, 9:16, 1:1, 4:3, 3:4 |
--model | auto | Model ID (e.g. kling, veo, grok, seedance); auto = proxy picks |
| Other commands: | ||
| Command | Description | |
| --------- | ------------- | |
node tools/generate.js --list-models | List available models from the proxy | |
node tools/generate.js --status --job-id <id> | Check async job status |
Step 3 ā Return the result
The script returns JSON:
{
"success": true,
"mode": "text-to-video",
"videoUrl": "https://...",
"duration": 5,
"aspectRatio": "16:9"
}
Send videoUrl to the user.
Example Conversations
User: "Generate a short video of a cat walking in the rain, cinematic."
node {baseDir}/tools/generate.js \
--mode text-to-video \
--prompt "A cat walking through rain, wet streets, neon reflections, cinematic lighting, slow motion, 4K" \
--duration 5 \
--aspect-ratio 16:9
User: "Animate this photo" (user uploads a landscape)
node {baseDir}/tools/generate.js \
--mode image-to-video \
--prompt "Gentle clouds moving across the sky, subtle grass movement, cinematic atmosphere" \
--image-url "https://..." \
--duration 5 \
--aspect-ratio 16:9
User: "Make a 10-second vertical video of a coffee pour, slow motion."
node {baseDir}/tools/generate.js \
--mode text-to-video \
--prompt "Close-up of coffee pouring into a white cup, slow motion, steam rising, soft lighting, product shot" \
--duration 10 \
--aspect-ratio 9:16
User: "Use Google Veo for a cinematic shot."
node {baseDir}/tools/generate.js \
--mode text-to-video \
--model veo \
--prompt "A dragon flying through cloudy skies, cinematic lighting, 8s" \
--duration 8 \
--aspect-ratio 16:9
User: "Animate this portrait."
node {baseDir}/tools/generate.js \
--mode image-to-video \
--model grok \
--prompt "Gentle smile, subtle head turn" \
--image-url "https://..." \
--duration 5
Setup
Zero API keys by default. Requests go through a hosted proxy. Set these for a custom proxy or token:
| Variable | Required | Description |
|---|---|---|
VIDEO_STUDIO_PROXY_URL | No | Proxy base URL |
VIDEO_STUDIO_TOKEN | No | Auth token if the proxy requires it |
Knowledge Base
- references/prompt_guide.md ā Prompt patterns for text-to-video and image-to-video.
- references/models.md ā Model list, capabilities, and selection guide.
- references/calling_guide.md ā Per-model endpoint details, input parameters, and special handling.