pexoai avatar

Videoagent Video Studio

pexoai/pexo-skills
172

This skill enables the creation of short AI-generated videos from text prompts or images, supporting various modes like text-to-video and image-to-video, including reference-based options. It offers multiple models for different styles and use cases, automatically selecting the appropriate backend for best results. Designed for users seeking to generate videos, animate images, or produce AI-driven clips quickly without requiring API keys.

npx skills add https://github.com/pexoai/pexo-skills --skill videoagent-video-studio

šŸŽ¬ VideoAgent Video Studio

Use when: User asks to generate a video, create a video from text, animate an image, make a short clip, or produce AI video. Generate short AI videos with 7 backends. This skill picks the right mode (text-to-video or image-to-video), enhances the prompt for best results, and returns the video URL.


Quick Reference

User Intent Mode Typical Duration "Make a video of..." (no image) text-to-video 4–10 s "Animate this image" / "Make this move" image-to-video 4–6 s "Turn this into a video with..." image-to-video 4–6 s Cinematic, story, ad Prefer text-to-video with detailed prompt 5–10 s

Generation Modes

Mode Description Models text-to-video Text prompt only → video minimax, kling, veo, hunyuan, grok, seedance image-to-video Single image + prompt → animated clip minimax, kling, veo, pixverse, grok, seedance reference-based Reference images/video → consistent output minimax, kling, veo, hunyuan, grok, seedance

Models (use --model <id>)

Model ID T2V I2V Reference Notes minimax āœ… āœ… āœ… Subject reference image, character consistency kling āœ… āœ… āœ… Multi-element / character / keyframe (O3) veo āœ… āœ… āœ… Google Veo 3.1, multiple reference images hunyuan āœ… — āœ… Video-to-video style transfer pixverse — āœ… — Stylized image-to-video grok āœ… āœ… āœ… Video editing via reference video seedance āœ… āœ… āœ… Seedance 1.5 Pro, synchronized audio, 4–12 s Full model details and endpoint reference: references/models.md.


How to Generate a Video

Step 1 — Choose mode and enhance the prompt

  • Text-to-video: Expand with subject, action, camera movement, lighting, and style. Be specific about motion (e.g. "camera slowly zooms in", "character walks left to right").
  • Image-to-video: Describe the motion to apply to the image (e.g. "gentle breeze in the hair", "camera pans across the scene"). See references/prompt_guide.md for patterns.

Step 2 — Run the script

Text-to-video:

node {baseDir}/tools/generate.js \
  --mode text-to-video \
  --prompt "<enhanced prompt>" \
  --duration <seconds> \
  --aspect-ratio <ratio>

Image-to-video:

node {baseDir}/tools/generate.js \
  --mode image-to-video \
  --prompt "<motion description>" \
  --image-url "<public image URL>" \
  --duration <seconds> \
  --aspect-ratio <ratio>

Parameters: Parameter Default Description --mode text-to-video text-to-video or image-to-video --prompt (required) Scene or motion description --image-url — Required for image-to-video; public image URL --duration 5 Length in seconds (typically 4–10) --aspect-ratio 16:9 16:9, 9:16, 1:1, 4:3, 3:4 --model auto Model ID (e.g. kling, veo, grok, seedance); auto = proxy picks Other commands: Command Description node tools/generate.js --list-models List available models from the proxy node tools/generate.js --status --job-id <id> Check async job status

Step 3 — Return the result

The script returns JSON:

{
  "success": true,
  "mode": "text-to-video",
  "videoUrl": "https://...",
  "duration": 5,
  "aspectRatio": "16:9"
}

Send videoUrl to the user.


Example Conversations

User: "Generate a short video of a cat walking in the rain, cinematic."

node {baseDir}/tools/generate.js \
  --mode text-to-video \
  --prompt "A cat walking through rain, wet streets, neon reflections, cinematic lighting, slow motion, 4K" \
  --duration 5 \
  --aspect-ratio 16:9

User: "Animate this photo" (user uploads a landscape)

node {baseDir}/tools/generate.js \
  --mode image-to-video \
  --prompt "Gentle clouds moving across the sky, subtle grass movement, cinematic atmosphere" \
  --image-url "https://..." \
  --duration 5 \
  --aspect-ratio 16:9

User: "Make a 10-second vertical video of a coffee pour, slow motion."

node {baseDir}/tools/generate.js \
  --mode text-to-video \
  --prompt "Close-up of coffee pouring into a white cup, slow motion, steam rising, soft lighting, product shot" \
  --duration 10 \
  --aspect-ratio 9:16

User: "Use Google Veo for a cinematic shot."

node {baseDir}/tools/generate.js \
  --mode text-to-video \
  --model veo \
  --prompt "A dragon flying through cloudy skies, cinematic lighting, 8s" \
  --duration 8 \
  --aspect-ratio 16:9

User: "Animate this portrait."

node {baseDir}/tools/generate.js \
  --mode image-to-video \
  --model grok \
  --prompt "Gentle smile, subtle head turn" \
  --image-url "https://..." \
  --duration 5

Setup

Zero API keys by default. Requests go through a hosted proxy. Set these for a custom proxy or token: Variable Required Description VIDEO_STUDIO_PROXY_URL No Proxy base URL VIDEO_STUDIO_TOKEN No Auth token if the proxy requires it


Knowledge Base

GitHub Owner

Owner: pexoai

Files

models.md

prompt_guide.md

prompt_guide.md

models.md

calling_guide.md

SKILL.md


name: videoagent-video-studio version: 2.1.0 author: pexoai emoji: "šŸŽ¬" tags:

  • video
  • video-generation
  • text-to-video
  • image-to-video
  • veo
  • grok
  • kling
  • seedance
  • minimax
  • hunyuan
  • pixverse description: > Generate short AI videos from text or images — text-to-video, image-to-video, and reference-based generation — with zero API key setup. Use when the user wants to create a video clip, animate an image, or generate video from a description. metadata: openclaw: emoji: "šŸŽ¬" install:
    • id: node kind: node label: "No dependencies needed — all calls go through the hosted proxy"

šŸŽ¬ VideoAgent Video Studio

Use when: User asks to generate a video, create a video from text, animate an image, make a short clip, or produce AI video. Generate short AI videos with 7 backends. This skill picks the right mode (text-to-video or image-to-video), enhances the prompt for best results, and returns the video URL.

Quick Reference

User IntentModeTypical Duration
"Make a video of..." (no image)text-to-video4–10 s
"Animate this image" / "Make this move"image-to-video4–6 s
"Turn this into a video with..."image-to-video4–6 s
Cinematic, story, adPrefer text-to-video with detailed prompt5–10 s

Generation Modes

ModeDescriptionModels
text-to-videoText prompt only → videominimax, kling, veo, hunyuan, grok, seedance
image-to-videoSingle image + prompt → animated clipminimax, kling, veo, pixverse, grok, seedance
reference-basedReference images/video → consistent outputminimax, kling, veo, hunyuan, grok, seedance

Models (use --model <id>)

Model IDT2VI2VReferenceNotes
minimaxāœ…āœ…āœ…Subject reference image, character consistency
klingāœ…āœ…āœ…Multi-element / character / keyframe (O3)
veoāœ…āœ…āœ…Google Veo 3.1, multiple reference images
hunyuanāœ…ā€”āœ…Video-to-video style transfer
pixverseā€”āœ…ā€”Stylized image-to-video
grokāœ…āœ…āœ…Video editing via reference video
seedanceāœ…āœ…āœ…Seedance 1.5 Pro, synchronized audio, 4–12 s
Full model details and endpoint reference: references/models.md.

How to Generate a Video

Step 1 — Choose mode and enhance the prompt

  • Text-to-video: Expand with subject, action, camera movement, lighting, and style. Be specific about motion (e.g. "camera slowly zooms in", "character walks left to right").
  • Image-to-video: Describe the motion to apply to the image (e.g. "gentle breeze in the hair", "camera pans across the scene"). See references/prompt_guide.md for patterns.

Step 2 — Run the script

Text-to-video:

node {baseDir}/tools/generate.js \
  --mode text-to-video \
  --prompt "<enhanced prompt>" \
  --duration <seconds> \
  --aspect-ratio <ratio>

Image-to-video:

node {baseDir}/tools/generate.js \
  --mode image-to-video \
  --prompt "<motion description>" \
  --image-url "<public image URL>" \
  --duration <seconds> \
  --aspect-ratio <ratio>

Parameters:

ParameterDefaultDescription
--modetext-to-videotext-to-video or image-to-video
--prompt(required)Scene or motion description
--image-url—Required for image-to-video; public image URL
--duration5Length in seconds (typically 4–10)
--aspect-ratio16:916:9, 9:16, 1:1, 4:3, 3:4
--modelautoModel ID (e.g. kling, veo, grok, seedance); auto = proxy picks
Other commands:
CommandDescription
----------------------
node tools/generate.js --list-modelsList available models from the proxy
node tools/generate.js --status --job-id <id>Check async job status

Step 3 — Return the result

The script returns JSON:

{
  "success": true,
  "mode": "text-to-video",
  "videoUrl": "https://...",
  "duration": 5,
  "aspectRatio": "16:9"
}

Send videoUrl to the user.

Example Conversations

User: "Generate a short video of a cat walking in the rain, cinematic."

node {baseDir}/tools/generate.js \
  --mode text-to-video \
  --prompt "A cat walking through rain, wet streets, neon reflections, cinematic lighting, slow motion, 4K" \
  --duration 5 \
  --aspect-ratio 16:9

User: "Animate this photo" (user uploads a landscape)

node {baseDir}/tools/generate.js \
  --mode image-to-video \
  --prompt "Gentle clouds moving across the sky, subtle grass movement, cinematic atmosphere" \
  --image-url "https://..." \
  --duration 5 \
  --aspect-ratio 16:9

User: "Make a 10-second vertical video of a coffee pour, slow motion."

node {baseDir}/tools/generate.js \
  --mode text-to-video \
  --prompt "Close-up of coffee pouring into a white cup, slow motion, steam rising, soft lighting, product shot" \
  --duration 10 \
  --aspect-ratio 9:16

User: "Use Google Veo for a cinematic shot."

node {baseDir}/tools/generate.js \
  --mode text-to-video \
  --model veo \
  --prompt "A dragon flying through cloudy skies, cinematic lighting, 8s" \
  --duration 8 \
  --aspect-ratio 16:9

User: "Animate this portrait."

node {baseDir}/tools/generate.js \
  --mode image-to-video \
  --model grok \
  --prompt "Gentle smile, subtle head turn" \
  --image-url "https://..." \
  --duration 5

Setup

Zero API keys by default. Requests go through a hosted proxy. Set these for a custom proxy or token:

VariableRequiredDescription
VIDEO_STUDIO_PROXY_URLNoProxy base URL
VIDEO_STUDIO_TOKENNoAuth token if the proxy requires it

Knowledge Base

More skills