Grok Imagine Video
Native audio generation with strong prompt and edit control.
Professional video for every use case
Stylized anime shorts.
Switch footage into anime, retro, watercolor, or cyberpunk looks without rebuilding the whole shot.Fun Social Media Content Creation.
It can generate short videos in seconds with built-in sound effects and perfect lip-sync. Freely unleash your creativity, turning any static images, doodles or text into engaging viral content instantly.Character-consistent scenes.
Reference images help carry people, props, or outfits into new short scenes without locking the first frame.Product Visualization.
It can convert static product photos into short videos with smart sound effects and 360° dynamic display with one click. Powered by accurate spatial geometry analysis, it keeps product structures intact, delivering ready-to-use e-commerce product demonstrations.Native audio generated with the video
Grok Imagine Video generates sound with the visual clip instead of treating audio as a separate afterthought. That makes short outputs feel more finished for demos, concept scenes, and social posts.
- •Audio is created in the same generation flow
- •Useful for ambience, effects, and mood
- •Reduces separate sound-design work
- •Helpful for quick social and concept videos
Promptable camera motion and scene direction
xAI showcases the model with moves like zoom out, pan right, tilt up, dolly out, and timelapse. Clear camera language gives you more direct control than a vague cinematic prompt.
- •Prompt zooms, pans, tilts, dolly moves, and timelapses
- •Useful for product reveals and environmental shots
- •Works in both text-to-video and image-to-video
- •Clear motion directions improve shot readability
- •Built for short clips where framing matters
Text, image, reference, edit, and extension flows
You can start from text, animate a still image, guide the result with reference images, edit an existing clip, or continue a scene from its last frame. That makes the model useful beyond one-off first drafts.
- •Text-to-video from scratch
- •Image-to-video uses a still as the opening frame
- •Reference-to-video supports up to 7 guide images
- •Video editing changes specific elements in an existing clip
- •Video extension continues from the final frame
Short-form settings that fit real delivery formats
xAI exposes practical controls for duration, aspect ratio, and resolution, so you can fit clips to reels, feeds, or widescreen mockups. Edited outputs keep the source framing and timing, which helps preserve continuity.
- •1 to 15 second generation range
- •480p or 720p output
- •1:1, 16:9, 9:16, 4:3, 3:4, 3:2, and 2:3
- •Edits keep the input clip's duration and aspect ratio
- •Edited output resolution is capped at 720p
How it works
Describe your scene
Start with a direct prompt that names the subject, action, camera move, and mood. If you have a still image to animate or guide the look, add it before generating.
Choose format and length
Pick the aspect ratio, clip length, and quality that match where the video will be used. Short 480p drafts are useful for testing motion, while 720p is better for final delivery.
Generate and refine
Render the first pass, then tighten the prompt to adjust movement, framing, or sound. Once the clip feels right, export it and move on to the next variation.
Pricing for Grok Imagine Video
Runs on credits — no per-model surcharges, no surprise billing.
Show pricing details▾
- 480pdefault70/ sec
- 720p100/ sec
Credits work across every plan. See /pricing for credit packages.