Back to blog
Tutorials

How to make an AI music video in 60 seconds

Turn any song into a full music video — synced visuals, AI-generated scenes, auto captions — in under a minute. No footage, no editing, no camera. Just a prompt.

By Prashar··4 min read

Making a music video used to take a full weekend. Picking clips, matching beats, color-grading, adding captions, syncing everything. With MarsClip, you paste a song and get a finished vertical music video back in under a minute.

This guide walks through the exact flow — what to upload, which settings matter, and how to fix the handful of things that trip up first-time users.

What you need before you start

Three things:

  • A song. MP3, WAV, or a direct URL. For your first try, pick something under 90 seconds — a snippet of your favorite track works fine.

  • A visual style in mind. Cinematic, neon, anime, documentary, dreamy. Anything you can describe in a sentence.

  • About 4 minutes of patience. The AI generates each scene in sequence, so longer songs take longer.

You don't need footage. You don't need B-roll. You don't need editing software. MarsClip generates every visual from scratch, synced to the song.

Step 1: Upload your song

From the dashboard, click Music Video. Drag your audio file into the upload zone.

While you wait a few seconds:

  • MarsClip transcribes the lyrics (for syncing captions).

  • It detects the tempo and beat drops (so visuals hit on strong beats).

  • It splits the track into scenes — usually one scene every 4-8 seconds. A 90-second track becomes about 12 scenes.

Two tips that save time later:

  • Use the original vocals if you have them. Clean vocals transcribe more accurately than compressed YouTube rips.

  • Name the upload with the song title. It becomes your video's title and your exported MP4's filename.

Step 2: Pick a visual style

This is the only creative decision you have to make. You can pick a preset or write your own style prompt.

The best prompts are specific. Good examples:

  • "Cinematic, moody, shot on 35mm film, shallow depth of field, rain on glass"

  • "Anime style, vibrant colors, Studio Ghibli inspired, outdoor adventure"

  • "80s neon, synthwave, retro-futuristic, palm trees at sunset"

  • "Documentary street photography, handheld, natural light, Brooklyn"

Generic prompts like "cool" or "modern" produce generic results. The more specific the reference, the more coherent the video.

Step 3: Let it generate

Hit Generate.

You'll land in the editor immediately — no waiting on a loading screen. At the top of the preview you'll see a progress bar ("Generating scenes… 20%") and scenes streaming into the sidebar one by one as each completes.

A 90-second song finishes in 3-5 minutes end-to-end. You can close the tab and come back; generation continues in the background.

Step 4: Review and tweak

Once all scenes have loaded, scroll through the scene list. Three things worth checking:

  • Any scene where the visual doesn't match the lyrics. Hover the scene, hit the pencil icon, regenerate with a more specific prompt.

  • Transitions. The Transitions tab (left rail) lets you swap fades for cuts, slides, or zooms. Default is a clean fade.

  • Captions. On by default. Toggle with the CC button on the preview. Restyle font, color, and position in the Captions tab.

Most first-time users export without changing anything. The defaults are deliberately conservative and lean on the song's rhythm.

Step 5: Export

Click Export (top right). Pick your aspect ratio:

  • 9:16 for TikTok, Reels, YouTube Shorts (most common)

  • 16:9 for YouTube main feed

  • 1:1 for Instagram grid

Your final MP4 downloads with your song title as the filename, ready to upload anywhere.

Typical stats for a MarsClip music video:

  • 60-90 seconds of finished video

  • Under 5 minutes from upload to download

  • ~40-60 credits on a standard plan

  • Zero footage or editing software needed

Common first-time mistakes

A few things that trip people up on their first try:

  • Too long a song. Start with a 60-90 second clip. Full 3-minute tracks work, but they take longer to regenerate if you don't love the first pass.

  • Vague style prompts. "Make it cool" produces generic AI slop. "Neon synthwave, 1980s Miami, palm trees, sunset colors" produces a video worth posting.

  • Skipping the preview. Watch the full preview before exporting. Regenerating one scene takes 10 seconds. Re-exporting the whole video is slower.

When AI music videos actually work

Use cases where MarsClip earns its keep:

  • Musicians putting out a single and need a visual for Spotify/Instagram

  • Content creators using trending sounds on TikTok and want something more interesting than a lip-sync

  • Small labels producing visualizers for every track without a video team

  • Hobbyists who want a 90-second music video for their Spotify Wrapped or a playlist

If you're making a cinema-quality feature film, you'll still hire a director. For everything between "just a waveform" and "full production", MarsClip fills the gap.

Ready to try it?