Sound Model
Mirelo
Mirelo Video2SFX 1.0
Mirelo Video2SFX 1.0 is Berlin-based Mirelo's foundational video-to-sound-effects model generating synchronized audio from video input without text prompts, purpose-built for AI-generated content. Excelling at producing pure sound effects without unwanted music or speech artifacts, particularly on synthetic footage where competitors fail. Supporting 10-second videos with real-time generation producing 2-4 diverse output variations. Requiring 50x less compute than typical LLMs through lightweight specialized architecture.
Official Site: https://www.mirelo.ai
Mirelo Video2SFX 1.5
Mirelo Video2SFX 1.5 is Mirelo's enhanced video-to-sound-effects model released 2025, delivering improved audio fidelity, wider scene coverage, and faster processing. Maintaining zero-prompt operation with context-aware soundscapes purely from video analysis. Winning 70-80% preference in blind listening tests through superior handling of synthetic AI content. Featuring enhanced frame-accurate synchronization, multiple variation generation, and optimized inference. Supporting diverse scenarios from natural environments to complex action sequences. Part of $44M funded startup backed by Index Ventures and Andreessen Horowitz.
Official Site: https://www.mirelo.ai
MMAudio
MMAudio 2
MMAudio 2 is University of Illinois and Sony AI's multimodal audio generation model (CVPR 2025) synthesizing synchronized audio from video and/or text through joint training framework. Featuring 157M parameters generating 8-second clips at 44.1kHz in 1.23 seconds, requiring only 6GB GPU memory. Utilizing flow prediction network with Synchformer and CLIP extractors achieving 10% better Fréchet Distance, 15% higher Inception Score, and 14% improved synchronization versus prior models. Supporting video-to-audio, text-to-audio, and experimental image-to-audio synthesis with MIT open-source license.
Official Site: https://github.com/hkchengrex/MMAudio
Last updated