Google LLC has released Gemini Omni Flash, a voice-controlled AI tool designed to generate and edit videos through conversational prompts [1, 2, 3].

This development represents a shift toward multimodal AI, where users can manipulate complex visual media without traditional editing software. By removing the technical barrier to video production, Google aims to reshape how digital content is created and modified in real time [2, 5].

The tool is part of the Gemini 3.5 Flash model series [1]. It enables a conversational interface where users speak to the AI to request specific changes or new scenes, effectively treating the AI as a creative agent rather than a static tool [1, 4].

Reports on the exact timing of the announcement vary across industry sources. SiliconANGLE said the news occurred on May 19, 2026 [1], while other reports placed the announcement on May 20 [3] or May 28, 2026 [2].

Gemini Omni Flash focuses on speed and fluidity in the creation process. The system is built to handle multimodal inputs, meaning it can process and produce various types of data, such as voice and video, simultaneously [4]. This allows for a more intuitive workflow where a user can describe a visual change and see it implemented almost immediately [2].

Google is positioning this technology to compete in the growing market of AI-driven content creation tools. The company is targeting the ability to create AI agents that can manage the entire video generation pipeline through a simple dialogue [1].

Google released Gemini Omni Flash, a voice-controlled AI tool that lets users generate and edit videos through conversational prompts.

The integration of voice-controlled editing into the Gemini 3.5 Flash model signals a move toward 'invisible' interfaces. By replacing timelines and keyframes with natural language, Google is attempting to democratize high-end video production, potentially disrupting the professional editing market and accelerating the volume of AI-generated synthetic media.