Google introduced the Gemini Omni multimodal AI model during its I/O 2026 conference to generate realistic video from text, photos, or live video [1, 2].
The launch marks a strategic effort by Google to broaden its artificial intelligence offerings and address a market gap left by OpenAI's Sora video model [5]. By enabling on-device video generation, the company aims to integrate high-fidelity creative tools directly into its existing ecosystem [2].
Presented at the Shoreline Amphitheatre in Mountain View, California, the model is also referred to as Gemini 3.5 Flash [3, 4]. The technology allows users to create surreal AI-generated clips by transforming existing real-world video, a feature Google said will be rolled out to YouTube Shorts [2].
Beyond social media integration, Google demonstrated the model's capabilities through a new AI agent within Google Flow [1]. This integration suggests a move toward more autonomous AI assistants that can perceive and generate visual content in real time to assist users with complex tasks.
The conference, which took place from May 14 to 16, 2026 [4], served as the primary venue for these announcements. The Gemini Omni system is designed to handle multiple modalities simultaneously, meaning it can process and output different types of data, such as text and video, within a single framework [1, 3].
Industry analysts noted that the timing of the release aligns with a broader race among tech giants to dominate the generative video space. While previous iterations of Gemini focused heavily on text and image processing, Gemini 3.5 Flash prioritizes speed and fluidity in video production [3, 7].
“Google introduced the Gemini Omni multimodal AI model to generate realistic video from text, photos, or live video.”
The introduction of Gemini Omni signals Google's transition from static AI responses to dynamic, real-time visual generation. By integrating this model into YouTube Shorts and Google Flow, the company is not just competing with standalone video generators but is embedding generative AI into the daily workflows of creators and general users, potentially shifting how short-form media is produced and consumed.





