Google has been actively announcing a slew of advancements in generative AI, featuring new tools for creating images and videos with sound, as well as a 3D video conferencing solution.
The annual Google I/O conference for developers has prominently featured artificial intelligence this year. Google introduced exciting updates for its Gemini AI, as well as for online searching capabilities. However, the company also emphasized visual creation, unveiling new models for image and video generation, and a video creation interface named Flow. Let’s delve into these announcements.
Veo 3 Introduces Sound-Enabled Video Generation
In a rare move in the AI field, Google has introduced Veo 3, a groundbreaking video generation model that now supports sound. This allows users to create custom soundtracks, sound effects, and even dialogues for their videos. While the exact precision of these features is yet to be fully demonstrated, the initial examples provided by Google are quite promising. This development could mark a significant step in video generation technology, which previously was limited to silent outputs.
“Veo 3 excels in all aspects, whether it’s text, image, real-life physics, or accurate lip-syncing. It’s adept at understanding; you can narrate a short story, and the model will return a clip that brings it to life,” explains Google.
Available today in the Gemini app and in Flow for Ultra subscribers, as well as for businesses in Vertex AI, Veo 3 also boasts improved video quality compared to its predecessor, Veo 2. However, Veo 2 is not being phased out and includes new features such as enhanced creative control through the addition or removal of objects, precise camera movements, and the ability to automatically expand the frame of the scene.
Imagen 4: Google’s Latest Image Generation Model
Google’s image generation capabilities have also received an upgrade with the introduction of Imagen 4. Notable improvements include:
- Enhanced sharpness, especially in fine details like textures,
- The ability to produce images in a wider variety of styles,
- The capacity to generate images up to 2K resolution,
- Improved results in typography and spelling,
Imagen 4 is now available in the Gemini application, Whisk, Vertex AI, and within the Workspace ecosystem. A version that is “ten times faster” than Imagen 3 is also under development.
Introducing Flow: Google’s Unified Creative Platform
To integrate the capabilities of its various models, Google has developed a new platform called Flow, which is seen as an evolution of VideoFX. Dedicated to film production, Flow is powered by Veo 3, Imagen 4, and Gemini. Specifically, Flow allows users to input precise camera directions (movements, angles, perspectives), modify or extend scenes narratively, and organize prompts within the interface. A section called Flow TV will feature content created using Veo by other users, complete with detailed prompts for inspiration.
Flow enables the creation of precise video clips. For example, users can import images and enter a prompt to animate them (see featured image).
However, Google cautions that this tool is still in its infancy. To fully exploit Flow’s potential, the company has engaged videographers who specialize in AI. The various short films showcased reveal the platform’s potential while also displaying the typical distortions associated with AI-generated content.
Flow is available to subscribers of Google AI Pro and Google AI Ultra in the United States and will soon be extended to other countries.
Google Beam Enhances 3D Video Conferencing
Google Beam, an evolution of Project Starline, aims to transform remote communications by making video calls feel as if participants are actually facing each other in the same room. This technology utilizes a light field display developed in collaboration with HP, motion sensors, and an AI model capable of creating three-dimensional images from standard video feeds. This depth illusion allows for eye contact and the perception of gestures and expressions, enhancing the quality of interactions. In Meet, Beam also includes a real-time translation feature, enabling seamless conversations between speakers of different languages without losing intonation or subtlety.
The first devices will be showcased in June at the InfoComm exhibition and will be available to select businesses before the year’s end.
Similar Posts
- Gemini Update: Google Supercharges API with 2.5 Flash, Pro, and New Multimodal Features
- YouTube Shorts Update: Tag Anything with a Simple Gesture!
- Top AI Models of May 2025: Discover the Most Powerful Performers!
- Sonos Sale Alert: Discounts on Speakers, Soundbars, Subwoofers & Headphones!
- AI Revolutionizes Shopping: Not Only Buys for You but Justifies Your Spending!

Jordan Park writes in-depth reviews and editorial opinion pieces for Touch Reviews. With a background in UI/UX design, Jordan offers a unique perspective on device usability and user experience across smartphones, tablets, and mobile software.