Gemini Update: Google Supercharges API with 2.5 Flash, Pro, and New Multimodal Features

May 27, 2025

Gemini : Google muscle son API avec 2.5 Flash, Pro et de nouvelles fonctions multimodales

Unveiled at Google I/O 2025, the latest update to the Gemini API boasts enhanced power, interactivity, and precision, particularly in the realms of audio and music.

During the I/O 2025 conference, Google introduced significant updates to the Gemini API. These enhancements underscore Google’s commitment to delivering an even more robust generative AI platform, tailored for real-world applications, including those in interactive, multimodal, and low-latency environments. Developers now have access to more sophisticated tools for crafting conversational, audio, and musical experiences through both the Gemini API and Google AI Studio.

Integration of the New Gemini 2.5 Models into the API

Google has upgraded its Gemini API with the new 2.5 models, designed for “enhanced performance and natural interactions”. The Gemini 2.5 Flash Preview model, as of May 20, 2025, is noted for its advancements in reasoning, code generation, and handling of extended contexts. According to Google’s benchmarks, it ranks second in the LMarena leaderboard, just behind the 2.5 Pro model, while requiring 22% fewer tokens for the same level of response.

The 2.5 Pro and Flash models can now produce native multilingual audio (in 24 languages) in a text-to-speech format, complete with voice style control and multi-speaker support. Additionally, the Gemini 2.5 Flash Audio Dialog, accessible via the Live API, allows for the creation of expressive real-time voices capable of detecting emotions and responding contextually. A specialized model for complex reasoning is also available to handle intricate queries. Google is also testing a Deep Think mode on 2.5 Pro for multi-step tasks, particularly in fields like mathematics and programming.

See also  Exciting AI Breakthroughs: 5 Underrated Tools You Need to Watch!

The API also introduces two new capabilities. Lyria RealTime offers continuous music generation through WebSocket, based on textual prompts. The model produces adaptive instrumental sequences, which can be tested through the PromptDJ-MIDI app. Finally, Gemma 3n, an open-source model optimized for mobile devices, processes text, audio, and images while minimizing computational requirements through a streamlined architecture and advanced caching techniques.

Do you use AI in your professional work? Participate in the BDM survey!

Features Designed for Developers

With this latest round of updates, Google has enriched the Gemini API with features aimed at enhancing transparency, control, and integration of models within complex environments. To aid developers in understanding the logic behind model responses, the API now offers “thought summaries” for the Gemini 2.5 Pro and Flash models. These summaries provide a structured view of the model’s logical progression, complete with titles, useful details, and associated tool prompts. They can be easily activated in the configuration settings and are always accompanied by the generated content.

Another addition is the introduction of “thinking budgets,” which allow developers to specify the amount of cognitive computation a model should allocate to a task. This enables fine-tuning the balance between latency, cost, and response quality. This feature is already available for 2.5 Flash and will soon be implemented in the 2.5 Pro model. Concurrently, a new URL Context tool enables models to automatically extract content from links provided in requests. It can operate independently, paving the way for the creation of personalized search agents.

Other enhancements include a computer control tool, derived from the Mariner project, which enables an agent to interact with a browser, such as automating web tasks. Video analysis capabilities have been expanded: models can now summarize, translate, or clip YouTube videos or uploaded content. The support for JSON Schema has been improved to handle complex structures like tuples. The Live API now supports asynchronous functions, allowing an agent to respond while a user action continues in the background. Lastly, a Batch API is being tested, offering the ability to send grouped requests at a reduced cost, with responses delivered within 24 hours.

See also  Top 10 AI Models of April 2025: See Which Ones Outperformed the Rest!

Similar Posts

Rate this post

Leave a Comment

Share to...