Share this
New Model Dominates Rankings: An Interview with Google Team – How a New AI Leader Emerges

New Model Dominates Rankings: An Interview with Google Team – How a New AI Leader Emerges

2026-01-15 11:56:36 · · #1

On November 19th, the long-awaited and widely discussed Gemini 3 was finally officially unveiled. Google didn't just offer a minor upgrade this time; it played a trump card—achieving comprehensive leadership in almost all mainstream benchmark tests, potentially rewriting the competitive landscape of large-scale models. Some industry insiders even predicted: "It will be difficult for any company to surpass this achievement within the next six months."

Shortly after its release, OpenAI CEO Altman and Tesla CEO Elon Musk subsequently offered his congratulations publicly. Altman called it "looks like a great model," while the comments section joked, "This compliment from a competitor is so heartwarming." Musk, as always, praised it as "Nice work."

Even Google, known for its meticulous approach, has been unusually high-profile this time. The official blog post was titled "Opening a New Era of Intelligence," repeatedly emphasizing "best" and "most advanced." Google employees have also been actively promoting their product on social media, with CEO Sundar Pichai posting eight updates today about the Gemini 3.

Prior to the official release, CBN participated in a small-scale media briefing held by Google. Although the progress of the model was expected, the enthusiastic response from the industry still exceeded expectations. Everyone was amazed by the speed of Google's progress. Designs that were impossible three months ago can now be generated with a single click, and AI programming is now at the "Next Level." Some people exclaimed, "This industry is developing too fast."

In just three years, Google has gone from catching up to leading the pack. Koray Kavukcuoglu, CTO of Google DeepMind, stated at a media briefing that Google's differentiated, full-stack technology solution is crucial, with all aspects from hardware to research interconnected. When asked by CBN about the slowdown in scaling laws, he replied that technological progress is not necessarily reflected in the creation of entirely new capabilities, but rather in "new scenarios that models can empower."

The new model dominates the rankings.

Early this morning, Pichai posted a message with only one picture, but the picture is convincing enough: the Gemini 3 Pro has almost "dominated" the leaderboards, ranking first in all major arena leaderboards.

Specifically, in the "Humanities' Last Exam" (a benchmark test that measures deep understanding and requires models to have multi-step logical reasoning and expert-level deductive abilities), the Gemini 3 Pro achieved a score of 37.5% without using any tools, while the second-ranked GPT.5.1 only achieved 26.5%, leading by 10 percentage points.

In the GPQA Diamond test, which measures graduate-level reasoning and knowledge, the Gemini 3 Pro scored 91.9%, closely followed by the GPT.5.1 at 88.1%. This means that the Gemini 3 Pro is not only highly capable but also extremely reliable in solving science and mathematics problems.

In terms of multimodal capabilities, its understanding and reasoning have reached new heights: Gemini 3 Pro directly broke the record for multimodal reasoning with an 81% MMMU-Pro score and an 87.6% Video-MMMU score.

In terms of reasoning ability, Gemini 3 Pro surpassed the score just achieved by Grok4.1, topping the LMARaena leaderboard with a score of 1501, while Grok4.1's thinking model scored 1484.

The rankings are only one part of the capabilities. Google defines the new model as "Gemini 3 can turn any idea into reality," so the actual user experience is more important.

One user tested a poster with sophisticated lighting and shadow effects. Three months ago, Google's Nano Banana lagged significantly behind GPT, but now it has achieved a much better result. "I never imagined Google would cover such a long road in just three months," one user remarked. Another blogger exclaimed, "The Gemini 3 Pro is truly amazing!" After having the model replicate a Mac OS webpage, "My expectations were already high, but it still exceeded them." "Watching the Gemini 3 Pro complete a web operating system in one go left me speechless," another user commented.

During the communication meeting, the media also asked the product team about some "aha moments" during the training of this new model. Tulsee Doshi, Product Director of Google DeepMind Gemini Model, said that when he first used it for code generation, what was most amazing was that it could generate various games with just simple prompts. Furthermore, it boasts a significant advantage in terms of detail. For instance, it can generate 3D visualizations and even allow users to play games directly within them, providing a fantastic experience.

Google DeepMind CEO Demis Hassabis, who also uses the model for game development , proudly stated in a post that the model "certainly ranks highly on various leaderboards," but beyond these benchmarks, it also performs exceptionally well in everyday tasks thanks to its unique style and powerful features. He mentioned that he has recently been using the Gemini 3 for some programming, such as recreating a game in just a few hours with excellent detail.

What other potential use cases does the Gemini agent have? Struhal mentioned at the conference that he personally is already using the model to handle two types of tasks with great success: purchasing tickets and organizing email inboxes using the agent model.

“Every morning when I wake up, I receive more than 50 emails. It would take a long time to check them one by one and decide how to handle them. Now I use an AI agent to help me sort them out: it extracts the to-do tasks from the emails, filters out the emails that require my reply, and also marks the emails that can be ignored, which really saves a lot of time.” Struhar said that he also uses the model to buy concert tickets, letting the AI ​​agent directly filter suitable ticket combinations based on family members, and he only needs to click “buy”.

The Google team hopes that users can use the new model to handle "multi-step, complex tasks" they encounter in life, which is the model's strength.

Is a new leader emerging in the AI ​​industry?

Besides the leap in capabilities, Google made two other noteworthy moves: firstly, it integrated Gemini 3 into Google Search on the first day of its release; and secondly, it launched Antigravity, a brand-new "IDE-like" AI programming product, betting on the programming field.

This means that the newly released model is mature enough to be applied in commercial scenarios. The official statement indicates that Gemini 3 brings strong inference capabilities to the search engine and unlocks a new generative UI experience, allowing users to obtain dynamic visual layouts using specially generated interactive tools and simulations.

For example, when a user asks about the three-body problem in physics, they can directly get an interactive simulation interface, and the user can observe the results by changing variables.

The team believes that the model released this time is also the most powerful "ambient code generation" model in the company to date, and Google's Antigravity, which is based on this model, further improves the product experience. Similar to an AI IDE, the intelligent agent can autonomously plan and execute complex end-to-end software tasks on behalf of the user.

When discussing Antigravity during the meeting, Kavukchoglou argued that large language models have revolutionized programming, enabling engineers and software developers to... Agents work "at a higher level," handling complex tasks with the help of intelligent agents, and Antigravity is built on this foundation.

There are other IDE products on the market, Kavukchoglu said. Google's model will still be available in various IDEs and will also be open to developers through APIs. However, Antigravity can provide teams with "another way to interact with developers" so that teams can understand users' usage scenarios, real task requirements and challenges, and then optimize the model accordingly.

Google's move has led to speculation that it is competing with programming models and tools like Anthropic and Cursor in the field of AI programming.

Kavukjooglu responded that Google continues to maintain a close working relationship with Cursor in this release. Their goal is not competition; the team's focus is on "reaching users in their current context." Currently, artificial intelligence... Development is still in its early stages, and its impact on different fields and industries is still being explored. "We believe it is important to maintain an open and experimental attitude."

Regardless, Google has indeed pulled ahead of its competitors, and these moves will inevitably make similar products wary. For example, "Anthropic may be sweating bullets." Previously, Anthropic's revenue grew rapidly and its valuation soared thanks to its leading performance in the programming field, but it seems that Google has caught up with this advantage.

The market believes that Gemini 3 may also be an important milestone for Google. Since the release of ChatGPT at the end of 2022, Google has been considered to be "early to start but late to finish," and is in a state of catching up with OpenAI in the AI ​​race. However, the new model may rewrite the landscape, and Google has the opportunity to take the lead, especially given that OpenAI's GPT-5 has been criticized as "more hype than substance." The AI ​​industry needs a new standard-bearer.

Some have even claimed that "Google is fueling the AI ​​bull market narrative." Recently, Loop Capital upgraded Google's parent company's rating from "hold" to "buy," raising its target price from $260 to $320 per share. Google's stock price surged a few days ago, pushing its market capitalization above $3.5 trillion, a record high. While it has since fallen back to $3.43 trillion, it remains at a historical high.

Previously, Berkshire Hathaway, owned by Warren Buffett, disclosed that it had significantly increased its holdings in Google, making it the company's tenth largest stock holding, which attracted much attention from the capital markets. Loop Capital's analysis pointed out that "search concerns are no longer valid" because Gemini's traffic share has doubled year-on-year. This growing engagement highlights a key insight: Google is effectively leveraging its massive user base and product ecosystem to drive AI adoption, embedding generative capabilities directly into the daily digital experiences of millions of people.

At the communication meeting, Kavukchoglu announced data on Gemini users: monthly active users have exceeded 650 million, more than 13 million developers are building models and artificial intelligence applications based on Gemini, and the AI ​​overview function in search powered by Gemini has more than 2 billion monthly users.

Why was Google able to go from catching up to taking the lead in just three years? Kavuk Jorgluw believes that one of the core reasons is that the team has always maintained an extremely fast pace of development, and the most crucial support for this is Google's highly differentiated full-stack technology solution.

This full-stack solution starts with hardware investment: first, the data center. The infrastructure development begins with chips, especially Google's high-performance TPUs (Tensor Processing Units). The network connections between these chips form computing clusters that support model training, thereby supporting Google's cutting-edge AI research. Simply put, from hardware design to the implementation of large-scale training, to groundbreaking research results, and finally to the improvement of the capabilities of basic models, all links are interconnected and work synergistically.

In the past six months, Gemini's user growth has been remarkable. Struhal believes that one of the key factors is the viral effect of the Nano Banana app, especially in countries such as Thailand, Indonesia, and India. It is a very successful product, with many people enjoying using it for interaction, sharing it with friends, and even sparking a trend for collectible figurines.

Since the end of last year, some have argued that the iteration speed of large models has slowed down and the scaling law is no longer effective. However, Google's large model seems to have made some significant progress. How does Google view the current development trend?

Kavukchoglu told CBN reporters that to observe the development of a field, the key is to look at its actual impact on various industries, and the impact of AI is expanding day by day, with more and more professionals using AI to assist their work.

“AI models are having an increasingly significant impact on daily life, and from this perspective, the pace of technological progress is actually very fast. Looking at the iteration of our own model capabilities, we can also see many exciting advancements.” Kavukjooglu believes that technological progress should not be limited to the emergence of entirely new capabilities; “new scenarios that models can empower” is also an indicator. They have seen comprehensive positive progress throughout the entire model development process, from pre-training to post-training, and this trend will continue for some time.

Google believes that Gemini 3 is the team's next step towards artificial general intelligence (AGI). Currently, this step is clearly faster than that of competitors like OpenAI and xAI.

In the comments section where Altman congratulated Google on the release of its new model, a popular comment was, "What else do you have in your pocket?" It's time for the competitors to make their moves.

(Article source: CBN)

Read next

Chen Huaiyu, Chairman of the Export-Import Bank of China, and Chen Liang, Chairman of China International Capital Corporation Limited, held talks.

On December 24, Chen Huaiyu, Party Secretary and Chairman of the Export-Import Bank of China, and Chen Liang, Party Sec...

Articles 2026-01-12