Share this
A Major Divergence in AI Smartphone Development: Why Did Apple and Google Choose to "Take a Step Back" When Doubao Tried to "Take Over the Screen"?

A Major Divergence in AI Smartphone Development: Why Did Apple and Google Choose to "Take a Step Back" When Doubao Tried to "Take Over the Screen"?

2026-01-15 12:04:54 · · #1

AI phones Their technological approaches have diverged significantly.

On December 19, ByteDance announced another expansion move after launching the nubia M153 "bean bun phone" in collaboration with ZTE and nubia – it is now working with manufacturers such as vivo, Lenovo, and Transsion to advance AI phone cooperation.

The camp represented by "Doubao Phone" attempted to enable AI assistants to break through application barriers and complete complex tasks by using GUI (Graphical User Interface) technologies such as screen reading and simulated clicks, but this triggered a collective "defense" by mainstream apps.

In contrast, overseas, Apple The camp led by Google, on the other hand, insisted on the standardization of APIs (Application Programming Interfaces), which, while steady, progressed slowly.

The clash between these two technological approaches represents a dramatic collision of the business logic and profit structure of the mobile internet over the past decade. The curtain has already been raised on the migration of traffic entry points, profoundly rewriting the relationship between mobile phone manufacturers, application developers, and users.

GUI Takes Over Screen: Doubao Opens Up System-Level Permissions, Zhipu Open Source Fills the Gap.

The catalyst for this debate was the nubia M153, the "Bean Bun Phone," jointly released by ByteDance and ZTE Nubia in early December. This phone disrupted the industry with its cross-application AI operation capabilities: users only need to issue voice commands, and AI can complete a series of cross-app tasks such as ordering takeout, sending WeChat messages, and comparing prices for shopping. Its core technology is the deep integration of a multimodal GUI (Graphical User Interface) model with system-level permissions.

Image source: ZTE Mall (the website for the Doubao mobile phone)

Zhang He, former Xiaomi OS AI product expert and founder of ExcelMaster.ai, an AI application company expanding overseas, told the Daily Economic News reporter that by establishing deep cooperation with mobile phone manufacturers (such as ZTE) at the operating system level, AI assistants can gain system-level operating permissions that override all apps. The technical logic is to simulate human clicks and swipes, connecting all mobile phone applications to achieve cross-app task execution.

However, the direct "takeover" of the screen by AI assistants quickly triggered a "defensive counterattack" from mainstream apps: WeChat issued abnormal environment warnings and even banned accounts, Taobao frequently popped up human verification prompts, and major banks... The app then refuses to run while in screen recording mode.

On December 9th, Zhipu AI The company announced the open-sourcing of the AutoGLM autonomous task model, providing another possibility for the GUI approach.

AutoGLM is also based on the GUI paradigm and uses a large visual model to automate mobile phone operations. However, it previously could only run in Android accessibility mode because it did not have system-level permission support from manufacturers.

Zhang He pointed out that the accessibility mode has obvious shortcomings: "When AI is operating, it completely occupies the foreground window. For example, if a user is using Taobao for one minute, they cannot browse Weibo." "Chatting." However, he emphasized that Doubao and Zhipu AutoGLM are essentially from the same source, both being explorations from the perspective of large model manufacturers, differing only in whether they are open-source or not. "As long as mobile phone manufacturers cooperate, Zhipu AutoGLM can also achieve silent background operation; the core issue remains system permissions."

Obtaining system-level access to ZTE Nubia phones was one of the core reasons why Doubao was the first to create an AI phone .

However, Zhang He pointed out that the initiative for such cooperation lies in the hands of mobile phone manufacturers —not because manufacturers lack the same technological R&D capabilities, but because of the strategic consideration of "whether or not to make (AI phones)".

Manufacturers have two main concerns: First, users are generally worried about privacy leaks, and rashly opening up permissions may seriously affect the user reputation and image of mobile phone brands; second, mobile phone manufacturers hope to keep the system-level AI entry point firmly in their own hands, rather than becoming a technical channel for AI companies.

"This also explains why Doubao's first partner was ZTE Nubia, rather than a leading manufacturer," Zhang He added.

Apple and Google are "a step behind": a conservative approach to their API strategies.

The emergence of the Doubao phone has also sparked a global comparison and discussion about the two technical routes for AI phones—the GUI paradigm and the API (Application Programming Interface) paradigm.

Doubao and Zhipu's GUIs act like an "AI nanny," watching the phone screen like a human and helping users operate the phone. Apple and Google's API approach, on the other hand, is like giving apps a "manual," allowing them to expose their functions for system calls.

The advantages and disadvantages of the two are very clear: GUI does not rely on the cooperation of app developers. It "reads" the screen through a large visual model and simulates human clicks to achieve operation. Its advantage is that it is highly versatile and can theoretically handle any app that a human can use. However, its disadvantages are equally prominent. First, it faces great pressure to protect privacy. Second, it is less efficient, requiring step-by-step operation and is prone to errors.

The API paradigm, exemplified by Apple Intelligence, doesn't rely on simulation but rather on building an underlying framework and standardized interfaces, allowing AI to complete tasks by "calling capabilities" in a regulated manner. The advantages of this model are stability, privacy protection, and high efficiency; the disadvantages are that it requires active cooperation from app developers, resulting in a longer ecosystem building cycle.

iPhone 17 features Apple Intelligence. Image source: Apple website.

Zhang He described Apple as "the most conservative big player".

In 2022, Apple launched the Apps Intent framework, encouraging developers to declare functions to the system for Siri to call, but resolutely refused to open up the ability to bypass apps by reading screens; even the "screen awareness" function, which has not yet been implemented, chose to provide screen content to Siri through API rather than directly controlling the interface.

Currently, Apple Intelligence integrates OpenAI's ChatGPT. However, according to media reports, Apple is planning to adopt Google's Gemini model to provide technical support for upgrades to its Siri voice assistant. Apple hopes to use this technology as a temporary solution until its own model is powerful enough.

Google, on the other hand, takes an edge-cloud collaboration approach, prioritizing deployment on desktop platforms. Its Gemini large model boasts powerful edge-cloud collaboration capabilities, but its mobile version does not employ GUI multimodal operation. Its AppFunctions API aims to address fragmentation within the ecosystem, enabling unified discovery and indexing of application capabilities across the system.

Both Google and Apple tend to encourage app developers to proactively integrate standardized interfaces to enable cross-application collaboration for AI assistants.

Zhang He revealed that neither company has yet launched GUI multimodal operation features on mobile phones, and they are still in the technology reserve stage. "Google needs to coordinate with mobile phone manufacturers in the Android ecosystem on the one hand, and on the other hand, it is also observing market feedback and the maturity of the technology."

AI Ecosystem Reshuffle: Reconstruction of Interests among Smartphone Manufacturers, Super Apps, and Long-Tail Applications

A report by market research firm Canalys shows that, thanks to the rapid development of chip technology and the growing consumer demand for AI features, the global share of AI smartphone shipments will rise from 16% in 2024 to 54% in 2028. The firm predicts a compound annual growth rate of 63% from 2023 to 2028, driven by major players such as Samsung and Apple.

The choice of technological route will ultimately lead to a profound restructuring of the mobile internet's profit landscape.

“Relying on AI assistants to shop is equivalent to directly intervening in the transaction, so major internet companies are naturally worried about the impact on their business models.” Zhang He pointed out the core anxiety of the super apps’ collective “self-defense.”

APIs are similar to mobile phone manufacturers negotiating with major internet companies to have them develop API interfaces for accessing mobile phone smart assistants, opening up certain functions to AI within a limited scope. In this model, the initiative is in the hands of the major internet companies; whether to open API interfaces, to whom to open them, and how many functions to open up are all bargaining chips for future negotiations.

The GUI Agent can operate the App by "looking at the screen and clicking buttons", which is equivalent to bypassing the App's AI licensing process.

More importantly, the GUI Agent intercepts user operations at the mobile system level: users can use the core functions of the app without opening it. This means that on-app advertising will lose its core value—and advertising revenue is a significant source of income for the internet industry.

Zhang He believes that the future AI smartphone ecosystem will present a "layered governance" pattern, with players of different sizes facing vastly different fates.

For super apps like WeChat and Taobao, in the short term, they can use technical means to resist external AI screen reading and protect commercial data and user privacy. The optimal long-term solution is to develop their own AI agents, forming an Agent-to-Agent (A2A) collaboration model: system-level AI transmits user intent to the application agent, which then completes the operation within its permissions. This model both protects the super app's "territory" and integrates into the AI-powered mobile ecosystem.

The situation will be completely different for millions of mid-to-long-tail apps.

Zhang He believes that long-tail apps, lacking the technology and commercial influence to develop their own agents, will most likely be "directly controlled by system-level AI." For them, rather than investing resources in acquiring new users, it's better to accept system-level AI operations in exchange for traffic distribution within the new ecosystem. Mobile phone manufacturers may formulate standardized profit-sharing terms, allowing long-tail apps to "ride the ecosystem's dividends."

"This is like autonomous driving; it's an irreversible historical trend," Zhang He summarized. This user-centric transformation will drive the ecosystem from "traffic competition" to "value co-creation," ultimately forming a new landscape where mobile phone manufacturers dominate, with super apps, long-tail applications, and large-scale manufacturers each fulfilling their respective roles.

(Source: Daily Economic News)

Read next

Will the Federal Reserve cut interest rates again as early as January next year? Wall Street's outlook for 2026: Two rate cuts are becoming the mainstream expectation.

On Wednesday, the Federal Reserve decided to cut interest rates by 25 basis points, in line with market expectations. W...

Stock 2026-01-12