Share this
TPU gets straight to the finals! What did Google do right after ten years of developing this chip?

TPU gets straight to the finals! What did Google do right after ten years of developing this chip?

2026-01-15 13:34:57 · · #1

With the advent of Gemini 3, Google's TPU has moved from behind the scenes to the forefront. Nvidia The dominance of GPUs is already showing signs of being challenged.

A Google Cloud executive revealed that expanding the market adoption of TPUs could help the company capture 10% of Nvidia's annual revenue. This was followed by a series of reports in the market, including "Google intensifies its marketing efforts to customers" and "Meta plans to spend billions of dollars to acquire TPUs." Morgan Stanley analysts boldly predict that Google's external sales of TPUs will reach 1 million units by 2027.

On the other hand, investors voted with their feet, expressing their concerns about the erosion of Nvidia's GPU market share. Nvidia issued an emergency statement emphasizing that its products offer higher performance, greater versatility, and better substitutability than ASICs.

From a reclusive master who secluded himself for intensive training to now having direct access to AI chips The origin of Google's TPU can be traced back to the "computing power anxiety" 12 years ago.

The optimal solution for deep learning

In 2013, the application of deep learning within Google began to spread rapidly, with the complexity of backend models and the scale of computing power consumption increasing exponentially. Google's chief scientist, Jeff Dean, calculated that if 100 million Android users used the speech-to-text service for just 3 minutes a day, the computing power consumed would be equivalent to that of all of Google's data centers combined. More than twice the total computing power.

While stacking GPUs to gain computing power is an intuitive option, the "von Neumann bottleneck" makes computing units such as CPUs and GPUs inefficient when handling specific tasks in deep learning, such as large-scale matrix multiplication. In contrast, self-developed ASIC accelerators can be customized for machine learning tasks, reducing the overall energy efficiency of deep neural network inference to one-tenth of its original level, and are more cost-effective in the long run .

So, in late 2013, Google officially launched the first-generation TPU project. Reportedly, three teams within Google were working on ASIC development at the time: Google Brain, DeepMind, and a department focused on custom data center hardware development. The latter's TPU was ultimately selected, and its key members possessed years of experience in chip architecture design. For example, Jonathan Ross had participated in the design of AMD's Zen architecture and later founded the AI ​​chip design company Groq.

The Google team made rapid progress, completing the design, verification, manufacturing, and even deployment of the first-generation TPU in data centers in just 15 months since the project was initiated. Project leader Norm Jouppi described the situation at the time: "Our chip design speed was so fast that we started shipping chips without correcting errors or changing masks."

In 2016, TPU ushered in its first "breakthrough moment": artificial intelligence. The Go-playing program AlphaGo defeated world Go champion Lee Sedol, sparking widespread interest in AI. At the time, AlphaGo was powered by the first-generation TPU.

In terms of research and development, a major turning point occurred in 2017. That year, the Google team introduced the Transformer—a new deep learning architecture based on attention mechanisms—in their presentation, "Attention Is All You Need." Its computational characteristics were highly compatible with the TPU design, ultimately elevating the TPU's strategic position from a single AI acceleration chip to the foundation of Google's AI infrastructure . Against this backdrop, TPU v2 was born.

Also in that year, Google announced that it would make 1,000 Cloud TPUs available for free to developers and researchers, further solidifying the TPU's position as the foundation of AI infrastructure. In the years that followed, Google introduced large-scale liquid cooling technology, forming a supernode with 4,096 chips, and utilizing a self-developed ring topology network to achieve near-lossless cross-chip communication. With the support of various "add-ons," the TPU's performance skyrocketed.

Beyond technology, Google has also generously integrated TPUs into its monetization strategies, such as advertising and search engine ranking. At the 2024 Application Development and Infrastructure Summit, Google publicly stated that TPU V6 and subsequent versions aim to become "the most cost-effective business engine for the inference era."

The rest of the story gradually became known. With the TPU v7 being used for AI training, the Gemini 3, widely recognized as the most powerful multimodal model today, emerged. Google transformed itself from a follower in the computing chip field to the new king of AI in the market's eyes. People can't help but wonder, after ten years of developing this chip, what exactly did Google do right?

▌An ecological moat that is hard to replicate

Looking back at Google's chip-making story, there is no grand vision of AGI, nor is there any absolute guidance of Scaling Law. Instead, it has always focused on a more business-logical and pragmatic metric—cost.

A deeper look reveals that the history of TPUs dates back to 2006, when Google began considering the possibility of building ASICs for neural networks. However, since the internet was on the verge of explosive growth at the time, Google's data centers still had a large amount of redundant GPU computing power, and the demand was not urgent. Therefore, Google did not overspend to actually advance this plan.

This "tailor-made" philosophy is even reflected in the design architecture of the TPU itself. Unlike GPUs, TPUs adopt a minimalist "pulsating array" architecture. While this comes at the cost of sacrificing hardware versatility, it also means that all irrelevant hardware can be stripped away , thereby ensuring maximum efficiency in deep learning within the smallest unit.

Today, this design architecture has clearly brought Google returns far beyond "cost savings," namely, the capital to compete with other top AI companies.

More importantly, unlike other manufacturers who heavily rely on external computing power and cloud infrastructure, Google's vertically integrated approach, based on TPUs, has built a full-stack AI capability chain encompassing "chip-cloud-model-application." As securities analysts have pointed out, this has created an ecosystem moat that is difficult to replicate, and has earned it a ticket to define the future of AI infrastructure.

(Article source: CLS)

Read next

U.S. inflation data for September fell across the board, increasing the likelihood of a Federal Reserve rate cut.

① The US September CPI data indicates that inflationary pressures have eased somewhat, but the 3% year-on-year increase...

Stock 2026-01-12