Meta experiences major personnel shake-up; AI guru endorses her, leading to a crossroads on the road to AGI.

Meta's major personnel reshuffle has once again drawn the industry's attention to the world of modeling.

Yann LeCun, Meta's chief AI scientist and Turing Award winner, is reportedly preparing to leave the company to start his own business. LeCun has worked at Meta for 12 years, but his vision for technological development has always been at odds with that of Mark Zuckerberg, who is betting on large language models in AI. The core mission of his startup is to advance the world model architecture that he has been dreaming of for many years.

Behind this seemingly simple personnel change, AI development has reached a crossroads. World model or large language model? This is a debate about the essence of intelligence, which may determine who will lead the journey towards AGI in the next decade.

Coincidentally, WorldLabs, the startup founded by AI guru Fei-Fei Li, launched its first product, Marble, on November 13th. Driven by a multimodal world model , it describes itself as "building a spatially intelligent future." The product is based on the idea that it can build a lasting 3D world from a single image, video, or sentence.

On the other side of the ocean, Wang Xingxing and Huawei Hubble have also recently shown great interest in the world model:

Excellent Vision recently completed a new round of financing, reaching RMB 100 million in its Series A1 round. The company will continue to advance the research and development of physical AI intelligent models centered on world models , accelerate the development of general-purpose embodied humanoid models, and continue to develop benchmark scenarios for commercial application. According to business registration information, Excellent Vision's new investors include Hubble Investment and Huakong Fund.

Wang Xingxing, founder and chairman of Unitree Robotics, stated at the 8th Hongqiao International Economic Forum that there are currently two mainstream embodied intelligence models. One is the VLA+RL model, which can be trained in simulation environments or real-world scenarios, but it also faces many challenges, with its generalization ability being relatively insufficient. "Personally, I really like the video-generated world model. However, this model faces significant challenges, particularly for small and medium-sized robots. " The company's model is difficult to run because video generation models have very high computing power requirements, needing a lot of computing cards. On the contrary, some large AI companies and internet companies have more abundant resources for video models, and have a higher probability of success.

Read ten thousand books and travel ten thousand miles.

Despite the differences in specific technologies and product forms, the core consensus of the "world model school" is that the current dominant large language models in the field of AI have fundamental limitations.

Wittgenstein, the founder of the philosophy of language, once said in his Tractatus Logico-Philosophicus: "The limits of my language are the limits of my world." But this may not apply to AI. Fei-Fei Li said, "I am not a philosopher, but I know that, at least for AI, the world is far more than just words."

In her latest lengthy article, she stated frankly that language is ultimately an abstract signal created by humans for communication. Nature itself has no written language; the physical world follows its own inherent laws. If AI wants to truly understand and interact with the world, it cannot remain merely a game of textual symbols. In the darkness, I will become a "master of words".

LeCun has also repeatedly criticized large language models, arguing that they are at best powerful text databases that remember massive amounts of text but completely fail to understand the physical world behind the text.

What exactly is a so-called world model?

The essence of world modeling is to endow intelligent agents with the ability to understand, predict, and plan by modeling the real world in a high-dimensional cognitive way. By bypassing the language conversion process, it directly inputs spatial perception data into the model, completes the deduction of physical laws within the model's latent space, and directly outputs instructions, thereby achieving "intrinsic understanding" and "active reasoning" of the real world.

In Li Feifei's words, it can elevate "seeing" to "reasoning", transform "perception" into "action", and ground "imagination" into "creation".

It requires AI to not only read extensively but also travel widely—to understand why a cup breaks and predict how a car will turn, thus laying the foundation for true embodied intelligence, autonomous driving, and robots that can work seamlessly with humans.

It's worth noting that in Silicon Valley, it's not just tech giants like Fei-Fei Li and LeCun who are supporting the World Model; among tech giants, Google is also leading the way.

In just a year and a half, DeepMind, a subsidiary of Genie, upgraded its world model from 2D to Genie 3, capable of generating interactive 3D environments in real time. With just a single command, Genie 3 can create a dynamic world that a user can walk through and view at 720p resolution, with scene details remaining coherent for up to a minute. Beyond gaming , Genie 3 can also provide diverse training scenarios for robots or autonomous driving systems, and can be used to train AI agents. The study provides longer and more stable interaction rounds.

It must be acknowledged that research on world models is still in its early stages. Compared to the VLA (Virtual Reality) approach, which is suitable for rapid iteration and quick implementation, world models represent a more fundamental way of understanding, emphasizing physical laws and spatial understanding, and are suitable for long-term evolution. However, on this parallel track, a race to define the next decade of AI has already begun, with AI striving to go beyond text and attempt to understand and reshape the physical world we inhabit.

(Source: Science and Technology Innovation Board Daily)

Meta experiences major personnel shake-up; AI guru endorses her, leading to a crossroads on the road to AGI.

Read next

New Model Dominates Rankings: An Interview with Google Team – How a New AI Leader Emerges

Will "mechanical ascension" become a reality? The latest developments of Neuralink's first patient have been revealed, and Musk hints at the implantation of a brain-computer interface device.

Is Powell a "lame duck"? "Bond King" Gross leads the sell-off of US Treasury bonds!

The Top 50 US Stocks | META Valuations Return to Normal After Price Rollercoaster Ride