Summary of Key Points
The team led by Jiang Yao from Tsinghua University (with all 8 doctors joining him) has chosen a non-mainstream approach to AI research. They started from discoveries in neuroscience, which suggest that human actions are instinctive, while language acquisition is not. Over the past 8 years, they have focused on developing robots that operate based on innate instincts rather than relying on large datasets to train models. Instead of using data to create complex algorithms, they have equipped their robots with sensory capabilities similar to those of humans, such as the ability to automatically adjust grip strength when detecting friction while grasping objects. This approach has solved the problems associated with the VLA (Vision-Language-Action) and world model frameworks in industrial applications. Their technology is now being commercially used in fast-moving consumer goods industries like cosmetics and perfumes, addressing a common issue: the need to change production lines without having to adjust the machinery.
1. Why Don’t They Follow the AI Trend? – Action and Language Are Different
Mainstream AI systems, such as ChatGPT, rely on massive datasets to train models, leading many to believe that robot operations can be achieved in the same way: by using vision and language commands followed by data-driven training. However, Jiang Yao, while studying the human brain at Harvard, realized that language is an acquired skill (people don’t learn to speak without instruction), whereas basic actions like grasping objects are instinctive and almost universally performed in the same manner. This indicates that there is a biological basis for certain behaviors that cannot be replicated solely through data.
For example, when picking up a bottle of water, one needs to know its weight and friction coefficient beforehand (information that is not available before the act; with data-driven methods, one would have to simulate countless scenarios with enormous amounts of data). In contrast, an instinctively driven robot would automatically adjust its grip strength based on sensory feedback.
2. Why Do VLA and World Models Fail in Industrial Settings? – Hardware Differences and Contact Mechanics
The popular VLA and world model frameworks have encountered challenges in practical industrial applications:
- VLA’s Problem: These systems rigidly link tasks (like grasping) to the hardware (robotic limbs). For instance, a robot trained with one set of grippers may not perform well with another due to differences in grip strength. Moreover, VLA models lack tactile feedback, so they can only mimic visual actions that are often ineffective for real-world tasks.
- World Models’ Problem: Simulating the physical world is difficult; even simple actions like picking up a pen require considering complex factors like friction and hardware resistance, which current simulators cannot fully replicate. Such simulations are purely theoretical and impractical.
3. How Do Robots Gain “Tactile Sensation?” – Touch Sensors and Three Types of Instinctive Reactions
Jiang Yao’s team spent seven years developing touch sensors that can detect the hardness, friction, and slip of objects. They then programmed the robots to exhibit three types of instinctive reactions:
1. Directed Reaction: The robot automatically moves towards an object upon detection.
2. Exploration Reaction: The robot can find objects even in the dark using tactile senses.
3. Grasping Reaction: It adjusts grip strength automatically based on friction.
For instance, when a robot tries to pick up an ID card without a thumb, it may lift the card to make it easier to grasp—a behavior that arises from instinct rather than pre-programmed instructions, similar to how children learn to solve problems through trial and error.
4. Why Choose Fast-Moving Consumer Goods for Commercialization? – The Automotive Industry Faced Challenges
Initially, they targeted the automotive industry but encountered difficulties due to the fast production pace (100 actions per minute) and the low profitability of car manufacturers, which were reluctant to invest in new technologies. They later realized that the consumer goods sector (cosmetics, perfumes) offered a better fit:
- The industry has a wide range of products with frequent changes in production lines.
- Traditional automation methods require significant adjustments for each product change, leading to significant downtime and costs.
- For example, in the perfume industry, robots can easily perform tasks like adjusting wick positions in candles, which requires precise tactile control.
Their approach eliminates the need for costly customization: the robots can be used immediately without any setup, and their performance improves with use.
5. Why Did All 8 Doctors Join His Venture?
The team was selected based on their shared belief in the power of human instincts. Students with a computer science background who were unable to let go of data-driven thinking were not considered for the project. The 8 doctors chose to follow Jiang Yao because they:
- Shared a common vision: They believed that instinct-based robotics was a more viable approach.
- Recognized the slow pace of academic research compared to the rapid development in the commercial world.
- Trusted Jiang Yao’s commitment to pursuing this path for 10–30 years.
Although the startup period was challenging, their efforts paid off as the products began to be sold and they started collecting user data to refine their technology.
6. The Value of Going Against Conventional Wisdom
While mainstream AI focuses on data accumulation and simulation, Jiang Yao’s team has focused on human instincts, leading to a more effective approach for robot operations. This counterintuitive approach not only addresses real industry challenges but also opens up new possibilities for AI development: learning from human behavior could be more productive than simply accumulating data.