虎嗅

"DouBao moves to the left, WeChat moves to the right"

原文:豆包向左,微信向右

Summary of Key Points

WeChat is testing an embedded AI Agent that can connect with its mini-programs to assist users with tasks such as ordering coffee and finding restaurants. Unlike DouBao, which offers a separate AI interface, WeChat is taking the approach of integrating AI into its existing ecosystem. Technically, it uses a method called GUI Agent (Graphical User Interface Agent), which allows the AI to interact with interfaces and buttons just like a human user, bypassing the challenges associated with adapting mini-program functionality. WeChat has unique advantages, including data on the intentions of its 1.4 billion users, the capability to execute millions of mini-programs, and a seamless payment system via WeChat Pay. However, it also faces challenges such as high computational costs, difficulties in defining permission boundaries, and limited ability to recognize user feedback. In the future, WeChat's business model may shift from selling advertising space to generating revenue by helping users complete tasks, potentially leading to two different paths for AI development: one focused on standalone AI tools and another on integrating AI into existing platforms.

Detailed Explanation

1. WeChat AI Agent vs DouBao: Two Opposite Approaches to AI

  • DouBao: A separate AI interface – you need to actively open the DouBao app to use its AI services (e.g., for homework help or photo editing).
  • WeChat AI Agent: Integrated into the ecosystem – when you chat on WeChat, you can simply ask for something like “let’s meet this weekend,” and the AI will automatically find a restaurant and book a table without the need to open another app.

The difference lies in the user experience: DouBao aims to train users to use its AI, while WeChat focuses on training the AI to work seamlessly within the WeChat platform. With WeChat AI Agent, tasks are completed effortlessly within the context of your regular WeChat usage.

2. GUI Agent: Making AI Interact with Mini-Programs Like a Human

Traditional AI requires developers to create custom APIs for mini-programs, but with millions of mini-programs, this is impractical. WeChat’s GUI Agent solves this by allowing the AI to directly interact with the mini-program interfaces, identifying buttons and performing actions naturally.

The process involves three steps:

  • Understanding the Interface: The AI analyzes a screenshot of the mini-program to accurately locate the “place order” button (WeChat’s team is among the best in this area).
  • Predicting the Outcome: Before clicking a button, the AI predicts the subsequent page layout.
  • Recognizing Feedback: After clicking, the AI determines whether the action was successful (e.g., by observing changes on the screen, such as a button turning gray or displaying “Payment Successful”). However, this step is still in need of improvement, as subtle interface changes can be difficult for the AI to detect.

3. WeChat’s Unique Advantages

WeChat has three key advantages that make it well-suited for developing AI Agents:

  • User Intent Data: Real-time user interactions (e.g., discussions in groups or likes on posts) provide valuable information for the AI.
  • Execution Capability: A wide range of mini-programs cover various daily tasks, enabling the AI to perform them directly.
  • Seamless Payment: The entire process (finding a restaurant, placing an order, and making a payment) happens within WeChat, allowing for accurate tracking of transactions.

4. Challenges in Turning a Prototype into a Product

Despite the promising demo, several issues need to be addressed before the AI Agent becomes widely available:

  • High Costs: Each task performed by the AI consumes significant computational resources. In Q1, Tencent spent 37 billion yuan on AI infrastructure, and long-term operational costs are a major concern.
  • Permission Issues: It’s unclear whether the AI can perform actions like making payments or sending confirmation messages on behalf of users.
  • Technical Limitations: The AI’s ability to determine success is not perfect, which may lead to task failures and poor user experience.

5. Changing the Business Model

The traditional internet model relies on the “attention economy” (advertising), where advertisers pay for user views. However, AI Agents can streamline this process, potentially increasing revenue. For example, if you ask for a birthday cake, the AI could directly help you with the selection and ordering, reducing the need for additional steps. WeChat’s advertising revenue increased by 20% in Q1 due to more accurate recommendations. In the future, the platform could earn commissions or service fees from these transactions.

However, this path is not without risks. Gartner predicts that 40% of AI Agent projects will be discontinued by 2027. Nevertheless, WeChat’s existing user base, mini-program ecosystem, and payment infrastructure give it a head start.

Conclusion

WeChat’s AI Agent and DouBao represent two different directions for AI development: one focuses on standalone AI tools, and the other on integrating AI into existing platforms. WeChat’s approach is more user-friendly but requires overcoming technical and operational challenges. In the future, we might see AI helping us with tasks simply by using WeChat, making the experience more natural and promising.