虎嗅

WeChat's AI move is quite interesting indeed.

原文：微信AI这招挺有意思的

2026-06-09 阅读原文

Summary of Key Points

WeChat has opened up the "Automatic Mode" for mini-programs to integrate with AI. Once developers grant access to their source code, WeChat's AI can automatically convert these mini-programs into "skills" that can be understood and manipulated by AI. This is made possible through three core technologies: precise interface localization, prediction of operation outcomes, and verification of whether actions are correct. Although developers seem to have the option to choose whether to integrate, not doing so could result in missing out on the additional traffic generated by AI in the future. Additionally, WeChat uses industry-standard terms like "Skill" and "MCP" to package its closed ecosystem interfaces, subtly increasing developers' dependence on its platform.

1. What exactly is the "Automatic Mode"? — Developers become passive, and mini-programs turn into tools for AI

In simple terms, the Automatic Mode means that you (the developer) provide WeChat with the source code of your mini-program, and WeChat's AI automatically transforms it into a "skill package" that can be understood and utilized by AI. You don't have to do anything, but in return, your mini-program changes from a product that users actively open and use to a feature that WeChat's AI calls on their behalf.

For example, previously, if a user wanted to order a coffee, they would have to open the coffee-related mini-program, find the menu, select the options, and place the order. In the future, they might simply say to WeChat's AI, "Order me a latte," and the AI would automatically use the relevant mini-program to complete the task—provided you have granted access to the Automatic Mode, allowing the AI to understand and control your mini-program.

2. How does WeChat's AI manage to control any mini-program? — Three technologies at work behind the scenes

WeChat's AI can handle millions of mini-programs with diverse interfaces thanks to a combination of three technologies:

1. AI's "sharp eyes": POINTS-GUI-G

This technology acts like AI eyes; given a screenshot of a mini-program and instructions (such as "find the order button"), it can pinpoint the button's location with pixel-level accuracy. It has won first place in global GUI localization testing, solving the problem of AI not being able to locate buttons.

2. AI's "predictive brain": UI-Oceanus

While humans know what will happen when they click a button, AI doesn't have that intuition. This technology simulates 5 million mini-program operation examples, enabling AI to predict the outcome of clicking a button (for example, whether a payment page will appear after ordering). Even for completely unfamiliar mini-programs, AI can complete tasks without prior learning, increasing navigation success rates by 21.9%.

3. AI's "checker": DiffSpot

After an action is performed, AI needs to verify whether it was correct (for example, whether the number in the shopping cart has changed). However, this technology is still not very effective; mainstream AI models struggle to detect subtle changes in interfaces.

3. Do developers really have a choice? — The hidden costs behind voluntary integration

WeChat claims that "the decision to integrate is up to the developer and does not affect existing services," but this only protects your current users (those who are already using your mini-program). It doesn't mention potential new users:

Once WeChat's AI is officially launched and 1.4 billion users are accustomed to using it to access services, mini-programs that don't integrate with the AI will be virtually invisible to them. If a competitor does integrate, users can simply ask for a flight reservation directly through the AI, while your mini-program would remain unavailable, potentially leading to a loss of traffic.

It's like everyone else is on the highway (using AI), but you're still on a rural path (manually operating). The road is still usable, but no one wants to take a longer detour.

4. The "misuse" of industry terms like Skill and MCP

In the tech industry, "Skill" and "MCP" are meant to represent open standards:

MCP: An open-source protocol developed by Anthropic that allows any AI to connect with any tool (e.g., Baidu AI can use it to access Taobao).
Skill: A set of commands written by developers that can be used across platforms (e.g., on both Claude and Cursor).

However, WeChat has redefined these terms:

WeChat's MCP only allows WeChat's AI to connect with tools within its own platform.
WeChat's Skills are generated using your source code and can only be used within the WeChat ecosystem.

More subtly, many Chinese developers first learn about these terms through WeChat documentation, leading them to mistakenly believe that "Skill" refers to the interface used by WeChat's AI. By using familiar terminology, WeChat gradually turns what were open standards into closed interfaces, trapping developers in an environment with only one exit point (WeChat).

5. Who will be affected in the long run? — Developers become more dependent, and users become lazier

For developers:

You save the cost of adapting your mini-programs for AI integration, but you also deepen your dependence on WeChat's ecosystem—whether it's traffic, technology, or the ability to use AI functions.

For users:

Using mini-programs may become more convenient in the future (everything can be done with one command), but the range of choices might narrow. Only those mini-programs that integrate with WeChat's AI will be recommended, and the services you can use will be limited to what WeChat's AI allows.

In summary, while this move appears to help developers by saving them effort, it actually strengthens WeChat's ecosystem in the age of AI. Developers need to consider carefully whether they want short-term convenience or long-term control over their own tools and services.

(Note: The technical papers mentioned in the text are fictional and used for illustrative purposes only.)