虎嗅

Robots haven't yet made significant profits on a large scale, but those selling data have already become unicorns.

原文：机器人还没大规模赚钱，“卖数据的”先成独角兽了

2026-06-02 阅读原文

Summary of Key Points

Recently, in the field of embodied intelligence (which simply refers to robot intelligence that can perceive, make decisions, and execute actions like humans), there has been a phenomenon where "the gold miners didn't make any money, while those who sell the shovels became wealthy first": the robot bodies themselves have not yet generated substantial profits, but the business of collecting data for training robots has flourished. Multiple data collection companies have received significant funding and orders, with giants like Bosch and JD.com also entering the market. This segment has evolved from a subsidiary business of robot companies into an independent industry that is now valued by capital and bet on by large corporations. The reasons behind this include a large gap in data availability, a strong demand, and the driving force of capital. There are currently three types of players in this field: large companies entering the market are reshaping the industry's logic, and in the future, the quality and interoperability of data will determine the outcome.

1. Why Does Data Collection Generate More Profit Than Robot Bodies?

To understand this, it's important to grasp the "data bottleneck" of embodied intelligence:

The massive data gap: Large language models (such as GPT) have access to decades of internet text for training, but embodied models need data from real-world interactions with robots—e.g., 3D trajectories for grasping objects, avoiding obstacles, and operating machinery. Currently, the world only has about 500,000 hours of high-quality such data, which is less than one-twentieth of what large language models use. Moreover, the data formats from different robots and sensors are not standardized, making sharing even more difficult.
Real demand: Both model developers and robot manufacturers are eager to acquire data because having it allows them to train models faster, deliver products earlier, and gain a competitive edge. For example, companies like Bosch and CATL are investing heavily in data collaboration, opening their factory lines for data collection, recognizing that data quality directly affects the capabilities of robots.
Capital moving upstream: Since 2026, the investment threshold for robot bodies has increased (with leading companies valued in tens of billions), and smaller institutions can't afford to invest. As a result, they are turning to data collection as a more stable bet on the industry's enduring demand for data.

These three factors have combined to make data collection the first to profit commercially.

2. Players in the Field Are Divided into Three Types

The players in the data collection market have differentiated, each with their own business models:

Specialized Data Companies: These companies do not manufacture robots but focus on building data infrastructure. For instance, Guanglun Intelligence, established just three years ago, has become the world's first embodied intelligence data unicorn (valued over $1 billion) and received 550 million in orders in its first quarter, serving clients like NVIDIA, ByteDance, and Zhiyuan Robotics. It operates the largest physical training facility in China, covering six scenarios, generating thousands of hours of data daily and securing industry orders during financing.
Robots with Separated Data Services: Robot manufacturers have spun off their data businesses. For example, Zhiyuan Robotics split its data collection and trading operations into Meifeng Technology, which raised hundreds of millions in just ten days—capital markets recognize the independent value of data assets. This model allows robot manufacturers to focus on product development while generating revenue from data services.
Cross-Industry Giants: Companies with existing scenarios leverage their advantages to enter the market. JD.com, for example, mobilized 600,000 people to collect 10 million hours of real-world data within two years using its own logistics and warehousing facilities. Baidu Smart Cloud has created a "data supermarket" for embodied intelligence, selling data like goods, while China Mobile has set up training facilities for home scenarios. These giants are not just trying to compete with small companies but aim to build a "data platform" where robot companies can access data on demand, similar to how they use cloud computing services.

3. Giants Like JD.com Enter the Market: Not to Compete, But to Reconstruct Industry Rules

The entry of large companies brings two significant changes:

Scale Effects: JD.com's existing logistics and warehousing infrastructure enable it to collect massive amounts of data that startups can't match in years. This scale reduces data costs, making high-quality data accessible to downstream robot companies.
Platformization: Platforms like Baidu's "Data Supermarket" and JD.com's "full-chain infrastructure" transform scattered data into standardized products. Small companies will no longer need to build their own data collection teams; they can purchase data directly from these platforms, just as they use cloud services for online stores. This shift moves the industry from decentralized data collection to a platform-based model, forcing startups to reposition themselves: either as platform providers, data tool vendors, or deeply integrated with specific scenarios.

4. The Future of the Competition

The winner of this data race will not be the one with the most data but the one that establishes the following standards:

Data Quality: Companies like CATL and Bosch, with their industrial facilities, are screening data partners carefully, only allowing those that can provide "industrial-grade accuracy" and proven data quality to join their ecosystems. Those who define what constitutes "good data" (e.g., robot errors of less than 0.1 millimeters in industrial scenarios) will control the standards.
Data Interoperability: If data from various sources (real-world collection, simulation, and scenario-specific data) can be integrated into a unified standard, it will become a core asset of the physical AI era, similar to how oil defines industry power.

In the end, it may not be the robot bodies that shape the future of embodied intelligence but the invisible "data fuels" that drive them.