虎嗅

RAG Technology Challenges the Content Ecosystem: How Can Copyright Owners Protect Their Rights Against AI-Based Search Engines?

原文:RAG技术冲击内容生态,版权人如何向AI搜索维权?

Summary of Key Points

RAG (Retrieval, Answer Generation) technology in AI search allows users to obtain answers without having to click on the original websites, completely disrupting the traditional media business models that rely on user clicks to generate advertising revenue and subscription fees. As a result, traditional media has filed two types of lawsuits against AI companies: one using copyright laws to target companies like Perplexity, which directly copy content and disregard rules; the other using antitrust laws to target companies like Google, which, although more "civilized," use their monopolistic positions to force media to accept unfair terms. The article also compares the judicial attitudes of China and the United States: the U.S. has actively pursued relevant lawsuits, while China is cautious about judging cases involving AI training and copyright infringement in order to protect its emerging AI industry.

Detailed Analysis

#### 1. RAG Technology: Why Has It Become a Threat to Media Revenue?

Traditional media relies on two main sources of income: advertising (paid by advertisers) and subscription fees from users who access content. However, with RAG-based AI search, when users ask questions, the AI directly retrieves information from the internet and provides the answers without requiring them to visit the original websites. This means that media no longer receives ad revenue or subscriptions.

Compared to traditional large-scale models, RAG technology is more aggressive because these models generate content based on their extensive knowledge but rarely copy specific articles. In contrast, RAG simply integrates existing online content into its answers. Even if AI companies control the amount of content copied, users still do not need to visit the original websites, leaving media without traffic and revenue.

#### 2. CNN Sues Perplexity

Perplexity is an AI service provider that has pushed the boundaries of acceptable behavior in RAG technology:

  • Disregarding Rules for Content Retrieval: It uses web crawlers to collect news, images, and videos from CNN, including content explicitly prohibited by the website’s robots.txt file. It also disguises itself as a regular user (e.g., pretending to be a Chrome browser) to bypass security measures.
  • Direct Copying for Profit: When users ask questions, Perplexity copies large chunks of CNN articles directly into its answers, even providing access to content behind paywalls. For example, when a user asked about which position Rubio had resigned from, Perplexity’s paid version simply copied the original CNN article, with clear evidence of infringement.

Therefore, CNN sued Perplexity under copyright law, claiming that it had violated the rights to the content.

#### 3. Google Also Sued: Why Are “Civilized” Companies Targeted by Antitrust Laws?

Google used to be a source of traffic for media, as its search results often led users to their websites. However, with AI-powered search, users can get answers directly without clicking on links, leading to a significant decline in media traffic.

Although Google appears "civilized" by respecting the robots.txt protocol and offering an option to disable AI-generated summaries, this option is actually a trap. If users choose not to see AI summaries, they no longer receive any search results from Google, which is devastating for many media outlets that rely on Google for over 80% of their traffic. The Penske Media Group (owner of Billboard and other brands) sued Google for antitrust practices, accusing it of using its monopoly position in the search market to force media to choose between accepting AI-generated summaries or losing access to Google’s search results.

#### 4. The Balance Between Technological Progress and Original Content Rights

While AI search makes it easier for users to find information (without having to navigate multiple pages), it threatens the revenue of original content creators. If media outlets lose income, who will continue to produce high-quality content? In the long run, there may be less content available for AI to search.

These lawsuits are not about opposing AI technology itself but about addressing the issue of how to distribute the benefits of technological progress fairly. Should AI companies pay for using media content, or how can they use it in a way that benefits both users and creators? This is a critical challenge that must be addressed as technology advances.

#### 5. Judicial Attitudes in China and the U.S.

The U.S. has taken proactive steps in prosecuting AI-related cases, covering issues from data infringement to copyright and antitrust issues related to RAG technology. In contrast, China is more cautious about judging cases involving AI training and copyright infringement to protect its emerging AI industry. As a result, Chinese intellectual property lawyers are forced to study U.S. cases, which is a regrettable situation.

In essence, this conflict between AI and media represents a clash between old and new business models. The key question is how to ensure that the benefits of technological progress benefit both users and creators without compromising the rights of original content owners.