虎嗅

DeepSeek V4 Achieves Mathematical Proof, 500 Times Cost Advantage: The Agent System Sets Multiple New Records

原文:DeepSeek V4做数学证明,500倍成本优势:智能体系统刷新多项纪录

Summary of Key Points

Recently, AI has made significant breakthroughs in the field of mathematical proofs: A team from Princeton University used the domestically developed open-source large model DeepSeek-V4-Flash to create the Goedel-Architect system, which has achieved a groundbreaking improvement in formalized theorem proving (machine-verifiable and rigorous proofs) with both lower costs and higher performance. This system is 500 times cheaper than the one driven by Google’s Gemini and yet achieves a higher accuracy rate. By employing an innovative approach of “blueprint generation followed by refinement”, Goedel-Architect has addressed the “verification crisis” and efficiency issues associated with AI-powered proofs, providing a more reliable and efficient tool for mathematical research.

1. Why Do AI Mathematical Proofs Need to Be “Formalized”? – Solving the “Verification Crisis”

Mathematics demands that every step is absolutely correct, but the speed at which AI generates proofs has surpassed human verification capabilities (Tao Zhe-Xuan has noted a shift from a scarcity of proofs to an abundance of them). For instance, if AI claims to have disproven a conjecture that has been around for 80 years, how can humans determine its validity?

In such cases, formalized proofs become a lifeline: Proofs written in languages like Lean require every logical step to be understandable by machines. Once the compiler verifies them, the proof is 100% correct, eliminating the need for human inspection. However, the cost of generating formalized proofs was previously extremely high (for example, Google’s system cost $170,000 per run), making it unaffordable for most people.

2. The Core Innovation of Goedel-Architect: Creating a “Blueprint” Before Proceeding

Traditional AI proof systems work like a blind person assembling building blocks; they break down complex problems into smaller ones, which may lead to dead ends and render all previous efforts futile. Goedel-Architect takes a different approach:

  • Generate a blueprint: It breaks down the theorem to be proven into smaller lemmas and uses directed diagrams to show the dependencies between them (which lemma depends on which results).
  • Parallel proofing: Multiple proofers work simultaneously on different lemmas without interfering with each other.
  • Blueprint refinement: If a lemma fails, the system diagnoses the issue:
  • If the lemma itself is incorrect (for example, if the direction of binary addition is reversed), it directly corrects the lemma and updates the dependencies.
  • If the lemma is too complex, it breaks it down into smaller sub-lemmas and tries again.

This approach is akin to drawing a construction plan before building a house; any errors can be corrected without having to start over from scratch, significantly improving efficiency.

3. The “Dimensional Reduction” in Performance and Cost

Goedel-Architect’s performance is impressive:

  • Cost: On the PutnamBench test set (672 competitive problems), Google’s Hilbert system cost $170,000, while Goedel-Architect only cost $294 (a 500-fold difference).
  • Accuracy rate: Goedel achieved a 75.6% success rate, compared to Hilbert’s 70%.
  • Problem coverage: It can handle almost all problems in the high school competition set MiniF2F (242/244) and even new problems from IMO (International Mathematical Olympiad) and USAMO (United States Mathematical Olympiad) competitions (it got 3 out of 6 correct).

The key is that it uses the domestically developed open-source DeepSeek model, eliminating the high costs associated with proprietary models and making it accessible to everyone.

4. The Team Behind the System: A Reputable Combination of Mathematics and AI

The team behind this system comes from Princeton University and consists of two leading experts:

  • Sanjeev Arora: An authority in the field of computational complexity who has been researching whether AI can become a superhuman mathematician.
  • Danqi Chen: With a bachelor’s degree from Tsinghua University and a Ph.D. from Stanford, he previously collaborated with Google on developing SyntaxNet (Google’s grammar analysis tool) and is now focused on language model reasoning.

They have developed two previous versions of the Goedel-Prover model, so this success is not accidental.

5. The Future Implications: An “Accelerator” for Mathematical Research

The value of Goedel-Architect lies in its ability to lower the barriers to formalized proofing:

  • Mathematicians will no longer need to spend years verifying details; AI can quickly generate machine-verifiable proofs.
  • Small teams or individuals can attempt to solve complex mathematical problems without relying on large institutional resources.
  • If AI ever claims to have proven the Riemann Conjecture, running Goedel’s proof through a Lean compiler would immediately determine its validity, eliminating the need for decades of peer review.

This could revolutionize the way mathematics is conducted: Humans will focus on formulating ideas, while AI will transform them into rigorous proofs.

In summary, Goedel-Architect represents not only a breakthrough in AI-powered mathematical proofing but also a crucial step towards integrating “trustworthy AI” into the field of mathematics. By using open-source models and innovative strategies, it has made formalized proofs, once deemed unattainable, accessible to everyone, potentially leading to more significant discoveries in the future.