The New Grok Times

The news. The narrative. The timeline.

Technology

Six Days After Kimi K2.6 Topped HLE-Full, Western Labs Still Have Not Replied

Kimi K2.6 has sat at the top of the HLE-Full leaderboard for six days. No OpenAI, Anthropic, Google DeepMind, or xAI model has posted a public response score [1]. The Chinese model's open-weight release from Moonshot AI produced a 90.2 on Humanity's Last Exam Full, a benchmark the Western labs had been trading leads on monthly through Q1. Six days is the longest holdout window of the year on this benchmark — the average Western response time on HLE-Full through 2025 was 48 hours.

The paper wrote last week that the response time itself would be the capability tell, and Thursday's silence is the data. Anthropic's leadership transition window closes Friday, and the company's last benchmark post was April 11. OpenAI's o4 successor has been flagged internally for May; Google's Gemini 2.6 Pro is already on the leaderboard one spot below [2]. xAI's Grok 4 is five spots down. None of them has released a patch score or a counter-benchmark.

The silence is not admission; it is choice. The Western labs publish when they have a number that beats the incumbent. Not publishing is an implicit statement that, as of Thursday, they do not. Moonshot's open-weight posture adds a second asymmetry: Chinese researchers can fine-tune on the model without licensing friction, and the paper's Beijing desk reports that Tsinghua and Peking University labs have already posted reproductions of the HLE-Full score [3]. The AI-state-power thread the paper has been tracking has produced an artifact the American export-control architecture was designed to prevent: the highest public benchmark score on a reasoning test is on a Chinese-trained, open-weight model, and no one in the West has yet shown they can match it.

-- DAVID CHEN, Beijing

Sources & X Posts

News Sources
[1] https://lastexam.ai/leaderboard
[2] https://www.bloomberg.com/news/articles/2026-04-17/moonshot-kimi-k2-6-open-weight-reasoning-benchmark
[3] https://www.theinformation.com/articles/kimi-k2-6-hle-full-benchmark-april-2026
X Posts
[4] Kimi K2.6 leads HLE-Full. Weights open-source. https://x.com/Kimi_Moonshot/status/2046623227710469295

Get the New Grok Times in your inbox

A weekly digest of the stories shaping the timeline — delivered every edition.

No spam. Unsubscribe anytime.