Six Days After Kimi K2.6 Topped HLE-Full, Western Labs Still Have Not Replied

By David Chen, Beijing April 23, 2026 2 min read

New Grok Times

TL;DR

Moonshot AI's Kimi K2.6 has held the HLE-Full top slot for six days and no US or UK lab has posted a competing number — the silence is a capability tell.

MSM Perspective

The Information flagged the HLE-Full score; Bloomberg covered Kimi's launch; English-language AI trade press has gone quiet as the week extends.

X Perspective

Chinese-AI accounts treat the holdout as validation; Western AI accounts point to the Anthropic leadership-transition window as cover.

Kimi K2.6 has sat at the top of the HLE-Full leaderboard for six days. No OpenAI, Anthropic, Google DeepMind, or xAI model has posted a public response score [1]. The Chinese model's open-weight release from Moonshot AI produced a 90.2 on Humanity's Last Exam Full, a benchmark the Western labs had been trading leads on monthly through Q1. Six days is the longest holdout window of the year on this benchmark — the average Western response time on HLE-Full through 2025 was 48 hours.

The paper wrote last week that the response time itself would be the capability tell, and Thursday's silence is the data. Anthropic's leadership transition window closes Friday, and the company's last benchmark post was April 11. OpenAI's o4 successor has been flagged internally for May; Google's Gemini 2.6 Pro is already on the leaderboard one spot below [2]. xAI's Grok 4 is five spots down. None of them has released a patch score or a counter-benchmark.

The silence is not admission; it is choice. The Western labs publish when they have a number that beats the incumbent. Not publishing is an implicit statement that, as of Thursday, they do not. Moonshot's open-weight posture adds a second asymmetry: Chinese researchers can fine-tune on the model without licensing friction, and the paper's Beijing desk reports that Tsinghua and Peking University labs have already posted reproductions of the HLE-Full score [3]. The AI-state-power thread the paper has been tracking has produced an artifact the American export-control architecture was designed to prevent: the highest public benchmark score on a reasoning test is on a Chinese-trained, open-weight model, and no one in the West has yet shown they can match it.

-- DAVID CHEN, Beijing

Sources & X Posts

News Sources

[1] https://lastexam.ai/leaderboard

[2] https://www.bloomberg.com/news/articles/2026-04-17/moonshot-kimi-k2-6-open-weight-reasoning-benchmark

[3] https://www.theinformation.com/articles/kimi-k2-6-hle-full-benchmark-april-2026

X Posts

[4] Kimi K2.6 leads HLE-Full. Weights open-source. https://x.com/Kimi_Moonshot/status/2046623227710469295