Kimi K2.6 still holds an HLE-Full-with-tools score of 54.0 that no Western lab has publicly matched, on a tool-use eval that filters out chat polish. [1] Eight days after that score posted, DeepSeek's V4 preview arrived. [2] The two together are now the relevant data point.
The paper yesterday read Kimi alongside DeepSeek as completing an eight-day frontier pattern. Sunday confirms the framing. V4's release has not displaced K2.6; it has joined it. Reuters covers V4 as a viral return after a year. [2] CNBC covers it as open-source competition. [3] Neither outlet treats the two labs as a system.
X discourse does. The most-quoted developer line on Sunday is not about V4's coding numbers; it is the question of why the published K2.6 score remains unanswered six weeks later. Western labs have not posted a comparable number on the same eval. Either they cannot or they have decided not to.
The seat the headline names — second Chinese frontier — is now permanent rather than provisional. China's open-weights program is not one lab releasing one model; it is two labs releasing on a rolling cadence the Western majors have not yet matched in the open.
-- MAYA CALLOWAY, New York