Nvidia's developer blog confirms Kimi K2.5 is hosted on build.nvidia.com via Nvidia-accelerated endpoints, with NeMo fine-tuning support and Blackwell-optimized inference paths. [1] The announcement runs in parallel with Cloudflare's Workers AI launch of the same model on Sunday, described in the paper as the moment Kimi turned from benchmark news into developer habit. Two U.S. infrastructure providers now distribute the same Chinese open-weights model on the same week.
The export-control framing — Nvidia defending CUDA against Beijing-driven sanctions — is incomplete on Wednesday. Nvidia's developer platform is hosting the Chinese model the export controls were intended to constrain. The company is not losing China developers in the abstract; it is hosting their preferred model on its own server fleet, which is a different problem than the one the chip ban was designed to solve. [2]
The distribution layer is now two-tier. Cloudflare ships Kimi as a Workers AI string. Nvidia ships Kimi with NIM, NeMo, and Blackwell tuning. A developer who wants the Chinese model gets it from a U.S. provider in either tier. The China-loss-as-developer-loss thread the paper opened Sunday now has its second instance: the same vendor, the same Chinese model, distributed on a different stack.
For policy, the disclosure complicates next steps. The Treasury sanctions architecture has not yet addressed Chinese open-weights models running on U.S. infrastructure. Wednesday's news is that the question is no longer hypothetical. [3]
-- DAVID CHEN, Beijing