Kimi Is Now a Cloudflare Workers AI Endpoint

By David Chen, Beijing April 29, 2026 4 min read

Cloudflare's Workers AI changelog now lists @cf/moonshotai/kimi-k2.6 as available, with REST, binding (env.AI.run()), and OpenAI-compatible endpoints at /v1/chat/completions. [1] The model — Moonshot AI's frontier 1-trillion-parameter mixture-of-experts release with a 262.1k context window, multimodal inputs, and tool calling — was added April 20 with what Cloudflare calls "Day 0 support" from Moonshot. [2]

The paper's Tuesday account of how Cloudflare turned Kimi from benchmark news into developer habit framed the launch as the moment a Chinese open-weights model became infrastructure. The Wednesday read is the version of that frame with the receipts attached: model string, endpoint format, OpenAI compatibility. The migration cost from a U.S. frontier model to Kimi K2.6 is now a one-line config change.

The OpenAI-compatible endpoint is the operative detail. A developer using OpenAI's SDK changes the base URL to Cloudflare's endpoint, the model name to @cf/moonshotai/kimi-k2.6, and the API key — and the rest of the codebase stays the same. [3] The compatibility layer was designed to ease migration toward Workers AI generally; it incidentally eases migration toward a Chinese model specifically.

K2.6's benchmark scores against U.S. frontier models matter for the procurement question. Cloudflare's changelog cites BrowseComp at 83.2, SWE-Bench Verified at 80.2, and Terminal-Bench 2.0 at 66.7 — competitive with GPT-5.4 and Claude Opus 4.6 on agentic and coding workloads. [2] For the developer choosing among hosted frontier models on a per-token cost basis, the comparison is now real, not hypothetical.

The pricing is part of the migration math. K2.6 prices at roughly $0.60 per million input tokens, $0.10 per million cached input tokens, and $3.00 per million output tokens on Workers AI. [4] Those rates undercut OpenAI's GPT-5-class output pricing materially, and the cached-input discount makes long-context agentic workloads — the workload K2.6 is designed for — cheaper to run on Cloudflare than on most competing hosts.

Nvidia, separately, is hosting K2.5 on build.nvidia.com via NVIDIA-accelerated endpoints, with NeMo fine-tuning support and Blackwell-optimized inference. [5] The same Chinese model now ships through two American distribution surfaces: Cloudflare's Workers AI for application deployment and Nvidia's developer platform for evaluation and customization. The export-control story has two distribution layers, not one. The model itself is open-weights; its hosting is American; its training company is in Beijing.

The U.S. policy frame has not caught up to the distribution. Export controls focus on advanced semiconductors leaving the United States and on advanced model weights being trained on U.S.-controlled infrastructure. [3] They do not currently constrain a Chinese-trained open-weights model running on U.S. cloud infrastructure for U.S. application developers. Cloudflare and Moonshot's "Day 0" partnership formalizes that gap as a product.

The Apr 28 thread asked whether usage data or hyperscaler counter-distribution would arrive. The first move is small but specific: Cloudflare's Agents SDK starter now uses K2.5 (and presumably K2.6) as its default model. [6] That choice — making a Chinese model the default for new developer projects on Cloudflare's agent framework — sets the developer-habit migration path the predecessor story anticipated.

The migration math has its own gravity. Once a project is built on a model with a particular tool-calling format, prompt-cache behavior, and reasoning-output schema, switching costs rise. Kimi K2.6's API differences from K2.5 — chat_template_kwargs.thinking replacing enable_thinking; reasoning replacing reasoning_content — are minor, but they are the kind of cumulative differences that lock in a model choice over a year. [2] What ships in 2026 on Workers AI as Kimi will be running in production in 2027 unless explicitly migrated.

What this changes for the U.S. AI competitive frame: the assumption that frontier models on U.S. cloud infrastructure are American models is no longer descriptive. Cloudflare runs Kimi. Nvidia hosts Kimi. AWS, Azure, and Google Cloud have not yet announced equivalent partnerships, but the floor has shifted. The next question — whether OpenAI, Anthropic, or Google adjust pricing or distribution to retain developer share against a Chinese frontier alternative — is the question Wednesday's edition leaves with the major U.S. labs. [7]

-- DAVID CHEN, Beijing

Sources & X Posts

News Sources

[1] https://developers.cloudflare.com/workers-ai/models/kimi-k2.6/

[2] https://developers.cloudflare.com/changelog/post/2026-04-20-kimi-k2-6-workers-ai/

[3] https://developers.cloudflare.com/workers-ai/

[4] https://developers.cloudflare.com/ai/models/@cf/moonshotai/kimi-k2.5/

[5] https://developer.nvidia.com/blog/build-with-kimi-k2-5-multimodal-vlm-using-nvidia-gpu-accelerated-endpoints/

[6] https://developers.cloudflare.com/changelog/post/2026-03-19-kimi-k2-5-workers-ai/

[7] https://blockchain.news/news/nvidia-gpu-endpoints-kimi-k2-5-multimodal-model

X Posts

[8] I have a little extension which adds support for cloudflare provider in pi, to be able to use any model from workers-ai including Kimi-K2.6 https://x.com/iaktech/status/2047072814939697579