Eating some mooncake
This episode discusses Kimi's serving architecture, the use of mooncake to offload GPU memory, the prevalence of vLLM, and the evolving standard LLM stack.
This episode discusses Kimi's serving architecture, the use of mooncake to offload GPU memory, the prevalence of vLLM, and the evolving standard LLM stack.