KV cache for dense models is order 50% of parameters. For sparse moe models it c...

		spott 4 months ago \| parent \| context \| favorite \| on: Ask HN: How can ChatGPT serve 700M users when I ca... KV cache for dense models is order 50% of parameters. For sparse moe models it can be significantly smaller I believe, but I don’t think it is measured in kb.