Huggingface[0] says that this is a 358B model. What kind of hardware is necessar...

rz2k · 2025-12-22T18:39:52 1766428792

It is a mixture of experts model so it will run on a computer with a lot of RAM and a GPU.

Alternately, on an M3 Ultra Mac Studio with 256GB of unified memory, you can run a 4bit quant of GLM-4.6 at about 20 tokens/second. That compares to about 40 t/s for a 6bit quant of MiniMax M2. I am not sure how fast these will run if you have a Mac Studio 512GB that can load the unquantized versions of the models.