Per the paper, phi3-mini (which is english-only) quantised to 4bit uses 1.8gb RAM and outputs 1212 tokens/sec (correction: 12 tokens/sec) on iOS.
A model on par with GPT-3.5 running on phones!
(weights haven't been released, though)
Phi-1, Phi-1.5, and Phi-2 have all had their weights released, and those weights are available under the MIT License.
Hopefully Microsoft will continue that trend with Phi-3.
> outputs 1212 tokens/sec on iOS
I think you meant "12 tokens/sec", which is still nice, just a little less exciting than a kilotoken/sec.
Thanks! The HTML version on archive.is has messed up markup and shows 1212 instead: https://archive.is/Ndox6
Per the paper, phi3-mini (which is english-only) quantised to 4bit uses 1.8gb RAM and outputs 1212 tokens/sec (correction: 12 tokens/sec) on iOS.
A model on par with GPT-3.5 running on phones!
(weights haven't been released, though)