Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Please link to the original post:

https://www.ai21.com/blog/announcing-jamba

Jamba looks fabulous. Good performance for its size and much more efficient than the available open alternatives.

The key idea: One of out of every eight transformer blocks in Jamba applies dot-product attention with quadratic cost, but the other seven out of eight apply a Mamba layer with linear cost. And the entire model is a mixture of experts(MoE) so only ~12B parameters are used at once for inference.

Thank you to the folks at AI21 for making Jamba available!



i havent seen anyone mention this yet so i'll be the first - what is the comparison vs StripedHyena? https://www.together.ai/blog/stripedhyena-7b


Mamba came out of the same research group, Hazy Research, led by Chris Ré. This new "Jamba" model incorporating Mamba and dot-product attention layers has ~8x more parameters than the largest open Striped Hyena, and appears to work much better.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: