Seems like the model isn't limited to those though, from the paper:
> as well as some additional relevant languages (Arabic, Catalan, Chinese, Galician, Hindi, Japanese, Korean, Norwegian, Russian, Turkish, and Ukrainian).
The paper also goes into detail on training set sources, which I feel like a curation thereof might be considered the main contribution of this publication?
> as well as some additional relevant languages (Arabic, Catalan, Chinese, Galician, Hindi, Japanese, Korean, Norwegian, Russian, Turkish, and Ukrainian).
https://arxiv.org/pdf/2409.16235
The paper also goes into detail on training set sources, which I feel like a curation thereof might be considered the main contribution of this publication?