A lot of training data was curated in Kenya[0]. I would imagine if LLM data was ...

erikig · 2025-12-15T20:09:54 1765829394

The Indian-born textbook author mentioned (Malkiat Singh [0]) had an inordinate influence on many Kenyan students because his textbooks were the de-facto standard for years. Its interesting how this influence extends as his students get to curate the LLMs on which the world has come to rely.

[0] https://en.wikipedia.org/wiki/Malkiat_Singh

jojobas · 2025-12-16T05:49:41 1765864181

So twists of training data procurement bring us the best of doing the needful through Africa.

m4rtink · 2025-12-15T14:55:00 1765810500

You are completely right dajou~ ^_^ !

delis-thumbs-7e · 2025-12-16T07:59:37 1765871977

Maybe we all should start writing Japanglish to show our authenticity? Or rather, ”Maybe we all should start writing the Japanglish, so that peoples can feel our real soul, you know?”

bakugo · 2025-12-15T20:35:10 1765830910

I guess it can't be helped.

koakuma-chan · 2025-12-16T00:41:18 1765845678

It's not because I like you or anything.

bpodgursky · 2025-12-15T21:29:10 1765834150

This is a wild misunderstanding of LLMs. Data labeling has nothing to do with generating the astronomical text corpus used to train modern LLMs.

heavyset_go · 2025-12-15T22:05:19 1765836319

The HF part of RLHF to refine the output of LLMs also happens in these places

astrange · 2025-12-16T09:42:19 1765878139

Note RLHF can only perform selection on existing model outputs, adding new data is SFT or else just more pretraining.

ChatGPT speaking African English was mostly just 3.5. 4o speaks like a TikTok user from LA. 5 seems kind of generic.

casey2 · 2025-12-15T22:46:58 1765838818

樣 is just setting us up for

ChatGPT :|

ChatGPT (japan) XD