Crime stats, average IQ across groups, stereotype accuracy, etc.
What's interesting to me is not the above, which is naughty in the anglosphere, but the question of the unknown unknowns that could be as bad or worse in other cultural contexts. There are probably enough people of Indian descent involved in GPT's development that they could guide it past some of the caste landmines, but what about a country like Turkey? We know they have massive internal divisions, but do we know what would exacerbate them and how to avoid them? What about Iran, or South Africa, or Brazil?
We RLHF the piss out of LLMs to ensure they don't say things that make white college graduates in San Francisco ornery, but I'd suggest the much greater risk lies in accidentally spawning scissor statements in cultures you don't know how to begin to parse to figure out what to avoid.
> Crime stats, average IQ across groups, stereotype accuracy, etc.
If you measured these stats for Irish Americans in 1865 you'd also see high crime and low IQ. If you measure these stats with recent black immigrants from Africa, you see low crime and high IQ.
These statistical differences are not caused by race. An all-knowing oracle wouldn't need to hold "opinions that are racist" to understand them.
But for accuracy it doesn't matter if the relationship is causal, it matters whether the correlation is real.
If in some country - for the sake of discussion, outside of Americas - a distinct ethnic group is heavily discriminated against, gets limited access to education and good jobs, and because of that has a high rate of crime, any accurate model should "know" that it's unlikely that someone from that group is a doctor and likely that someone from that group is a felon. If the model would treat that group the same as others, and state that they're as likely to be a doctor/felon as anyone else, then that model is simply wrong, detached from reality.
And if names are somewhat indicative of these groups, then an all-seeing oracle should acknowledge that someone named XYZ is much more likely to be a felon (and much less likely to be a doctor) than average, because that is a true correlation and the name provides some information, but that - assuming that someone is more likely to be a felon because their name sounds like one from an underprivileged group - is generally considered to be a racist, taboo opinion.
> should acknowledge that someone named XYZ is much more likely to be a felon
The obvious problem comes with the questions why is that true and what do we do with that information. Information is, sadly, not value-neutral. We see "XYZ is a felon" and it implies specific causes (deviance in the individual and/or community) and solutions (policing, incarceration, continued surveillance), which are in fact embedded in the very definition of "felon". (Felony, and crime in general, are social and governmental constructs.)
Here's the same statement, phrased in a way that is not racist and taboo:
Someone named XYZ is much more likely to be watched closely by the police, much more likely to be charged with a crime, and much less likely to be able to defend himself against that charge. He is far more likely to be affected by the economic instability that comes with both imprisonment and a criminal record, and is therefore likely to resort to means of income that are deemed illegal, making him a risk for re-imprisonment.
That's a little long-winded, so we can reduce it to the following:
Someone named XYZ is much more likely to be a victim of overpolicing and the prison-industrial complex.
Of course, none of this is value-neutral either; it in many ways implies values opposite to the ones implied by the original statement.
All of this is to say: You can't strip context, and it's a problem to pretend that we can.
Correlations don’t entail a specific causal relation. Asking why asks for causal relations. I’d suggest a look at Reichenbach’s principle as necessary for science.
I’m getting really sick of conflating statistics with reasons. It’s like people don’t see the error in their methods and then claim the other side is censoring when criticized. Ya, they’re censoring non-facts from science and being called censors.
What's interesting to me is not the above, which is naughty in the anglosphere, but the question of the unknown unknowns that could be as bad or worse in other cultural contexts. There are probably enough people of Indian descent involved in GPT's development that they could guide it past some of the caste landmines, but what about a country like Turkey? We know they have massive internal divisions, but do we know what would exacerbate them and how to avoid them? What about Iran, or South Africa, or Brazil?
We RLHF the piss out of LLMs to ensure they don't say things that make white college graduates in San Francisco ornery, but I'd suggest the much greater risk lies in accidentally spawning scissor statements in cultures you don't know how to begin to parse to figure out what to avoid.