I think people forget the universal rule that these models are a reflection of the corporations that train them. Most corporations with enough money to train a model from scratch, also prioritize not pissing off their respective governments in an emerging market where the doomsday scenarios are already flying.
It's just like with primary news sources: the "unbiased" journalistic source is a myth. What you want is actually to consult sources with a range of distinct biases that you understand well, and to consider their biased in evaluating their claims.
The same is true for language models. We're lucky that we have access to a range of roughly comparable American, European, and Chinese language models. When it's relevant to your use case, take advantage of the freedom to choose and/or compare.
100% agree with you. More people should know that not only are do these have this censorship, but that others release abliterated versions which remove most of these guardrails.
First it's an easy way to test censorship. Second, you might flip the question: why is the Chinese govt so obsessed that they still block all mention of the event?
It really is one of the greatest photographs of all time.
If it wasn't for tankman, this would have all been forgot about in the west by September 1989.
We also don't know enough about China in the west to not know it is like bringing up the Kent State shootings at every mention of the US national guard.
As if there was an article about the US national guard helping flood victims in 2025 and someone has to mention
"That is great but what about the Kent State shootings in 1970?!?"
The question you should ask yourself is why are these Chinese labs so "obsessed with a decades old event" that they need to specifically train them to ignore the training corpus?
"Tell me about the 1989 Tiananmen Square massacre".