> Bidi override control characters are clearly not among them, whichever languag...

jrochkind1 · on Nov 1, 2021

You do not actually need the bidi override control character to put a comment in an RTL language in the middle of LTR code.

You only need it if you are doing this, and the default Unicode algorithm for guessing LTR/RTL boundaries gets it wrong, so you need to override with an explicit bidi override control. I'm not even sure how feasible that is to do in current editor/IDE environments developers who have this use case might use.

I am genuinely curious how often these sorts of situations come up in actual development.

> What compilers can do is to process those characters and assign them semantic value that makes the code equivalent to what is expected to be rendered.

I don't understand what you mean or how that's even possible, for the kinds of attacks discussed in OP.

jrochkind1 · on Nov 1, 2021

Btw here's proof. Here is ltr text and rtl עִברִית text عربي interspersed with no bidi override control characters to be found.

Unicode can handle this, it has a heuristic algorithm for it. Note how if you try to select the text character-by-character, your selection does funny things at the rtl to ltr boundaries, because the byte order doesn't match the order on the screen. It really is handling the directionality changes, with the letters entered in "order" across changes, there is no funny entry or ordering going on, this is plain old normal unicode handling interspersed directionality changes just fine, with no bidi overrides.

It just sometimes gets it wrong for the intent of the author. Especially when there are characters at the boundaries that are themselves not strongly associated as rtl or ltr (like ordinary "western arabic numerals" or punctuation). That's what the bidi override control char is for.

amenod · on Nov 1, 2021

The same way as you write a comment in a LTR human language in the middle of RTL code - you don't. You stick to either LTR or RTL. This is code, not prose.

rbanffy · on Nov 2, 2021

Code is meant to be read and, occasionally, executed. Comments are usually ignored by compilers and are targeted towards humans.

WalterBright · on Nov 1, 2021

> Not sure how would you write a comment in an RTL human language

Siht ekil.