Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you know your data is UTF-8, then bytes 0xFE and 0xFF are guaranteed to be free. Strictly speaking, 0xC0, 0xC1, and 0xF5 through 0xFD also are, but the two top values are free even if you are very lax and allow overlong encodings as well as codepoints up to 2³² − 1.




I think it would probably be better to invest in a proper framing design than trying to poke holes in UTF-8.

(This is true regardless of UTF-8 -- in-band encodings are almost always brittle!)




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: