Adventures in Text Rendering: Kerning and Glyph Atlases

deathanatos · on July 27, 2022

> For monospace fonts, used by Warp for terminal input and output, glyph advance is a constant - a fixed amount of spacing is used between all pairs of glyphs.

It's more complicated than that, too. (But just about every terminal I've ever used at one point or another has made that buggy assumption!) Emoji and wide CJK characters typically take two terminal cells.

For example (and HN will corrupt the output of this command, as it will remove the emoji):

  » python3 -c 'print("1234567890\n\N{PILE OF POO}x\u4e09x")'
  1234567890
  x三x

The second "x" should align under the 6. 4 glyphs, 6 cells.

(I am a bit curious about the subpixel stuff. I've done font atlases, but IME Harfbuzz — the shaping engine I've used — only seems to emit integral advances? (Although I'm now wanting to re-test that assumption.) And I'm not entirely sure I know how to ask FreeType to render a glyph with an offset, either.)

vorporeal · on July 27, 2022

Good call-out - definitely a bit of a simplification there. We account for this in our terminal grid by allocating an empty cell following any double-width glyphs (e.g.: emoji and wide chars, as you mentioned).

Tried out your example in Warp, and the alignment is as you described: https://i.ibb.co/Jcwcnwn/image.png

Core Text supports rasterizing at sub-pixel offsets (though some configuration of the graphics context is necessary to do so properly) by applying a transformation to the graphics context before the rasterization call. I'll definitely have to figure out the FreeType angle when we start working on Linux support; if I uncover anything (and remember this comment thread), I'll report back with my findings. :)

kevin_thibedeau · on July 27, 2022

Unfortunately many of the grandfathered symbols in the emoji block have ambiguous width and there is little recourse for applications to determine the terminal display width of any particular codepoint. Gnome VTE gives you a setting to force them one way or another but that can interact badly with fonts that expect a different width.

deathanatos · on July 27, 2022

I think this is for the codepoints that have both an emoji form and a non-emoji form? (And yeah, that's admittedly confusing for the app.)

I think Unicode does specify a way to force those codepoints to either the emoji form or the non-emoji form, but there is still an encoding that is ambiguous by spec, I think.

E.g., the "TM" symbol falls into this; on macOS if you open the character selector and search "trade", you will get both variants.

Edit: this: https://unicode.org/reports/tr51/#Presentation_Style

skavi · on July 27, 2022

Opening this website significantly degrades scrolling performance in Edge (Chromium). This impacts all my open tabs. Scrolling becomes fast again when the Warp tab is closed.

Doesn't exactly instill confidence in the advertised product.

unsafecast · on July 28, 2022

Can't reproduce on Chrome mobile.

boywitharupee · on July 27, 2022

I'm fascinated by text rendering and just scratching the surface to learn more.

A few questions I have:

1. Is the pixel grid global to the the screen? or is it specific to application? Can different applications on OS have different pixel grids? Not sure where pixel grids come from

2. Sounds painful to write your own font rendering system. Why doesn't Warp use OS level libs CoreText, etc?

3. I know there are many font rendering techniques such as texture, distance and geometry based. The most recent one is pathfinder technique, which is geometry based? Which one does Warp uses?

4. In which step the font rendering pipeline spends most time? Is it the shaping? rasterization? font file parsing?

vorporeal · on July 29, 2022

1. I guess the answer is that it is both global and local? An application window draws its contents to a buffer (basically, an RGBA image), which the operating system can copy to the buffer that represents the full contents of the screen. The buffer for a given window is effectively a grid of pixels; when we lay out the contents of our application, we compute an (x,y) position and (w,h) size for each primitive (whether a rectangle or a glyph). These four values (x,y,w,h) don't have to be integral, but if they're not, some extra work has to be done in order to correctly set the colors of the buffer's pixels that "sit beneath" the objects that we're drawing.

2. We do use Core Text, in fact, via a cross-platform Rust library called font-kit. It provides shaping and glyph rasterization APIs for us, but for performance reasons, we cache the results of shaping at a line level but the results of rasterization at a glyph level (as individual glyphs frequently appear in multiple places in a single frame).

3. We're currently delegating rasterization to Core Text; I'm not sure what it uses internally. My sense is that most of the development of new rasterization strategies is for applications like game engines, which need to be able to draw text efficiently at a variety of scales, a requirement that doesn't apply for a terminal. We prioritize crispness of text (and, to some extent, consistency with text rendering in other applications) over size scalability, as people will spend a lot of time reading text at a single size.

4. That's a good question, and I don't have an answer handy. We load (most) font data in the background, as we only use two fonts at any time (one for monospace text, like in the terminal grid, and one for UI strings). We currently don't do any shaping in the terminal grid (and therefore don't support ligatures there). That said, we cache shaping results at a per-line level and rasterization results at a per-glyph level, so don't need to do either step frequently. In aggregate, over a user session, the vast majority of the time will be spent "copying" an already-rasterized glyph from our atlas to the window's backing buffer.

pseudosavant · on July 27, 2022

It is always great to see an excellent post on keming like this.

actually_a_dog · on July 27, 2022

See https://www.reddit.com/r/keming/ for examples in the wild

dubiousconst281 · on July 28, 2022

I wonder how signed distance fields performs in practice. Seems like it would fix, or at least minimize the scaling/offset issues quite nicely.

https://github.com/Chlumsky/msdfgen

vorporeal · on July 29, 2022

From my research, it seems like SDF is a best suited for game engines to use for text, especially non-UI text (that may appear at arbitrary sizes), as it is a fast, space-efficient, and easily scalable approximation.

Unfortunately, SDF tends to lose precision at corners, making it less good for text-oriented applications. It does seem like multi-channel SDF improves on the corner precision issues (as highlighted by the repo you linked to), but in a terminal, we don't frequently need to render text at different scales, so not sure there is much benefit. (If a user changes font sizes, we can clear out and rebuild the atlas sufficiently quickly.)

That said, I could borrow some ideas from MSDF to improve on space efficiency by encoding the three different offset variants into separate color channels at the same atlas location, reducing size of the atlas itself as well as the hashmap used to look up atlas position from glyph ID and whatnot.

billconan · on July 27, 2022

I have a question,

How to do font fallback? For example, a user uses an English font "Arial", and suddenly she types a Chinese character. the character doesn't present in Arial, so we need to find a fallback font that does.

Let's say we find one, but the fallback font has different sizing. How to scale the character to match the other font? What about kerning and other positioning issues?

And also, is your cache ever increasing? Imagine a user zooms in, do you retire those small glyphs?

vorporeal · on July 27, 2022

Blog post author here - great questions.

My understanding is that some, but not all, shaping libraries will handle fallback fonts. For example, I believe Core Text (on MacOS) will specify which sections of a laid-out line should be rendered using which fonts, factoring in fallback font selections. Kerning should work within a run of a single font, but I'm not sure there's any way to "properly" kern a glyph pair from two separate fonts - there wouldn't be any information available about proper alignment (as kerning data is part of a font file). In terms of sizing, one would hope that font creators all respect the em-square when constructing glyph vectors, leading to two different fonts using the same point size having comparably-sized glyphs. If one doesn't want to rely on that, font metrics like ascent and descent could be utilized in an attempt to normalize sizing across different fonts.

In terms of cache size: at the moment, we rarely empty out the cache (but we should do it more often). I have some ideas around triggers for emptying the cache and letting it get rebuilt (e.g.: changes to font size, changes to font family), but haven't wired it up yet. In addition, we could consider clearing the cache periodically when the application is sitting in the background (allowing us to re-rasterize the needed glyphs without blocking painting a frame). So tl;dr: we don't currently but we should and will do so in the future.

billconan · on July 27, 2022

Thank you for the answers! I built a rust lib to generate multi-channel signed distance field font texture. I wanted to make it a text rendering lib. But after learning how complex it is, I guess I should give up and simply use core text...

vorporeal · on July 29, 2022

MSDF seems like a great choice, depending on your use-case. Given that we don't need to draw text at multiple scales, it doesn't seem to provide much benefit for us over caching rasterized glyphs provided by Core Text. (Copying a rasterized glyph is certainly faster than evaluating an SDF.)

Using Core Text also means our text looks the same as it does in other applications, which avoids bug reports from users like "text in Warp looks different than in iTerm". :)

gumby · on July 27, 2022

What an excellent introduction.