Arc Forumnew | comments | leaders | submitlogin
3 points by olavk 6140 days ago | link | parent

I believe that e.g. accented characters like é are implemented as a single glyph in fonts, but are composed of two unicode code points: the base character (e) and a modifier character (´).

This is complicated by the issue that unicode also supports the combined character as a seperate single code point, for backwards compatibility with legacy character sets. However the decomposed (normalized) form is the recommended.



1 point by almkglor 6140 days ago | link

True. A bit of research also suggests that it would be better for both forms to be considered "equal" when comparing individual characters.

-----