This page looks at issues around the generation of Knuth element for line break possibilities. It does not deal with actually determining line break possibilities but concentrates only on the Knuth elements to be generated for a particular line break possibility. Because it is related it also deals with the Knuth elements required for text justification, that is Knuth elements generated for elastic spaces.
The following shorthands are used in the sample sequences:
- spb-start = the sum of the space-start, border-start and padding-start lengths
- spb-end = the sum of the space-end, border-end and padding-end lengths
- sp-width = the width of a nominal space character
- hyp-width = the width of a hyphenation character
Commonly occurring Knuth sequences
A simple break
A forced break
An elastic break
The width, stretch and shrink values shown do depend on the word-spacing property.
The Knuth approach of using box, glue and penalty elements can not only be used for justified text but also for text with ragged left or right margins and centered text.
Breaks in justified text
For justified alignment (text-align="justify") the normal elastic break sequence (single glue element) as above is used.
Breaks in text with ragged margins (left or right)
For left or right alignment (text-align="left" or text-align="right") a constant stretch is added at the end of the line:
Breaks in centered text
For center alignment (text-align="center") a constant stretch is added both sides of the break:
Space/Border/Padding around a break
A common occurrence at a break is the presence of space/border/padding on one or both sides of a break. The generic Knuth sequence for such a situation is very similar to the centered text above:
Space/Border/Padding combined with Alignments
These sequences combine the space/border/padding sequence with alignment sequences.
Space/Border/Padding with justified alignment
Space/Border/Padding with Left/Right alignment
Space/Border/Padding with Center alignment
Specific Knuth sequences
The following cases have been identified:
- Non breaking / non elastic
Example: U+202F NARROW NO-BREAK SPACE
This is actually the normal character case but can contain some characters Unicode classifies as space. A consecutive sequence of non breaking / non elastic characters with the same properties is mapped into a single Knuth box element with the combined width of all the characters. It is important to aggregate and not to generate individual box elements so that kerning can be taken into account.
These box elements are not related to the identification of words in the text required by the hyphenation subsystem.
However, the hyphenation algorithm would need to be given the word: Bargain.
2. Non breaking / elastic space
Example: U+00A0 Non breaking space
For this character class the Knuth elements must prevent that a break is generated but they still participate in text justification.
If a character falls into this class or not depends on the combination of the treat-as-word-space property and its Unicode value.
The Knuth sequence for text-align not equal to "justify":
and for text-align="justify":
The width, stretch and shrink values above do depend on the word-spacing property.
3. Break / non elastic
Example: U+200B Zero Width Space
This type involves all break possibilities which don't add, remove or change any characters. However, when a break is generated border and padding must be taken into account as must certain text-align values. These sequences are identical to the generic sequences mentioned above.
In addition a change in width due to kerning may need to be considered.
4. Break / non elastic / add character if break
The Knuth solution if something needs to be added to the end of the line when a break is generated is to assign a non zero width to the penalty for the break. For hyphens the penalty will also be flagged (given a non zero value):
This can be easily combined with the common sequences for Space/Border/Padding and/or alignment. For example the Knuth sequence for a break possibility with a hyphen for Space/Border/Padding and text-align="center" would be:
This doesn't cater for change in spelling or kerning in the presence of hyphenation.
5. Break / non elastic / remove if not break
Example: U+00AD Soft hyphen
As a these characters have a zero width in the non break situation they behave with respect to the Knuth sequences identical to the hyphenation case above.
6. Break / non elastic / removable
Example: U+2000 EN QUAD and other fixed width spaces
The Knuth algorithm removes all glue elements at the beginning of the line therefore this sequence will do the trick:
Again this can be combined with Space/Border/Padding and alignment as this example for text-align="left/right" shows:
XSL-FO does not define these characters as removable white space but would under common typesetting conventions these be removed at a line break?
7. Break / elastic / non removable
Example: U+3000 Ideographic space
This can be handled like a combination of a non breaking space (case 2.) followed by a zero width space (case 3.). For example text-align="justify" with Space/Border/Padding:
XSL-FO does not define U+3000 as removable white space but would under common CJK typesetting conventions this be removed at a line break?
Unicode does not break before a space as it assumes spaces are removed from the end of a line. This is not the case here. Do we need to allow for a break before?
8. Break / elastic / removable
Example: U+0020 Space
If white-space-collapse="false" and white-space-treatment="ignore..." we can have a situation that there is a run of spaces which must be removed if a break is generated. Assuming each space generates its own glue element (or at least we may have multiple glue elements if the spaces cross fo boundaries) we get sequences similar to case 6 in the simplest case:
Again this can be combined with the Space/Border/Padding and/or alignment sequences.