This page looks at issues around the generation of Knuth element for line break possibilities. It does not deal with actually determining line break possibilities but concentrates only on the Knuth elements to be generated for a particular line break possibility. Because it is related it also deals with the Knuth elements required for text justification, that is Knuth elements generated for elastic spaces.

The following shorthands are used in the sample sequences:

  • spb-start = the sum of the space-start, border-start and padding-start lengths
  • spb-end = the sum of the space-end, border-end and padding-end lengths
  • sp-width = the width of a nominal space character
  • hyp-width = the width of a hyphenation character

Commonly occurring Knuth sequences

A simple break

1  pen   w="0" p="0"

A forced break

1  pen   w="0" p="-INF"

An elastic break

1  glue  w="sp-width" stretch="sp-width / 2" shrink="sp-width / 3"

(warning) The width, stretch and shrink values shown do depend on the word-spacing property.

Alignments

The Knuth approach of using box, glue and penalty elements can not only be used for justified text but also for text with ragged left or right margins and centered text.

Breaks in justified text

For justified alignment (text-align="justify") the normal elastic break sequence (single glue element) as above is used.

Breaks in text with ragged margins (left or right)

For left or right alignment (text-align="left" or text-align="right") a constant stretch is added at the end of the line:

1  glue  w="0" stretch="3 * sp-width" shrink="0"
2  pen   w="0" p="0"
3  glue  w="sp-width" stretch="- 3 * sp-width" shrink="0"

Explanation:

element 1 is a legal break point, but it is never chosen as 2 is better
element 2 is a legal break point: if it is chosen the glue 3 is discarded
    and no space is reserved. Element 1 leaves some stretch for the line.
element 3 is NOT a legal break because of the preceding penalty. If there is
    no break the sequence is equivalent to a fixed width glue of "sp-width" width
    because the stretch values of elements 1 and 3 cancel each other.

Breaks in centered text

For center alignment (text-align="center") a constant stretch is added both sides of the break:

1  glue  w="0" stretch="3 * sp-width" shrink="0"
2  pen   w="0" p="0"
3  glue  w="sp-width" stretch="- 6 * sp-width" shrink="0"
4  box   w="0"
5  pen   w="0" p="INF"
6  glue  w="0" stretch="3 * sp-width" shrink="0"

Explanation:

element 1 is a legal break point, but it is never chosen as 2 is better
element 2 is a legal break point: if it is chosen the glue 3 is discarded
   and an equal amount of stretch is reserved at the end of the line (element 1)
   and the beginning of the next line (element 5)
element 3 is NOT a legal break because of the preceding penalty. If there is
    no break the sequence is equivalent to a fixed width glue of "sp-width" width
    because the stretch values of elements 1, 3 and 6 cancel each other.
element 4 prevents element 6 to be discarded in case element 2 is chosen as a break
element 5 is NOT a legal break because of its value
element 6 is NOT a legal break because of the preceding penalty

Space/Border/Padding around a break

A common occurrence at a break is the presence of space/border/padding on one or both sides of a break. The generic Knuth sequence for such a situation is very similar to the centered text above:

1  glue  w="spb-end"
2  pen   w="0" p="0"
3  glue  w="- (spb-end + spb-start)"
4  box   w="0"
5  pen   w="0" p="INF"
6  glue  w="spb-start"

Explanation:

element 1 is a legal break point, but it is never chosen as 2 is better
element 2 is a legal break point: if it is chosen, the ending line will
   reserve a width of spb-end for border and padding, and the next line will
   reserve a width of spb-start (the glue 3 is discarded)
element 3 is NOT a legal break because of the preceding penalty
element 4 prevents element 6 to be discarded in case element 2 is chosen as a break
element 5 is NOT a legal break because of its value
element 6 is NOT a legal break because of the preceding penalty
if there is no break, the overall width is spb-end + (-(spb-end + spb-start)) + spb-start

Space/Border/Padding combined with Alignments

These sequences combine the space/border/padding sequence with alignment sequences.

Space/Border/Padding with justified alignment

1  glue  w="spb-end" stretch="0" shrink="0"
2  pen   w="0" p="0"
3  glue  w="sp-width -(spb-end + spb-start)" stretch="sp-width/2" shrink="sp-width/3"
4  box   w="0"
5  pen   w="0" p="INF"
6  glue  w="spb-start"

Space/Border/Padding with Left/Right alignment

1  glue  w="spb-end" stretch="3 * sp-width" shrink="0"
2  pen   w="0" p="0"
3  glue  w="sp-width -(spb-end + spb-start)" stretch="- 3 * sp-width" shrink="0"
4  box   w="0"
5  pen   w="0" p="INF"
6  glue  w="spb-start"

Space/Border/Padding with Center alignment

1  glue  w="spb-end" stretch="3 * sp-width" shrink="0"
2  pen   w="0" p="0"
3  glue  w="sp-width -(spb-end + spb-start)" stretch="- 6 * sp-width" shrink="0"
4  box   w="0"
5  pen   w="0" p="INF"
6  glue  w="spb-start" stretch="3 * sp-width" shrink="0"

Specific Knuth sequences

The following cases have been identified:

  1. Non breaking / non elastic
    Example: U+202F NARROW NO-BREAK SPACE

This is actually the normal character case but can contain some characters Unicode classifies as space. A consecutive sequence of non breaking / non elastic characters with the same properties is mapped into a single Knuth box element with the combined width of all the characters. It is important to aggregate and not to generate individual box elements so that kerning can be taken into account.

1  box w="<width of sequence>"

(warning) These box elements are not related to the identification of words in the text required by the hyphenation subsystem.

For example:

<fo:inline font-size="2em">B</fo:inline>argain

would generate:

1  box   w="width of 'B'"
2  box   w="width of 'argain'"

However, the hyphenation algorithm would need to be given the word: Bargain.

2. Non breaking / elastic space
Example: U+00A0 Non breaking space

For this character class the Knuth elements must prevent that a break is generated but they still participate in text justification.
(warning) If a character falls into this class or not depends on the combination of the treat-as-word-space property and its Unicode value.

The Knuth sequence for text-align not equal to "justify":

1  pen   w="0" p="INF"
2  glue  w="sp-width" stretch="0" stretch="0"

and for text-align="justify":

1  pen w="0" p="INF"
2  glue w="sp-width" stretch="sp-width / 2" shrink="sp-width / 3"

(warning) The width, stretch and shrink values above do depend on the word-spacing property.

3. Break / non elastic

Example: U+200B Zero Width Space

This type involves all break possibilities which don't add, remove or change any characters. However, when a break is generated border and padding must be taken into account as must certain text-align values. These sequences are identical to the generic sequences mentioned above.
(warning) In addition a change in width due to kerning may need to be considered.

4. Break / non elastic / add character if break
Example: Hyphenation

The Knuth solution if something needs to be added to the end of the line when a break is generated is to assign a non zero width to the penalty for the break. For hyphens the penalty will also be flagged (given a non zero value):

1  pen   w="hyp-width" p="FLAGGED"

This can be easily combined with the common sequences for Space/Border/Padding and/or alignment. For example the Knuth sequence for a break possibility with a hyphen for Space/Border/Padding and text-align="center" would be:

1  glue  w="spb-end" stretch="3 * sp-width" shrink="0"
2  pen   w="hyp-width" p="FLAGGED"
3  glue  w="- (spb-end + spb-start)" stretch="- 6 * sp-width" shrink="0"
4  box   w="0"
5  pen   w="0" p="INF"
6  glue  w="spb-start" stretch="3 * sp-width" shrink="0"

(warning) This doesn't cater for change in spelling or kerning in the presence of hyphenation.

5. Break / non elastic / remove if not break
Example: U+00AD Soft hyphen

As a these characters have a zero width in the non break situation they behave with respect to the Knuth sequences identical to the hyphenation case above.

6. Break / non elastic / removable
Example: U+2000 EN QUAD and other fixed width spaces

The Knuth algorithm removes all glue elements at the beginning of the line therefore this sequence will do the trick:

1  pen   w="0" p"=0"
2  glue  w="char width"

Again this can be combined with Space/Border/Padding and alignment as this example for text-align="left/right" shows:

1  glue  w="spb-end" stretch="3 * sp-width" shrink="0"
2  pen   w="0" p="0"
3  glue  w="char width - (spb-end + spb-start)" stretch="- 3 * sp-width" shrink="0"
4  box   w="0"
5  pen   w="0" p="INF"
6  glue  w="spb-start"

(warning) XSL-FO does not define these characters as removable white space but would under common typesetting conventions these be removed at a line break?

7. Break / elastic / non removable
Example: U+3000 Ideographic space

This can be handled like a combination of a non breaking space (case 2.) followed by a zero width space (case 3.). For example text-align="justify" with Space/Border/Padding:

1  pen   w="0" p="INF"
2  glue  w="sp-width" stretch="sp-width / 2" shrink="sp-width / 3"
3  glue  w="spb-end"
4  pen   w="0" p="0"
5  glue  w="- (spb-end + spb-start)"
6  box   w="0"
7  pen   w="0" p="INF"
8  glue  w="spb-start"

(warning) XSL-FO does not define U+3000 as removable white space but would under common CJK typesetting conventions this be removed at a line break?
(warning) Unicode does not break before a space as it assumes spaces are removed from the end of a line. This is not the case here. Do we need to allow for a break before?

8. Break / elastic / removable
Example: U+0020 Space

If white-space-collapse="false" and white-space-treatment="ignore..." we can have a situation that there is a run of spaces which must be removed if a break is generated. Assuming each space generates its own glue element (or at least we may have multiple glue elements if the spaces cross fo boundaries) we get sequences similar to case 6 in the simplest case:

1  glue  w="sp-width" stretch="sp-width / 2" shrink="sp-width / 3"
2  glue  w="sp-width" stretch="sp-width / 2" shrink="sp-width / 3"
...
n  glue  w="sp-width" stretch="sp-width / 2" shrink="sp-width / 3"

Explanation:

element 1 is a legal break point (assuming its following a box element). If chosen
    as a break point all elements will be discarded.
elements 2..n are not legal break points as only the first glue directly after a box element
    constitute a legal break.

Again this can be combined with the Space/Border/Padding and/or alignment sequences.

  • No labels