Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

Authors: OlegTkachenko (ot), VictorMote (wvm), <add yourself>.

This page is an editable working draft of the

Writing modes and Bidi support design

Goals

Wiki Markup
The main goal is to achieve a robust support of all three main writing modes defined in the XSL Recommendation \[6\] : lr-tb (e.g. Western writing systems), rl-tb (e.g. Hebrew, Arabic) and tb-rl (e.g. Japanese, Chinese). (I'm not sure about tb-rl actually[ot]). Additional writing modes defined in Appendix A.1 of the Recommendation\[9\] are out of scope of this design.


Definitions

No Format
 writing-mode 

Wiki Markup
 property\[6\] represents global cultural traditions of placement an information on a media and it's a main way in XSL to control the directions of placement of glyphs, blocks, lines, words etc. This is done by deriving 

No Format
 block-progression-direction 

(BPDir) and

No Format
 inline-progression-direction 

(IPDir) traits from

No Format
 writing-mode 

Wiki Markup
 property. In addition, IPDir for a sequence of characters may be implicitly determined using the Unicode Character Database (UCD)[10] and the Unicode Bidirectional Algorithm (Bidi) \[8\]. 

In most cases that information would be enough to properly process bidirectional text, but sometimes current writing mode and implicit direction need to be overriden and such kind of fine-tuning of bidirectional processing can be controlled by

No Format
 direction 

and

No Format
 unicode-bidi 

properties of

No Format
 fo:bidi-override 

formatting object.

No Format
 writing-mode 

No Format
 writing-mode 

Wiki Markup
 property \[6\] applies only to the following formatting objects: 
\#

No Format
 fo:simple-page-master 

#

No Format
 fo:region-* 

#

No Format
 fo:table 

#

No Format
 fo:block-container 

#

No Format
 fo:inline-container 

Values: lr-tb (default) | rl-tb | tb-rl | lr (shorthand for lr-tb) | rl (shorthand for rl-tb) | tb (shorthand for tb-rl) | inherit. Each value of

No Format
 writing-mode 

sets IPDir, BPDir and

No Format
 shift-direction 

traits on the reference area.

Mapping of corresponding properies from absolute to relative one is as follows:
lr-tb

indent
  top\->before, bottom\->after, left\->start, right\->end
 

rl-tb

indent
  top\->before, bottom\->after, left\->end,   right\->start
 

tb-rl

indent
  top\->start,  bottom\->end,   left\->after, right\->before
 

How

No Format
 writing-mode 

is used:
#IPDir and BPDir are used for stacking respectively inline and block areas
#

No Format
 fo:simple-page-master 

- placement of the regions on the master
#

No Format
 fo:region-body 

- stacking of columns and default flow of text from column to column
#

No Format
 fo:table 

- layout of rows and columns (BPDir determines row-stacking direction and IPDir determines colums-stacking direction (and cell order within a row)).

No Format
 direction 

Values: ltr (default) | rtl | inherit.

Usage of

No Format
 direction 

Wiki Markup
 property \[4\] in XSL is almost deprecated, except for the case of controlling/overriding of IPDir determined by the current writing mode and the implicit direction determined by UCD and Bidi. Applies only to 

No Format
 fo:bidi-override 

formatting object.

The property only has an effect on text in which orientation of the glyphs is perpendicular to the IPDir, therefore for lr-tb and rl-tb writing modes

No Format
 direction 

property affects only non rotated glyphs and for tb-rl - only rotated glyphs.

No Format
 unicode-bidi 

No Format
 unicode-bidi 

Wiki Markup
 property \[5\] only applies to 

No Format
 fo:bidi-override 

formatting object. Values: normal (default) | embed | bidi-override | inherit. Along with

No Format
 direction 

Wiki Markup
 property it opens new embedding level or creates override with respect to Bidi, see \[8\]. 

How it affects Bidi processing:

*

No Format
 <bidi-override direction="ltr" unicode-bidi="embed"> foo </bidi-override> 

*

indent

 LRE foo PDF

 

*

No Format
 <bidi-override direction="rtl" unicode-bidi="embed"> foo </bidi-override> 

*

indent
 RLE foo PDF
 

*

No Format
 <bidi-override direction="ltr" unicode-bidi="bidi-override"> foo </bidi-override> 

*

indent
 LRO foo PDF
 

*

No Format
 <bidi-override direction="rtl" unicode-bidi="bidi-override"> foo </bidi-override> 

*

indent
 RLO foo PDF
 

Where LRE is U+202A, RLE is U+202B, PDF is U+202C, LRO is U+202D and RLO is U+202E.

"normal" value was designed in CSS to change directional type of non-textual emtities such as images, but as in XSL the property appplies only to

No Format
 fo:bidi-override 

formatting object, therefore "normal" value is not used and all non-textual entities are treated as neutral characters (more specifically as OBJECT REPLACEMENT CHARACTER (U+FFFC) according to Bidi algorithm).

XSL Bidi processing conceptual model

Wiki Markup
The final phase of refinement uses Bidi algorithm and UCD "...to convert the implicit directionality of the text into explicit markup in terms of formatting objects". E.g. LLLRRR text with ltr IPDir would be translated into LLL<fo:bidi-override direction="rtl" unicode-bidi="bidi-override">RRR</fo:bidi-override>. Bidi algorithm as defined in \[8\] requires some adaptations to fit into XSL processing model. Here is adopted conceptual model:

Step 1. Breaking into DTRs

In XSL Bidi algorithm is applied to delimited text ranges (DTR) instead of paragraphs. A DTR is a maximal flattened sequence of characters (FSC) that doesn't contain any delimiters. A FSC is created by pre-order traversing of a FO tree fragment down to

No Format
 fo:character 

level. During the traversal, every

No Format
 fo:character 

formatting object adds a character to the sequence and

No Format
 fo:bidi-override 

formatting object with

No Format
 unicode-bidi 

property with a value of "embed" or "bidi-override" adds appropriate directional formatting code (such as LRE, RLE, LRO or RLO) before traversing its content and PDF code after. Delimiters are: any formatting object that generates block-areas,

No Format
 fo:multi-case 

and any text with glyph orientation that is not perpendicular to the dominant-baseline.

For each DTR, default bidirectional orientation (paragraph embedding level) is determined according to IPDir of the nearest ancestor-or-self formatting object that generates a block-area.

Step 2. Resolution of the embedding levels

Each character in the DTR is labeled with a resolved embedding level. rtl text will always end up with an odd level and ltr and numeric text will always end up with an even level. Then new

No Format
 fo:bidi-override 

formatting objects with appropriate values of the

No Format
 direction 

and

No Format
 unicode-bidi 

properties are inserted into the FO tree fragment that was flattened into the DTR such that the following constraints are satisfied:

  1. For any character in the DTR, IPDir matches resolved embedding level.
  2. Newly insterted
    No Format
     fo:bidi-override 
    formatting objects don't break nesting relationship and retain computed property values.
  3. Minimum number of
    No Format
     fo:bidi-override 
    formatting objects was inserted.

Step 3. Reordering the text

The final text reordering step is not done during refinement. Instead, XSL equivalent of reordering is done during formatting. IPDir of each glyph, which is explicitly determined during the previous step is used to control the stacking of glyphs.

(wvm+ot)Note that according to Bidi algorithm the reordering acts on a per-line basis and process of breaking a paragraph into lines is outside the scope of Bidi algorithm. (This effectively means that line breaking algorithm acts on a text in logical order). For example, suppose that lower-case letters represent Latin text, and upper-case letters represent Hebrew text, and that the layout context is lr-tb. The text "abc defg ABCD EFG hijk lmno", requiring a line break between the two Hebrew words, should be rendered as follows:

No Format
 abc defg DCBA 
 GFE hijk lmno 

NOT as:

No Format
 abc defg GFE 
 DCBA hijk lmno 

Bidi mirroring

When characters are shaped into glyphs, the mirroring process defined in Bidi algorithm must take place. This takes resolved embedding levels into acount: if glyph-orientation="90" and the embedding level is odd or if glyph-orientation="-90" and the embedding level is even - the character needs to be mirrored.


Implementation

Bidi implementation

Requirements

Bidi implementation must provide API for the following tasks:

  • Checking out whether a text requires bidi processing.
  • Resolution of embedding levels within a text.
  • Reordering objects in a visual order according to their levels.
  • Mirroring of characters.

Available implementations

First of all the mirroring is actually rather simple task, there is a predefined character-to-character mapping set including about 150 characters, which defines bidi mirroring. We can do it ourself of alternatively we can make use of static <code>mirrorChar(int c)</code> method of <code>org.apache.batik.gvt.text.BidiAttributedCharacterIterator</code> class.

...

Reference materials

  • Wiki Markup
    \[1\] The spec [1.2.5 Internationalization and Writing-Modes|http://www.w3.org/TR/xsl/slice1.html#section-N1002-Internationalization-and-Writing-Modes]

  • Wiki Markup
    \[2\] The spec [5.5.3 Writing-mode and Direction Properties (refinement)|http://www.w3.org/TR/xsl/slice5.html#refine-writing-mode]

  • Wiki Markup
    \[3\] The spec [7.27 Writing-mode-related Properties|http://www.w3.org/TR/xsl/slice7.html#writing-mode-related]

  • Wiki Markup
    \[4\] The spec [7.27.1 "direction"|http://www.w3.org/TR/xsl/slice7.html#direction]

  • Wiki Markup
    \[5\] The spec [7.27.6 "unicode-bidi"|http://www.w3.org/TR/xsl/slice7.html#unicode-bidi]

  • Wiki Markup
    \[6\] The spec [7.27.7 "writing-mode"|http://www.w3.org/TR/xsl/slice7.html#writing-mode]

  • Wiki Markup
    \[7\] The spec [5.8 Unicode BIDI Processing|http://www.w3.org/TR/xsl/slice5.html#section-N6720-Unicode-BIDI-Processing]

  • Wiki Markup
    \[8\] The Bidirectional Algorithm -- http://www.unicode.org/unicode/reports/tr9/

  • Wiki Markup
    \[9\] The spec [A.1 Additional "writing-mode" values|http://www.w3.org/TR/xsl/sliceA.html#writing-mode-add]

  • 10 Unicode Character Database – http://www.unicode.org/Public/UNIDATA/