Discussion thread

https://lists.apache.org/thread/87z01dpvb3xgsxgf4rccf8p4pck3sgjz

https://lists.apache.org/thread/srohfn3hvrndjkrtf52n5s68jndjvxc4

Vote thread
JIRA

FLINK-11526 - Getting issue details... STATUS FLINK-11529 - Getting issue details... STATUS

Release1.9

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

In the past year, the Chinese community is working on building a Chinese Flink website and documents in order to help Chinese speaking users. We have hosted the website at https://flink-china.org which has received a lot of traffic since online. At the same time, we have seen there is a huge demand for Chinese language support.

In order to follow the "Apache Way" and grow Apache Flink community, we want to contribute the content of https://flink-china.org to Apache Flink, and establish a mechanism to support multiple languages for Apache Flink. It contains two parts:

(1) the Chinese translated version of the Flink website: i.e. https://flink.apache.org/

(2) the Chinese translated version of the Flink documentation: i.e. https://ci.apache.org/projects/flink/flink-docs-master/

Goals

  1. Propose a documentation contribute process to make sure the English & Chinese docs sync.

  2. Propose a Chinese translation specification and glossary table to make sure the translation is high quality and style unified.

  3. Make the framework of flink-web and flink/docs support multiple languages (currently only English and Chinese-Simplified).

  4. Translate the content of flink-web and flink/docs into Chinese page by page.

Proposed Changes

1. Documentation Contribute Process Changes

  • Adjust our PR review bot to include a "updated required English documentation" & "opened a JIRA for Chinese documentation translation sync" checks. A PR should not be merged without these checks being approved. Also should add the checklist to Review Checklist on website.
  • Add a "chinese-translation" component to our JIRA, which should be tagged on Chinese documentation sync JIRAs opened in the previous step.
  • Add a translation guideline in our contribute documentation page, including educating Chinese-speaking developers to search for these labels in JIRA.

2. Chinese Translation Specification and Glossary Table

The proposed translation specification includes typesetting, style, terminology translation and some tips to improve translation quality. The purpose of the specification is used to improve the translation quality, reduce the workload of reviewer, keep the style consistent and make the reader's reading experience better.

The community should reach a consensus on the translation specification and then follow the specification as much as possible.

The proposed translation specification can be found here: https://docs.google.com/document/d/1zhGyPU4bVJ7mCIdUTsQr9lHLn4vltejhngG_SOV6g1w

I would propose to convert it into a WIKI page when it is accept, and linked from the contribute documentation page.

3. Multiple Languages Support


We have tried some language plugins of Jekyll, such as jekyll-multiple-languages-plugin and jekyll-multiple-languages. Choosing a different plugin may result in the file directory structure different. After some experiments, I would propose to use jekyll-multiple-languages as the pugin, which is also used by Kylin.

Build Way

The build way is totally the same as before. You can build the site using docker or build_docs.sh -p

URL address

The English version content is still in original URL. The Chinese version content will be build into the subdirectory /zh under the original URL. For example:


File Name and File Directory Structure

  • There is no changes to the existing files.

  • English File Names: keep the same as usual, i.e. $page_name.$ext, for example: index.md

  • Chinese File Names: add .zh after $page_name, i.e. $page_name.zh.$ext, for example: index.zh.md

  • The Chinese file will be build into the subdirectory /zh in the destination directory, for example: content/zh/index.html.

  • The Chinese file and English file should be placed under the same directory.


Here is an example of the new file structure and the build result


file structure

build result

.

├── index.md

├── index.zh.md

├── usecases.md

├── usecases.zh.md

├── dev

│    ├── api.md

│    ├── api.zh.md

├── content

│   ├── index.html

│   ├── usercase.html

│   ├── dev

│   │ ├── api.html

│   ├── zh

│   │ ├── index.html

│   │ ├── usercase.html

│   │ ├── dev

│   │ │    ├── api.html


Compatibility, Deprecation, and Migration Plan

The build way and existing markdown files is not modified. The translation files are totally added. So it's compatible and there is no change to current behavior.

Implementation Plan

The implementation contains two parts, the Chinese version of the flink-web and flink/docs. The flink-web is much smaller than docs and a smaller scope might help to find tricks to ease the integration of the documentation. So we would propose to support Chinese version for flink-web (flink.apache.org) first.

  1. support multiple language for the framework and create all the needed `xx.zh.md`
  2. translate the Home page to Chinese
  3. translate the "User Cases" page to Chinese
  4. ...
  1. support multiple language for the framework and create all the needed `xx.zh.md`
  2. add a PR checklist to review bot and website page
  3. add translation guideline to website page
  4. translate the "Dataflow Programming Model" page into Chinese
  5. translate the "DataStream API Tutorial" page into Chinese
  6. ....
  7. ....
  8. [last] add a link between English version and Chinese version


Rejected Alternatives

Use Docusaurus/Crowdin to support localization

# What is Docusaurus and crowdin ?
Docusaurus is a documentation framework which supports document versioning and localization (via crowdin). IMO, Docusaurus is something similar to Jekyll.
Crowdin is a localization management platform. Users can upload contents (e.g. markdown source files) to crowdin and translate, collaborate, manage process on crowdin. 
The English contents is kept in the original repository, and the multiple language translated contents is kept in crowdin. We need to download the translated contents from crowdin and build them into localization website.
Apache Pulsar is using Docusaurus for website and documentation. 
Here is the Pulsar project on crowdin: https://crowdin.com/project/apache-pulsar/zh-CN#
And here is a test project for Flink I created https://crowdin.com/project/flink-test/zh-CN#

# How can Flink fit into them?
I'm afraid that Flink is hard to fit into Docusaurus unless we rework our website/docs from Jekyll to Docusaurus.

How about Jekyll + Crowdin?
We need a build job to make it work. The build job is triggered when a commit merged into master. 
The build job does the following things: 
1) upload the lastest contents (English markdown source files) from git to crowdin. 
    - If the source content is changed, the corresponding translation will lose and need re-translation.
2) download the translated contents 
3) build website and publish
 
But it seems that Crowdin doesn't fit well with Jekyll. 
Crowdin will break contents into multiple lines to translate according his syntax. This results to the layout broken.
For example, the translated metric page is not rendered as expect:
this is the original `metric.md`: https://user-images.githubusercontent.com/5378924/52795366-9a22f080-30ac-11e9-9cfd-4de82041aa77.png
this is the file downloaded from crowdin: https://user-images.githubusercontent.com/5378924/52795379-9f803b00-30ac-11e9-9700-0a4077b5882d.png
this is the page after rendered: https://user-images.githubusercontent.com/5378924/52795389-a5761c00-30ac-11e9-9821-35c705a8d65b.png

So it seems that currently crowdin works less well when the markdown contains HTML and Liquid codes.

# Conclusion (Docusaurus/crowdin or current approach)

Staying with the current approach and avoiding the effort of porting the documentation. We can reconsider Docusaurus if we want to restructure the documentation completely at some point. We can find a way to move the translated content to crowdin to reuse the translation effort if we want to migrate to Docusaurus in the future.