Proposers
Approvers
- Vinoth Chandar : [APPROVED/REQUESTED_INFO/REJECTED]
- Balaji Varadarajan : [APPROVED/REQUESTED_INFO/REJECTED]
Status
Current state: IN PROGRESS
Discussion thread: here
JIRA: - HUDI-504Getting issue details... STATUS
Released: <Hudi Version>
Abstract
This RFC aims at improving the Hudi web documentation for users and the process of updating docs for developers.
Background
There are a few gaps we observed regarding the docs:
- We only have one version of docs kept at the asf-site branch for the latest release. Given that each version has new features and improvements, some involving configuration and parameter changes compared to to previous versions, the single version of docs can create confusion for users using a previous release of Hudi.
- There's no API docs generated from the code.
- Current process of building, testing, and deploying docs (i.e the content powering hudi.apache.org) is mostly manual.
Implementation
To address these gaps, restructuring of the docs is needed to make the process easier. Migration with Travis-ci, it can build the asf-site branch and execute callback scripts.
The diagram below shows the workflow.
Key Points
There are several key points need to consider:
- How to migration with travis-ci
- How to use git command safely in travis-ci (important)
To use git command safely, hidden the password, we need use bellow form.
git clone https://${GIT_TOKEN}@github.com/${GIT_REPO}/${GIT_PROJECT}.git
Steps to get the {GIT_TOKEN}
Steps - 01 : Enable two-factor authentication
https://help.github.com/en/github/authenticating-to-github/about-two-factor-authentication
Steps - 02 : Generate personal access tokens
Steps to migration with travis-ci
Steps - 01 : Active repository
Steps - 02 : Add {GIT_TOKEN} to environment variables
Steps - 03 : Add .travis.yml file to asf-site branch
language: ruby rvm: - 2.6.3 git: clone: false env: global: - GIT_USER="CI BOT" - GIT_EMAIL="cibot@test.com" - GIT_REPO="apache" - GIT_PROJECT="incubator-hudi" - GIT_BRANCH="asf-site" - DOCS_ROOT="`pwd`/${GIT_PROJECT}/docs" before_install: - git config --global user.name ${GIT_USER} - git config --global user.email ${GIT_EMAIL} - git clone https://${GIT_TOKEN}@github.com/${GIT_REPO}/${GIT_PROJECT}.git - cd ${GIT_PROJECT} && git checkout ${GIT_BRANCH} - gem install bundler:2.0.2 script: - pushd ${DOCS_ROOT} - bundle install - bundle update --bundler - bundle exec jekyll build _config.yml --source . --destination _site - popd after_success: - \cp -rf ${DOCS_ROOT}/_site/* test-content - git add -A - git commit -am "Travis CI build asf-site" - git push origin asf-site --force branches: only: - asf-site
Steps - 04 : Check whether ci works well or not
End
Hope, this will works well
Test
had test https://github.com/lamber-ken/lamber-ken.github.io, when commits, it will generate docs and push the content to asf-site branch.
42 Comments
Vinoth Chandar
Main comments here seem to revolve around how content is split and here is my take
My point is, we are going to build scripts to do this anyway. So instead of having to remember what is release docs and what is site docs, this could be simpler?
Y Ethan Guo
Sorry for the late update...
Based on the discussion in the community meeting on Nov 19, the consensus is that:
I've updated the RFC accordingly.
Vinoth Chandar
Y Ethan Guo can we get an update on this work? if you are not actively working on this, lamber-ken , are you interested in contributing here, given you know the site really well now.
lamber-ken
Hi Vinoth Chandar , it's my pleasure if I can work on this RFC. We can refer to flink project, flink project has resolved this problem in a simple way.
Flink faced the same issue[1] which talk about the building website automactically. So I think we can learn from it.
The solution uses apache buildbot[2] which can build and deploy snapshots automatically. It seems to need PMC/Committer to complete the next steps.
[1] https://issues.apache.org/jira/browse/FLINK-1370
[2] https://ci.apache.org/buildbot.html
[3] https://ci.apache.org/projects/flink/flink-docs-master
https://ci.apache.org/projects/flink/flink-docs-release-1.8
https://ci.apache.org/projects/flink/flink-docs-release-1.9
Here is an architecture in my mind.
Y Ethan Guo
Vinoth Chandar I'm not actively working on this. lamber-ken feel free to take over the tasks regarding this RFC.
lamber-ken
Thanks Y Ethan Guo.
Vinoth Chandar
lamber-ken IIUC buildbot gives an ability to test and deploy the site.. but can we write a test that can bring up a site and test if its working correctly..
We tried getting an account on ci.apache before.. not sure if we had much luck with getting that through.. Balaji Varadarajan do you remember what the issue was?
lamber-ken
Vinoth Chandar Right. IMO, we can test that.
lamber-ken
hi Vinoth Chandar from https://ci.apache.org/buildbot.html we can learn that buildbot support test and deploy the site.
Vinoth Chandar
I think the issue was more around getting an account provisioned for us.. lets wait for balaji to chime in
lamber-ken
I filed an infra jira to ask infra for helping us. Track this https://issues.apache.org/jira/browse/INFRA-19775
Vinoth Chandar
lamber-ken can you please update this RFC to accurately reflect the proposed plan? and split up the work into different JIRAs.
There are some different use-cases problems here.
Would love to understand the big picture and how we plan to solve each ..
lamber-ken
Hi Vinoth Chandar willing to update the RFC based on my understanding of buildbot, any suggestion is welcome.
Vinoth Chandar
Left some comments above.. Overall, I feel we should try to work off the asf-site branch alone and leave master alone for now..
Vinoth Chandar
> When a contributor submits a code change to the site, a staging site is built so reviewer can test it out.
I am not sure how/if this requirement is met..
> Once PR to the site is merged, the hudi.apache.org site is refreshed automatically.
I think you are hinting that it will be refreshed nightly, which is fine as well.
>Maintaining release specific docs.
Don't understand that you keep mentioning the site gets deployed to ci.apache.org/projects/hudi/.... Are you saying hudi.apache.org will serve the "latest" (unreleased version), while releases will be served off ci.apache.org? Not following this at all
lamber-ken
> When a contributor submits a code change to the site, a staging site is built so reviewer can test it out.
I think it may can not met. Given a scenario, if many users submit prs, it may need many staging sites to service
> Try to work off the asf-site branch alone and leave master alone for now..
Agree
> Maintaining release specific docs.
1, https://hudi.apache.org always used to point to the latest stage release doc.
2, When a new hudi version released, we can use script to build site like before does. May this can migrate with GH actions.
3, Once PR to the site is merged, the https://ci.apache.org/projects/hudi/hudi-site-master is refreshed automatically.
Vinoth Chandar
> if many users submit prs, it may need many staging sites to service
lets may be leave this out for now..
> 3, Once PR to the site is merged, the https://ci.apache.org/projects/hudi/hudi-site-master is refreshed automatically.
Would hudi.apache.org refresh automatically as well, every night?
> When a new hudi version released, we can use script to build site like before does. May this can migrate with GH actions.
I am wondering if we just tackle the versioning like we do today.. We can script the release specific docs creation like I did and keep it as just one site..
All we need here is some automation to generate the site and refresh hudi.apache.org regularly right?
lamber-ken
> 3, Once PR to the site is merged, the https://ci.apache.org/projects/hudi/hudi-site-master is refreshed automatically.
The content of hudi.apache.org only be refreshed until next hudi stable version released.
> When a new hudi version released, we can use script to build site like before does. May this can migrate with GH actions.
The https://ci.apache.org/projects/hudi/hudi-site-master is refreshed automatically.
I think no need to refresh https://hudi.apache.org, because when user run hudi using below command, they need release-0.5.1 version
docs. So, they can visit http://hudi.apache.org/docs/quick-start-guide.html
If user build master branch by themself, they can visit https://ci.apache.org/projects/hudi/hudi-site-master/docs/quick-start-guide.html
Vinoth Chandar
>The content of hudi.apache.org only be refreshed until next hudi stable version released.
got it.. Thats how the we do it now anyway I guess.. but would this process be automated? How do we make small edits to the live site, e.g fixing typos and so on. ?
Still don't clearly understand where the docs for old releases will reside? I would like them to be served off the hudi.apache.org site always.. i.e when we make a new release,
And we need to automate this. Do you agree?
lamber-ken
We should not push the master content to the current site, event if we fix some typos.
The content of the current site is a snapshot for the latest released hudi version.
For example, if someone report bugs in 0.5.0-incubating, we can't modify the previous hudi jar.
We can fix these bugs in master branch, user can build the master branch then use it.
lamber-ken
From my side, using buildbot is a better way to host multi version docs. But I don't have much experience with it, just learn it from its official document, and can't try it by myself to check whether it can meet our needs or not. So we can only test it step by step.
Vinoth Chandar
>From my side, using buildbot is a better way to host multi version docs
We need to really prove this statement with good data points.. I understand Flink serves release docs from ci.apache.org , but Spark, Hadoop, Kafka all don't.. I am leaning towards this model of serving old and latest releases from hudi.apache.org. master or unreleased docs can be staged via build bot .. can we agree on this? So we can start making some progress..
lamber-ken
Agree, let me know if I can help.
lamber-ken
Right, we can also learn from Spark about how they did. Let's try buildbot first.
Vinoth Chandar
Sounds good.. please let me know if I need to step in to unblock you..
But high level. we will be first making the master site publish on ci.apache.org?
lamber-ken
Hi Vinoth Chandar
Right, the first thing for us is making the master site publish on ci.apache.org.
Also, I took some time to study how spark / hadoop did, they thems generate docs manually.
HADOOP-SITE
https://github.com/apache/hadoop-site
https://github.com/apache/hadoop-site/tree/asf-site/content/docs
Add 2.10.0 https://github.com/apache/hadoop-site/commit/cefbabb6a0f2615e5f565c8baed1aee437cf3dc7
Add 3.1.2 https://github.com/apache/hadoop-site/commit/ab0f7cdbab46075b9b8e9aa9f48014f8b0e20d0e
Add 3.2.1 https://github.com/apache/hadoop-site/commit/d23deb5f5f40f259e0c133ee1d855bf5f7f5d3ac
SPARK-SITE
https://github.com/apache/spark-website
https://github.com/apache/spark-website/tree/asf-site/site/docs
Add v2.3.4 https://github.com/apache/spark-website/commit/8aa1175f99847d71c6bffe63018c6d9ff4a4dc61
Add v2.4.5 https://github.com/apache/spark-website/commit/4c0c162a2f6677d6c87fcc7907c62fc4660f4073
Add 3.0.0-preview2 https://github.com/apache/spark-website/commit/840b1b1da7cc23b121b5b304848053d8f6f019e2
Add 3.0.0-preview https://github.com/apache/spark-website/commit/564843ab32b2434d03aa8ae8ad732252acb84e1a
Vinoth Chandar
lamber-ken sorry was busy for past few days with other things.. They seem to "promote" a release specific doc to the site ..
I suggest we take the following approach.
This is an incremental approach and allows us to learn how it goes with buildbot, while still solving the biggest problem we have (which is automatically generating master docs).. I'd still prefer to serve released versions of hudi out of hudi.apache.org
Hopefully, this gives you enough clarity to start tackling this
lamber-ken
Hi Vinoth Chandar you're welcome, no need say sorry. : )
Goals
1, web-site supports multi version docs
2, generate docs automatically
3, need a staging site for visit after each PR merged
Plan
>> web-site supports multi version docs
1) if use buildbot, like bellow
2) if not, I think may we need to use Github Actions. IMO, it can build site and push build result to asf-site branch
GitHub Actions
- 1, build docs by script
-2, execute git command to push the bulid result to asf-site branch
>> generate docs automatically
As I known, only Flink did by using buildbot to build master docs. Spark, Hadoop .. etc didn't support this.
IMO, we can do
1, use buildbot to build master docs
2, use GitHub Actions to build release docs
>> need a staging site for visit after each PR merged
If need use buildbot, staging site:
https://ci.apache.org/projects/hudi/hudi-site-master
if use use GA, staging site:
http://hudi.apache.org/docs/master-quick-start-guide.html
Cureent
I am learning Github Actions and trying it with my own project these days, if it can build site and push the result by git command.
If succeed, we may don't need use buildbot.
Vinoth Chandar
>> IMO, we can do
>> 1, use buildbot to build master docs
>> 2, use GitHub Actions to build release docs
This a good tradeoff..
>> If succeed, we may don't need use buildbot.
+1 In general, actually trying these tools and playing with it, often gives much more clarity.
lamber-ken
Yeah, I updated the RFC content, I had test it with my own project with travis ci. The most important is use git command safely in travis ci env.
Next steps require the git token which can assess hudi project, so let authorized guys continue to work on it. : )
Steps:
Vinoth Chandar
lamber-ken thanks for updating the RFC.. So, the first step here to build the master docs, push directly onto the asf-site branch?
Let's have enough checks in there to ensure only the relevant folders are touched .. is this a new script that we will write as a part of github actions ?
cc Shaofeng Li who can also help you more quickly in your time zone ..
Vinoth Chandar
lamber-ken could you help structure the tasks in the effort a bit more as sub task jiras under the root one at the top? IIUC so far, we have a solution to automating the doc build, seems like..
lets also think about, what happens if someone accidentally breaks the site.. ? We just rollback the last commit manually?
lamber-ken
Hi Vinoth Chandar
>> who can also help you more quickly in your time zone
Shaofeng LiHe has contacted me yesterday.
>> is this a new script that we will write as a part of github actions ?
No, just use travis ci, https://www.travis-ci.org
>> the first step here to build the master docs, push directly onto the asf-site branch?
Right, I write the `.travis.yml` myself and checked it. Travis will use it to build the docs, then push the content to asf-site
>> if someone accidentally breaks the site
1. the git push action can be only executed after success
2. We also can rollback the last commit manually
Vinoth Chandar
lamber-ken so the next step is for me to try repeating those steps you did.. correct?
lamber-ken
Correct
Vinoth Chandar
Okay will attempt this and report back..
Vinoth Chandar
lamber-ken Finally got around to trying some stuff.. I have a concern here..
Per https://docs.travis-ci.com/user/job-lifecycle/ `after_success` executes if the `script` step is successful with non-zero error code..
So if someone makes a change and opens a PR, it will trigger the job, which will check out the git repo (I think this is done by travis anyway. separate topic).. , run the steps in script and just push the site? without even needing the PR to be approved? I think we should use the `deploy` step?
lamber-ken
Usually, the travis only be triggered when the PR be approved(I think it is)
We can control the default behavior, here is the travis doc,
https://docs.travis-ci.com/user/web-ui/#build-pushed-branches
lamber-ken
Right, only push the site
Vinoth Chandar
> Usually, the travis only be triggered when the PR be approved(I think it is)
For non asf-site branch , this is not true. travis is triggered once you submit the PR.
I don't know how to make a custom configuration, that just works for PRs against asf-site branch.. That switch you mention will affect every PR right
lamber-ken
We can use this config "branches", place `.travis.yml` at asf-site branch (I did that when I test).
We can give it a try
lamber-ken
hi Vinoth Chandar I found a better way to control whether push build result or not, by using $TRAVIS_PULL_REQUEST env variable.
then the .travis.yml will be updated