This page is meant as a template for writing a KIP. To create a KIP choose Tools->Copy on this page and modify with your content and replace the heading with the next KIP number and a description of your issue. Replace anything in italics with your own description.
Status
Current state: Under Discussion
Discussion thread: here
JIRA: here
Motivation
The current Apache Kafka documentation website (https://kafka.apache.org/documentation/) uses raw HTML embedded within the source code. This approach presents several challenges:
- Inefficient Maintenance: Editing raw HTML is cumbersome and mixes content with styling, making updates difficult and increasing the barrier to contribution, especially for developers unfamiliar with HTML/CSS. Testing changes requires deploying to a web server.
- Styling Inconsistencies: The site exhibits inconsistencies in styling, such as inconsistent heading levels across different pages (e.g., some pages start with H2 headings, while others use H4).
- Long Pages: Several pages are excessively long, impacting readability and maintainability.
- Server-Side Dependencies: The website relies on server-side includes (SSI) for dynamic page generation, which introduces potential security vulnerabilities. For example, improper configuration of SSI can lead to information disclosure or cross-site scripting (XSS) attacks. More details on SSI vulnerabilities can be found in resources like OWASP's documentation on Server-Side Includes injection. The site also uses Handlebars.js as a templating engine, further complicating the structure as raw HTML is embedded within
<script>
tags.
This KIP proposes migrating the Apache Kafka documentation website from its current raw HTML format to Markdown. This change will leverage modern static site generation tools like Hugo and the Docsy theme to improve maintainability, readability, and testability of the documentation, while also enabling richer features and a more consistent user experience.
Public Interfaces
N/A since there are no changes to Apache Kafka codebase.
Proposed Changes
Migrate the Apache Kafka documentation website to Markdown and using Hugo (https://gohugo.io/) for static site generation. Leverage the Docsy theme (https://www.docsy.dev/docs/get-started/), which is used by other successful open-source projects like Kubernetes (https://github.com/kubernetes/website/tree/main/content) and Istio (https://github.com/istio/istio.io/tree/master/content). This approach offers several advantages:
- Modern Tooling: Markdown offers a simpler, more readable, and maintainable format compared to raw HTML. Hugo provides a powerful and efficient static site generator with a rich ecosystem of tools and themes.
- Improved Maintainability and Testability: Markdown's simplicity makes it easier to write, edit, and review documentation. Static site generation simplifies testing as the entire site can be built locally without requiring a live web server.
- Richer Features: Using Hugo and Docsy opens up access to a wide range of features, including improved navigation, local search functionality, better mobile responsiveness, and potential integration with other Kafka resources.
- Content Refactoring: This migration provides an opportunity to refactor the content for better organization and readability. As detailed in the automation repository (https://github.com/hvishwanath/ak2md) and the
process.yaml
file (https://github.com/hvishwanath/ak2md/blob/main/process.yaml), specific refactoring tasks are planned. For example, the current documentation includes several long pages. These will be split into smaller, more manageable sections. Additionally, the information about Kafka Connectors will be reorganized to improve the user experience. The current documentation also has some inconsistencies in the structure of the quickstart guides. This will be standardized as part of the migration to Markdown.
Markdown Conversion
Convert all raw html source files to their markdown equivalent.
- Remove the use of SSI: Develop a corresponding {{hugo}} template that allows referencing and rendering HTML files.
- Handle
handlebars.js
templates: Parse raw html content that are written under<script type="text/x-handlebars-template"/>
tags, substituting right values for template variables during conversion. - Refactoring: Refactor very long documents into shorter sections for better readability and maintainability. Adjust heading levels to be consistent across the website.
- Handling static assets:
- Tool generated content under
generated
andjavadoc
folders will not be converted into corresponding markdown equivalents. Instead, they will be served as static assets and linked into the corresponding locations in the markdown documents. - Other assets such as applicable images, logos, css, js files will be served as static assets as well.
- Tool generated content under
Styling
Maintain existing color themes, layout, design for the most part. Wherever possible, use latest equivalents.
AK website documentation
Complete one-time migration of the content of website repo to markdown. After this point, any updates required will follow the existing process:
- For AK version specific documents, updates will be done in the
docs
directory of the corresponding branch in core kafka repo. As part of the build process, the documentation will be copied over to the corresponding location in the website repo. - For the rest, updates will be done by raising PRs directly on the website repo . These documents will be refactored as mentioned above for better readability and organization.
Sample directory layout:
. (a few sample directories below, other versions will have similar structure) ├── 39 │ ├── _index.md │ ├── apis │ ├── configuration │ ├── design │ ├── getting-started │ ├── implementation │ ├── kafka-connect │ ├── operations │ ├── security │ └── streams ├── _index.md ├── blog │ ├── _index.md │ └── releases ├── community │ ├── _index.md │ ├── books_and_papers.md │ ├── committers.md │ ├── contact.md │ ├── developer.md │ ├── downloads.md │ ├── events.md │ ├── project_security.md │ ├── trademark.md │ └── videos.md ├── search.md └── testimonials └── _index.md
AK version specific documentation
We store Kafka version specific documentation under {{docs}} folder in the corresponding branch of core kafka repo: Convert the html
source files as specified above and update the docs
directory with their markdown equivalents. We will do this for the following branches:
3.9, 3.8, 3.7, 3.6, 3.5, 3.4, 3.3, 3.2, 3.1, 3, 2.8, 2.7, 2.6, 2.5, 2.4, 2.3, 2.2, 2.1, 2, 1.1, 1, 0.11.0, 0.10.2, 0.10.1, 0.10.0, 0.9.0, 0.8.2, 0.8.1, 0.8, 0.7
Sample directory layout:
docs/ ├── _index.md ├── apis │ ├── _index.md │ └── api.md ├── configuration │ ├── _index.md │ └── configuration.md ├── design │ ├── _index.md │ ├── design.md │ └── protocol.md ├── getting-started │ ├── _index.md │ ├── docker.md │ ├── ecosystem.md │ ├── introduction.md │ ├── quickstart.md │ ├── upgrade.md │ └── uses.md ├── implementation │ ├── _index.md │ ├── distribution.md │ ├── log.md │ ├── message-format.md │ ├── messages.md │ └── network-layer.md ├── kafka-connect │ ├── _index.md │ ├── administration.md │ ├── connector-development-guide.md │ ├── overview.md │ └── user-guide.md ├── operations │ ├── _index.md │ ├── basic-kafka-operations.md │ ├── datacenters.md │ ├── enter-migration-mode-on-the-brokers.md │ ├── finalizing-the-migration.md │ ├── geo-replication-(cross-cluster-data-mirroring).md │ ├── hardware-and-os.md │ ├── java-version.md │ ├── kafka-configuration.md │ ├── kraft.md │ ├── limitations.md │ ├── migrating-brokers-to-kraft.md │ ├── migration-phases.md │ ├── monitoring.md │ ├── multi-tenancy.md │ ├── preparing-for-migration.md │ ├── provisioning-the-kraft-controller-quorum.md │ ├── reverting-to-zookeeper-mode-during-the-migration.md │ ├── terminology.md │ ├── tiered-storage.md │ └── zookeeper.md ├── security │ ├── _index.md │ ├── authentication-using-sasl.md │ ├── authorization-and-acls.md │ ├── encryption-and-authentication-using-ssl.md │ ├── incorporating-security-features-in-a-running-cluster.md │ ├── listener-configuration.md │ ├── security-overview.md │ ├── zookeeper-authentication.md │ └── zookeeper-encryption.md └── streams ├── _index.md ├── architecture.md ├── core-concepts.md ├── developer-guide │ ├── _index.md │ ├── app-reset-tool.md │ ├── config-streams.md │ ├── datatypes.md │ ├── dsl-api.md │ ├── dsl-topology-naming.md │ ├── interactive-queries.md │ ├── manage-topics.md │ ├── memory-mgmt.md │ ├── processor-api.md │ ├── running-app.md │ ├── security.md │ ├── testing.md │ └── write-streams-app.md ├── introduction.md ├── quickstart.md ├── tutorial.md └── upgrade-guide.md
Development Workflow
One of the key advantages of using hugo
is its built-in live reload functionality. During content creation and editing, Hugo automatically detects changes in source files and instantly refreshes the browser, providing immediate visual feedback. This rapid feedback loop significantly accelerates the development process, allowing contributors to quickly iterate on their proposals and preview the rendered output in real-time. As soon as a change is saved in the source files, the documentation site in the browser updates, eliminating the need for manual rebuilds and refreshes. This feature drastically improves the efficiency of writing and refining KIP content.
While docsy
provides a rich and well-structured theme for technical documentation, it does have some dependencies to be aware of. docsy
relies on Go for an easier setup, and additionally utilizes Node.js libraries such as autoprefixer
and postcss
for asset processing and styling. Managing these dependencies and ensuring consistent versions across different developer environments can sometimes introduce friction into the workflow.
To address dependency management and guarantee a uniform development experience for all contributors, I propose a Docker-based development workflow. This approach involves providing a pre-configured Docker container that encapsulates all necessary dependencies for Hugo and Docsy, including Go and Node.js with the required libraries. Developers can then mount their local KIP source code directory into this Docker container.
This Dockerized setup offers a few benefits:
- Consistent Development Environment: Docker ensures that every developer works within an identical environment, eliminating "works on my machine" issues caused by differing dependency versions or operating system configurations.
- Simplified Dependency Management: The Docker image pre-installs and manages all dependencies, removing the burden from individual developers to set up and maintain their local environments. This simplifies onboarding for new contributors and reduces potential compatibility problems.
- Live Reload Support within Docker: Hugo's live reload functionality seamlessly integrates with the Dockerized environment. By mounting the local source directory into the container, Hugo running inside Docker can still monitor file changes and trigger browser refreshes on the host machine, preserving the rapid feedback loop essential for efficient development.
An example of this is in the prototype.
Furthermore, since we do not rely on any server side constructs for serving the website generated by hugo, what the developer sees during development lifecycle is what will be available in production.
Build and Deployment
Leverage hugo
and docsy
toolchain to generate static html website from markdown source. Package the website as a docker container and host it behind existing website serving infrastructure.
Prototype
To demonstrate the feasibility of this transition, I created a working prototype of the Apache Kafka documentation using Hugo and Docsy.
- Source code for the website: https://github.com/hvishwanath/kafka-site-md . Specifically “content/en” directory shows the markdown source with some refactoring for improved maintainability.
I wrote some automation to help with this: https://github.com/hvishwanath/ak2md
Compatibility, Deprecation, and Migration Plan
- Compatibility: This change is not expected to impact the compatibility of Apache Kafka itself. It only affects the documentation website.
- Deprecation: The existing HTML-based documentation website will be deprecated once the new Markdown-based version is launched.
- Migration Plan:
- Complete the conversion of any remaining HTML documentation to Markdown.
- Thoroughly test the migrated documentation to ensure accuracy and consistency.
- Deploy the new documentation website.
- Update any relevant links or references to the documentation.
Test Plan
Write automation to ensure all html source files are converted to markdown. Manual testing to ensure completeness, correctness and required functionality is present in the new website. Run broken link checker to verify that internal links within the documentation are correct.
Rejected Alternatives
Alternative Static Site Generators: While other static site generators exist, Hugo was chosen for its popularity, maturity, and strong community support. The Docsy theme aligns well with the needs of technical documentation.