Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Portability Framework
    • The model protos contain all aspects of the portability API and is the truth on the ground. The proto definitions supercede any design documents. The main design documents are the following:
    • Runner API. Pipeline representation and discussion on primitive/composite transforms and optimizations.

    • Job API. Job submission and management protocol.

    • Fn API. Execution-side control and data protocols and overview.

    • Container contract. Execution-side docker container invocation and provisioning protocols. See CONTAINERS.md for how to build container images.

    • Cross language. Options and tradeoffs for how to handle various kinds of multi-language/multi-SDK pipelines.
  • Fn API
    • Apache Beam Fn API Overview [doc]
    • Processing a Bundle [doc]
    • Progress [doc]
    • Graphical view of progress [doc]
    • Fn State API and Bundle Processing [doc]
    • Checkpointing and splitting of Beam bundles over the Fn API, with application to SDF [doc]
    • How to send and receive data [doc]
    • Defining and adding SDK Metrics [doc]
    • SDK harness container contract [doc]
    • Structure and Lifting of Combines [doc]
  • SDK X with Runner Y using Runner API [doc]
  • Flink Portable Runner Overview [doc]
  • Launching portable pipeline on Flink Runner [doc]
  • Portability support [table]
  • Portability Prototype [doc]
  • Portable Artifact Staging [doc]
  • Portable Beam on Flink [doc]
  • Portability API: How to Checkpoint and Split Bundles [doc]
  • Portability API: How to Finalize Bundles [doc]
  • Side Input in Universal Reference Runner [doc]
  • Spark Portable Runner Overview [doc]
  • Cross-Language
    • Cross-language Beam Pipelines [doc]
    • Cross-Language Pipelines & Legacy IO [doc]
    • Artifact Staging in Cross-Language Pipelines [doc]
    • Cross-Language Table Provider [s.apache.org/xlang-table-provider]
    • Auto-generating external transform wrappers [doc]
  • Environment Resources and Annotations [doc]

...

  • Beam Python User State and Timer APIs [doc]
  • Python Kafka connector [doc]
  • Python 3 support [doc]
  • Splittable DoFn for Python SDK [doc]
  • Parquet IO for Python SDK [doc]
  • Building Python Wheels [doc]
  • Beam Type Hints for Python 3 [doc]
  • Pandas Dataframe API for Beam [doc]
  • Batched DoFns [doc]
  • PEP 585 Type Hints for Python 3.9+ [doc]
  • The Current State of Beam Python Type Hinting (as of 2.52.0) [doc]
  • Enrichment transform [doc]
  • Dependency Extras [doc]

Go

  • Apache Beam Go SDK design [doc]
  • Go SDK Vanity Import Path [doc] (unimplemented)
    • Needs to be adjusted to account for Go Modules.
  • Go SDK Integration Tests [doc]
  • Design RFC
    • Assumes Beam knowledge, but points out how Go's features informed the SDK design.
  • User Defined Coders + Original Schema Sketch 
  • Splittable DoFns for the Go SDK [doc]
  • Self-Checkpointing SDFs for the Go SDK [doc]
  • Bundle Finalization in the Go SDK [doc]
  • Watermark Estimation in the Go SDK [doc]
  • State and Timers in the Go SDK [doc]
  • Using Generics for Registration [doc]
  • Side Input Window Mapping [doc]
  • MultiMap Side Input Support [doc]
  • One-Pagers:
    • Investigation: Go Expansion Service Auto-Startup for Dev Environments [doc]

...

  • Custom Inference Functions [doc]
  • Model Updates using Side Inputs [doc]
  • RunInference: ML Inference in Beam [doc]
  • beam.MLTransform [ doc ]
  • Embeddings in MLTransform [doc]
  • TensorFlow Model Handler [doc]
  • Hugging Face Model Handler [doc]
  • Per Key Inference [doc]
  • Load N Model Copies in RunInference [doc]
  • Benchmarking RunInference with Multi-Process Shared Models [doc]

...