Page tree
Skip to end of metadata
Go to start of metadata

Documents by category

Project Incubation (2016)

  • Technical Vision [doc], [slides]
  • Repository Structure [doc]
  • Flink runner: Current status and development roadmap [doc]
  • Spark Runner Technical Vision [doc]
  • PPMC deep dive [slides]

Beam Model

  • Checkpoints [doc]
  • A New DoFn [doc], [slides]
  • Proposed Splittable DoFn API changes [doc]
  • Splittable DoFn (Obsoletes Source API) [doc]
    • Reimplementing Beam API classes on top of Splittable DoFn on top of Source API [doc]
    • New TextIO features based on SDF [doc]
    • Watch transform [doc]
    • Bundles w/ SplittableDoFns [doc]
  • State and Timers for DoFn [doc]
  • ContextFn [doc]
  • Static Display Data [doc]
  • Lateness (and Panes) in Apache Beam [doc]
  • Triggers in Apache Beam [doc]
  • Triggering is for sinks [doc] (not implemented)
  • Guard against “Trigger Finishing” [doc]
  • Pipeline Drain [doc]
  • Pipelines Considered Harmful [doc]
  • Side-Channel Inputs [doc]
  • Dynamic Pipeline Options [doc]
  • SDK Support for Reading Dynamic PipelineOptions [doc]
  • Fine-grained Resource Configuration in Beam [doc]
  • External Join with KV Stores [doc]
  • Error Reporting Callback (WIP) [doc]
  • Snapshotting and Updating Beam Pipelines [doc]
  • Requiring PTransform to set a coder on its resulting collections [mail]
  • Support of @RequiresStableInput annotation [doc], [mail]
  • [PROPOSAL] @onwindowexpiration [mail]
  • AutoValue Coding and Row Support [doc]
  • HyperLogLog++ Integration with Apache Beam [doc]
  • Retractions [doc]
  • @RequiresTimeSortedInput annotation for stateful DoFns [doc]

IO / Filesystem

  • IOChannelFactory Redesign [doc]
  • Configurable BeamFileSystem [doc]
  • New API for writing files in Beam [doc]
  • Dynamic file-based sinks [doc]
  • Event Time and Watermarks in KafkaIO [doc]
  • Exactly-once Kafka sink [doc]
  • Beam GCP Debuggability Metrics [doc]

Metrics

  • Defining and Adding SDK Metrics via FN API [doc]
  • Histogram Style Metrics - [doc]
  • Get Metrics API: Metric Extraction via proto RPC API. [doc]
  • Metrics API [doc]
  • I/O Metrics [doc]
  • Metrics extraction independent from runners / execution engines [doc]
  • Watermark Metrics [doc]
  • Support Dropwizard Metrics in Beam [doc]
  • Beam GCP Debuggability Metrics [doc]

Runners

  • Runner Authoring Guide [doc] (obsoletes [doc] and [doc])
  • Composite PInputs, POutputs, and the Runner API [doc]
  • Side Input Architecture for Apache Beam [doc]
  • Runner supported features plugin [doc]
  • Structured streaming Spark Runner [doc]

SQL / Schema

  • Streams and Tables [doc]
  • Streaming SQL [doc]
  • Schema-Aware PCollections [doc]
  • Pubsub to Beam SQL [doc]
  • Apache Beam Proposal: design of DSL SQL interface [doc]
  • Calcite/Beam SQL Windowing [doc]
  • Reject Unsupported Windowing Strategies in JOIN [doc]
  • Beam DSL_SQL branch API review [doc]
  • Complex Types Support for Beam SQL DDL [mail]
  • [SQL] Reject unsupported inputs to Joins [mail]
  • Integrating runners & IO [doc]
  • Beam SQL Pipeline Options [doc]
  • Unbounded limit [doc]
  • Portable Beam Schemas [doc]
  • Cost Based Optimizer [doc1, doc2]
  • ZetaSQL as a dialect in BeamSQL [doc]
  • Project and predicate push-down [doc]

Portability

  • Portability Framework
    • The model protos contain all aspects of the portability API and is the truth on the ground. The proto definitions supercede any design documents. The main design documents are the following:
    • Runner API. Pipeline representation and discussion on primitive/composite transforms and optimizations.

    • Job API. Job submission and management protocol.

    • Fn API. Execution-side control and data protocols and overview.

    • Container contract. Execution-side docker container invocation and provisioning protocols. See CONTAINERS.md for how to build container images.

    • Cross language. Options and tradeoffs for how to handle various kinds of multi-language/multi-SDK pipelines.
  • Fn API
    • Apache Beam Fn API Overview [doc]
    • Processing a Bundle [doc]
    • Progress [doc]
    • Graphical view of progress [doc]
    • Fn State API and Bundle Processing [doc]
    • Checkpointing and splitting of Beam bundles over the Fn API, with application to SDF [doc]
    • How to send and receive data [doc]
    • Defining and adding SDK Metrics [doc]
    • SDK harness container contract [doc]
    • Structure and Lifting of Combines [doc]
  • SDK X with Runner Y using Runner API [doc]
  • Flink Portable Runner Overview [doc]
  • Launching portable pipeline on Flink Runner [doc]
  • Portability support [table]
  • Portability Prototype [doc]
  • Portable Artifact Staging [doc]
  • Portable Beam on Flink [doc]
  • Portability API: How to Checkpoint and Split Bundles [doc]
  • Portability API: How to Finalize Bundles [doc]
  • Side Input in Universal Reference Runner [doc]
  • Spark Portable Runner Overview [doc]
  • Cross-Language

Build / Testing

  • More Expressive PAsserts [doc]
  • Mergebot design document [doc]
  • Performance tests for commonly used file-based I/O PTransforms [doc]
  • Performance tests results analysis and basic regression detection [doc]
  • Eventual PAssert [doc]
  • Testing I/O Transforms in Apache Beam [doc]
  • Reproducible Environment for Jenkins Tests By Using Container [doc]
  • Keeping precommit times fast [doc]
  • Increase Beam post-commit tests stability [doc]
  • Beam-Site Automation Reliability [doc]
  • Managing outdated dependencies [doc]
  • Automation For Beam Dependency Check [doc]
  • Test performance of core Apache Beam operations [doc]
  • Add static code analysis quality gates to Beam [doc]
  • Portable batch & streaming load tests in all sdks [doc]
  • Storing, displaying and detecting anomalies in test results using Prometheus and Grafana [doc]
  • Storing, displaying and detecting anomalies in test results (corrected version of the previous proposal) [doc]

Deployment

  • Beam on Flink on Kubernetes [doc]

Python

  • Beam Python User State and Timer APIs [doc]
  • Python Kafka connector [doc]
  • Python 3 support [doc]
  • Splittable DoFn for Python SDK [doc]
  • Parquet IO for Python SDK [doc]
  • Building Python Wheels [doc]
  • Beam Type Hints for Python 3 [doc]

Go

Other

  • Euphoria - High-Level Java 8 DSL [doc]
  • Apache Beam Code Review Guide [doc]
  • Nexmark - Nexmark

Some of documents are available on this google drive

To add new design document it is recommended to use this design document template

  • No labels