DUE TO SPAM, SIGN-UP IS DISABLED. Goto Selfserve wiki signup and request an account.
...
Contents
...
[GSOC] [SkyWalking] Self-Observability of the query subsystem in BanyanDB
Background
SkyWalking BanyanDB is an observability database, aims to ingest, analyze and store Metrics, Tracing and Logging data.
Objectives
- Support EXPLAIN[1] for both measure query and stream query
- Add self-observability including trace and metrics for query subsystem
- Support EXPLAIN in the client SDK & CLI and add query plan visualization in the UI
[1]: EXPLAIN in MySQL
Recommended Skills
- Familiar with Go
- Have a basic understanding of database query engine
- Have an experience of Apache SkyWalking or other APMs
Mentor
- Mentor: Jiajing Lu, Apache SkyWalking PMC, lujiajing@apache.org

- Mentor: Hongtao Gao, Apache SkyWalking PMC, Apache ShardingSphere PMC, hanahmily@apache.org

- Mailing List: dev@skywalking.apache.org
Beam
[GSOC][Beam] Build out Beam
Yaml featuresUse Cases
Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and Runners for executing them on distributed processing backends. Beam recently added support for launching jobs using Yaml on top of its other SDKs, this project would focus on adding more features and transforms to the Yaml SDK so that it can be the easiest way to define your data pipelines.On top of providing lower level primitives, Beam has also introduced several higher level transforms used for machine learning and some general data processing use cases. This project focuses on identifying and implementing real world use cases that use these transforms
Objectives:
1. Add support for existing Beam transforms (IOs, Machine Learning transforms, and others) to the Yaml SDKreal world use cases demonstrating Beam's MLTransform for preprocessing data and generating embeddings
2. Add end to end pipeline use cases using the Yaml SDK
3. (stretch) Add Yaml SDK support to the Beam playgroundreal world use cases demonstrating Beam's Enrichment transform for enriching existing data with data from a slowly changing source.
3. (Stretch) Implement 1 or more additional "enrichment handlers" for interacting with currently unsupported sources
Useful links:
Apache Beam repo - https://github.com/apache/beam
Yaml SDK code + docs
MLTransform docs - https://beam.apache.org/documentation/transforms/python/elementwise/mltransform/
Enrichment code - https://github.com/apache/beam/treeblob/master/sdks/python/apache_beam/yaml
Open issues for the Yaml SDK transforms/enrichment.py
Enrichment docs (should be published soon) - https://github.com/apache/beam/issues?q=is%3Aopen+is%3Aissue+label%3Ayamlpull/30187
[GSOC][Beam] Add connectors to Beam ManagedIO
Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and Runners for executing them on distributed processing backends. On top of providing lower level primitives, Beam has also introduced several higher level transforms used for machine learning and some general data processing use cases. One new transform that is being actively worked on is a unified ManagedIO transform which gives runners the ability to manage (upgrade, optimize, etc...) an IO (input source or output sink) without upgrading the whole pipeline. This project will be about adding one or more IO integrations to ManagedIO
Objectives:
1. Add a BigTable integration to ManagedIO
2. Add a Spanner integration to ManagedIO
Useful links:
Apache Beam repo - https://github.com/apache/beam
Docs on ManagedIO are relatively light since this is a new project, but here are some docs on existing IOs in Beam - https://beam.apache.org/documentation/io/connectors/
[GSOC][Beam] Build out Beam
Use CasesYaml features
Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and Runners for executing them on distributed processing backends. On top of providing lower level primitives, Beam has also introduced several higher level transforms used for machine learning and some general data processing use cases. This project focuses on identifying and implementing real world use cases that use these transformsBeam recently added support for launching jobs using Yaml on top of its other SDKs, this project would focus on adding more features and transforms to the Yaml SDK so that it can be the easiest way to define your data pipelines.
Objectives:
1. Add real world use cases demonstrating Beam's MLTransform for preprocessing data and generating embeddingssupport for existing Beam transforms (IOs, Machine Learning transforms, and others) to the Yaml SDK
2. Add real world use cases demonstrating Beam's Enrichment transform for enriching existing data with data from a slowly changing source.end to end pipeline use cases using the Yaml SDK
3. (Stretch) Implement 1 or more additional "enrichment handlers" for interacting with currently unsupported sourcesstretch) Add Yaml SDK support to the Beam playground
Useful links:
Apache Beam repo - https://github.com/apache/beam
MLTransform Yaml SDK code + docs - https://beam.apache.org/documentation/transforms/python/elementwise/mltransform/
Enrichment code - https://github.com/apache/beam/blobtree/master/sdks/python/apache_beam/transforms/enrichment.py
Enrichment docs (should be published soon) yaml
Open issues for the Yaml SDK - https://github.com/apache/beam/pull/30187issues?q=is%3Aopen+is%3Aissue+label%3Ayaml
...