Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The roadmap below can provide an indication of the algorithms we aim to implement in for the coming
months. If you are interested in helping out, please check our [contribution guide](contribution_guide.html).library.
Items in bold have already been implemented:

  • Pipelines of transformers and learners
  • Data pre-processing
    • Feature scaling
    • Polynomial feature base mapper
    • Feature hashing
    • Feature extraction for text
    • Dimensionality reduction
  • Model selection and performance evaluation
    • Model evaluation using a variety of scoring functions
    • Cross-validation for model selection and evaluation
    • Hyper-parameter optimization
  • Supervised learning
    • Optimization framework
      • Stochastic Gradient Descent
      • L-BFGS
    • Generalized Linear Models
      • Multiple linear regression
      • LASSO, Ridge regression
      • Multi-class Logistic regression
      • Random forests
      • Support Vector Machines
    • Decision trees
  • Unsupervised learning
    • Clustering
      • K-means clustering
    • Principal Components Analysis
  • Recommendation
    • ALS
  • Text analytics
    • LDA
  • Statistical estimation tools
  • Distributed linear algebra
  • Streaming ML

 

How can I help?

Any contribution to the above roadmap is welcome! You can also check out the list of open issues
for FlinkML on JIRA, or send a message with your idea to the Flink developers list.

We recommend reading the FlinkML contribution guide before starting out, and definitely
subscribe and post a message on the Flink developers mailing list to introduce yourself!


References:

Anchor
technical
technical
D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary,
and M. Young. Machine learning: The high interest credit card of technical debt. In SE4ML:
Software Engineering for Machine Learning (NIPS 2014 Workshop), 2014.