REEF is an excellent base layer for cloud computing projects and is used in several courses for that. On this page, we collect project ideas for students of these courses.
- Apache ZooKeeper as a service: Many REEF applications have a need for a system like ZooKeeper. This project would provide ZooKeeper as a background service that uses the Evaluators the Application needs anyhow for replication.
- Long-running application management on REEF: When REEF applications are used with long-running applications (e.g., database backends), it's handy to have a framework that deploys and manages long-running applications. Apache Slider is a framework to make it easy to deploy and manage long-running static applications in a YARN cluster. Its focus is to adapt existing applications such as HBase and Accumulo to run on YARN with little modification. Using Slider is a solution, but it's heavyweight to use. In this project, you will implement much lighter Slider-like functionalities on REEF. The tool itself is useful. In addition, this project can demonstrate that such a tool can be easily implementable with REEF.
- New language (e.g., Scala, python, C++) binding for Tang: Tang is the dependency injection framework for configuring distributed systems developed with REEF. The Tang APIs give configurations that are strongly typed and easily verified for correctness. In this project, you will implement a new language binding for Tang.
- Wake visualiser for profiling: Wake is the event-driven framework that REEF is built on. As the already large scale and high concurrency continues to increase, subtle performance bugs can crop up that are difficult to grasp. In this project, you will develop a pluggable instrumentation scheme for Wake and a visualiser that shows Wake status, in terms of request rate, queue lengths, thread pool sizes, computations times, etc.