You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Domain-specific language for graph processing: Create a GraphDataSet that
abstracts away the internal representation of a graph and operations on the
GraphDataSet. The project involves gathering requirements for graph
processing functionality, architecting the DSL, implementation, and
possible work on optimizing the operations when a graph operation can be
mapped to different DataSet to DataSet transformations.

Distributed mutable state: Currently delta iterations use internally a hash
index to store the state of the iteration, and they invoke index merging
functionality. One idea would be to surface an operator (with care) to the
APIs that essentially allows mutable state manipulations. Another idea
would be to implement something along the lines of a parameter server and
make such functionality accessible to the APIs.

Enhance Flink's Monitoring Capabilities: Flink has a web interface to track the progress
of running jobs. In addition to that, it contains some basic system information.
We would like to enhance the monitoring capabilities to a much broader set of features, including
system performance monitoring (cpu, memory, disk, network, processes) and application
level monitoring (records processed per second, garbage collection statistics, input/output ratio,
data distribution information, iteration statistics).
There is also a need for reworking the internal APIs of the current webinterface. Changing the 
AJAX requests to a well-defined API (Rest, ...), integration with other systems such as Ambari will be
much easier. 


Domain-specific language for spatial data: Create spatial data types
(point, region, etc) and operations thereof

Integration into Apache BigTop

Integration with Apache Ambari

Pig frontend for Flink: An initial effort was here:
http://kth.diva-portal.org/smash/get/diva2:539046/FULLTEXT01.pdf

Cascading on Flink

Optimizing the integration with columnar file formats (Parquet, ORCFile)
and perhaps eventually pushing filters down to data scans.

Statistical operators to extract statistical information from a DataSet
(e.g., histograms of value distributions)

Integration with Apache Mahout (ongoing effort)

Integration with Apache Tez (ongoing effort)

Flink Streaming (ongoing effort)

Eclipse plugin that includes functionality for execution plan debugging

Local execution of programs using Java Collections

Utility Library:  Unable to locate Jira server for this macro. It may be due to Application Link configuration.


 

  • No labels