Here are some suggestions for some interesting Hadoop projects. For more information, please inquire on the Hadoop mailing lists. Also, please update and add to these lists.
For good small JIRAs to get started on, see this list of newbie jiras and this list of test failures. There is also the list of all open Hadoop Issues with no patch.
Test Projects
Rough estimates are given in hours. These estimates assume an existing understanding of Hadoop.
Description |
Estimate |
Links |
write Junit test cases that run the Hadoop examples |
16 |
|
MapRed reliability tests |
40 |
|
HDFS reliability tests |
40 |
|
refactor TestDFSUpgradeFromImage (once HADOOP-1622 is committed) to auto zip and unzip the supporting DFS image |
6 |
|
write compatibility tests for reading the same data set from different HDFS versions |
24 |
|
re-write (or drop) flaky TestMiniMRWithDFS unit test |
8 |
|
write the "system tests" for DFS Upgrade |
20 |
|
write new unit tests based on code coverage |
40 |
|
pipes and libhddfs benchmark tests |
30 |
|
review Findbugs warnings and fix the reasonable warnings |
18 |
|
create a distributed JUnit runner on top of Hadoop |
80 |
|
implement a Map-Reduce application which can be used to reliably launch speculative tasks |
40 |
Research Projects
Check out this page of Hadoop Research Projects.
Tool Investigations
We are always looking for open source testing tools that add value to our development and build process. Here are some that need to be investigated.
Description |
Links |
evaluate new unit test frameworks (rewriting some existing test in the new framework to show benefits) |
|
evaluate mock object frameworks for unit testing |
|
evaluate PMD |
|
evaluate concurrency test tools |
|
evaluate Fortify |
|
evaluate dashboards like QALab and Panopticode |
|
evaluate Faban |
|
evaluate NCSS, a source code metrics suite |
|
evaluate Java PathFinder, a software model checker |
|
evaluate SA4J, a structural dependency analysis |
|
evaluate JDepend, generates design quality metrics |
|
evaluate Dependency Finder, generates design quality metrics and dependency graphs |
|
evaluate Classycle, finds class and package cyclic dependencies |
|
evaluate XRadar, an extensible code report tool |
|
evaluate Crap4j, combines cyclomatic complexity and code coverage |
|
evaluate Eclipse TPTP, a test and performance tools platform |
|
evaluate JUnitFactory |
|
evaluate code review applications like Codestriker and Review Board |
|
evaluate test automation frameworks |
|
evaluate QA management platform |
Random Ideas
Description |
Estimate |
Links |
Implement a advanced job control framework to help chain multiple Map-Reduce jobs i.e. investigate/improve upon existing org.apache.hadoop.mapred.jobcontrol package. |
tbd |
|
Implement a library/framework to support Genetic Algorithms on Hadoop Map-Reduce. |
tbd |
|
Improve the Eclipse Plugin |
tbd |
|