This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Child pages
  • GSoC 2013 - Giraph Implementation of Nutch LinkRank Algorithm
Skip to end of metadata
Go to start of metadata

Giraph implementation of Nutch LinkRank Algorithm

Author

Renato Marroquin Mogrovejo - renatoj.marroquin at gmail dot com

Project Aim

  • Provide a new implementation of web site ranking to Apache Nutch while offering users the ability to extend ranking algorithms by using Apache Giraph.

Project Objectives

  1. Fully integrate the LinkRank algorithm developed within the Apache Giraph community into Apache Nutch due to the lack of ranking algorithms in the latest version of Nutch 1.
  2. Be able to reproduce the example in 3 but using the PageRank implementation in Giraph.
  3. Study different approaches and possibilities of creating variations of the open source PageRank2 as possible new/future ranking algorithms for Nutch.

Project Scope

  • Integrate Apache Giraph's PageRank implementation with Apache Nutch 2.x
  • Write an standard API with Apache Giraph to enable users/devs to create/use new algorithms developed with Apache Giraph

References

1 https://wiki.apache.org/nutch/NewScoring
2 https://ilpubs.stanford.edu:8090/422/1/1999-66.pdf
3 http://wiki.apache.org/nutch/NewScoringIndexingExample

  • No labels