Apache Droids > Index > Droids Web Services
Added by Mingfai Ma, last edited by Mingfai Ma on Jun 20, 2009  (view change)

This page is in draft. Refer to the dev mail list for more information

Droids Web Services is a proposed module (i.e. not yet implemented) that offer web crawling functionalities in cloud computing platform. It works as follows:

  • A web application that expose Droids core functions in Web APIs
    • support URL fetching, HTML/Image parsing, and data extraction
    • Spring HTTP Invoker is chosen. (any binary web remoting technology is fine.)
  • The original Droids client component that is configured to use a remote worker
    • The worker will no longer make local request to do fetch. Instead, it make remoting call to the web services and collect results.

Requirement

  • unlimited scalability / extreme throughput
  • support any cloud computing platform, e.g. Google App Engine, Amazon EC2 etc.
  • share nothing in the server application. no use of session. every remote method call is a complete process.

Dependency

  • Spring
    • allow transparently switch from a local component to a remote component in the client
    • allow easy exposing any service with a Web API
  •  Google App Engine API
    • for use in GAE
    • URL Fetching Service

 Restrictions

  • any component that pass to the remote API must be serializable (for sure!)
  • the master task/link queue is in a single JVM like the original Droids. 

Reference- restrictions in Google App Engine

  • 30s per request