This page is in draft. Refer to the dev mail list for more information
Droids Web Services is a proposed module (i.e. not yet implemented) that offer web crawling functionalities in cloud computing platform. It works as follows:
- A web application that expose Droids core functions in Web APIs
- support URL fetching, HTML/Image parsing, and data extraction
- Spring HTTP Invoker is chosen. (any binary web remoting technology is fine.)
- The original Droids client component, with configuration to call remote Worker rather than using the local worker
Requirement
- unlimited scalability / extreme throughput
- support any cloud computing platform, e.g. Google App Engine, Amazon EC2 etc.
- share nothing in the server application. no use of session. every remote method call is a complete process.
Dependency
- Spring
- allow transparently switch from a local component to a remote component in the client
- allow easy exposing any service with a Web API
- Google App Engine API
- for use in GAE
- URL Fetching Service
Restrictions
- any component that pass to the remote API must be serializable (for sure!)
- the master task/link queue is in a single JVM like the original Droids.
Reference- restrictions in Google App Engine
- 30s per request