This page is in draft. Refer to the dev mail list for more information
Droids Web Services is a proposed module (i.e. not yet implemented) that offer web crawling functionalities in cloud computing platform. It works as follows:
- A web application that expose Droids core functions in Web APIs
- support URL fetching, HTML/Image parsing, and data extraction
- Spring HTTP Invoker is chosen. (any binary web remoting technology is fine.)
- The original Droids client component that is configured to use a remote worker
- The worker will no longer make local request to do fetch. Instead, it make remoting call to the web services and collect results.
Requirement
- unlimited scalability / extreme throughput
- support any cloud computing platform, e.g. Google App Engine, Amazon EC2 etc.
- share nothing in the server application. no use of session. every remote method call is a complete process.
Dependency
- Spring
- allow transparently switch from a local component to a remote component in the client
- allow easy exposing any service with a Web API
- Google App Engine API
- for use in GAE
- URL Fetching Service
Restrictions
- any component that pass to the remote API must be serializable (for sure!)
- the master task/link queue is in a single JVM like the original Droids.
Reference- restrictions in Google App Engine
- 30s per request