If you are new to OODT, you may easily be overwhelmed by the sheer number of components and configurations in OODT. To make it easy to understand, a real-world use case of OODT is mentioned step-by-step in the content below, to get an understanding of how each component in OODT works together as one ecosystem.

Scenario

You have a remote server that hosts a set of files that needs to be processed via a certain workflow (say Process A → Process B → Process C). The files on the remote server are added dynamically over time, and this processing needs to be automated in a way that every time a file or files are added to the remote server, it runs through the processing stages, and the result is stored (possibly somewhere on the same remote server).

How OODT can be set up for this scenario

  1. OODT PushPull wakes up every hour and checks a remote FTP server for any new files that haven't been downloaded/ingested.
  2. If it finds new files it will download them into a Staging Area, if not the PushPull Daemon will go to sleep for another hour.
  3. In parallel the OODT Crawler Component is running every 20 minutes on that staging directory and looking for new data. This component will first check-in with the OODT FileManager to see if we have already ingested/archived/cataloged the file.
  4. If the file exists, the crawler will ignore it, removing the risk of getting duplicates.
  5. If we need the file then Crawler will extract Metadata and pass the file along to the FileManager for archive/ingestion into the catalog.
  6. After FileManager ingestion succeeds a PostIngestSuccess action is raised which starts a Workflow within the OODT WorkflowManager.
  7. The WorkflowManager is configured beforehand to actually send this workflow as a job to the OODT ResourceManager.
  8. ResourceManager is aware of the the nodes registered in its cluster(s) so it marshals this job to the nodes and checks on them to see if they are done.
  9. Once the workflow job is complete, the ResourceManager will tell the Workflow Manager the job is complete.
  10. At this point the OODT Crawler is invoked on the job output directory, metadata is extracted and the final output products are ingested into the FileManager.


The steps above highlights certain terms and components used in OODT. You can read more on these topics to understand OODT better.

  • No labels