Pushpull is deprecated... see OODT-837 - Getting issue details... STATUS


OODT's PushPull component framework provides a client architecture for accessing an array of remote resources. This component is used to pull from remote resources and push to local ones. It is typically used in conjunction with the CAS Crawler component. An example use case would be pulling data products from a remote FTP service and pushing them to a local staging area from which the CAS Crawler then then inject them into the File Manager.

Download and Install

  1. Download a Released tarball/zip from the Downloads page. (http://oodt.apache.org/download)
  2. Uncompress it
  3. cd into the apache-oodt-{version} folder
  4. mvn package

Now the required Maven artifacts have been downloaded and installed into your local maven m2 repo. Time for deployment to your local machine or to another server.


  1. cd into pushpull/target
  2. Copy the tarball (cas-pushpull-{version}-dist.tar.gz) to your deployment location
  3. untar the tarball and you will have folder named cas-pushpull-{version} with the following directory structure

/bin /etc /lib /logs /policy


Basic Configuration

This is a set of configuration that must be completed to get the Push/Pull framework setup. These setups are required for even the most basic installations. We will cover deployment specific setup/configuration in the next section.

This documentation has been written assuming the environment variable CAS_PP_HOME has been set to the directory where you have untar'd the pushpull component. Several configuration properties require a full file path. Just be sure to replace the CAS_PP_HOME with a value that is applicable to your deployment, or export that environment variable and use the following config.

The following Sub-Sections will reference the path to each file that needs to be edited, and each file will be followed by a block showing what changes need to be made


21   #external configuration files
22   org.apache.oodt.cas.pushpull.config.external.properties.files=[CAS_PP_HOME]/etc/default.properties

35   # ingester filemgr url
36   org.apache.oodt.cas.filemgr.url=

61   #protocolfactory specification for protocol types
62   org.apache.oodt.cas.pushpull.config.protocolfactory.info.files=[CAS_PP_HOME]/policy/ProtocolFactoryInfo.xml

69   #parser to retrievalmethod map
70   org.apache.oodt.cas.pushpull.config.parser.info.files=[CAS_PP_HOME]/policy/ParserToRetrievalMethodMap.xml
72   #unique metadata element info
73   org.apache.oodt.cas.pushpull.config.type.detection.file=[CAS_PP_HOME]/policy/mimetypes.xml
75   #directory below which all data file will be downloaded to
76   org.apache.oodt.cas.pushpull.data.files.base.staging.area=[CAS_PP_HOME]/staging

Specific Configuration(s)

Due to the limitless combinations of protocols and remote data archives the following list of example is NOT exhaustive and is intended to give you working examples. Each configuration will begin with a summary description of the problem being solved, then it will be followed with config/setups needed to solve the problem.


Example of Connecting to a Remote FTP Server to Retrieve All *.he5 Files

Connection Protocol: FTP
Root Path: ftp://l4ftl01.larc.nasa.gov/TES/TL2CO2N.005/
Password Required: NO
Download (All or Subset)?: All

Examples of full path to where the data resides on the FTP server:



Within the mimetypes.xml file we need to map a filename pattern (regex or not) to a custom mimetype. Below we have 3 mimetypes, the first 2 are default in pushpull the 3rd is a custom one based on the filenaming of our desired HDF-5 remote files.

    <mime-type type="metadata/cas_pushpull">
        <glob pattern="*.info.tmp"/>
    <mime-type type="metadata/cas_metadata">
        <glob pattern="*.cas"/>
        <glob pattern="*.met"/>
    <mime-type type="product/TESLevel2CO2">
        <_comment>Level 2 - CO2 Retrivals from TES</_comment>
        <glob pattern="TES-Aura_L2-CO2-Nadir_r\d{10}\w{2}\d{2}\w\d{2}\.he5" isregex="true"/>


Purpose: This file contains a list of External Data Sources such as FTP Servers. The login.alias attribute will be used within the RemoteSpecs.xml file. This file is located in the etc/examples folder and contains several great examples that you can tailor to your application. I have removed all un-used ExternalSources to make sure I don't go download files I don't want. The source.host doesn't contain the URI prefix (ftp://, http://) and there is NO trailing slash. The login.type takes care of the prefix.

    <source host="l4ftl01.larc.nasa.gov">
        <login type="ftp" alias="TESL2CO2">


Purpose: This file will first reference the aliases listed in the ExternalSources.xml file from the previous section. Then you can define one or more daemons. The daemon.alias must be listed in the ExternalSources.xml so the daemon will know where it should look for files. The propInfo and propFiles tell the daemon exactly what directories and files to retrieve. We will need to create an xml file called TESL2CO2.xml and place it in the propInfo.dir location. For simplicity I have kept the alias, propFiles and staging area the same (TESL2CO2).  The period attribute on the runInfo tag is used to set the sleep/wait time for the daemon.  Default in 3 minutes, but you may want to adjust this later in production.

        <aliasSpec file="[CAS_PP_HOME]/etc/examples/ExternalSources/ExternalSources.xml"/>

        <daemon alias="TESL2CO2" active="yes">
            <runInfo firstRunDateTime="2011-12-01T00:00:00Z" period="3m" runOnReboot="yes"/>
            <propInfo dir="[CAS_PP_HOME]/etc/examples/DirStructXmlParserFiles">
                <propFiles regExp="TESL2CO2\.xml" parser="org.apache.oodt.cas.pushpull.filerestrictions.parsers.DirStructXmlParser"/>
            <dataInfo stagingArea="TESL2CO2" deleteFromServer="no"/>


Purpose: This file tells pushpull how to parse the remote directory structure. In this example the starting_path is static for all of our remote file paths, but then we have dynamic folders that correspond to a YYYY.MM.DD format so we have a simple regex to pushpull will dig down into each subfolder and will pull out the filename we have declared with another regex.
Within the examples/DirStructXmlParserFiles there are several different examples to learn from.

    <dirstruct starting_path="/TES/TL2CO2N.005">
        <dir name="\d{4}\.\d{2}\.\d{2}"> <!-- regex matching '2004.09.20' -->
            <!-- regex matching TES-Aura_L2-CO2-Nadir_r0000002147_F06_09.he5 -->
            <file name="TES-Aura_L2-CO2-Nadir_r\d{10}\w{2}\d{2}\w\d{2}\.he5"/>

Launching the PushPull Daemon

Located within $CAS_PP_HOME/bin there is a shell script that you can use to launch the PushPull daemon process. You will either need to edit the pushpull file directly to make the proper adjustments or export 2 environment variables. The following steps will assume that we are starting the daemon to run using the configs listed above.

  1. cd $CAS_PP_HOME/bin
  2. The two options listed below:
    1. Export 2 env vars
    2. Replace the CAS_PP_RESOURCES and DAEMONLAUNCHER_PORT with static values

      25   ${JAVA_HOME}/bin/java \
      26   -cp ${LIB_DEPS} -Dcom.sun.management.jmxremote \
      27   -Djava.util.logging.config.file=../etc/logging.properties \
      28   -Djavax.net.ssl.trustStore=${CAS_PP_RESOURCES}/jssecacerts \
      29   org.apache.oodt.cas.pushpull.daemon.DaemonLauncher \
      30   --rmiRegistryPort ${DAEMONLAUNCHER_PORT} \
      31   --propertiesFile ${CAS_PP_RESOURCES}/push_pull_framework.properties \
      32   --remoteSpecsFile ${CAS_PP_RESOURCES}/examples/RemoteSpecsFiles/RemoteSpecs.xml
      # You can leave this file unchanged by merely exporting the following env vars (bash shell)
      export CAS_PP_RESOURCES=$CAS_PP_HOME/etc
      export DAEMONLAUNCHER_PORT=9012
      # Or you can always use this config and not setup env vars
      25   ${JAVA_HOME}/bin/java \
      26   -cp ${LIB_DEPS} -Dcom.sun.management.jmxremote \
      27   -Djava.util.logging.config.file=${CAS_PP_HOME}/etc/logging.properties \
      28   -Djavax.net.ssl.trustStore=${CAS_PP_HOME}/etc/jssecacerts \
      29   org.apache.oodt.cas.pushpull.daemon.DaemonLauncher \
      30   --rmiRegistryPort 9012 \
      31   --propertiesFile ${CAS_PP_HOME}/etc/push_pull_framework.properties \
      32   --remoteSpecsFile ${CAS_PP_HOME}/etc/examples/RemoteSpecsFiles/RemoteSpecs.xml
  3. ./pushpull

That should be about it. The daemon should start up on port 9012 (given this config)

FAQ Section

Pushpull keeps re-downloading files I have ingested. How can I prevent PushPull from repeatedly downloading products?

1. You will need to have a fileManager that pushpull can inspect to see if the product has been ingested into the archive.

# ingester filemgr url

2. Then you just configure the RemoteSpecs.xml file and update the <dataInfo> element and set queryElement="Filename" within the <daemon> block. If you have multiple daemon's configured you will have to configure each one.

<dataInfo stagingArea="MOD09GA-NRT" deleteFromServer="no" queryElement="Filename"/>

No data file is downloaded to my staging directory after running the ./pushpull script. What should I do? 

1. Make sure there are indeed some qualified data files in the remote ftp server.  

2. This may be caused by the protocol issues of the PushPull ftp plugins. So please try the other PushPull ftp plugins. For the details please refer to OODT Push Pull Plugins.

  • No labels


  1. Hi Cameron,
    this is an excellent guide, thanks for taking the time to write it. I assume it will become the PushPull User Guide on the OODT Apache site.
    I think there is only only piece of information missing: how to start the daemon with the script provided in the bin directory. This will also explain how the DaemonLauncher takes as input the RemoteSpecs.xml file, which in turns references the other two files in the examples directory. At first I was having trouble figuring out how these XML files would be loaded at startup, and it turns out it's from the bin/ script.
    thanks again, this is great.

    1. Hey Luca,

      Sorry about the long delay in fixing up the How to Launch the Daemon section, but I just added it in today.

      I am not sure where to configure the amount of time the pushpull daemon will sleep and I have been opening files all over the place. If someone else knows off the top of their head, then I will gladly add it to the Launch section. Hopefully this addition will help new users get up to speed.

      Thank you for providing your feedback, I appreciate it.