S4 piper walkthrough
Skip to end of metadata
Go to start of metadata

Install S4

There are 2 ways:

  • Download the 0.5.0 release (warning) We recommend getting the "source" release and building it, because some dependencies that may not be available on your machine, but are required for the "binary" release.
  • or checkout from the Apache git repository, by following the instructions. The 0.5.0 tag corresponds to the current release.

If you get the binary release, s4 scripts are immediately available. Otherwise you must build the project:

  1. Compile and install S4 in the local maven repository: (you can also let the tests run without the -DskipTests option)
  2. Build the startup scripts: 

(warning) If you work with NFS, you may get some issues for building the apps. The ./s4 deploy command currently may not work properly (depending on the file locking settings of your cluster). However you can still build applications, deploy them (as s4r, see in a further section) and run them, but you may have to tell the build tool (gradle) to use the local file system for caches and repositories, by appending the following options when using gradle commands:


Start a new application

S4 provides some scripts in order to simplify development and testing of applications. Let's see how to create a new project and start a sample application.

Create a new project

  1. Create a new application template (here, we create it in the /tmp directory): 
  2. This creates a sample application in the specified directory, with the following structure:

Have a look at the sample project content

The src/main/java/hello directory contains 3 files: 

  • HelloPE.java : a very simple PE that simply prints the name contained in incoming events
  • HelloApp.java: defines a simple application: exposes an input stream ("names"), connected to the HelloPE. See the event dispatch configuration page for more information about how events are dispatched.
  • HelloInputAdapter is a simple adapter that reads character lines from a socket, converts them into events, and sends the events to interested S4 apps, through the "names" stream

Run the sample app

In order to run an S4 application, you need :

  • to set-up a cluster: provision a cluster and start S4 nodes for that cluster
  • to package the app
  • to publish the app on the cluster
  1. Set-up the cluster:
    1. In 2 steps:
      1. Start a Zookeeper server instance:
      2. Define a new cluster. Say a cluster named "cluster1" with 2 partitions, nodes listening to ports starting from 12000:
    2. Alternatively you may combine these two steps into a single one, by passing the cluster configuration inline with the zkServer command:
  2. Start 2 S4 nodes with the default configuration, and attach them to cluster "cluster1" :
    and again (maybe in another shell):
  3. Build, package and publish the app to cluster1:
    1. You may do that in a single step (currently, you must use the name of the current project, and you need to specify the gradle build file with a complete path).
      Note that specifying the app class is optional but avoids issues when the scripts tries to guess automatically the app class:
    2. You may also do that in 2 separate steps:
      1. Create an s4r archive. The following creates an archive named myApp.s4r (here you may specify an arbitrary name) in build/libs.
        Again specifying the app class is optional :
      2. Publish the s4r archive (you may first copy it to a more adequate place). The name of the app is arbitrary:
        (grey lightbulb) You can follow this method for a distributed deployment (by copying the s4r to a shared location on a distributed file system)
  4. S4 nodes will detect the new application, download it, load it and start it. You will get something like:

Great! The application is now deployed on 2 S4 nodes.

(grey lightbulb) You can check the status of the application, nodes and streams with the "status" command:

Now what we need is some input!

We can get input through an adapter, i.e. an S4 app that converts an external stream into S4 events, and injects the events into S4 clusters. In the sample application, the adapter is a very basic class, that extends App, listens to an input socket on port 15000, and converts each received line of characters into a generic S4 event, in which the line data is kept in a "name" field. We specify :

  • the adapter class
  • the name of the output stream
  • the cluster where to deploy this app

For easy testing, we provide a facility to start a node with an adapter app without having to package and deploy the adapter app.

  1. First, we need to define a new S4 subcluster for that app:
  2. Then we can start the adapter, and we use "names" for identifying the output stream (this is the same name used as input by the myApp app)
    (warning)   The adapter command must be run from the root of your S4 project (myApp dir in our case).
  1. Now let's just provide some data to the external stream (our adapter is listening to port 15000):
  2. One of the nodes should output in its console:

If you keep sending messages, nodes will alternatively display the "hello" messages because the adapter app sends keyless events on the "names" stream in a round-robin fashion by default.

What happened?

The following figures illustrate the various steps we have taken. The local file system is used as the S4 application repository in our example.


Run the Twitter trending example

Let's have a look at another application, that computes trendy Twitter topics by listening to the spritzer stream from the Twitter API. This application was adapted from a previous example in S4 0.3.

Overview

This application is divided into:

  • twitter-counter , in test-apps/twitter-counter/ : extracts topics from tweets and maintains a count of the most popular ones, periodically dumped to disk
  • twitter-adapter, in test-apps/twitter-adapter/ : listens to the feed from Twitter, converts status text into S4 events, and passes them to the "RawStatus" stream

Have a look at the code in these directories. You'll note that:

  • the build.gradle file must be tailored to include new dependencies (twitter4j libs in twitter-adapter)
  • events are partitioned through various keys

Run it!

Note: You need a twitter4j.properties file in your home directory with the following content (debug is optional):

  1. Start a Zookeeper instance. From the S4 base directory, do:
  2. Define 2 clusters : 1 for deploying the twitter-counter app, and 1 for the adapter app
  3. Start 2 app nodes (you may want to start each node in a separate console) :
  4. Start 1 node for the adapter app:
  5. Deploy twitter-counter app (you may also first build the s4r then publish it, as described in the previous section)
  6. Deploy twitter-adapter app. In this example, we don't directly specify the app class of the adapter, we use the deployment approach for apps (remember, the adapter is also an app).
  7. Observe the current 10 most popular topics in file TopNTopics.txt. The file gets updated at regular intervals, and only outputs topics with a minimum of 10 occurrences, so you may have to wait a little before the file is updated :
  8. You may also check the status of the S4 node with:

What next?

You have now seen some basics applications, and you know how to run them, and how to get events into the system. You may now try to code your own apps with your own data.

This page will help for specifying your own dependencies.

There are more parameters available for the scripts (typing the name of the task will list the options). In particular, if you want distributed deployments, you'll need to pass the Zookeeper connection strings when you start the nodes.

You may also customize the communication and the core layers of S4 by tweaking configuration files and modules.

Last, the javadoc will help you when writing applications.

We hope this will help you start rapidly, and remember: we're happy to help!

Labels
  • No labels