Install S4

There are 2 ways:

Download the 0.5.0 release We recommend getting the "source" release and building it, because some dependencies that may not be available on your machine, but are required for the "binary" release.
or checkout from the Apache git repository, by following the instructions. The 0.5.0 tag corresponds to the current release.

If you get the binary release, s4 scripts are immediately available. Otherwise you must build the project:

Note: these instructions are currently for the S4-22 branch

Instructions now valid for the piper branch!

Install S4

There is currently no distribution package as such. So you need to download the source and build the platform.

...

Compile and install S4 in the local maven repository: (you can also let the tests run , which is currently quite long: we're not yet using mockswithout the -DskipTests option)
Code Block
S4:incubator-s4$ ./gradlew install -DskipTests .... verbose logs ...

Build the startup scripts:

Code Block
S4:incubator-s4$ ./gradlew s4-tools:installApp .... verbose logs ...:s4-tools:installApp

...

HelloApp.java: defines a simple application: exposes an input stream ("names"), connected to the HelloPE. See the event dispatch configuration page for more information about how events are dispatched.

Code Block

// App parent class provides integration with the S4 platform
public class HelloApp extends App {

    @Override
    protected void onStart() {
    }

    @Override
    protected void onInit() {
        // That's where we define PEs and streams
        // create a prototype
        HelloPE helloPE = createPE(HelloPE.class);
        // Create a stream that listens to the "lines" stream and passes events to the helloPE instance.
        createInputStream("names", new KeyFinder<Event>() {
                // the KeyFinder is used to identify keys
            @Override
            public List<String> get(Event event) {
                return Arrays.asList(new String[] { event.get("name") });
            }
        }, helloPE);
    }
// skipped remaining methods

...

to set-up a cluster: provision a cluster and start S4 nodes for that cluster
to package the app
to publish the app on the clusteron the cluster

Set-up the cluster:

In 2 steps:

Start a Zookeeper server instance:

Code Block
S4:incubator-s4$ ./s4 zkServer S4:myApp$ calling referenced s4 script : /Users/S4/tmp/

s4-22/

incubator-s4/s4
[main] INFO  org.apache.s4.tools.ZKServer - Starting zookeeper server on port [2181]
[main] INFO  org.apache.s4.tools.ZKServer - cleaning existing data in [/var/folders/8V/8VdgKWU3HCiy2yV4dzFpDk+++TI/-Tmp-/tmp/zookeeper/data] and [/var/folders/8V/8VdgKWU3HCiy2yV4dzFpDk+++TI/-Tmp-/tmp/zookeeper/log]

Define a new cluster. Say a cluster named "cluster1" with 2 partitions, nodes listening to ports starting from 12000:
Code Block
S4:myApp$ ./s4 newCluster -c=cluster1 -nbTasks=2 -flp=12000 calling referenced s4 script : /Users/S4/tmp/

s4-22/

incubator-s4/s4
[main] INFO  org.apache.s4.tools.DefineCluster - preparing new cluster [cluster1] with [2] node(s)
[main] INFO  org.apache.s4.tools.DefineCluster - New cluster configuration

uploaded into zookeeper

1. uploaded into zookeeper
Alternatively you may combine these two steps into a single one, by passing the cluster configuration inline with the zkServer command:
Code Block
S4:incubator-s4$ ./s4 zkServer -clusters=c=cluster1:flp=12000:nbTasks=2

Start 2 S4 nodes with the default configuration, and attach them to cluster "cluster1" (you may start the other S4 node in a different console) :

Code Block

S4:myApp$ ./s4 node -c=cluster1
calling referenced s4 script : /Users/S4/tmp/s4-22/incubator-s4/s4
15:50:18.996 [main] INFO  org.apache.s4.core.Main - Initializing S4 node with :
- comm module class [org.apache.s4.comm.DefaultCommModule]
- comm configuration file [default.s4.comm.properties from classpath]
- core module class [org.apache.s4.core.DefaultCoreModule]
- core configuration file[default.s4.core.properties from classpath]
-extra modules: []
[main] INFO  org.apache.s4.core.Main - Starting S4 node. This node will automatically download applications published for the cluster it belongs to

and again (maybe in another shell):

Code Block
S4:myApp$ ./s4 node -c=cluster1

Build, package and publish the app to cluster1:

You may do that in a single step (currently, you must use the name of the current project, and you need to specify the gradle build file with a complete path).
Note that specifying the app class is optional but avoids issues when the scripts tries to guess automatically the app class:

Code Block

S4:myApp$ ./s4 deploy -appName=myApp -c=cluster1 -b=`pwd`/build.gradle -a=hello.HelloApp
.... verbose logs for compiling, building the package, and publishing it to Zookeeper...
15:46:16.486 [main] INFO  org.apache.s4.tools.Deploy - uploaded application [myApp] to cluster [cluster1], using zookeeper znode [/s4/clusters/cluster1/apps/myApp]

You may also do that in 2 separate steps:
1. Create an s4r archive. The following creates an archive named myApp.s4r (here you may specify an arbitrary name) in build/libs.
  Again specifying the app class is optional :
  Code Block
  ./s4 s4r -a=hello.HelloApp -b=`pwd`/build.gradle myApp
2. Publish the s4r archive (you may first copy it to a more adequate place). The name of the app is arbitrary:
  Code Block
  ./s4 deploy -s4r=`pwd`/build/libs/myApp.s4r -c=cluster1 -appName=myAppmyApp.s4r -c=cluster1 -appName=myApp
  You can follow this method for a distributed deployment (by copying the s4r to a shared location on a distributed file system)

S4 nodes will detect the new application, download it, load it and start it. You will get something like:

Code Block

[ZkClient-EventThread-15-localhost:2181] INFO  o.a.s.d.DistributedDeploymentManager - Detected new application(s) to deploy {}[myApp]
[ZkClient-EventThread-15-localhost:2181] INFO  org.apache.s4.core.Server - Local app deployment: using s4r file name [myApp] as application name
[ZkClient-EventThread-15-localhost:2181] INFO  org.apache.s4.core.Server - App class name is: hello.HelloApp
[ZkClient-EventThread-15-localhost:2181] INFO  o.a.s4.comm.topology.ClusterFromZK - Changing cluster topology to { nbNodes=0,name=unknown,mode=unicast,type=,nodes=[]} from null
[ZkClient-EventThread-15-localhost:2181] INFO  o.a.s4.comm.topology.ClusterFromZK - Adding topology change listener:org.apache.s4.comm.tcp.TCPEmitter@79b2591c
[ZkClient-EventThread-15-localhost:2181] INFO  o.a.s.comm.topology.AssignmentFromZK - New session:87684175268872203; state is : SyncConnected
[ZkClient-EventThread-19-localhost:2181] INFO  o.a.s4.comm.topology.ClusterFromZK - Changing cluster topology to { nbNodes=1,name=cluster1,mode=unicast,type=,nodes=[{partition=0,port=12000,machineName=myMachine.myNetwork,taskId=Task-0}]} from { nbNodes=0,name=unknown,mode=unicast,type=,nodes=[]}
[ZkClient-EventThread-15-localhost:2181] INFO  o.a.s.comm.topology.AssignmentFromZK - Successfully acquired task:Task-1 by myMachine.myNetwork
[ZkClient-EventThread-19-localhost:2181] INFO  o.a.s4.comm.topology.ClusterFromZK - Changing cluster topology to { nbNodes=2,name=cluster1,mode=unicast,type=,nodes=[{partition=0,port=12000,machineName=myMachine.myNetwork,taskId=Task-0}, {partition=1,port=12001,machineName=myMachine.myNetwork,taskId=Task-1}]} from { nbNodes=1,name=cluster1,mode=unicast,type=,nodes=[{partition=0,port=12000,machineName=myMachine.myNetwork,taskId=Task-0}]}
[ZkClient-EventThread-15-localhost:2181] INFO  o.a.s4.comm.topology.ClustersFromZK - New session:87684175268872205
[ZkClient-EventThread-15-localhost:2181] INFO  o.a.s4.comm.topology.ClustersFromZK - Detected new stream [names]
[ZkClient-EventThread-15-localhost:2181] INFO  o.a.s4.comm.topology.ClustersFromZK - New session:87684175268872206
[ZkClient-EventThread-15-localhost:2181] INFO  o.a.s4.comm.topology.ClusterFromZK - Changing cluster topology to { nbNodes=2,name=cluster1,mode=unicast,type=,nodes=[{partition=0,port=12000,machineName=myMachine.myNetwork,taskId=Task-0}, {partition=1,port=12001,machineName=myMachine.myNetwork,taskId=Task-1}]} from null
[ZkClient-EventThread-15-localhost:2181] INFO  org.apache.s4.core.Server - Loaded application from file /tmp/deploy-test/cluster1/myApp.s4r
[ZkClient-EventThread-15-localhost:2181] INFO  o.a.s.d.DistributedDeploymentManager - Successfully installed application myApp
[ZkClient-EventThread-15-localhost:2181] DEBUG o.a.s.c.g.OverloadDispatcherGenerator - Dumping generated overload dispatcher class for PE of class [class hello.HelloPE]
[ZkClient-EventThread-15-localhost:2181] DEBUG o.a.s4.comm.topology.ClustersFromZK - Adding input stream [names] for app [-1] in cluster [cluster1]
[ZkClient-EventThread-15-localhost:2181] INFO  org.apache.s4.core.App - Init prototype [hello.HelloPE].

Great! The application is now deployed on 2 S4 nodes. It needs some input though...The application is now deployed on 2 S4 nodes.

You can check the status of the application, nodes and streams with the "status" command:

Code Block
./s4 status

Now what we need is some input!

We can get input through an adapter, i.e. an S4 app that converts an external stream into S4 events, and injects the events into S4 clusters. In the sample application, the adapter is a very basic class, that extends App, listens to an input socket on port 15000, and converts each received line of characters into a generic S4 event, in which the line data is kept in a "name" field. We specify :

...

First, we need to define a new S4 subcluster for that app:
Code Block
S4:myApp$ ./s4 newCluster -c=cluster2 -nbTasks=1 -flp=13000
Then we can start the adapter, and we use "names" for identifying the output stream (this is the same name used as input by the myApp app)
The adapter command must be run from the root of your S4 project (myApp dir in our case).

Code Block
./s4 adapter -appClass=hello.HelloInputAdapter -c=cluster2 -

...

p=s4.adapter.output.stream

...

=names

Now let's just provide some data to the external stream (our adapter is listening to port 15000):
Code Block
S4:~$ echo "Bob" | nc localhost 15000
One of the nodes should output in its console:
Code Block
Hello Bob!

...

Start a Zookeeper instance. From the S4 base directory, do:
Code Block
./s4 zkServer

Define 2 clusters : 1 for deploying the twitter-counter app, and 1 for the adapter app

Code Block
./s4 newCluster -c=cluster1 -nbTasks=2 -flp=12000; ./s4 newCluster -c=cluster2 -nbTasks=1 -flp=13000

Start 2 app nodes (you may want to start each node in a separate console) :
Code Block
./s4 node -c=cluster1 ./s4 node -c=cluster1

Start 1 node for the adapter app:

Code Block
./s4 node -c=cluster2 -namedStringParametersp=s4.adapter.output.stream:=RawStatus

Deploy twitter-counter app (you may also first build the s4r then publish it, as described in the previous section)
Code Block
./s4 deploy -appName=twitter-counter -c=cluster1 -b=`pwd`/test-apps/twitter-counter/build.gradle
Deploy twitter-adapter app. In this example, we don't directly specify the app class of the adapter, we use the deployment approach for apps (remember, the adapter is also an app).
Code Block
./s4 deploy -appName=twitter-adapter -c=cluster2 -b=`pwd`/test-apps/twitter-adapter/build.gradle
Observe the current 10 most popular topics in file TopNTopics.txt. The file gets updated at regular intervals, and only outputs topics with a minimum of 10 occurrences, so you may have to wait a little before the file is updated :
Code Block
tail -f TopNTopics.txt
You may also check the status of the S4 node with:
Code Block
./s4 status

...

What next?

You have now seen some basics applications, and you know how to run them, and how to get events into the system. You may now try to code your own apps with your own data.

...

You may also customize the communication and the core layers of S4 by tweaking configuration files and modules.

Last, the javadoc will help you when writing applications.

We hope this will help you start rapidly, and remember: In conclusion, edges are still a bit rough, more aspects need to be documented, and this is not a final version, but that should let you start, and we're happy to help!

Child pages

Versions Compared

Old Version 33

New Version Current

Key

Install S4

Install S4

What next?

Child pages

Page History

Versions Compared

Old Version 33

New Version Current

Key

Install S4

Install S4

What next?