Here are instructions on profiling Spark applications using YourKit Java Profiler.
On Spark EC2 images
1. After logging into the master node, download the YourKit Java Profiler for Linux from the YourKit downloads page (at the time of writing, the latest version is yjp-12.0.5-linux.tar.bz2; you will need to substitute different paths if using a newer version). This file is pretty big (~100 MB) and YourKit downloads site is somewhat slow, so you may consider mirroring this file or including it on a custom AMI.
2. Untar this file somewhere (in `/root` in our case):
tar xvjf yjp-12.0.5-linux.tar.bz2
3. Copy the expanded YourKit files to each node using copy-dir:
4. Configure the Spark JVMs to use the YourKit profiling agent by editing `~/spark/conf/spark-env.sh` and adding the lines
SPARK_DAEMON_JAVA_OPTS+=" -agentpath:/root/yjp-12.0.5/bin/linux-x86-64/libyjpagent.so=sampling" export SPARK_DAEMON_JAVA_OPTS SPARK_JAVA_OPTS+=" -agentpath:/root/yjp-12.0.5/bin/linux-x86-64/libyjpagent.so=sampling" export SPARK_JAVA_OPTS
5. Copy the updated configuration to each node:
6. Restart your Spark cluster:
7. By default, the YourKit profiler agents use ports 10001-10010. To connect the YourKit desktop application to the remote profiler agents, you'll have to open these ports in the cluster's EC2 security groups.
To do this, sign into the AWS Management Console. Go to the EC2 section and select `Security Groups` from the `Network & Security` section on the left side of the page. Find the security groups corresponding to your cluster; if you launched a cluster named `test_cluster`, then you will want to modify the settings for the `test_cluster-slaves` and `test_cluster-master` security groups. For each group, select it from the list, click the `Inbound` tab, and create a new `Custom TCP Rule` opening the port range `10001-10010`. Finally, click `Apply Rule Changes`. Make sure to do this for both security groups.
Note: by default, `spark-ec2` re-uses security groups: if you stop this cluster and launch another cluster with the same name, your security group settings will be re-used.
8. Launch the YourKit profiler on your desktop.
9. Select "Connect to remote application..." from the welcome screen and enter the the address of your Spark master or worker machine, e.g. `ec2--.compute-1.amazonaws.com`
10. YourKit should now be connected to the remote profiling agent. It may take a few moments for profiling information to appear.
Please see the full YourKit documentation for the full list of profiler agent startup options.
In Spark unit tests
When running Spark tests through SBT, add
javaOptions in Test += "-agentpath:/path/to/yjp",
to SparkBuild.scala to launch the tests with the YourKit profiler agent enabled. The platform-specific paths to the profiler agents are listed in the YourKit documentation.
Moved permanently to http://spark.apache.org/developer-tools.html#profiling