Home > English, java, work > How to run a Spark cluster on Mesos on your Mac

How to run a Spark cluster on Mesos on your Mac

On my current project we are running Spark on top of Yarn. Since Hadoop causes dependency problems and feels a bit ancient I was looking for an alternative. At JPoint we have a few a days a year to try things out, this was one of them. I paired with Eelco to get things up and running.

This article will show you how to run Spark on top of Mesos on your Mac (or Linux and probably a combination of these two).

I won’t explain what Mesos is. The people at apache already did a great job.

I found a few guides on Mesosphere (a product to run Mesos in ‘the cloud’) explaining how to run Mesos on your Mac. They were a great help, but not enough to get everything running, so that’s why I wrote this article.

Installing Mesos onto a Mac with Homebrew” is the starting point of your installation. I will repeat the relevant steps here, when you need more information just visit the link mentioned earlier.

To run Spark you only have to install Zookeeper and Mesos, we don’t need Marathon. I assume you have java and homebrew installed.

brew install zookeeper
brew install mesos

That was the complete installation.

Now start Zookeeper

zkServer start

Start a Mesos master :

/usr/local/sbin/mesos-master --registry=in_memory --ip=192.168.1.21 --zk=zk://192.168.1.21:2181/jpoint-mesos

Note that I use my network-ip in the configuration. It’s possible with localhost but on a Mac you sometimes run into trouble with localhost/127.0.0.1/0.0.0.0/hostname and you eventually will have more than one cluster node, so it will save you time in the end to do it right from the start. jpoint-mesos is the location in zookeeper where the information about the cluster is stored, it can be anything but / and will be automatically created.

After the logging cools down you should see a console at http://:5050

Now it’s time to start a slave :

/usr/local/sbin/mesos-slave --master=zk://192.168.1.21:2181/jpoint-mesos

When you start a slave it should appear in the web console :
http://:5050/#/slaves

With Spark you need at leas two executors (that’s two slave nodes). But when you start a second slave on the same machine you will get conflicts with port numbers and the working dir. You can solve this with the port and work_dir parameters (or just hijack the laptop of a co-worker) :

/usr/local/sbin/mesos-slave --master=zk://192.168.1.21:2181/jpoint-mesos --port=5052 --work_dir=/tmp/mesos2
/usr/local/sbin/mesos-slave --master=zk://192.168.1.21:2181/jpoint-mesos --port=5053 --work_dir=/tmp/mesos3

All right, there is a cluster running. It’s time to install Spark on it.

Installing Spark

Our goal was to run Spark without Hadoop, but unfortunately we had to download the Hadoop-distribution of Spark. Download the tgz of Spark and extract it.
You probably have to rebuild Spark without Hadoop (if that’s even possible) to get rid of Hadoop, but we only had one day and while writing this article I still didn’t found out how to get it working, but for a proof of concept it will suffice.

Mesos slaves eventually want a Spark distribution to run Spark jobs. The distribution can be served on hdfs, s3 or http. The simplest solution is http. Just start a Python web server in the directory where you downloaded Spark :

python -m SimpleHTTPServer 9914

You will get a message (ie : “Serving HTTP on 0.0.0.0 port 9914 ...“)

Now there is a web server on port 9914. With the current spark version it is available at http://192.168.1.21:9914/spark-1.3.1-bin-hadoop2.6.tgz

The next step is to tell Spark where Mesos and the tgz on http live. Copy spark-env.sh.template (in the conf dir) to spark-env.sh and add the following entries :

export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/Cellar/mesos/0.22.0/lib/libmesos.dylib 
export SPARK_EXECUTOR_URI=http://192.168.1.21:9914/spark-1.3.1-bin-hadoop2.6.tgz

(replace libmesos.dylib with libmesos.so (and the right directory) on Linux)

Now run spark-shell in the directory where you extracted Spark:

./bin/spark-shell --master mesos://192.168.1.21:5050

This must be a Mesos url (note that I tried to use Zookeeper-urls wherever possible).

Run a Job on Spark

Let’s run a spark-shell to calculate pi. Go to the directory where spark-shell is located (in the bin directory of spark) and execute :

./spark-shell --master mesos://192.168.1.21:5050

now paste the following code (adapted from http://spark.apache.org/examples.html)

val NUM_SAMPLES=1000
val count = sc.parallelize(1 to NUM_SAMPLES).map{i =>
   val x = Math.random()
   val y = Math.random()
   if (x*x + y*y < 1) 1 else 0
 }.reduce(_ + _)
 println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)

You should see some activity in your Mesos console. When things go to quickly change the NUM_SAMPLES into something bigger (val NUM_SAMPLES=Int.MaxValue is the maximum value and takes at least a minute on my laptop)

Of course you can run your Job outside a REPL, just use mesos://192.168.1.21:5050 as your master url. ie:

./spark-submit --class nl.jpoint.PiJob --master mesos://localhost:5050 \
--num-executors 2 --driver-memory 1g --executor-memory 1g \
 /Users/jeroen/spark-test2/target/spark-test2-1.0-SNAPSHOT*.jar

Troubleshooting/debugging

chown

When you get an error with ‘failed to chown workdir‘ try starting the slave this way :

MESOS_SWITCH_USER=0 /usr/local/sbin/mesos-slave --master=zk://192.168.1.21:2181/jpoint-mesos

(source)

logging

To get the logging go to the console in your browser and click on a sandbox-link of a node you’re interested in. You should see a stdout and stderr file. These files are also located in the work dir of your nodes.

ip-conflicts

When you’re playing around with different machines/networks things get ugly pretty fast. The best thing to do is to delete your working dirs (of masters and slaves) and remove the zookeeper tree.
Removing the zookeeper tree is done via

zkCli
rmr /jpoint-mesos

Mesos just isn’t suitable for this kind of stuff (which isn’t a bad thing since you won’t run into these kind of problems on production clusters).

Killing a Job

Killing a Job (ie. Spark streaming) can be done via the Spark web ui. This ui usually runs in a web interface on port 4040 on the machine where you started Spark. Look for a ‘kill’ link near your job.

Conclusion

That’s all, I hope it’s useful to you. When you have any improvements/errors feel free to leave a comment and I’ll try to update the article.

One final tip : don’t use wifi on your cluster😉 , uploading the Spark-tgz is too slow.

Sources / useful links

Mesos

https://docs.mesosphere.com/getting-started/developer/brew-install/
https://docs.mesosphere.com/tutorials/run-spark-on-mesos/
Apache Spark cluster deployment Part 2: Mesos and YARN
http://ampcamp.berkeley.edu/3/exercises/mesos.html

Categories: English, java, work Tags: ,
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: