Home > English, java, work > How to run a Spark cluster on Mesos on your Mac

How to run a Spark cluster on Mesos on your Mac

On my current project we are running Spark on top of Yarn. Since Hadoop causes dependency problems and feels a bit ancient I was looking for an alternative. At JPoint we have a few a days a year to try things out, this was one of them. I paired with Eelco to get things up and running.

This article will show you how to run Spark on top of Mesos on your Mac (or Linux and probably a combination of these two).

I won’t explain what Mesos is. The people at apache already did a great job.

I found a few guides on Mesosphere (a product to run Mesos in ‘the cloud’) explaining how to run Mesos on your Mac. They were a great help, but not enough to get everything running, so that’s why I wrote this article.

Installing Mesos onto a Mac with Homebrew” is the starting point of your installation. I will repeat the relevant steps here, when you need more information just visit the link mentioned earlier.

To run Spark you only have to install Zookeeper and Mesos, we don’t need Marathon. I assume you have java and homebrew installed.

brew install zookeeper
brew install mesos

That was the complete installation.

Now start Zookeeper

zkServer start

Start a Mesos master :

/usr/local/sbin/mesos-master --registry=in_memory --ip= --zk=zk://

Note that I use my network-ip in the configuration. It’s possible with localhost but on a Mac you sometimes run into trouble with localhost/ and you eventually will have more than one cluster node, so it will save you time in the end to do it right from the start. jpoint-mesos is the location in zookeeper where the information about the cluster is stored, it can be anything but / and will be automatically created.

After the logging cools down you should see a console at http://:5050

Now it’s time to start a slave :

/usr/local/sbin/mesos-slave --master=zk://

When you start a slave it should appear in the web console :

With Spark you need at leas two executors (that’s two slave nodes). But when you start a second slave on the same machine you will get conflicts with port numbers and the working dir. You can solve this with the port and work_dir parameters (or just hijack the laptop of a co-worker) :

/usr/local/sbin/mesos-slave --master=zk:// --port=5052 --work_dir=/tmp/mesos2
/usr/local/sbin/mesos-slave --master=zk:// --port=5053 --work_dir=/tmp/mesos3

All right, there is a cluster running. It’s time to install Spark on it.

Installing Spark

Our goal was to run Spark without Hadoop, but unfortunately we had to download the Hadoop-distribution of Spark. Download the tgz of Spark and extract it.
You probably have to rebuild Spark without Hadoop (if that’s even possible) to get rid of Hadoop, but we only had one day and while writing this article I still didn’t found out how to get it working, but for a proof of concept it will suffice.

Mesos slaves eventually want a Spark distribution to run Spark jobs. The distribution can be served on hdfs, s3 or http. The simplest solution is http. Just start a Python web server in the directory where you downloaded Spark :

python -m SimpleHTTPServer 9914

You will get a message (ie : “Serving HTTP on port 9914 ...“)

Now there is a web server on port 9914. With the current spark version it is available at

The next step is to tell Spark where Mesos and the tgz on http live. Copy spark-env.sh.template (in the conf dir) to spark-env.sh and add the following entries :

export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/Cellar/mesos/0.22.0/lib/libmesos.dylib 

(replace libmesos.dylib with libmesos.so (and the right directory) on Linux)

Now run spark-shell in the directory where you extracted Spark:

./bin/spark-shell --master mesos://

This must be a Mesos url (note that I tried to use Zookeeper-urls wherever possible).

Run a Job on Spark

Let’s run a spark-shell to calculate pi. Go to the directory where spark-shell is located (in the bin directory of spark) and execute :

./spark-shell --master mesos://

now paste the following code (adapted from http://spark.apache.org/examples.html)

val NUM_SAMPLES=1000
val count = sc.parallelize(1 to NUM_SAMPLES).map{i =>
   val x = Math.random()
   val y = Math.random()
   if (x*x + y*y < 1) 1 else 0
 }.reduce(_ + _)
 println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)

You should see some activity in your Mesos console. When things go to quickly change the NUM_SAMPLES into something bigger (val NUM_SAMPLES=Int.MaxValue is the maximum value and takes at least a minute on my laptop)

Of course you can run your Job outside a REPL, just use mesos:// as your master url. ie:

./spark-submit --class nl.jpoint.PiJob --master mesos://localhost:5050 \
--num-executors 2 --driver-memory 1g --executor-memory 1g \



When you get an error with ‘failed to chown workdir‘ try starting the slave this way :

MESOS_SWITCH_USER=0 /usr/local/sbin/mesos-slave --master=zk://



To get the logging go to the console in your browser and click on a sandbox-link of a node you’re interested in. You should see a stdout and stderr file. These files are also located in the work dir of your nodes.


When you’re playing around with different machines/networks things get ugly pretty fast. The best thing to do is to delete your working dirs (of masters and slaves) and remove the zookeeper tree.
Removing the zookeeper tree is done via

rmr /jpoint-mesos

Mesos just isn’t suitable for this kind of stuff (which isn’t a bad thing since you won’t run into these kind of problems on production clusters).

Killing a Job

Killing a Job (ie. Spark streaming) can be done via the Spark web ui. This ui usually runs in a web interface on port 4040 on the machine where you started Spark. Look for a ‘kill’ link near your job.


That’s all, I hope it’s useful to you. When you have any improvements/errors feel free to leave a comment and I’ll try to update the article.

One final tip : don’t use wifi on your cluster 😉 , uploading the Spark-tgz is too slow.

Sources / useful links


Apache Spark cluster deployment Part 2: Mesos and YARN

Categories: English, java, work Tags: , ,
  1. Akhil
    7 November 2017 at 10:41

    I am getting following error when I try to run -registry=in_memory –ip= –zk=zk://

    -bash: -registry=in_memory: command not found

    Is there a way to run spark over mesos without Zookeeper as there is only one master which I am using.

    Kindly Help


    • Jeroen van Wilgenburg
      7 November 2017 at 10:53

      Does your command start with ‘/usr/local/sbin/mesos-master’ ? And are the double hyphens copied correctly, wordpress sometimes copies them as a single large hyphen.

  2. Akhil
    7 November 2017 at 10:42

    Kindly let me know in case its possible to run spark over mesos without Zookeeper.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: