Understanding Spark parameters – A step by step guide to tune your Spark job

15 februari 2015 1 reactie

After using Spark for a few months we thought we had a pretty good grip on how to use it. The documentation of Spark appeared pretty decent and we had acceptable performance on most jobs. On one Job we kept hitting limits which were much lower than with that Jobs predecessor (Storm). When we did some research we found out we didn’t understand Spark as good as we thought.
My colleague Jethro pointed me to an article by Gerard Maas and I found another great article by Michael Noll. Combined with the Spark docs and some Googlin’ I wrote this article to help you tune your Spark Job. We improved our throughput by 600% (and then the elasticsearch cluster became the new bottle neck)

Lees verder…

Categorieën:English, java, work Tags:, , , ,

How to fix the classpath in Spark – Rebuilding Spark to get rid of old jars

After fixing an earlier classpath problem a new and more difficult problem showed up. Again the evil-doer is and ancient version of Jackson.
Unfortunately all classes are loaded from one big container with Spark/Hadoop/Yarn, this causes a lot of problems. The spark.files.userClassPathFirst option is still ‘experimental’ which, in our case, meant it just didn’t work. But again, we found a solution. Our system engineers also wanted a reproducible solution so in the end it’s just a small recipe.

Lees verder…

Categorieën:English, java, work Tags:, , , ,

Using a Spring Context within Hadoop Mappers and Reducers with environment specific property files

Bootstrapping a Spring Context from the classpath is quite simple, even from Mappers/Reducers in Hadoop. When you want to use enviroment specific property files it gets a bit harder. In our Mappers and Reducers we rely heavily on Spring Beans who perform the magic. Besides that we load different properties per environment (ie a MongoDB). After several iterations we came up with a nice solution. You can also leave out the Spring part and use this solution to broadcast Properties to all your Mappers/Reducers.

Lees verder…

Overriding Hadoop jars with MapReduce v2 – How to fix the classpath

7 september 2014 1 reactie

We ran into a nasty problem with Hadoop when we wanted to use a ‘new’ version of Jackson. Hadoop decided to throw a NoSuchMethodError. It appeared Hadoop uses an ancient version of Jackson. Some terrible memories of classloading and JBoss EAP came into my head. After I was calmed down by some colleagues I ultimately found a solution.

Lees verder…

Categorieën:English, java, work Tags:, , ,

How to solve a “does not contain shard key for pattern” at a Mongo WriteConcernException

A few days ago I got stuck a bit longer than normally because of a WriteConcernException in MongoDB. Inside the exception I got the message : “does not contain shard key for pattern { _id: 1.0 }”. The id was in the query, but apparently not in the right place

Lees verder…

Categorieën:English, work Tags:


This post is a collection of interesting things I found about creativity on the internet

Lees verder…

Categorieën:creativity, English


This post is a collection of interesting things I found about food on the internet


Lees verder…

Categorieën:English, food

Ontvang elk nieuw bericht direct in je inbox.