The default behaviour of a Spark-job is losing data when it explodes. This isn’t a bad choice per se, but at my current project we need higher reliability. In this article I’ll talk about at least once delivery with the Spark write ahead log. This article is focussed on Kafka, but can also be applied to other Receivers.
Since HTTP/2 is gaining momentum I thought it would be a nice experiment to see if it’s possible to convert some applications to HTTP/2. We have a bunch of Spring Boot micro services and those services communicate with each other via REST calls. All communication happens via JSON (Jackson 2). Running Spring Boot with HTTP/2 should be easy and hopefully Spring RestTemplate supports HTTP/2 for the inter service communication. Let’s see…
With HTTP/2 it’s possible to deliver data at a client before the client even asks for the data. This will significantly improve latency and perceived download speed. Last hack day at JPoint we spent some time with HTTP/2 server push. I will show you how to get it running on your machine and show some tools to monitor/prove all this goodness.
Last JavaOne I attended ‘HTTP 2.0 – What do I need to know?’, an excellent talk by Hadi Hariri. Many of the solutions of HTTP/2 are solutions to problems I face daily. Since I’m a curious guy I wanted to know what was happening at packet level in this awesomeness. Soon I faced SSL-decoded-packet-problems (in practice all HTTP/2 traffic is encrypted). Hadi mentioned Wireshark had support to solve this problem. He made it sound very easy, but since I wrote this article it was a bit harder.
The method I’ll explain to decode HTTP/2 can also be applied to HTTP/1.1
|1.||12-apr-15||ETU EK Powerman Holland Duathlon 2015 – Sprint||Horst||5-20-2,5km||1:07:37 [u]
|2.||29-aug-15||Run Bike Run Borne||Borne||6,5-30-4,8km||1:47:22 [u]
- 31-dec-15 – Sylvestercross, Soest
dikgedrukt is een PR
One of my complaints about Spark was that it wasn’t possible to set a dynamic maximum rate. This is a problem in many jobs since the maximum throughput isn’t always linear with the output rate. Another issue is with local testing. You have to set the rate to extremely low values and experiment a lot to make a Spark job usable on a local machine.
But all these problems are in the past with the introduction of backpressure (I believe it’s spelled as back pressure, but I’ll stick to the Spark notation).
On my current project we are running Spark on top of Yarn. Since Hadoop causes dependency problems and feels a bit ancient I was looking for an alternative. At JPoint we have a few a days a year to try things out, this was one of them. I paired with Eelco to get things up and running.
This article will show you how to run Spark on top of Mesos on your Mac (or Linux and probably a combination of these two).