While preparing my presentation I discovered that boosting with wildcards wasn’t working. I kind of promised to blog about it, so here it is
Last thursday I gave a talk about elasticsearch on the Dutch Java User Group conference (JFall) called “Full text search met ElasticSearch in de praktijk”. Yes it’s in Dutch but I still want to share it with you, it might contain some useful tips and tricks.
Currently I’m using a small dataset (about 3500 records) on ElasticSearch and saw some strange scoring. Hits that should have exactly the same score had _almost_ the same score. Almost the same is kind of a problem since we sort our data on score and name. After some researching it appeared sharding was the issue here. With only one shard the problem is solved. If you want to know how I found out and what alternatives you have please stay tuned.
updated @ Sep 13: Zachary Tong pointed me to an article with a much better solution (read: the right solution ). I added the second to last paragraph to explain it.
A few weeks ago I solved an issue with ElasticSearch where some search results with diacritics didn’t show up in the results. At least, I thought I did. There are quite a few places where it can go wrong. In this article I’ll explain these places and how to fix the issues.
|1.||20-mei-13||29e Triatlon Woerden||Woerden||2.3km + 7.4km (trio-duathlon)||8:32 + 26:24 [u]
|2.||19-juli-13||Wagenings Loop-, Spring- en Smijtwerk – 19 juli 2013||Wageningen||5000m (baan)||18:26.16 [u]
|3.||17-aug-13||Hulsbeektriathlon 2013||Oldenzaal||500m + 19km + 5km (1/8e triathlon)||1:07:44[u]
|4.||24-aug-13||Run Bike Run Borne||Borne||6,5km + 30km + 4,8km||1:37:05[u]
|5.||31-aug-13||Triathlon Veeenendaal||Veeenendaal||9,5km lopen (trio)||37:18[u]
|6.||20-sept-13||Wagenings Loop-, Spring- en Smijtwerk – 20 september 2013||Wageningen||5000m (baan)||18:13.20 [u]
|7.||5-okt-13||Cromwijckloop Woerden||Woerden||10km||41:46 [u]
|9.||24-nov-13||Run Rondje Rhenoy||Beesd||10km||38:06 [u]
dikgedrukt is een PR
I saw the errors ‘The last packet successfully received from the server was x milliseconds ago’ and ‘Communications link failure’ a lot lately. So much it almost drove me crazy. I knew it had something to do with the timeout of 8 hours that MySQL has by default for every connection.This is a simple problem, but the internet is so polluted with everybody parroting each other that it didn’t get me any further.
After a few days I fixed the problem, it was indeed very simple but because I did something stupid with JPA it became very complex.
In this article I’ll explain the simple solution and tell you about my complex problem, maybe you’re having the same trouble.
Last week a ‘bug’ was filed where the end users of our application wanted search results with a match on the name given more priority than a match on the address (we’re talking about searching for a company). Since I used Lucene a lot I thought it was just ‘boosting’ the name field. It appeared to be a bit more difficult. Maybe because Elasticsearch behaves differently, but probably because my Lucene knowledge has some rusty colorations.