Currently I’m using a small dataset (about 3500 records) on ElasticSearch and saw some strange scoring. Hits that should have exactly the same score had _almost_ the same score. Almost the same is kind of a problem since we sort our data on score and name. After some researching it appeared sharding was the issue here. With only one shard the problem is solved. If you want to know how I found out and what alternatives you have please stay tuned.
updated @ Sep 13: Zachary Tong pointed me to an article with a much better solution (read: the right solution 😉 ). I added the second to last paragraph to explain it.
A few weeks ago I solved an issue with ElasticSearch where some search results with diacritics didn’t show up in the results. At least, I thought I did. There are quite a few places where it can go wrong. In this article I’ll explain these places and how to fix the issues.
Last week a ‘bug’ was filed where the end users of our application wanted search results with a match on the name given more priority than a match on the address (we’re talking about searching for a company). Since I used Lucene a lot I thought it was just ‘boosting’ the name field. It appeared to be a bit more difficult. Maybe because Elasticsearch behaves differently, but probably because my Lucene knowledge has some rusty colorations.
I’m working on a project where we need to search the data the ‘google way’ and keep a history of every change in the data. Since a requirement is that we have to store the data in an sql database I started with Hibernate JPA. Hibernate Envers was added for versioning. For the Google search (or just full text search) I needed something with Lucene in the background. Hibernate Search seemed like a good combination.
Pretty soon I found out that Envers and Search don’t mix very well and a little search on the Hibernate forum confirmed it . Envers and Search are great products, don’t get me wrong, but this time it didn’t work out.
Furthermore it’s good to know that I’m also using Spring, which can mess up things pretty bad. Again it’s a lethal cocktail, nothing to do with the quality of the frameworks.