Home > English, java, work > Boosting with wildcards in elasticsearch

Boosting with wildcards in elasticsearch

While preparing my presentation I discovered that boosting with wildcards wasn’t working. I kind of promised to blog about it, so here it is🙂

To show the problem I generated 10 random words and inserted the cartesian product (100 documents) as a ‘left’ and ‘right’ field : interlocutrice,kobold,paralipomena,uncongruous,quandary,cruiserweight,punctual,stichic,paradoxical,draco,iteratively,ahuehuete,yellowknife,hit,gules,initiatory,overeasiness,accusatory,leucothea,googolplex

Note that today I left out the ‘from’ and ‘size’ fields in the query, so when you do the queries yourself you only will get 10 results.

Fix it without wildcards

My first search string is [yellowknife hit]

{
    "query": {
        "query_string": {
            "fields": [ "_all", "right^2.0" ], 
            "query": "yellowknife hit"
        }
    }
}

The results are not what I expected (not all results are shown, only the relevant ones for this blog).

score left right
3.4 yellowknife hit
3.4 hit yellowknife
1.4 kobold yellowknife
1.4 kobold hit
1.4 yellowknife yellowknife
1.4 hit hit
0.3 yellowknife kobold
0.3 hit kobold

yellowknife/yellowknife should have a higher score thatn 1.4 for my client’s use case. Preferably 3.4, but higher is also ok (this appeared another problem I won’t go into right now).
To solve the yellowknife/yellowknife disable use_dis_max. Why? Read this excellent😉 blog

Dis_max disabled

When dis_max is disabled this is your query:

{
    "query": {
        "query_string": {
            "fields": [ "_all", "right^2.0" ], 
            "query": "yellowknife hit", 
            "use_dis_max": false
        }
    }
}

The query above will solve the problem. Let’s add a wildcard!

{
    "query": {
        "query_string": {
            "fields": [ "_all", "right^2.0" ], 
            "query": "yellow* hit", 
            "use_dis_max": false
        }
    }
}

You guessed that this will also show some strange results. yellow* isn’t important at all it appears. To make it important you have to set the rewrite option to scoring_boolean. In json the query will look like this :

{
    "query": {
        "query_string": {
            "fields": [ "_all", "right^2.0" ], 
            "query": "yellow* hit", 
            "rewrite": "scoring_boolean", 
            "use_dis_max": false
        }
    }
}

To set the option in Java you have to call a method on the QueryStringQueryBuilder: builder.rewrite(“scoring_boolean”);

I’m not really able to explain what is happening, so today I’m only sharing the solution😉

How did I find the solution?

That’s a good question! I found the solution on the mailing list even though the questioner claims it doesn’t work. But the path to the solution is fuzzy. Probably a mix of Google and Stackoverflow. Next time I’ll remember how I did it.

It’s also quite hard to find documentation about scoring_boolean on the elasticsearch site. Like last time the solution will be on the elasticsearch-website, but I don’t know where😉

Source

Random words
Solution
Github issue of the problem

Categories: English, java, work Tags:
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: