Symfony2: improving elasticsearch results

In my previous post I looked at integrating elasticsearch into a Symfony2 app using Elastica and the FOQElasticaBundle bundle. By the end we were indexing a Site entity and performing basic searches against the index. In the post I will look at improving how we index and search the Site entities.

We can improve the indexing of the name and keywords by switching to a different analyzer. Currently we are only going to find whole word matches, for example, if we index Lime Thinking as a site name then it will be found by a search for thinking, but not think or thinks. We can change this by instead using the snowball analyzer, this is a built in analyzer which is the same as the standard analyzer but with the edition of the snowball filter which stems the tokens. This means that words are indexed as their stems, so thinking will be indexed as think. We can then find it with searches for words such as think, thinks and thinkings. I will have a more detailed look at analyzers and filters in a future post.

We just need to make a small config change to start using this analyzer for indexing:

foq_elastica:
    clients:
        default: { host: localhost, port: 9200 }
    indexes:
        bookmarks:
            client: default
            types:
                site:
                    mappings:
                        name: { analyzer: snowball }
                        keywords: { analyzer: snowball }
                    doctrine:
                        driver: orm
                        model: LimeThinking\ExampleBundle\Entity\Site
                        provider:
                        listener:
                        finder:

We need to make some further changes though to get the benefits of this. We also need to make sure that the search terms are analyzed with the same analyzer as the indexed field. If this does not happen we will only get matches if we search for the stemmed token e.g. think will find Lime Thinking but thinking will not. Our simple query does not specify which field we are searching, this means its searches the built in _all field which, unsurprisingly, contains all the fields. This means we cannot use different analyzers for searching different fields. We are going to want to add the url at some point using a different analyzer so we need to specify each field we want to search separately.

So we now need to split up our query into several parts. For this we need to use Elastica's query builder objects. To search on a specific field we can use a Text query, so to search on the name field we use:

/**
* @Route("/sites/search/", name="site_search")
* @Method({ "head", "get" })
* @Template
*/
public function searchAction(Request $request)
{
    $finder = $this->get('foq_elastica.finder.bookmarks.site');
    $searchTerm = $request->query->get('search');
    
    $nameQuery = new \Elastica_Query_Text();
    $nameQuery->setFieldQuery('name', $searchTerm);
    
    $sites = $finder->find($nameQuery);
    return array('sites' => $sites);
}

Notice that we pass the query object into the same method on the finder as before, this method accepts both simple search strings as well as queries built through objects. According to the elasticsearch documentation the analyzer will default to the field specific analyzer or the default one, to me this suggests that the above query will automatically use the analyzer set for the field. However this does not work for me, fortunately it easy to specify the analyzer to use for the field:

/**
* @Route("/sites/search/", name="site_search")
* @Method({ "head", "get" })
* @Template
*/
public function searchAction(Request $request)
{
    $finder = $this->get('foq_elastica.finder.bookmarks.site');
    $searchTerm = $request->query->get('search');
    
    $nameQuery = new \Elastica_Query_Text();
    $nameQuery->setFieldQuery('name', $searchTerm);
    $nameQuery->setFieldParam('name', 'analyzer', 'snowball');

    $sites = $finder->find($nameQuery);
    return array('sites' => $sites);
}

Our current query will of course only search the name field, what we want to do is search the name field and the keywords field using the snowball analyzer. This is done by creating another query as above for the keywords field and then using a boolean query to combine the two individual queries into one query:

/**
* @Route("/sites/search/", name="site_search")
* @Method({ "head", "get" })
* @Template
*/
public function searchAction(Request $request)
{
    $finder = $this->get('foq_elastica.finder.bookmarks.site');
    $searchTerm = $request->query->get('search');
    
    $nameQuery = new \Elastica_Query_Text();
    $nameQuery->setFieldQuery('name', $searchTerm);
    $nameQuery->setFieldParam('name', 'analyzer', 'snowball');

    $keywordsQuery = new \Elastica_Query_Text();
    $keywordsQuery->setFieldQuery('keywords', $searchTerm);
    $keywordsQuery->setFieldParam('keywords', 'analyzer', 'snowball');

    $boolQuery = new \Elastica_Query_Bool();
    $boolQuery->addShould($nameQuery);
    $boolQuery->addShould($keywordsQuery);

    $sites = $finder->find($boolQuery);
    return array('sites' => $sites);
}

Whilst this looks complicate each constituent part is simple and this is a good way to build more complicated queries.

A really helpful recent inclusion to the bundle is logging to the web profiler toolbar so you can see the parsed JSON query that is sent to elasticsearch. The combined query from above looks like this:

Method: GET
{ query: { bool: { should: [{ text: { name: { query: thinking, analyzer: snowball } } }, { text: { keywords: { query: thinking, analyzer: snowball } } }] } } }
Time: 8.19 ms 

We have seen Text query and Boolean query here, these are just a few of the available query types. There is more information on each in the elasticsearch documentation. There is little in the way of documentation for the Elastica objects for creating these query types but the test suite provides quite a lot of example of putting them to use.