Symfony2: Integrating elasticsearch

Over a short series of posts I am going to have a look at using elasticsearch with Symfony2.

Elasticsearch is built on top of Lucene and indexes data as JSON documents in a similar way to the way MongoDB stores data. This means as with Mongo that it is schemaless and creates fields on the fly. It is queried over HTTP using queries which are themselves defined in JSON. I am not going to go into details about using elasticsearch in this way, there is plenty of information in its online documentation.

Reading through the documentation makes it look as though there is a steep learning curve to getting started with elasticsearch. What I want to do is look at how you can avoid having to deal with issuing JSON queries over HTTP from a Symfony2 app and actually get started using elasticsearch in a very simple way. This is possible by using Elastica, a PHP library which abstracts the details of the queries, along with the FOQElasticaBundle which integrates Elastica into Symfony2 applications. This is not just a basic wrapper though to make Elastica into a Symfony2 service, the integration with Doctrine to make indexing of ORM entities or ODM documents is fantastic and what I am going to look at here.

To get started you need to install elasticsearch itself, as well as installing Elastica and the FOQElasticaBundle in the usual way.

As an example of how easy the integration is I will look at a very basic application for bookmarking sites and searching for them. For simplicity's sake we are just going to have a single entity to model each site, it is just a name, the URL and some keywords stored as a comma separated list. So here it is as a Doctrine entity class:

<?php

namespace LimeThinking\ExampleBundle\Entity;

use Doctrine\ORM\Mapping as ORM;
use Symfony\Component\Validator\Constraints as Assert;

/**
 * @ORM\Entity
 * @ORM\Table(name="site")
 */
class Site
{
    /**
     * @ORM\Id
     * @ORM\Column(type="integer")
     * @ORM\GeneratedValue(strategy="AUTO")
     */
    private $id;

    /**
     * @ORM\Column(type="string", length=255)
     * @Assert\NotBlank()
     * @Assert\MaxLength(255)
     */
    private $name;

    /**
     * @ORM\Column(type="string", length=255)
     * @Assert\NotBlank()
     * @Assert\Url()
     * @Assert\MaxLength(255)
     */
    private $url;

    /**
     * @ORM\Column(type="string", length=255)
     * @Assert\MaxLength(255)
     */
    private $keywords;

    public function getId()
    {
        return $this->id;
    }

    public function setName($name)
    {
        $this->name = $name;
    }

    public function getName()
    {
        return $this->name;
    }

    public function setUrl($url)
    {
        $this->url = $url;
    }

    public function getUrl()
    {
        return $this->url;
    }

    public function setKeywords($keywords)
    {
        $this->keywords = $keywords;
    }

    public function getKeywords()
    {
        return $this->keywords;
    }
   
}

We can then set up the bundle to index the fields of our entity. By choosing to use the integration with doctrine we can make this very simple:

foq_elastica:
    clients:
        default: { host: localhost, port: 9200 }
    indexes:
        bookmarks:
            client: default
            types:
                site:
                    mappings:
                        name:
                        keywords:
                    doctrine:
                        driver: orm
                        model: LimeThinking\ExampleBundle\Entity\Site
                        provider:

Whilst there are quite a few settings here it is fairly straight forward. The client just sets the port to use for the http communication. The bookmarks setting under indexes is the name of the index we will create. Within each index you can have types for each of your entity types, we just have the one type (site) here at the moment.

We have specified that we are using the ORM, the entity class and which fields to map, for now just the name and keywords (I will return to indexing in the url in my next post). That is enough to get any existing Sites stored in the database into the search index. Running the following console command will do this:

php app/console foq:elastica:populate

It is as easy as that! All the sites already stored in the database are now indexed without the need for even writing any code, just a small amount of configuration. Great as that is, it would be even better if we could automatically index any new entities, as well as updating and removing entities as they are updated and removed from the database without having to rerun the console command. This is just as easy to achieve with only one extra item added to the configuration:

foq_elastica:
    clients:
        default: { host: localhost, port: 9200 }
    indexes:
        bookmarks:
            client: default
            types:
                site:
                    mappings:
                        name:
                        keywords:
                    doctrine:
                        driver: orm
                        model: LimeThinking\ExampleBundle\Entity\Site
                        provider:
                        listener:

This enables the bundle's built in doctrine event listeners which will then do just that, keep the search index up to date with any changes we make to the entities, again without any additional code needed in typical CRUD controllers.

Before looking at searching the index there is one more bit of config which can be added to make integration easy:

foq_elastica:
    clients:
        default: { host: localhost, port: 9200 }
    indexes:
        bookmarks:
            client: default
            types:
                site:
                    mappings:
                        name:
                        keywords:
                    doctrine:
                        driver: orm
                        model: LimeThinking\ExampleBundle\Entity\Site
                        provider:
                        listener:
                        finder:
                        

By adding the finder line we activate the support for returning the search results as Doctrine entities, so the bundle will do the work of fetching the relevant entities from the database after querying the elasticsearch index.

So how do we query the index? The bundle dynamically creates a service you can request from the container with the format foq_elastica.finder.index-name.type-name. These match the values in our config, so the service we need is foq_elastica.finder.bookmarks.site. We can now issue queries using this service:

/**
* @Route("/sites/search/", name="site_search")
* @Method({ "head", "get" })
* @Template
*/
public function searchAction(Request $request)
{
    $finder = $this->get('foq_elastica.finder.bookmarks.site');
    $searchTerm = $request->query->get('search');
    $sites = $finder->find($searchTerm);
    return array('sites' => $sites);
}

Elastica provides an OO query builder for creating more complicated queries but I will leave that for another day. Hopefully I have shown just how straightforward it is to get stated using elasticsearch with a Symfony2 app. As always, it is not limited to such simplicity and you can override these built in services to provide your own providers, finders and listeners if you have more complex requirements.