Searching

Introduction

Searching is a huge topic, hence an entire chapter has been devoted to a behavior called Searchable. It is a fulltext indexing and searching tool. It can be used for indexing and searching both database and files.

Consider we have a class called NewsItem with the following definition:

// models/NewsItem.php
class NewsItem extends Doctrine_Record
{
    public function setTableDefinition()
    {
        $this->hasColumn('title', 'string', 255);
        $this->hasColumn('body', 'clob');
    }
}

Here is the same example in YAML format. You can read more about YAML in the YAML Schema Files chapter:

---
# schema.yml

# ...
NewsItem:
  columns:
    title: string(255)
    body: clob

Now lets say we have an application where users are allowed to search for different news items, an obvious way to implement this would be building a form and based on that forms submitted values build DQL queries such as:

// test.php

// ...
$q = Doctrine_Query::create()
    ->from('NewsItem i')
    ->where('n.title LIKE ? OR n.content LIKE ?');

As the application grows these kind of queries become very slow. For example when using the previous query with parameters %framework% and %framework% (this would be equivalent of ‘find all news items whose title or content contains word ‘framework’) the database would have to traverse through each row in the table, which would naturally be very very slow.

Doctrine solves this with its search component and inverse indexes. First lets alter our definition a bit:

// models/NewsItem.php
class NewsItem extends Doctrine_Record
{
    // ...
    public function setUp()
    {
        $this->actAs('Searchable', array(
                'fields' => array('title', 'content')
            )
        );
    }
}

Here is the same example in YAML format. You can read more about YAML in the YAML Schema Files chapter:

---
# schema.yml

# ...
NewsItem:
  actAs:
    Searchable:
      fields: [title, content]
# ...

Lets check the SQL that is generated by the above models:

// test.php

// ...
$sql = Doctrine_Core::generateSqlFromArray(array('NewsItem'));
echo $sql[0] . "";
echo $sql[1] . "";
echo $sql[2];

The above code would output the following SQL query:

CREATE TABLE news_item_index (id BIGINT,
keyword VARCHAR(200),
field VARCHAR(50),
position BIGINT,
PRIMARY KEY(id,
keyword,
field,
position)) ENGINE = INNODB
CREATE TABLE news_item (id BIGINT AUTO_INCREMENT,
title VARCHAR(255),
body LONGTEXT,
PRIMARY KEY(id)) ENGINE = INNODB
ALTER TABLE news_item_index ADD FOREIGN KEY (id) REFERENCES news_item(id) ON UPDATE CASCADE ON DELETE CASCADE

Here we tell Doctrine that NewsItem class acts as searchable (internally Doctrine loads :php:class:`Doctrine_Template_Searchable`) and fields title and content are marked as fulltext indexed fields. This means that every time a NewsItem is added or updated Doctrine will:

  1. Update the inverse search index or
  2. Add new pending entry to the inverse search index (sometimes it can be efficient to update the inverse search index in batches)

Index structure

The structure of the inverse index Doctrine uses is the following:

[ (string) keyword] [ (string) field ] [ (integer) position ] [ (mixed) [foreign_keys] ]

Column Description
keyword The keyword in the text that can be searched for.
field The field where the keyword was found.
position The position where the keyword was found.
[foreign_keys] The foreign keys of the record being indexed.

In the NewsItem example the [foreign_keys] would simply contain one field named id with foreign key references to NewsItem(id) and with onDelete => CASCADE constraint.

An example row in this table might look something like:

keyword field position id
database title 3 1

In this example the word database is the third word of the title field of NewsItem with id of 1.

Index Building

Whenever a searchable record is being inserted into database Doctrine executes the index building procedure. This happens in the background as the procedure is being invoked by the search listener. The phases of this procedure are:

  1. Analyze the text using a :php:class:`Doctrine_Search_Analyzer` based class
  2. Insert new rows into index table for all analyzed keywords

Sometimes you may not want to update the index table directly when new searchable entries are added. Rather you may want to batch update the index table in certain intervals. For disabling the direct update functionality you’ll need to set the batchUpdates option to true when you attach the behavior:

// models/NewsItem.php
class NewsItem extends Doctrine_Record
{
    // ...
    public function setUp()
    {
        $this->actAs('Searchable', array(
                'fields' => array('title', 'content')
                'batchUpdates' => true
            )
        );
    }
}

Here is the same example in YAML format. You can read more about YAML in the YAML Schema Files chapter:

---
# schema.yml

# ...
NewsItem:
  actAs:
    Searchable:
      fields: [title, content]
      batchUpdates: true
# ...

The actual batch updating procedure can be invoked with the batchUpdateIndex() method. It takes two optional arguments: limit and offset. Limit can be used for limiting the number of batch indexed entries while the offset can be used for setting the first entry to start the indexing from.

First lets insert a new NewsItem records:

// test.php

// ...
$newsItem        = new NewsItem();
$newsItem->title = 'Test';
$newsItem->body  = 'test';
$newsItem->save();

Note

If you don’t have batch updates enabled then the index will be automatically updated for you when you insert or update NewsItem records. If you do have batch updates enabled then you can perform the batch updates by using the following code:

// test.php

// ...
$newsItemTable = Doctrine_Core::getTable('NewsItem');
$newsItemTable->batchUpdateIndex();

Text Analyzers

By default Doctrine uses :php:class:`Doctrine_Search_Analyzer_Standard` for analyzing the text. This class performs the following things:

  • Strips out stop-keywords (such as ‘and’, ‘if’ etc.) As many commonly used words such as ‘and’, ‘if’ etc. have no relevance for the search, they are being stripped out in order to keep the index size reasonable.
  • Makes all keywords lowercased. When searching words ‘database’ and ‘DataBase’ are considered equal by the standard analyzer, hence the standard analyzer lowercases all keywords.
  • Replaces all non alpha-numeric marks with whitespace. In normal text many keywords might contain non alpha-numeric chars after them, for example ‘database.’. The standard analyzer strips these out so that ‘database’ matches ‘database.’.
  • Replaces all quotation marks with empty strings so that “O’Connor” matches “oconnor”

You can write your own analyzer class by making a class that implements :php:class:`Doctrine_Search_Analyzer_Interface`. Here is an example where we create an analyzer named MyAnalyzer:

// models/MyAnalyzer.php
class MyAnalyzer implements Doctrine_Search_Analyzer_Interface
{
    public function analyze($text)
    {
        $text = trim($text);
        return $text;
    }
}

Note

The search analyzers must only contain one method named analyze() and it should return the modified inputted text to be indexed.

This analyzer can then be applied to the search object as follows:

// test.php

// ...
$newsItemTable = Doctrine_Core::getTable('NewsItem');
$search = $newsItemTable
    ->getTemplate('Doctrine_Template_Searchable')
    ->getPlugin();

$search->setOption('analyzer', new MyAnalyzer());

Query language

:php:class:`Doctrine_Search` provides a query language similar to Apache Lucene. The :php:class:`Doctrine_Search_Query` converts human readable, easy-to-construct search queries to their complex DQL equivalents which are then converted to SQL like normal.

Performing Searches

Here is a simple example to retrieve the record ids and relevance data.

// test.php

// ...
$newsItemTable = Doctrine_Core::getTable('NewsItem');
$results       = $newsItemTable->search('test');
print_r($results);

The above code executes the following query:

SELECT
COUNT(keyword) AS relevance,
id
FROM article_index
WHERE id IN (SELECT
id
FROM article_index WHERE keyword = ?)
AND id IN (SELECT
id
FROM article_index
WHERE keyword = ?)
GROUP BY id
ORDER BY relevance DESC

The output of the code above would be the following:

$ php test.php
Array
(
    [0] => Array
        (
            [relevance] => 1
            [id] => 1
        )
)

Now you can use those results in another query to retrieve the actual NewsItem objects:

// test.php

// ...
$ids = array();
foreach ($results as $result) {
    $ids[] = $result['id'];
}

$q = Doctrine_Query::create()
    ->from('NewsItem i')
    ->whereIn('i.id', $ids);

$newsItems = $q->execute();

print_r($newsItems->toArray());

The above example would produce the following output:

$ php test.php
Array
(
    [0] => Array
        (
            [id] => 1
            [title] => Test
            [body] => test
        )
)

You can optionally pass the search() function a query object to modify with a where condition subquery to limit the results using the search index.

// test.php

// ...
$q = Doctrine_Query::create()
    ->from('NewsItem i');

$q = Doctrine_Core::getTable('Article')
    ->search('test', $q);

echo $q->getSqlQuery();

The above call to getSql() would output the following SQL query:

SELECT
n.id AS n**id,
n.title AS n**title,
n.body AS n__body
FROM news_item n
WHERE n.id IN (SELECT
id
FROM news_item_index
WHERE keyword = ?
GROUP BY id)

Now we can execute the query and get the NewsItem objects:

// test.php

// ...
$newsItems = $q->execute();

print_r($newsItems->toArray());

The above example would produce the following output:

$ php test.php
Array
(
    [0] => Array
        (
            [id] => 1
            [title] => Test
            [body] => test
        )
)

File searches

As stated before :php:class:`Doctrine_Search` can also be used for searching files. Lets say we have a directory which we want to be searchable. First we need to create an instance of :php:class:`Doctrine_Search_File` which is a child of :php:class:`Doctrine_Search` providing some extra functionality needed for the file searches.

// test.php

// ...
$search = new Doctrine_Search_File();

Second thing to do is to generate the index table. By default Doctrine names the database index class as FileIndex.

Lets check the SQL that is generated by the above models created:

// test.php

// ...
$sql = Doctrine_Core::generateSqlFromArray(array('FileIndex'));

The above code would output the following SQL query:

CREATE TABLE file_index (url VARCHAR(255),
keyword VARCHAR(200),
field VARCHAR(50),
position BIGINT,
PRIMARY KEY(url,
keyword,
field,
position)) ENGINE = INNODB

You can create the actual table in the database by using the Doctrine_Core::createTablesFromArray() method:

// test.php

// ...
Doctrine_Core::createTablesFromArray(array('FileIndex'));

Now we can start using the file searcher. In this example lets just index the models directory:

// test.php

// ...
$search->indexDirectory('models');

The indexDirectory() iterates recursively through given directory and analyzes all files within it updating the index table as necessary.

Finally we can start searching for pieces of text within the indexed files:

// test.php

// ...
$results = $search->search('hasColumn');
print_r($results);

The above example would produce the following output:

$ php test.php
Array
(
    [0] => Array
        (
            [relevance] => 2
            [url] => models/generated/BaseNewsItem.php
        )
)

Conclusion

Now that we have learned all about the Searchable behavior we are ready to learn in more detail about the NestedSet behavior in the Hierarchical Data chapter. The NestedSet is a large topic like the Searchable behavior so it got its own dedicated chapter as well.

Fork me on GitHub