Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sphinx Full Text Search

Mariusz Gil
February 09, 2013

Sphinx Full Text Search

Slides from MeetPHP #8 presentation.

Mariusz Gil

February 09, 2013
Tweet

More Decks by Mariusz Gil

Other Decks in Programming

Transcript

  1. SPHINX full textsearch mYSQL postgresql indexing updates mva delta real

    time distributed live updates Sphinx QL mariusz gil json
  2. mysql> EXPLAIN SELECT * FROM film WHERE title = 'LEGEND

    JEDI' \G *************************** 1. row *************************** select_type: SIMPLE type: ref possible_keys: idx_title key: idx_title rows: 1 mysql> EXPLAIN SELECT * FROM film WHERE title LIKE 'LEGEND%' \G *************************** 1. row *************************** select_type: SIMPLE type: range possible_keys: idx_title key: idx_title key_len: 767 mysql> EXPLAIN SELECT * FROM film WHERE title LIKE '%LEGEND%' \G *************************** 1. row *************************** select_type: SIMPLE table: film type: ALL possible_keys: NULL key: NULL rows: 1026
  3. The data to be indexed can generally come from very

    different sources: SQL databases, plain text files, HTML files, mailboxes, and so on. data sources
  4. data sources source base { type = mysql sql_host =

    localhost sql_user = root sql_pass = sql_db = sakila sql_port = 3306 sql_sock = /opt/local/var/run/mysql55/mysqld.sock }
  5. Attributes are additional values associated with each document that can

    be used to perform additional filtering and sorting during search. documentS and attributes
  6. documentS and attributes source sakila_film : base { sql_query_pre =

    SET NAMES 'utf8' sql_query = \ SELECT \ f.film_id, \ f.title, \ f.description, \ f.release_year, \ f.language_id, \ f.original_language_id \ FROM \ film AS f sql_query_info = SELECT * FROM film WHERE film_id = $id sql_attr_bigint = release_year sql_attr_uint = language_id sql_attr_uint = original_language_id sql_attr_multi = bigint category_id from query; \ SELECT film_id, category_id FROM film_category sql_attr_multi = bigint actor_id from query; \ SELECT film_id, actor_id FROM film_actor }
  7. To be able to answer full-text search queries fast, Sphinx

    needs to build a special data structure optimized for such queries from your text data. This structure is called index. indexes
  8. indexes index sakila_film { source = sakila_film path = /opt/local/var/data/sphinx/sakila_film

    docinfo = extern charset_type = utf-8 } indexer { mem_limit = 32M }
  9. indexes $ indexer --config /etc/sphinx/sphinx.conf --all Sphinx 2.0.6-release (r3473) Copyright

    (c) 2001-2012, Andrew Aksyonoff Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com) using config file '/etc/sphinx/sphinx.conf'... indexing index 'sakila_actor'... collected 200 docs, 0.0 MB sorted 0.0 Mhits, 100.0% done total 200 docs, 2307 bytes total 0.013 sec, 166282 bytes/sec, 14415.45 docs/sec indexing index 'sakila_film'... collected 1000 docs, 0.1 MB collected 6462 attr values sorted 0.0 Mvalues, 100.0% done sorted 0.0 Mhits, 100.0% done total 1000 docs, 108077 bytes total 0.051 sec, 2101561 bytes/sec, 19445.03 docs/sec total 15930 reads, 0.017 sec, 0.0 kb/call avg, 0.0 msec/call avgno total 17 writes, 0.000 sec, 24.6 kb/call avg, 0.0 msec/call avg
  10. searching searchd { listen = 9312 listen = 9306:mysql41 log

    = /opt/local/var/log/sphinx/searchd.log query_log = /opt/local/var/log/sphinx/query.log pid_file = /opt/local/var/log/sphinx/searchd.pid max_matches = 1000 preopen_indexes = 1 unlink_old = 1 workers = threads # for RT to work binlog_path = /opt/local/var/data }
  11. searching $ searchd --config /etc/sphinx/sphinx.conf Sphinx 2.0.6-release (r3473) Copyright (c)

    2001-2012, Andrew Aksyonoff Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com) using config file '/etc/sphinx/sphinx.conf'... listening on all interfaces, port=9312 listening on all interfaces, port=9306 precaching index 'sakila_actor' precaching index 'sakila_film' precached 2 indexes in 0.005 sec $ searchd --config /etc/sphinx/sphinx.conf --stop
  12. php sphinx api $client = new SphinxClient(); $client->SetServer('localhost', 9312); $client->SetMatchMode(SPH_MATCH_ALL);

    $client->SetFilterRange('release_year', 2005, 2006); $client->SetArrayResult(true); $client->SetLimits(0, 1); $results = $client->Query('fight', 'sakila_film');
  13. php sphinx api addQuery RunQueries setlimits setmaxquerytime setoverride setselect setmatchmode

    setrankingmode setsortmode setweights setfieldsweights setindexweights setidrange setfilter setfilterrange setfilterfloatrange setgeoanchor setgroupby setgroupdistinct
  14. sphinx ql $ mysql -P 9306 -h 127.0.0.1 mysql> SELECT

    * FROM sakila_film WHERE MATCH('fight') AND release_year BETWEEN 2005 AND 2006 LIMIT 1; +------+--------+--------------+-------------+--------------+ | id | weight | release_year | category_id | actor_id | +------+--------+--------------+-------------+--------------+ | 11 | 1599 | 2006 | 9 | 40,81,90,174 | +------+--------+--------------+-------------+--------------+
  15. json support! source json_test { ... sql_query = select *

    from test sql_attr_json = j } $ mysql -P 9306 -h 127.0.0.1 mysql> SELECT * FROM json_test WHERE j.uid > 100; +------+------+-------------------------------------+ | id | gid | j | +------+------+-------------------------------------+ | 1 | 1 | {"name":"Alice","uid":123} | | 2 | 1 | {"name":"Bob","uid":234,"gid":12} | | 3 | 2 | {"name":"Charlie","uid":345} | | 4 | 2 | {"name":"Damon","uid":456,"gid":23} | +------+------+-------------------------------------+
  16. delta index 0:00 6:00 12:00 18:00 main index contains all

    changes from last main index generation 0:00 index merge or rotation