Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sphinx Full Text Search

34be88398f623c109b61d23e8215bd23?s=47 Mariusz Gil
February 09, 2013

Sphinx Full Text Search

Slides from MeetPHP #8 presentation.

34be88398f623c109b61d23e8215bd23?s=128

Mariusz Gil

February 09, 2013
Tweet

Transcript

  1. SPHINX full textsearch mYSQL postgresql indexing updates mva delta real

    time distributed live updates Sphinx QL mariusz gil json
  2. ABOUT ME

  3. INTRODUCTION

  4. most of apps need search

  5. None
  6. how to support it?

  7. One tool to rule them all

  8. why not sql?

  9. mysql> EXPLAIN SELECT * FROM film WHERE title = 'LEGEND

    JEDI' \G *************************** 1. row *************************** select_type: SIMPLE type: ref possible_keys: idx_title key: idx_title rows: 1 mysql> EXPLAIN SELECT * FROM film WHERE title LIKE 'LEGEND%' \G *************************** 1. row *************************** select_type: SIMPLE type: range possible_keys: idx_title key: idx_title key_len: 767 mysql> EXPLAIN SELECT * FROM film WHERE title LIKE '%LEGEND%' \G *************************** 1. row *************************** select_type: SIMPLE table: film type: ALL possible_keys: NULL key: NULL rows: 1026
  10. why not...lucene solr xapian elasticsearch whatever?

  11. so... what is sphinx?

  12. sphinx isfree open source full featured efficient scalable full text

    search
  13. php perl ruby java official Sphinxapi for

  14. perl c# haskell and many more third party libraries for

  15. 4 terms data sources attributes indexes searching

  16. installation $ tar xzvf sphinx-2.0.6.tar.gz $ cd sphinx $ ./configure

    $ make && make install
  17. The data to be indexed can generally come from very

    different sources: SQL databases, plain text files, HTML files, mailboxes, and so on. data sources
  18. data sources source base { type = mysql sql_host =

    localhost sql_user = root sql_pass = sql_db = sakila sql_port = 3306 sql_sock = /opt/local/var/run/mysql55/mysqld.sock }
  19. Attributes are additional values associated with each document that can

    be used to perform additional filtering and sorting during search. documentS and attributes
  20. documentS and attributes source sakila_film : base { sql_query_pre =

    SET NAMES 'utf8' sql_query = \ SELECT \ f.film_id, \ f.title, \ f.description, \ f.release_year, \ f.language_id, \ f.original_language_id \ FROM \ film AS f sql_query_info = SELECT * FROM film WHERE film_id = $id sql_attr_bigint = release_year sql_attr_uint = language_id sql_attr_uint = original_language_id sql_attr_multi = bigint category_id from query; \ SELECT film_id, category_id FROM film_category sql_attr_multi = bigint actor_id from query; \ SELECT film_id, actor_id FROM film_actor }
  21. To be able to answer full-text search queries fast, Sphinx

    needs to build a special data structure optimized for such queries from your text data. This structure is called index. indexes
  22. indexes index sakila_film { source = sakila_film path = /opt/local/var/data/sphinx/sakila_film

    docinfo = extern charset_type = utf-8 } indexer { mem_limit = 32M }
  23. indexes $ indexer --config /etc/sphinx/sphinx.conf --all Sphinx 2.0.6-release (r3473) Copyright

    (c) 2001-2012, Andrew Aksyonoff Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com) using config file '/etc/sphinx/sphinx.conf'... indexing index 'sakila_actor'... collected 200 docs, 0.0 MB sorted 0.0 Mhits, 100.0% done total 200 docs, 2307 bytes total 0.013 sec, 166282 bytes/sec, 14415.45 docs/sec indexing index 'sakila_film'... collected 1000 docs, 0.1 MB collected 6462 attr values sorted 0.0 Mvalues, 100.0% done sorted 0.0 Mhits, 100.0% done total 1000 docs, 108077 bytes total 0.051 sec, 2101561 bytes/sec, 19445.03 docs/sec total 15930 reads, 0.017 sec, 0.0 kb/call avg, 0.0 msec/call avgno total 17 writes, 0.000 sec, 24.6 kb/call avg, 0.0 msec/call avg
  24. searching searchd { listen = 9312 listen = 9306:mysql41 log

    = /opt/local/var/log/sphinx/searchd.log query_log = /opt/local/var/log/sphinx/query.log pid_file = /opt/local/var/log/sphinx/searchd.pid max_matches = 1000 preopen_indexes = 1 unlink_old = 1 workers = threads # for RT to work binlog_path = /opt/local/var/data }
  25. searching $ searchd --config /etc/sphinx/sphinx.conf Sphinx 2.0.6-release (r3473) Copyright (c)

    2001-2012, Andrew Aksyonoff Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com) using config file '/etc/sphinx/sphinx.conf'... listening on all interfaces, port=9312 listening on all interfaces, port=9306 precaching index 'sakila_actor' precaching index 'sakila_film' precached 2 indexes in 0.005 sec $ searchd --config /etc/sphinx/sphinx.conf --stop
  26. standard flow

  27. web app sphinx sql database 1 6 2 3 4

    5
  28. php sphinx api $client = new SphinxClient(); $client->SetServer('localhost', 9312); $client->SetMatchMode(SPH_MATCH_ALL);

    $client->SetFilterRange('release_year', 2005, 2006); $client->SetArrayResult(true); $client->SetLimits(0, 1); $results = $client->Query('fight', 'sakila_film');
  29. php sphinx api addQuery RunQueries setlimits setmaxquerytime setoverride setselect setmatchmode

    setrankingmode setsortmode setweights setfieldsweights setindexweights setidrange setfilter setfilterrange setfilterfloatrange setgeoanchor setgroupby setgroupdistinct
  30. sphinx ql $ mysql -P 9306 -h 127.0.0.1 mysql> SELECT

    * FROM sakila_film WHERE MATCH('fight') AND release_year BETWEEN 2005 AND 2006 LIMIT 1; +------+--------+--------------+-------------+--------------+ | id | weight | release_year | category_id | actor_id | +------+--------+--------------+-------------+--------------+ | 11 | 1599 | 2006 | 9 | 40,81,90,174 | +------+--------+--------------+-------------+--------------+
  31. json support! source json_test { ... sql_query = select *

    from test sql_attr_json = j } $ mysql -P 9306 -h 127.0.0.1 mysql> SELECT * FROM json_test WHERE j.uid > 100; +------+------+-------------------------------------+ | id | gid | j | +------+------+-------------------------------------+ | 1 | 1 | {"name":"Alice","uid":123} | | 2 | 1 | {"name":"Bob","uid":234,"gid":12} | | 3 | 2 | {"name":"Charlie","uid":345} | | 4 | 2 | {"name":"Damon","uid":456,"gid":23} | +------+------+-------------------------------------+
  32. demo

  33. refreshing data main+delta real time indexes

  34. delta index 0:00 6:00 12:00 18:00 main index contains all

    changes from last main index generation 0:00 index merge or rotation
  35. distributed flow

  36. sphinx master sphinx sql shard sphinx sql shard ... ...

    1 2 2 3 3 5 4
  37. want more?

  38. None
  39. thanks! any questions? feel free to contact me mariusz@mariuszgil.pl or

    twitter @mariuszgil