Slide 1

Slide 1 text

SPHINX full textsearch mYSQL postgresql indexing updates mva delta real time distributed live updates Sphinx QL mariusz gil json

Slide 2

Slide 2 text

ABOUT ME

Slide 3

Slide 3 text

INTRODUCTION

Slide 4

Slide 4 text

most of apps need search

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

how to support it?

Slide 7

Slide 7 text

One tool to rule them all

Slide 8

Slide 8 text

why not sql?

Slide 9

Slide 9 text

mysql> EXPLAIN SELECT * FROM film WHERE title = 'LEGEND JEDI' \G *************************** 1. row *************************** select_type: SIMPLE type: ref possible_keys: idx_title key: idx_title rows: 1 mysql> EXPLAIN SELECT * FROM film WHERE title LIKE 'LEGEND%' \G *************************** 1. row *************************** select_type: SIMPLE type: range possible_keys: idx_title key: idx_title key_len: 767 mysql> EXPLAIN SELECT * FROM film WHERE title LIKE '%LEGEND%' \G *************************** 1. row *************************** select_type: SIMPLE table: film type: ALL possible_keys: NULL key: NULL rows: 1026

Slide 10

Slide 10 text

why not...lucene solr xapian elasticsearch whatever?

Slide 11

Slide 11 text

so... what is sphinx?

Slide 12

Slide 12 text

sphinx isfree open source full featured efficient scalable full text search

Slide 13

Slide 13 text

php perl ruby java official Sphinxapi for

Slide 14

Slide 14 text

perl c# haskell and many more third party libraries for

Slide 15

Slide 15 text

4 terms data sources attributes indexes searching

Slide 16

Slide 16 text

installation $ tar xzvf sphinx-2.0.6.tar.gz $ cd sphinx $ ./configure $ make && make install

Slide 17

Slide 17 text

The data to be indexed can generally come from very different sources: SQL databases, plain text files, HTML files, mailboxes, and so on. data sources

Slide 18

Slide 18 text

data sources source base { type = mysql sql_host = localhost sql_user = root sql_pass = sql_db = sakila sql_port = 3306 sql_sock = /opt/local/var/run/mysql55/mysqld.sock }

Slide 19

Slide 19 text

Attributes are additional values associated with each document that can be used to perform additional filtering and sorting during search. documentS and attributes

Slide 20

Slide 20 text

documentS and attributes source sakila_film : base { sql_query_pre = SET NAMES 'utf8' sql_query = \ SELECT \ f.film_id, \ f.title, \ f.description, \ f.release_year, \ f.language_id, \ f.original_language_id \ FROM \ film AS f sql_query_info = SELECT * FROM film WHERE film_id = $id sql_attr_bigint = release_year sql_attr_uint = language_id sql_attr_uint = original_language_id sql_attr_multi = bigint category_id from query; \ SELECT film_id, category_id FROM film_category sql_attr_multi = bigint actor_id from query; \ SELECT film_id, actor_id FROM film_actor }

Slide 21

Slide 21 text

To be able to answer full-text search queries fast, Sphinx needs to build a special data structure optimized for such queries from your text data. This structure is called index. indexes

Slide 22

Slide 22 text

indexes index sakila_film { source = sakila_film path = /opt/local/var/data/sphinx/sakila_film docinfo = extern charset_type = utf-8 } indexer { mem_limit = 32M }

Slide 23

Slide 23 text

indexes $ indexer --config /etc/sphinx/sphinx.conf --all Sphinx 2.0.6-release (r3473) Copyright (c) 2001-2012, Andrew Aksyonoff Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com) using config file '/etc/sphinx/sphinx.conf'... indexing index 'sakila_actor'... collected 200 docs, 0.0 MB sorted 0.0 Mhits, 100.0% done total 200 docs, 2307 bytes total 0.013 sec, 166282 bytes/sec, 14415.45 docs/sec indexing index 'sakila_film'... collected 1000 docs, 0.1 MB collected 6462 attr values sorted 0.0 Mvalues, 100.0% done sorted 0.0 Mhits, 100.0% done total 1000 docs, 108077 bytes total 0.051 sec, 2101561 bytes/sec, 19445.03 docs/sec total 15930 reads, 0.017 sec, 0.0 kb/call avg, 0.0 msec/call avgno total 17 writes, 0.000 sec, 24.6 kb/call avg, 0.0 msec/call avg

Slide 24

Slide 24 text

searching searchd { listen = 9312 listen = 9306:mysql41 log = /opt/local/var/log/sphinx/searchd.log query_log = /opt/local/var/log/sphinx/query.log pid_file = /opt/local/var/log/sphinx/searchd.pid max_matches = 1000 preopen_indexes = 1 unlink_old = 1 workers = threads # for RT to work binlog_path = /opt/local/var/data }

Slide 25

Slide 25 text

searching $ searchd --config /etc/sphinx/sphinx.conf Sphinx 2.0.6-release (r3473) Copyright (c) 2001-2012, Andrew Aksyonoff Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com) using config file '/etc/sphinx/sphinx.conf'... listening on all interfaces, port=9312 listening on all interfaces, port=9306 precaching index 'sakila_actor' precaching index 'sakila_film' precached 2 indexes in 0.005 sec $ searchd --config /etc/sphinx/sphinx.conf --stop

Slide 26

Slide 26 text

standard flow

Slide 27

Slide 27 text

web app sphinx sql database 1 6 2 3 4 5

Slide 28

Slide 28 text

php sphinx api $client = new SphinxClient(); $client->SetServer('localhost', 9312); $client->SetMatchMode(SPH_MATCH_ALL); $client->SetFilterRange('release_year', 2005, 2006); $client->SetArrayResult(true); $client->SetLimits(0, 1); $results = $client->Query('fight', 'sakila_film');

Slide 29

Slide 29 text

php sphinx api addQuery RunQueries setlimits setmaxquerytime setoverride setselect setmatchmode setrankingmode setsortmode setweights setfieldsweights setindexweights setidrange setfilter setfilterrange setfilterfloatrange setgeoanchor setgroupby setgroupdistinct

Slide 30

Slide 30 text

sphinx ql $ mysql -P 9306 -h 127.0.0.1 mysql> SELECT * FROM sakila_film WHERE MATCH('fight') AND release_year BETWEEN 2005 AND 2006 LIMIT 1; +------+--------+--------------+-------------+--------------+ | id | weight | release_year | category_id | actor_id | +------+--------+--------------+-------------+--------------+ | 11 | 1599 | 2006 | 9 | 40,81,90,174 | +------+--------+--------------+-------------+--------------+

Slide 31

Slide 31 text

json support! source json_test { ... sql_query = select * from test sql_attr_json = j } $ mysql -P 9306 -h 127.0.0.1 mysql> SELECT * FROM json_test WHERE j.uid > 100; +------+------+-------------------------------------+ | id | gid | j | +------+------+-------------------------------------+ | 1 | 1 | {"name":"Alice","uid":123} | | 2 | 1 | {"name":"Bob","uid":234,"gid":12} | | 3 | 2 | {"name":"Charlie","uid":345} | | 4 | 2 | {"name":"Damon","uid":456,"gid":23} | +------+------+-------------------------------------+

Slide 32

Slide 32 text

demo

Slide 33

Slide 33 text

refreshing data main+delta real time indexes

Slide 34

Slide 34 text

delta index 0:00 6:00 12:00 18:00 main index contains all changes from last main index generation 0:00 index merge or rotation

Slide 35

Slide 35 text

distributed flow

Slide 36

Slide 36 text

sphinx master sphinx sql shard sphinx sql shard ... ... 1 2 2 3 3 5 4

Slide 37

Slide 37 text

want more?

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

thanks! any questions? feel free to contact me [email protected] or twitter @mariuszgil