Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MySQL Casual Talks Vol.4 「MySQL-5.6で始める全文検索 〜In...

MySQL Casual Talks Vol.4 「MySQL-5.6で始める全文検索 〜InnoDB FTS編〜」

MySQL Casual Talks Vol.4 でのライトニングトークに利用した資料です。
MySQL-5.6.4より「InnoDB FTS」としてInnoDBに全文検索機能が加わりました。
この全文検索機能を利用し、日本語の全文検索エンジンとしての可能性を探ります。

Kentaro Yoshida

April 17, 2013
Tweet

More Decks by Kentaro Yoshida

Other Decks in Technology

Transcript

  1. Sphinx SE (Search Engine) ֓ཁ by wikipedia http://en.wikipedia.org/wiki/ Sphinx_(search_engine) ೔ຊޠղઆ

    by IBM http://www.ibm.com/developerworks/jp/ opensource/library/os-sphinx/? cmp=dw&cpb=dwope&ct=dwrss&cr=dwrss& ccy=jp&csr=120911 storage pluginରԠ http://sphinxsearch.com/docs/ current.html#sphinxse-using ϦΞϧλΠϜΠϯσοΫ εඇରԠʢٙ࿭ʣ http://www.slideshare.net/conmame/ ss-12117195/9 ϕϯνϚʔΫ http://www.percona.com/files//presentations/ opensql2008_sphinx.pdf http://www.slideshare.net/billkarwin/practical-full- text-search-with-my-sql Πϯετʔϧํ๏ http://www.howtoforge.com/sphinx-as- mysql-storage-engine-sphinxse ೔ຊޠ (CJK) ରԠঢ়گ http://www.ivinco.com/blog/using-sphinx- search-engine-with-chinese-japanese-and- korean-language-documents/ MySQL-5.6.10ରԠঢ়گ http://sphinxsearch.com/bugs/view.php? id=1419
  2. શจݕࡧσϞ (mroonga) ςʔϒϧ࡞੒ CREATE TABLE search_with_mroonga ( id INT PRIMARY

    KEY AUTO_INCREMENT, subject TEXT, content TEXT, FULLTEXT INDEX (subject,content) ) ENGINE = mroonga DEFAULT CHARSET utf8 collate utf8_unicode_ci ;
  3. શจݕࡧσϞ (mroonga) σʔλొ࿥ INSERT INTO search_with_mroonga (subject, content) VALUES (

    ‘MySQL’, ’MySQLʢϚΠΤεΩϡʔΤϧʣ͸ɺΦϥΫϧ͕։ ൃ͢ΔRDBMSʢϦϨʔγϣφϧσʔλϕʔεΛ؅ཧɺӡ ༻͢ΔͨΊͷγεςϜʣͷ࣮૷ͷҰͭͰ͋Δɻ’ );
  4. શจݕࡧσϞ (InnoDB FTS) ςʔϒϧ࡞੒ CREATE TABLE search_with_innodb ( id INT

    PRIMARY KEY AUTO_INCREMENT, subject TEXT, content TEXT, FULLTEXT INDEX (subject,content) ) ENGINE=InnoDB DEFAULT CHARSET utf8 collate utf8_unicode_ci ;
  5. શจݕࡧσϞ (InnoDB FTS) ෼͔ͪॻ͖ with Ruby # -*- encoding: utf-8

    -*- require 'MeCab' wakati = MeCab::Tagger.new('-O wakati') puts wakati.parse('ຊ೔΋ྑ͍ఱؾͰ͢')
  6. શจݕࡧσϞ (InnoDB FTS) σʔλొ࿥ INSERT INTO search_with_innodb (subject, content) VALUES

    ( ‘MySQL’, ’MySQL ʢ ϚΠΤεΩϡʔΤϧ ʣ ͸ ɺ ΦϥΫ ϧ ͕ ։ൃ ͢Δ RDBMS ʢ ϦϨʔγϣφϧ σʔλ ϕʔε Λ ؅ཧ ɺ ӡ༻ ͢Δ ͨΊ ͷ γεςϜ ʣ ͷ ࣮૷ ͷ Ұͭ Ͱ ͋Δ ɻ’ );
  7. ϕϯνϚʔΫ search_with_mroonga_ngram CREATE TABLE search_with_mroonga_ngram ( id INT PRIMARY KEY

    AUTO_INCREMENT, subject TEXT, content TEXT, FULLTEXT (subject,content) ) ENGINE=mroonga DEFAULT CHARSET utf8 collate utf8_unicode_ci ;
  8. ϕϯνϚʔΫ search_with_mroonga_mecab CREATE TABLE search_with_mroonga_mecab ( id INT PRIMARY KEY

    AUTO_INCREMENT, subject TEXT, content TEXT, FULLTEXT (subject,content) COMMENT 'parser "TokenMecab"' ) ENGINE=mroonga DEFAULT CHARSET utf8 collate utf8_unicode_ci ;
  9. ϕϯνϚʔΫ search_with_innodb_ngram CREATE TABLE search_with_innodb_ngram ( id INT PRIMARY KEY

    AUTO_INCREMENT, subject TEXT, content TEXT, FULLTEXT (subject,content) ) ENGINE=InnoDB DEFAULT CHARSET utf8 collate utf8_unicode_ci ;
  10. ϕϯνϚʔΫ search_with_innodb_mecab CREATE TABLE search_with_innodb_mecab ( id INT PRIMARY KEY

    AUTO_INCREMENT, subject TEXT, content TEXT, FULLTEXT (subject,content) ) ENGINE=InnoDB DEFAULT CHARSET utf8 collate utf8_unicode_ci ;
  11. ϕϯνɿߋ৽ੑೳ 0 1000 2000 3000 4000 75ສ݅ͷొ࿥ॴཁ࣌ؒʢඵʣ 3054.54 329.8 629.57

    178.01 mroonga mecab InnoDB FTS mecab mroonga ngram InnoDB FTS ngram
  12. ϕϯνɿߋ৽ੑೳ 0 100 200 300 400 50k 100k 150k 200k

    250k 300k 350k 400k 450k 500k 550k 600k 650k 700k 750k mroonga mecab InnoDB FTS mecab mroonga ngram InnoDB FTS ngram 5ສ݅୯ҐͰͷॻ͖ࠐΈॴཁ࣌ؒʢඵʣ
  13. ϕϯνɿߋ৽ੑೳ 0 15 30 45 60 50k 100k 150k 200k

    250k 300k 350k 400k 450k 500k 550k 600k 650k 700k 750k mroonga mecab InnoDB FTS mecab 5ສ݅୯ҐͰͷॻ͖ࠐΈॴཁ࣌ؒʢඵʣ
  14. ϕϯνɿશจݕࡧੑೳ 1ϫʔυ 0 50 100 150 200 61.638 187.944 55.152

    163.824 mroonga mecab InnoDB FTS mecab mroonga ngram InnoDB FTS ngram 75ສϨίʔυʹରͯ͠5ສ୯ޠΛॱ࣍SELECTͨ͠ࡍͷ࣮ߦॴཁ࣌ؒʢඵʣ