Slide 1

Slide 1 text

Software stack for the visualization and analysis around Apache Solr with Parallel SQL [email protected] Some rights reserved by Sebastian Sikora

Slide 2

Slide 2 text

Self-introduction େਢլ ູ Minoru OSUKA
 [email protected] Committer and PMC Member
 on ManifoldCF Project
 at Apache Software Foundation
 http://manifoldcf.apache.org Contributor on Solr Project
 at Apache Software Foundation
 http://lucene.apache.org/solr/ Author ofʦվగ৽൛ʧApache Solr ೖ໳
 http://gihyo.jp/book/2014/978-4-7741-6163-1 Some rights reserved by QuinnDombrowski

Slide 3

Slide 3 text

Agenda Parallel SQL SQL Request Handler JDBC SQL Syntax ETL and Search The Elastic Stack Apache Family Visualizing Data Demonstration Some rights reserved by scui3asteveo

Slide 4

Slide 4 text

Parallel SQL MapReduce-like shufflingɺ·ͨ͸ JSON Facet API Λ࢖ͬͯूܭɻ SQL ͸ Prest SQL Parser Ͱ Solr Query ΁ม׵ ͞ΕɺStreaming API Λར༻ͯ͠σʔλͷݕࡧ Λߦ͏ɻ SQL Request Handler ͱ JDBC Driver Ͱఏڙɻ Some rights reserved by cogdogblog

Slide 5

Slide 5 text

Parallel SQL SQL จ͸ SolrCloud ͷෳ਺ͷϫʔΧʔϊʔυͰฒྻ࣮ߦ͢Δ Streaming Expression ʹίϯ ύΠϧ͞ΕΔɻ SolrCloud ͷ Collection ͸ϦϨʔγϣφϧͳςʔϒϧͱͯ͠ந৅Խ͞ΕΔɻ WHERE ۟͸ Lucene / Solr ͷΫΤϦߏจΛαϙʔτɻ άϧʔϐϯά΍ूܭͷΑ͏ͳଟ͘ͷΦϖϨʔγϣϯ͸ɺStreaming expressions ͷϦΞϧλ ΠϜ MapReduce ͷػೳΛར༻ͯ͠ɺࣗಈతʹฒྻԽ͞ΕΔɻ άϧʔϐϯά / ूܭ͸ύϑΥʔϚϯε͕௿Լ͢Δࣄ͕͋Δ͕ɺJSON Facet API ͰύϑΥʔ ϚϯεΛվળ͢Δࣄ͕Ͱ͖Δɻ ݱࡏ͸ SolrCloud ϞʔυͷΈαϙʔτɻελϯυΞϩʔϯϞʔυͰ͸ະରԠɻ SQL ػೳ͸ݱࡏɺ࣮ݧతͰશͯͷ SQL ߏจ͸࣮૷͸͞Ε͍ͯͳ͍ɻ

Slide 6

Slide 6 text

SQL Request Handler σϑΥϧτͰ SQL Λड͚෇͚Δɺ/sql ͱ͍͏໊ લͷϦΫΤετϋϯυϥʔΛఏڙɻ ϦΫΤετϋϯυϥʔʹରͯ͠ɺSQL จΛൃߦ ͢Δ͜ͱͰɺSolr ͷΠϯσοΫε͔Β৚݅ʹ Ϛον͢ΔυΩϡϝϯτ (σʔλ) Λநग़ɻ Some rights reserved by christiaan_008

Slide 7

Slide 7 text

SQL Request Handler • ϦΫΤεταϯϓϧ • Ϩεϙϯεαϯϓϧ $ curl --data-urlencode \
 'stmt=SELECT to, count(*) FROM collection4 GROUP BY to ORDER BY count(*) desc LIMIT 10' \
 http://localhost:8983/solr/collection4/sql?aggregationMode=facet {"result-set":{"docs":[
 {"count(*)":9158,"to":"[email protected]"},
 {"count(*)":6244,"to":"[email protected]"},
 {"count(*)":5874,"to":"[email protected]"},
 {"count(*)":5867,"to":"[email protected]"},
 {"count(*)":5595,"to":"[email protected]"},
 {"count(*)":4904,"to":"[email protected]"},
 {"count(*)":4622,"to":"[email protected]"},
 {"count(*)":3819,"to":"[email protected]"},
 {"count(*)":3678,"to":"[email protected]"},
 {"count(*)":3653,"to":"[email protected]"},
 {"EOF":"true","RESPONSE_TIME":10}]}
 }

Slide 8

Slide 8 text

JDBC Driver SQL αϙʔτʹ൐͍ɺैདྷ͔Β͋ΔΫϥΠ ΞϯτϥΠϒϥϦͷ SolrJ ( Java ΫϥΠΞϯ τϥΠϒϥϦ ) ʹɺJDBC υϥΠόʔ͕૊Έ ࠐ·ΕΔɻ JDBC υϥΠόʔܦ༝ͰɺSQL Request Handler Λར༻ͯ͠ɺSolr ͷΠϯσοΫε΁ ΞΫηεՄೳɻ Some rights reserved by WashuOtaku

Slide 9

Slide 9 text

JDBC Driver JDBC υϥΠόʔར༻࣌ʹ͸ҎԼͷϑΝΠϧΛ Java ΫϥεύεͷσΟϨΫτϦ΁഑ஔɻ JDBC υϥΠόʔΫϥε໊ org.apache.solr.client.solrj.io.sql.DriverImpl ઀ଓจࣈྻ jdbc:solr:///?collection= $ find solr/solr-6.0.1/dist -name *.jar | grep -E "(/solr-solrj-.*\.jar)|(/solrj-lib/.*\.jar)"
 solr/solr-6.0.1/dist/solr-solrj-6.0.1.jar
 solr/solr-6.0.1/dist/solrj-lib/commons-io-2.4.jar
 solr/solr-6.0.1/dist/solrj-lib/httpclient-4.4.1.jar
 solr/solr-6.0.1/dist/solrj-lib/httpcore-4.4.1.jar
 solr/solr-6.0.1/dist/solrj-lib/httpmime-4.4.1.jar
 solr/solr-6.0.1/dist/solrj-lib/jcl-over-slf4j-1.7.7.jar
 solr/solr-6.0.1/dist/solrj-lib/noggit-0.6.jar
 solr/solr-6.0.1/dist/solrj-lib/slf4j-api-1.7.7.jar
 solr/solr-6.0.1/dist/solrj-lib/stax2-api-3.1.4.jar
 solr/solr-6.0.1/dist/solrj-lib/woodstox-core-asl-4.4.1.jar
 solr/solr-6.0.1/dist/solrj-lib/zookeeper-3.4.6.jar

Slide 10

Slide 10 text

I Love Syntax Tee Shirt SQL Syntax SELECT จ • WHERE ۟ • LIMIT ۟ • ORDER BY ۟ • DISTINCT ۟ • GROUP BY ۟ ౷ܭؔ਺ countɺminɺmaxɺsumɺavg

Slide 11

Slide 11 text

SQL Syntax fieldC ΛϑϨʔζ 'term1 term2' Ͱݕࡧ fieldC Λ 'term1' ͱ 'term2' ͷ OR ৚݅Ͱݕࡧ fieldC ͕ 0 Ҏ্ɺ100 ҎԼͷ৚݅Ͱݕࡧ ( Solr ͷൣғݕࡧ ) ෳ਺৚݅ͷ૊Έ߹Θͤ NOT ݕࡧ WHERE fieldC = 'term1 term2' WHERE fieldC = '(term1 term2)' WHERE fieldC = '[0 TO 100]' WHERE ((fieldC = 'term1' AND fieldA = 'term2') OR (fieldB = 'term3')) WHERE (fieldA = 'term1') AND NOT (fieldB = 'term2')

Slide 12

Slide 12 text

Visualization and Analysis • Data Source
 ֎෦ͷσʔλͷೖखݩͱͳΔ΋ͷɻ
 σʔλϕεͳͲɻ • Data Lake
 େྔͷσʔλΛѻ͏ͨΊͷετϨʔδɺ
 ϑΝΠϧஔ͖৔ͳͲɻ • Data Mart
 ༻్ɺ໨తͳͲʹԠͯ͡ඞཁͳ΋ͷ͚ͩΛநग़ɺ
 ूܭ͠ɺར༻͠΍͍͢ܗʹ֨ೲͨ͠΋ͷɻ • User Interface
 σʔλʹରͯ͠ΞυϗοΫͳΫΤϦΛ༻͍ͯ
 ݕࡧɺநग़Λߦ͍ɺද΍άϥϑͳͲΛඳըɻ Some rights reserved by MMU Engage

Slide 13

Slide 13 text

Data Lake Data Source Visualization and Analysis User Interface Data Mart Data Service Log Storage Storage Search Engine KVS RDBMS Search Graph Chart Storage Data Stream

Slide 14

Slide 14 text

The Elastic Stack Logstash
 https://www.elastic.co/products/logstash Πϕϯτ΍ϩάΛ؅ཧ͢ΔͨΊͷπʔϧɻ
 ೚ҙͷγεςϜ͔Βϩά΍࣌ؒϕʔεͷ
 ΠϕϯτσʔλΛऔಘͰ͖Δɻ Elasticsearch
 https://www.elastic.co/products/elasticsearch Lucene Λϕʔεʹͨ͠ɺॊೈͰڧྗͳ
 Φʔϓϯιʔεͷ෼ࢄՄೳͳϦΞϧλΠϜ
 ݕࡧΤϯδϯɻ Kibana
 https://www.elastic.co/products/kibana Elasticsearch ΛόοΫΤϯυʹͨ͠ɺ
 μογϡϘʔυػೳΛ΋ͭσʔλ
 ͷՄࢹԽιϑτ΢ΣΞɻ Some rights reserved by paisleyorguk

Slide 15

Slide 15 text

Apache Family Apache Flume
 http://flume.apache.org େྔͷσʔλΛޮ཰తʹऩूɺू໿͓Αͼ
 Ҡಈͤ͞ΔͨΊͷ෼ࢄαʔϏεɻ Apache Solr
 http://lucene.apache.org/solr/ Apache Lucene ϓϩδΣΫτ͔Β೿ੜͨ͠ɺ
 ߴ଎ͳΦʔϓϯιʔεͷݕࡧϓϥοτϑΥʔϜɻ Apache Zeppelin
 https://zeppelin.incubator.apache.org Web ϕʔεͷΠϯλϥΫςΟϒ UIɻ
 SQL ΍ Streaming ͷίϚϯυͷ݁ՌΛදʹ੔ܗ
 ͨ͠ΓɺάϥϑΛϓϩοτ͢Δ͜ͱ͕Ͱ͖Δɻ Some rights reserved by QuinnDombrowski

Slide 16

Slide 16 text

Data Lake / Data Mart Data Source In The Case of Solr User Interface Data Stream Banana / Silk SQuirreL SQL github.com
 mosuka/logstash-output-solr + github.com
 mosuka/fluent-plugin-output-solr +

Slide 17

Slide 17 text

Apache Solr Some rights reserved by NASA Goddard Photo and Video Data Lake Data Source User Interface Data Mart Data Service Log Storage Storage Search Engine KVS RDBMS Search Graph Chart Storage Data Stream

Slide 18

Slide 18 text

Apache Solr Solr Λ σʔλͷӬଓԽͷͨΊͷ Data Lakeɺ·ͨɺඞཁͳσʔλͷநग़ͷͨΊͷ Data Mart ͱͯ͠ར༻Λ͢Δɻ Data-driven schemaless mode Λద༻ɺautoCommitɺsolftAutoCommit Λ༗ޮʹ͓ͯ͘͠ɻ add-unknown-fields-to-the-schema Update Request Processor Chain Ͱޡղऍ͞Εͯࠔ ΔϑΟʔϧυ͕͋Δ৔߹͸͋Β͔͡ΊɺSchema API Λ࢖ͬͯϑΟʔϧυఆٛΛߦ͏ɻ Solr 6.0.1 ʹ͸ɺγϯάϧγϟʔυͷ SolrCloud ؀ڥͰɺSQL Request Handler ͷ aggregation ϞʔυΛ facet ͱͨ͠৔߹ (aggregationMode=facet)ɺྫ֎͕ൃੜ͢Δόά͕͋ Δɻ ClassCastException occurs in /sql handler with GROUP BY aggregationMode=facet and single shard https://github.com/apache/lucene-solr ࠷৽ͷιʔεΛར༻͢Δɻ

Slide 19

Slide 19 text

Apache Flume Some rights reserved by post406 Data Lake Data Source User Interface Data Mart Data Service Log Storage Storage Search Engine KVS RDBMS Search Graph Chart Storage Data Stream

Slide 20

Slide 20 text

Apache Flume Flume Λ Data Source ͔Βσʔλநग़͠ɺData Lake ΁ͷసૹΛ͢ΔͨΊʹར༻Λ ͢ΔɻετϦʔϛϯάॲཧʹΑΓɺ΄΅ϦΞϧλΠϜͰͷॲཧ͕Մೳɻ ݱࡏϦϦʔε͞Ε͍ͯΔɺFlume 1.6.0 Ͱ͸ɺαϙʔτ͞Ε͍ͯΔ Solr ͷόʔδϣ ϯ͸ 4.3 ͱݹ͍ɻ ελϯυΞϩʔϯϞʔυͷ Solr ΁͸σʔλΛૹ৴Ͱ͖Δ͕ɺSolr 5.x ͔Βͷ APIɺ࢓༷ͷมߋʹ൐͍ɺSolrCloud Ϟʔυͷ Solr ʹ͸σʔλΛૹ৴Ͱ͖ͳ͍ɻ https://issues.apache.org/jira/browse/FLUME-2919 Solr 6.x ରԠͨ͠ https://github.com/mosuka/flume Λར༻͢Δ

Slide 21

Slide 21 text

Apache Zeppelin Some rights reserved by probabilistic Data Lake Data Source User Interface Data Mart Data Service Log Storage Storage Search Engine KVS RDBMS Search Graph Chart Storage Data Stream

Slide 22

Slide 22 text

Apache Zeppelin Zeppelin Λ User Interface ͱͯ͠ར༻͠ɺData Mart ͷσʔλΛඞཁʹԠͯ͡ɺΞυ ϗοΫͳΫΤϦʹΑΔɺΠϯλϥΫςΟϒͳσʔλ໰͍߹Θͤɾ෼ੳ͕ՄೳͱͳΔɻ ݱࡏϦϦʔε͞Ε͍ͯΔɺZeppelin 0.5.6-incubating Ͱ͸ɺJDBC Driver αϙʔτ͞ Ε͍ͯͳ͍ɻ JDBC ରԠ͸ 0.6.0 Ͱ༧ఆɻ JDBC Driver ରԠͨ͠ https://github.com/apache/incubator-zeppelin Λར༻͢ Δɻ Solr ΁ͷ઀ଓઃఆํ๏͸Լه URL Ͱ΋঺հ͞Ε͍ͯΔɻ Solr JDBC - Apache Zeppelin (incubating)
 https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=63406991

Slide 23

Slide 23 text

Lucidworks Banana Some rights reserved by danbri Data Lake Data Source User Interface Data Mart Data Service Log Storage Storage Search Engine KVS RDBMS Search Graph Chart Storage Data Stream

Slide 24

Slide 24 text

Lucidworks Banana https://github.com/lucidworks/banana Banana Λ User Interface ͱͯ͠ར༻͠ɺData Mart ͷσʔλΛϦΞϧλΠϜʹՄ ࢹԽΛ͢Δ͜ͱ͕͕ՄೳͱͳΔɻ Banana ͸ Kibana 3.x Λ Solr ޲͚ʹҠ২ͨ͠΋ͷɻ ݱࡏϦϦʔε͞Ε͍ͯΔɺBanana 1.6.0 Ͱ͸ɺϚϧνϊʔυߏ੒ͨ͠৔߹ɺ Dashboard ઃఆΛ Solr ʹอଘ͢ΔࡍɺϦϞʔτͷ Solr Λࢀর͢Δ͜ͱ͕Ͱ͖ͳ ͍όά͕͋Δɻ·ͨɺSolr ίϨΫγϣϯ໊ͷ࢓༷มߋʹ൐͍ɺDashboard ઃఆอ ଘ༻ͷίϨΫγϣϯ໊Λมߋ͢Δඞཁ͕͋Δɻ Pull Request : https://github.com/lucidworks/banana/pull/270 ্هόάରԠͨ͠ https://github.com/mosuka/banana Λར༻͢Δɻ

Slide 25

Slide 25 text

Lucidworks Silk Some rights reserved by karen2754 Data Lake Data Source User Interface Data Mart Data Service Log Storage Storage Search Engine KVS RDBMS Search Graph Chart Storage Data Stream

Slide 26

Slide 26 text

Lucidworks Silk https://github.com/lucidworks/silk Silk Λ User Interface ͱͯ͠ར༻͠ɺData Mart ͷσʔλΛϦΞϧλΠϜʹՄ ࢹԽΛ͢Δ͜ͱ͕͕ՄೳͱͳΔɻ Silk ͸ Kibana 4.x Λ Solr ޲͚ʹҠ২ͨ͠΋ͷɻ ݱࡏͷ dev ϒϥϯν͸ Solr 6.x ͔Βͷ schema.xml ʹະରԠͷField type ͷه ड़෦෼͕͋Δɻ Pull Request : https://github.com/lucidworks/silk/pull/11 ରԠͨ͠ https://github.com/mosuka/silk Λར༻͢Δɻ

Slide 27

Slide 27 text

SQuirreL SQL Some rights reserved by likeaduck Data Lake Data Source User Interface Data Mart Data Service Log Storage Storage Search Engine KVS RDBMS Search Graph Chart Storage Data Stream

Slide 28

Slide 28 text

SQuirreL SQL http://squirrel-sql.sourceforge.net SQuirrel SQL ͸ άϥϑΟΧϧͳσεΫτοϓΞϓϦέʔγϣϯɻ Java Ͱ։ൃ͞Ε͓ͯΓɺJVM ͕Πϯετʔϧ͞Ε͍ͯΔϚγϯͰ͋Ε͹ɺ OS Λ໰Θ࣮ͣߦՄೳɻ JDBC Driver Λఏڙ͢ΔɺϦϨʔγϣφϧσʔλϕʔεͰ͋Ε͹ɺ઀ଓՄೳɻ Solr ΁ͷ઀ଓઃఆํ๏͸Լه URL Ͱ΋঺հ͞Ε͍ͯΔɻ Solr JDBC - SQuirreL SQL
 https://cwiki.apache.org/confluence/display/solr/Solr+JDBC+-+SQuirreL +SQL

Slide 29

Slide 29 text

Demonstration Some rights reserved by Sebastian Sikora User Interface Data Lake /Data Mart Data Source Data Lake Data Source User Interface Data Mart Data Service Log Storage Storage Search Engine KVS RDBMS Search Graph Chart Storage Data Stream Banana / Silk SQuirreL SQL

Slide 30

Slide 30 text

Demonstration http://qiita.com/mosuka/items/c8185d759553535452f6

Slide 31

Slide 31 text

Demonstration https://github.com/mosuka/the-18th-lucene-solr-meetup

Slide 32

Slide 32 text

Some rights reserved by Miss Dilettante Documents and Software Apache Solr Reference Guide
 https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide Flume 1.6.0 User Guide
 https://flume.apache.org/FlumeUserGuide.html Zeppelin 0.6.0-SNAPSHOT document
 https://zeppelin.incubator.apache.org/docs/0.6.0-SNAPSHOT/ lucidworks/banana: Banana for Solr - A Port of Kibana
 https://github.com/lucidworks/banana lucidworks/silk: Silk is a port of Kibana 4 project.
 https://github.com/lucidworks/silk Cloudera Morphlines Reference Guide
 http://www.slideshare.net/cloudera/using-morphlines-for-onthefly-etl OSSͷπʔϧʮSolrʯʮFlumeʯʮBananaʯͷ૊Έ߹ΘͤʹΑΔ σʔλՄࢹԽϓϥοτϑΥʔϜߏங
 https://codezine.jp/article/detail/8707 ʦվగ৽൛ʧApache Solrೖ໳――ΦʔϓϯιʔεશจݕࡧΤϯδϯ
 http://gihyo.jp/book/2014/978-4-7741-6163-1 αʔόʗΠϯϑϥΤϯδχΞཆ੒ಡຊ ϩάऩूʙՄࢹԽฤ
 http://gihyo.jp/book/2014/978-4-7741-6983-5

Slide 33

Slide 33 text

Documents and Software Apache Solr
 https://cwiki.apache.org/confluence/display/solr/ Apache+Solr+Reference+Guide Apache Flume
 https://flume.apache.org/FlumeUserGuide.html Logstash
 https://codezine.jp/article/detail/8707 logstash-output-solr
 http://gihyo.jp/book/2014/978-4-7741-6163-1 Fluentd
 http://gihyo.jp/book/2014/978-4-7741-6163-1 fluent-plugin-output-solr
 http://gihyo.jp/book/2014/978-4-7741-6983-5 Apache Zeppelin
 https://zeppelin.incubator.apache.org/docs/0.6.0- SNAPSHOT/ Lucidworks Banana
 https://github.com/lucidworks/banana Lucidworks Silk
 https://github.com/lucidworks/silk SQuirrel SQL
 http://www.slideshare.net/cloudera/using-morphlines- for-onthefly-etl

Slide 34

Slide 34 text

No content