Upgrade to Pro — share decks privately, control downloads, hide ads and more …

형태소 분석기를 적용한 elasticsearch 운영

seapy
September 05, 2015

형태소 분석기를 적용한 elasticsearch 운영

2015. 09. 05 도커 서울 밋업 4번째(Open Container Korea 주최).

elasticsearch에 은전한닢 한국어 형태소 분석기를 적용하고 운영한 사례례 발표.

- 사용자 사전별로 이미지를 만들기
- nginx를 이용해 http basic auth 적용하기

seapy

September 05, 2015
Tweet

More Decks by seapy

Other Decks in Programming

Transcript

  1. ౸Ү੢ఠ Ѩ࢝ ౟ې೗੉ ݆ইઉࢲ ؊ જ਷ ҃೷ਸ ೞѱغݶ ژ ҕਬೡѱਃ

    য়ט਷ elasticsearch, ഋకࣗ ࠙ࢳӝ ৘੹੄ ௿۞झఠ ਍৔ ҃೷ Docker Seoul Meetup #4, 2015 5
  2. ਷੹ೠ׭ ࢸ஖ೞӝ2 • mecab-ko : ഋకࣗ ࠙ࢳӝ • mecab-ko-dic :

    ೠҴয ࢎ੹ • mecab-java : ݫݽܻ ־ࣻо ೧Ѿػ ߡ੹ • elasticsearch mecab-ko ೒۞Ӓੋ 2 ৃۄझ౮ࢲ஖(elasticsearch)ী ೠӖ ഋకࣗ ࠙ࢳӝ ਷੹ೠ੖(eunjeon) ੸ਊೞӝ - nacyot੄ ೐۽Ӓې߁ ੉ঠ ӝ Docker Seoul Meetup #4, 2015 11
  3. $ tar zxfv mecab-ko-XX.tar.gz $ cd mecab-ko-XX $ ./configure &&

    make && make check $ sudo make install $ tar zxfv mecab-ko-dic-XX.tar.gz $ cd mecab-ko-dic-XX $ ./configure && make $ sudo make install #... ࢤۚ ੗ࣁೠ ࢸ஖ח ҙ۲ޙࢲ2 ଵҊ 2 ৃۄझ౮ࢲ஖(elasticsearch)ী ೠӖ ഋకࣗ ࠙ࢳӝ ਷੹ೠ੖(eunjeon) ੸ਊೞӝ - nacyot੄ ೐۽Ӓې߁ ੉ঠ ӝ Docker Seoul Meetup #4, 2015 12
  4. • automake ߡ੹ ޙઁ۽ ࢸ஖ ب઺ ী۞о աח ҃਋, ׮਺җ

    э੉ ೡ ࣻ ੓णפ׮. • libmecab.so.2ܳ ଺ਸ ࣻ হח ী۞о աח ҃਋, ׮ ਺җ э੉ ೡ ࣻ ੓णפ׮. ! Docker Seoul Meetup #4, 2015 13
  5. ౸Ү੢ఠী ࢎਊೞҊ ੓ח Docker ੉޷૑ $ docker run -d --name

    elasticsearch \ -v /home/data-xxx:/data \ -p 9200:9200 n42corp/elasticsearch-n42 • ೠҴয ഋకࣗ ࠙ࢳӝ + elasticsearch 3 • ೠҴয Ѩ࢝ਊ ࢎਊ੗ࢎ੹/ز੄য ݾ۾4 • ݻѐ উغ૑݅ ౸Ү੢ఠীࢲ ࢎਊೞחѪ Ӓ؀۽ 4 https://github.com/n42corp/search-ko-dic 3 https://github.com/nacyot/elasticsearch Docker Seoul Meetup #4, 2015 22
  6. Nginx ޖܐ server { listen *:80; location / { auth_basic

    "ElasticSearch"; auth_basic_user_file /etc/nginx/search.htpasswd; try_files @elasticsearch @elasticsearch; } location @elasticsearch { proxy_pass http://xxxx.com:9200; } } Docker Seoul Meetup #4, 2015 25
  7. Dockerizing $ docker run -d -p 80:80 \ --link elasticsearch

    \ -e SEARCH_USER='xxx' \ -e SEARCH_PASSWORD_ENCRYPTED='encrypted password' \ n42corp/elasticsearch-proxy-nginx $ curl -XGET http://xxx:passwd@localhost Docker Seoul Meetup #4, 2015 26
  8. docker-compose.yml elasticsearch: image: n42corp/elasticsearch volumes: - /home/data:/data nginx: image: n42corp/elasticsearch-proxy-nginx

    links: - elasticsearch ports: - "80:80" environment: - SEARCH_USER=username - SEARCH_PASSWORD_ENCRYPTED=$apr1$o2nD6/0t$U6DaCfEqLaIZptGKYw84Y. Docker Seoul Meetup #4, 2015 28
  9. $ docker-compose up -d $ curl -XGET http://username:[email protected]/ { "status"

    : 200, "name" : "Fusion", "cluster_name" : "elasticsearch", "version" : { "number" : "1.7.0", "build_hash" : "929b9739cae115e73c346cb5f9a6f24ba735a743", "build_timestamp" : "2015-07-16T14:31:07Z", "build_snapshot" : false, "lucene_version" : "4.10.4" }, "tagline" : "You Know, for Search" } $ docker-compose stop Docker Seoul Meetup #4, 2015 29
  10. ONBUILD n42corp/elastcisearch Dockerfile #... ࢤۚ # ࢎਊ੗ ࢎ੹ ࢸ஖ ONBUILD

    COPY servicecustom.csv /opt/mecab-ko-dic-2.0.1-20150707/user-dic/servicecustom.csv ONBUILD RUN cd /opt/mecab-ko-dic-2.0.1-20150707 &&\ tools/add-userdic.sh &&\ make install # ਬ੄য ୶о ONBUILD COPY synonym.txt /elasticsearch/config/synonym.txt #... ࢤۚ Docker Seoul Meetup #4, 2015 33
  11. ࢲ࠺झ߹ ੉޷૑ ࠽٘ Dockerfile ࢤࢿറ ࢎਊ੗ࢎ੹(servicecustom.csv), ਬ੄য(synonym.txt)ܳ4 Dockerfileҗ э਷ ಫ؊ী

    ࠂ ࢎ FROM n42corp/elasticsearch $ docker build -t n42corp/elasticsearch-n42 . 4 https://github.com/n42corp/search-ko-dic Docker Seoul Meetup #4, 2015 34
  12. ೞ૑݅ Docker ۄݶ যڄө? ࢲ۽ ׮ܲ Docker ഐझ౟ ՙܻח ࢲ۽

    ଺૑ ޅ೧ਃ Docker Seoul Meetup #4, 2015 38
  13. IPܳ ૒੽ ࢸ੿ $ docker run -d -p 9200:9200 -p

    9300:9300 \ -v /home/data-xxx:/data \ n42corp/elasticsearch \ --cluster.name=pangyo_market \ --node.name=$(hostname) \ --network.publish_host=$(hostname -i) \ --discovery.zen.ping.multicast.enabled=false \ --discovery.zen.ping.unicast.hosts=x.x.x.x:9300,y.y.y.y:9300 Docker Seoul Meetup #4, 2015 39
  14. AWSܳ ࢎਊೠ׮ݶ5 $ docker run -d --name elasticsearch \ -v

    /home/data-xxx:/data \ -e AWS_ACCESS_KEY_ID=xxxx \ -e AWS_SECRET_KEY=yyyy \ -p 9200:9200 \ n42corp/elasticsearch \ --cluster.name=pangyo_market \ --node.name=$(hostname) \ --discovery.type=ec2 ੜ উغ֎ਃ ƑƑ 5 https://github.com/elastic/elasticsearch-cloud-aws Docker Seoul Meetup #4, 2015 40
  15. ࢲߡী ૒੽ ੽Ӕ೧ࢲ प೯ • ࢲߡ݃׮ ੌੌ੉ ੽Ӕ೧ঠ ػ׮ •

    ജ҃߸ࣻ, ࠅܬ ١ प೯ ২࣌ਸ झ௼݀౟۽ ٜ݅Ҋ ߔস • ࢲ࠺झ ઺ױ दр੉ ߊࢤغ޲۽ haproxy, nginx, ELB١ਸ ҳࢿೡ ೙ਃо ੓׮ • ੘਷ ӏݽীࢲח അप੸ੋ ؀উ੉ۄҊ ࢤп Docker Seoul Meetup #4, 2015 42
  16. AWS ECS • ECS ࢎਊਃӘ੉ ߹ب۽ হ׮ • ELB৬ োز೧ࢲ

    ޖ઺ױ ߓನܳ पഅ • ECS ੋझఢझਊ OSܳ ࢎਊ೧ঠغࢲ ࢲߡ ೠ؀۽ ӝ ઓ ࢲ࠺झ৬ ECSܳ زदী ࢎਊೞ૑ ޅೞחѪীࢲ ਃ Ә ߊࢤ + kubenertes, fleet • ೠ؀ ੿ب प೯ೞҊ र਷ؘ ॳӝীח ࠗ׸ Docker Seoul Meetup #4, 2015 43
  17. docker-machine & docker-compose • ѐߊ ஹೊఠীࢲ ਗѺ੄ ࢲߡী ੽Ӕ(docker- machine)

    • ޷ܻ ੿੄ػ yaml ౵ੌ۽ प೯(docker-compose) • docker-composeо production ীࢲ ই૒ ࠺୶ • ޖ઺ױ ߓನܳ ਤ೧ যڌѱ ҳࢿ ೧ঠೡ૑ Ҋ޹ Docker Seoul Meetup #4, 2015 44
  18. sinatra ੉ਊ೧ࢲ ݅ٚറ dockerizing- $ docker run -d -p 4567:4567

    \ n42corp/korean-morpheme-sinatra # posids 150(ੌ߈ݺࢎ),151(Ҋਬݺࢎ)ী ೧׼ೞח ഋకࣗ݅ ߈ജ $ curl -XGET 'http://192.168.59.103:4567/morpheme' \ -d 'text=׀ ݃ࢎ૑ӝ ࢎਃ&posids=150,151' - https://github.com/n42corp/dockerfiles/tree/master/korean-morpheme-sinatra Docker Seoul Meetup #4, 2015 47
  19. { "morps": [ { "surface": "׀", "posid": 150, "desc": "ੌ߈

    ݺࢎ", "feature": "NNG,*,T,׀,*,*,*,*" }, { "surface": "݃ࢎ૑ӝ", "posid": 151, "desc": "Ҋਬ ݺࢎ", "feature": "NNP,*,F,݃ࢎ૑ӝ,*,*,*,*" }, { "surface": "ࢎਃ", "posid": 150, "desc": "ੌ߈ ݺࢎ", "feature": "NNG,*,F,ࢎਃ,*,*,*,*" } ] } Docker Seoul Meetup #4, 2015 48
  20. RORLab ࢲߡীࢲ प೯઺ curl -XGET 'http://rorla2.rorlab.org:4567/morpheme' \ -d 'text=׀ ݃ࢎ૑ӝ

    ࢎਃ&posids=150,151' ցޖ ݆੉ ਃ୒ೞݶ ઺ױؼࣻب ੓ਵפ పझ౟ਊਵ۽݅ ॄ઱ࣁਃ Docker Seoul Meetup #4, 2015 49