Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Enabling Enterprise Search Platform with Elastic Stack

Enabling Enterprise Search Platform with Elastic Stack

ElasticsearchとLogstashで始めるEnterprise Search Platform
Elasticsearch勉強会スライド
2016年6月27日

Kosho Owa

June 27, 2016
Tweet

More Decks by Kosho Owa

Other Decks in Technology

Transcript

  1. ‹#› Kosho Owa, Solutions Architect, Elastic Elasticsearch Meetup Tokyo, 2016-06-27

    ElasticsearchͱLogstashͰ࢝ΊΔ Enterprise Search Platform
  2. υΩϡϝϯτͷ࡞੒ɺߋ৽ɺ࡟আͷݕग़ 5 ΠϯλʔϑΣΠε ಛ௃ inotify ଟ͘ͷσΟετϦϏϡʔγϣϯͰར༻Մೳ ؂ࢹର৅σΟϨΫτϦ͸શͯྻڍ͢Δඞཁ͕͋Δ fanotify มߋ͕ߦΘΕΔલʹڐՄɾෆڐՄΛܾΊΒΕΔ ରԠ͍ͯ͠ͳ͍σΟετϦϏϡʔγϣϯ͕ଟ͍

    Linux Security Module ϑΝΠϧʹର͢Δଟ͘ͷΦϖϨʔγϣϯʹରԠ͍ͯ͠Δ ΧʔωϧϞδϡʔϧͱͯ͠࡞੒͢Δඞཁ͕͋Δ Samba VFS - full_audit ଟ͘ͷσΟετϦϏϡʔγϣϯͰར༻Մೳ ؂ࠪϩάʹϑΝΠϧͷมߋ͕ग़ྗ͞ΕΔ
  3. vfs_full_audit ग़ྗαϯϓϧ 6 Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|open|ok|w|sample.txt Jun

    18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|fstat|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|kernel_flock|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|create_file|ok|0x12019f|file| open_if|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|stat|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|sys_acl_get_file|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|get_nt_acl|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|strict_lock|ok|sample.txt:0-9:0 Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|pread|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|strict_unlock|ok|sample.txt: 0-9:0 Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|fstat|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|ntimes|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|strict_lock|ok|sample.txt:0-9:1 Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|fstat|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|pwrite|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|strict_unlock|ok|sample.txt: 0-9:1 Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|kernel_flock|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|close|ok|sample.txt
  4. Sambaͷઃఆ • SambaͰڞ༗͍ͯ͠ΔྖҬͷvfs objectͱͯ͠full_auditΛࢦఆ • pwrite, rename, unlink ΠϕϯτͷΈ؂ࢹ͢Δ 7

    # /etc/samba/smb.conf [public] comment = Public Stuff path = /home/samba … vfs objects = full_audit full_audit:success = pwrite rename unlink full_audit:failure = none
  5. υΩϡϝϯτͷύʔε Mapper Attachments Plugin 8 $ cd elasticserach $ bin/plugin

    install mapper-attachments • Elasticsearch Plugins and Integrations > Mapper Plugins > Mapper Attachments Plugin -https:// www.elastic.co/guide/en/elasticsearch/plugins/2.3/mapper-attachments.html • Elasticsearch΁ͷΠϯετʔϧ • PPT, XLS, PDFͳͲͷҰൠతͳϑΥʔϚοτͷυΩϡϝϯτΛTikaΛ࢖༻ ͯ͠ςΩετ৘ใΛൈ͖ग़͠ɺΠϯσοΫε͢Δ • Elasticsearch 5.0.0Ͱ͸ingest-attachmentʹஔ͖׵͑ݟࠐΈ
  6. Mappingͷ࡞੒ 9 $ curl -d localhost:9200/docs -d ‘ { "mappings"

    : { "_default_" : { "properties" : { "file" : { "type" : "attachment", "fields" : { "content" : { "type" : "string", "store" : true, "term_vector" : "with_positions_offsets", "analyzer" : "kuromoji" }, "author" : { "type" : "string", "store" : true, "analyzer" : "kuromoji" }, "title" : { "type" : "string", "store" : true, "analyzer" : "kuromoji" }, "name" : { "type" : "string", "store" : true, "analyzer" : "kuromoji" }, "date" : { "type" : "string", "store" : true }, "keywords" : { "type" : "string", "store" : true }, "content_type" : { "type" : "string", "store" : true }, "content_length" : { "type" : "string", "store" : true }, "language" : { "type" : "string", "store" : true } }}}}}}’
  7. σʔλͷ౤ೖ • ϑΝΠϧͷύεͷSHA-1ΛElasticsearchͷυΩϡϝϯτͷ_idͱͯ͠࠾༻͢Δ • ϑΝΠϧͷ಺༰ΛBase64ͰΤϯίʔυͯ͠ɺ_contentϑΟʔϧυͷ಺༰ͱ͠ ͯPUT͢Δ • ϑΝΠϧ͕࡟আ΍໊લ͕มߋ͞Εͨ৔߹ʹ͸ݹ͍υΩϡϝϯτΛDELETE͢Δ 10 $

    curl localhost:9200/docs/doc/`echo sample.txt | sha1sum | cut -d’ ‘ -f1` -d ‘ { "_content": “ewogICJtYXBwaW5nc....gfQp9Cg==" }' $ curl -XDELETE localhost:9200/docs/doc/`echo sample.txt | md5sum | cut -d’ ‘ -f1`
  8. ΠϕϯτͷύΠϓϥΠϯॲཧ Logstash 11 input filter output stdin file syslog jdbc

    kafka s3 … grok geoip anonymize date mutate ruby csv … elasticsearch file csv http kafka stdout syslog …
  9. Logstash - filter {} 1/2 full_audit ϩάͷύʔεͱɺ_idͷܭࢉ 13 filter {

    grok { match => { "message" => "%{SYSLOGTIMESTAMP:timestamp} %{HOSTNAME:hostname} %{DATA:process}: % {USER:user}\|%{IP:clientip}\|%{DATA:operation}\|%{DATA:result}\|%{GREEDYDATA:file}" } add_field => { "file_hash" => "%{file}" } } anonymize { key => "something_secret" algorithm => "SHA1" fields => ["file_hash"] } …
  10. Logstash - filter {} 2/2 ϑΝΠϧ໊มߋ࣌ͷ_idͷܭࢉ 14 filter { …

    if "rename" in [operation] { grok { match => { "file" => "%{DATA:file_prev}\|%{GREEDYDATA:file_aft}" } add_field => { "file_prev_hash" => "%{file_prev}" "file_aft_hash" => "%{file_aft}" } } anonymize { key => "something_secret" algorithm => "SHA1" fields => ["file_prev_hash", "file_aft_hash"] } }
  11. Logstash - output {} 1/3 ϑΝΠϧͷ࡞੒ɺߋ৽ 15 output { if

    "pwrite" in [operation] { exec { command => "temp=$(mktemp) ; echo \{ \"file\": \{ \"_content\": \”$(base64 /home/samba/ {file})\”, \"_name\": \"%{file}\"\}\} > $temp ; curl -XPUT localhost:9200/docs/doc/%{file_hash} - d@$temp; rm $temp" } } … }
  12. Logstash - output {} 2/3 ϑΝΠϧ໊ͷมߋ 16 output { …

    if "rename" in [operation] { exec { command => "curl -XDELETE localhost:9200/docs/doc/%{file_prev_hash}" } exec { command => "temp=$(mktemp) ; echo \{ \"file\": \{ \"_content\": \"$(base64 /home/samba/% {file_aft})\”, \"_name\": \"%{file_aft}\"\}\} > $temp ; curl -XPUT localhost:9200/docs/doc/% {file_aft_hash} -d@$temp; rm $temp" } } … }
  13. Logstash - output {} 2/3 ϑΝΠϧͷ࡟আ 17 output { …

    if "unlink" in [operation] { exec { command => "curl -XDELETE localhost:9200/docs/doc/%{file_hash}" } } }
  14. Search 18 $ curl “localhost:9200/docs/_search?q=*&fields=file.title” { "took": 17, "timed_out": false,

    "_shards": { "total": 1, "successful": 1, "failed": 0 }, "hits": { "total": 13, "max_score": 1, "hits": [ { "_index": "docs", "_type": "doc", "_id": "dfb573e25b4b4b612b6694edc63b9ad17450daf0", "_score": 1, "fields": { "file.title": [ “PDFαϯϓϧϑΝΠϧ" ]