Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Enabling Enterprise Search Platform with Elasti...
Search
Kosho Owa
June 27, 2016
Technology
2.4k
0
Share
Enabling Enterprise Search Platform with Elastic Stack
ElasticsearchとLogstashで始めるEnterprise Search Platform
Elasticsearch勉強会スライド
2016年6月27日
Kosho Owa
June 27, 2016
More Decks by Kosho Owa
See All by Kosho Owa
Introducing Machine Learning for the Elastic Stack
kosho
2
13k
Elastic Stack X-Pack 5.0 for IT Security Workshop
kosho
1
350
Elastic Stack X-Pack 5.0 for IT Ops Workshop
kosho
0
350
[Developers Summit 2017] Anomaly Detection with the Elastic Stack
kosho
1
740
Anomaly Detection with the Elastic Stack
kosho
1
1.8k
Getting Started with Elastic Cloud and Beats for Log Analytics
kosho
0
130
Elastic{ON} Seminar Tokyo 2016 Product Update
kosho
0
180
Introducing Elastic Cloud
kosho
0
84
Gearing Up for Elastic Stack, X-Pack 5.0 Releases
kosho
0
160
Other Decks in Technology
See All in Technology
AI駆動1on1〜AIに自分を育ててもらう〜
yoshiakiyasuda
0
120
60分で学ぶ最新Webフロントエンド
mizdra
PRO
35
18k
ハーネスエンジニアリングをやりすぎた話 ~そのハーネスは解体された~
gotalab555
4
1.7k
Sansan Engineering Unit 紹介資料
sansan33
PRO
1
4.3k
AI バイブコーティングでキーボード不要?!
samakada
0
540
研究開発部メンバーの働き⽅ / Sansan R&D Profile
sansan33
PRO
4
23k
ネットワーク運用を楽にするAWS DevOps Agent活用法!! / 20260421 Masaki Okuda
shift_evolve
PRO
2
210
自立を加速させる神器 - EMOasis #11
stanby_inc
0
140
Introduction to Sansan, inc / Sansan Global Development Center, Inc.
sansan33
PRO
0
3k
基盤を育てる 外部SaaS連携の運用
gamonges_dresscode
1
120
マルチエージェント × ハーネスエンジニアリング × GitLab Duo Agent Platformで実現する「AIエージェントに仕事をさせる時代へ。」 / 20260421 GitLab Duo Agent Platform
n11sh1
0
150
インターネットの技術 / Internet technology
ks91
PRO
0
200
Featured
See All Featured
Why Mistakes Are the Best Teachers: Turning Failure into a Pathway for Growth
auna
0
120
Testing 201, or: Great Expectations
jmmastey
46
8.1k
Winning Ecommerce Organic Search in an AI Era - #searchnstuff2025
aleyda
1
2k
Color Theory Basics | Prateek | Gurzu
gurzu
0
290
Become a Pro
speakerdeck
PRO
31
5.9k
Neural Spatial Audio Processing for Sound Field Analysis and Control
skoyamalab
0
260
Speed Design
sergeychernyshev
33
1.6k
Google's AI Overviews - The New Search
badams
0
970
Code Review Best Practice
trishagee
74
20k
Statistics for Hackers
jakevdp
799
230k
Odyssey Design
rkendrick25
PRO
2
570
BBQ
matthewcrist
89
10k
Transcript
‹#› Kosho Owa, Solutions Architect, Elastic Elasticsearch Meetup Tokyo, 2016-06-27
ElasticsearchͱLogstashͰ࢝ΊΔ Enterprise Search Platform
Elastic StackͷϢʔεέʔε 2 ϩά + ੳ ݕࡧ
ElasticsearchͱLogstashͰ࢝ΊΔEnterprise Search • ϢʔβSambaͰڞ༗͞ΕͨετϨʔδʹυΩϡϝϯτΛอଘ͢Δ • υΩϡϝϯτͷՃɺߋ৽ɺআλΠϜϦʔʹै͠ɺυΩϡϝϯτΛ ΠϯσοΫε͢Δ • ϢʔβΣϒͷΠϯλʔϑΣΠεΛ௨ͯ͡ݕࡧ͢Δ (ࠓճͷείʔϓ֎)
3
ϑϩʔ 4 ࡞ɺߋ৽ɺআ͞ ΕͨϑΝΠϧͷࢹ ϑΝΠϧͷύʔε υΩϡϝϯτͷ࡞ɺ ߋ৽ɺআ ݕࡧΠϯλʔϑΣΠ εͷఏڙ σʔλͷೖ
υΩϡϝϯτͷ࡞ɺߋ৽ɺআͷݕग़ 5 ΠϯλʔϑΣΠε ಛ inotify ଟ͘ͷσΟετϦϏϡʔγϣϯͰར༻Մೳ ࢹରσΟϨΫτϦશͯྻڍ͢Δඞཁ͕͋Δ fanotify มߋ͕ߦΘΕΔલʹڐՄɾෆڐՄΛܾΊΒΕΔ ରԠ͍ͯ͠ͳ͍σΟετϦϏϡʔγϣϯ͕ଟ͍
Linux Security Module ϑΝΠϧʹର͢Δଟ͘ͷΦϖϨʔγϣϯʹରԠ͍ͯ͠Δ ΧʔωϧϞδϡʔϧͱͯ͠࡞͢Δඞཁ͕͋Δ Samba VFS - full_audit ଟ͘ͷσΟετϦϏϡʔγϣϯͰར༻Մೳ ࠪϩάʹϑΝΠϧͷมߋ͕ग़ྗ͞ΕΔ
vfs_full_audit ग़ྗαϯϓϧ 6 Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|open|ok|w|sample.txt Jun
18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|fstat|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|kernel_flock|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|create_file|ok|0x12019f|file| open_if|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|stat|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|sys_acl_get_file|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|get_nt_acl|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|strict_lock|ok|sample.txt:0-9:0 Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|pread|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|strict_unlock|ok|sample.txt: 0-9:0 Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|fstat|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|ntimes|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|strict_lock|ok|sample.txt:0-9:1 Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|fstat|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|pwrite|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|strict_unlock|ok|sample.txt: 0-9:1 Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|kernel_flock|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|close|ok|sample.txt
Sambaͷઃఆ • SambaͰڞ༗͍ͯ͠ΔྖҬͷvfs objectͱͯ͠full_auditΛࢦఆ • pwrite, rename, unlink ΠϕϯτͷΈࢹ͢Δ 7
# /etc/samba/smb.conf [public] comment = Public Stuff path = /home/samba … vfs objects = full_audit full_audit:success = pwrite rename unlink full_audit:failure = none
υΩϡϝϯτͷύʔε Mapper Attachments Plugin 8 $ cd elasticserach $ bin/plugin
install mapper-attachments • Elasticsearch Plugins and Integrations > Mapper Plugins > Mapper Attachments Plugin -https:// www.elastic.co/guide/en/elasticsearch/plugins/2.3/mapper-attachments.html • ElasticsearchͷΠϯετʔϧ • PPT, XLS, PDFͳͲͷҰൠతͳϑΥʔϚοτͷυΩϡϝϯτΛTikaΛ༻ ͯ͠ςΩετใΛൈ͖ग़͠ɺΠϯσοΫε͢Δ • Elasticsearch 5.0.0Ͱingest-attachmentʹஔ͖͑ݟࠐΈ
Mappingͷ࡞ 9 $ curl -d localhost:9200/docs -d ‘ { "mappings"
: { "_default_" : { "properties" : { "file" : { "type" : "attachment", "fields" : { "content" : { "type" : "string", "store" : true, "term_vector" : "with_positions_offsets", "analyzer" : "kuromoji" }, "author" : { "type" : "string", "store" : true, "analyzer" : "kuromoji" }, "title" : { "type" : "string", "store" : true, "analyzer" : "kuromoji" }, "name" : { "type" : "string", "store" : true, "analyzer" : "kuromoji" }, "date" : { "type" : "string", "store" : true }, "keywords" : { "type" : "string", "store" : true }, "content_type" : { "type" : "string", "store" : true }, "content_length" : { "type" : "string", "store" : true }, "language" : { "type" : "string", "store" : true } }}}}}}’
σʔλͷೖ • ϑΝΠϧͷύεͷSHA-1ΛElasticsearchͷυΩϡϝϯτͷ_idͱͯ͠࠾༻͢Δ • ϑΝΠϧͷ༰ΛBase64ͰΤϯίʔυͯ͠ɺ_contentϑΟʔϧυͷ༰ͱ͠ ͯPUT͢Δ • ϑΝΠϧ͕আ໊લ͕มߋ͞Εͨ߹ʹݹ͍υΩϡϝϯτΛDELETE͢Δ 10 $
curl localhost:9200/docs/doc/`echo sample.txt | sha1sum | cut -d’ ‘ -f1` -d ‘ { "_content": “ewogICJtYXBwaW5nc....gfQp9Cg==" }' $ curl -XDELETE localhost:9200/docs/doc/`echo sample.txt | md5sum | cut -d’ ‘ -f1`
ΠϕϯτͷύΠϓϥΠϯॲཧ Logstash 11 input filter output stdin file syslog jdbc
kafka s3 … grok geoip anonymize date mutate ruby csv … elasticsearch file csv http kafka stdout syslog …
Logstash - input {} ࠪϩάͷมߋΛࢹ͢Δ 12 input { file {
path => "/var/log/messages" } }
Logstash - filter {} 1/2 full_audit ϩάͷύʔεͱɺ_idͷܭࢉ 13 filter {
grok { match => { "message" => "%{SYSLOGTIMESTAMP:timestamp} %{HOSTNAME:hostname} %{DATA:process}: % {USER:user}\|%{IP:clientip}\|%{DATA:operation}\|%{DATA:result}\|%{GREEDYDATA:file}" } add_field => { "file_hash" => "%{file}" } } anonymize { key => "something_secret" algorithm => "SHA1" fields => ["file_hash"] } …
Logstash - filter {} 2/2 ϑΝΠϧ໊มߋ࣌ͷ_idͷܭࢉ 14 filter { …
if "rename" in [operation] { grok { match => { "file" => "%{DATA:file_prev}\|%{GREEDYDATA:file_aft}" } add_field => { "file_prev_hash" => "%{file_prev}" "file_aft_hash" => "%{file_aft}" } } anonymize { key => "something_secret" algorithm => "SHA1" fields => ["file_prev_hash", "file_aft_hash"] } }
Logstash - output {} 1/3 ϑΝΠϧͷ࡞ɺߋ৽ 15 output { if
"pwrite" in [operation] { exec { command => "temp=$(mktemp) ; echo \{ \"file\": \{ \"_content\": \”$(base64 /home/samba/ {file})\”, \"_name\": \"%{file}\"\}\} > $temp ; curl -XPUT localhost:9200/docs/doc/%{file_hash} - d@$temp; rm $temp" } } … }
Logstash - output {} 2/3 ϑΝΠϧ໊ͷมߋ 16 output { …
if "rename" in [operation] { exec { command => "curl -XDELETE localhost:9200/docs/doc/%{file_prev_hash}" } exec { command => "temp=$(mktemp) ; echo \{ \"file\": \{ \"_content\": \"$(base64 /home/samba/% {file_aft})\”, \"_name\": \"%{file_aft}\"\}\} > $temp ; curl -XPUT localhost:9200/docs/doc/% {file_aft_hash} -d@$temp; rm $temp" } } … }
Logstash - output {} 2/3 ϑΝΠϧͷআ 17 output { …
if "unlink" in [operation] { exec { command => "curl -XDELETE localhost:9200/docs/doc/%{file_hash}" } } }
Search 18 $ curl “localhost:9200/docs/_search?q=*&fields=file.title” { "took": 17, "timed_out": false,
"_shards": { "total": 1, "successful": 1, "failed": 0 }, "hits": { "total": 13, "max_score": 1, "hits": [ { "_index": "docs", "_type": "doc", "_id": "dfb573e25b4b4b612b6694edc63b9ad17450daf0", "_score": 1, "fields": { "file.title": [ “PDFαϯϓϧϑΝΠϧ" ]
Future Work 19 ݕࡧUI ൚༻ੑ ෆ߹ͷղফ ηΩϡϦςΟ
Thank you! ϒϩάͷຊޠίϯςϯπੋඇ͝ཡ͍ͩ͘͞ 20