Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Enabling Enterprise Search Platform with Elasti...
Search
Kosho Owa
June 27, 2016
Technology
0
2.3k
Enabling Enterprise Search Platform with Elastic Stack
ElasticsearchとLogstashで始めるEnterprise Search Platform
Elasticsearch勉強会スライド
2016年6月27日
Kosho Owa
June 27, 2016
Tweet
Share
More Decks by Kosho Owa
See All by Kosho Owa
Introducing Machine Learning for the Elastic Stack
kosho
2
12k
Elastic Stack X-Pack 5.0 for IT Security Workshop
kosho
1
290
Elastic Stack X-Pack 5.0 for IT Ops Workshop
kosho
0
310
[Developers Summit 2017] Anomaly Detection with the Elastic Stack
kosho
1
690
Anomaly Detection with the Elastic Stack
kosho
1
1.8k
Getting Started with Elastic Cloud and Beats for Log Analytics
kosho
0
91
Elastic{ON} Seminar Tokyo 2016 Product Update
kosho
0
160
Introducing Elastic Cloud
kosho
0
64
Gearing Up for Elastic Stack, X-Pack 5.0 Releases
kosho
0
130
Other Decks in Technology
See All in Technology
PHPで印刷所に入稿できる名札データを作る / Generating Print-Ready Name Tag Data with PHP
tomzoh
0
110
偶然 × 行動で人生の可能性を広げよう / Serendipity × Action: Discover Your Possibilities
ar_tama
1
1.1k
リアルタイム分析データベースで実現する SQLベースのオブザーバビリティ
mikimatsumoto
0
1.4k
ハッキングの世界に迫る~攻撃者の思考で考えるセキュリティ~
nomizone
13
5.2k
白金鉱業Meetup Vol.17_あるデータサイエンティストのデータマネジメントとの向き合い方
brainpadpr
6
760
自動テストの世界に、この5年間で起きたこと
autifyhq
10
8.6k
エンジニアの育成を支える爆速フィードバック文化
sansantech
PRO
3
1.1k
2/18/25: Java meets AI: Build LLM-Powered Apps with LangChain4j
edeandrea
PRO
0
120
【Developers Summit 2025】プロダクトエンジニアから学ぶ、 ユーザーにより高い価値を届ける技術
niwatakeru
2
1.4k
表現を育てる
kiyou77
1
210
アジャイル開発とスクラム
araihara
0
170
Tech Blogを書きやすい環境づくり
lycorptech_jp
PRO
1
240
Featured
See All Featured
Why You Should Never Use an ORM
jnunemaker
PRO
55
9.2k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
100
18k
We Have a Design System, Now What?
morganepeng
51
7.4k
A designer walks into a library…
pauljervisheath
205
24k
How to train your dragon (web standard)
notwaldorf
91
5.8k
Building a Scalable Design System with Sketch
lauravandoore
461
33k
Adopting Sorbet at Scale
ufuk
74
9.2k
Raft: Consensus for Rubyists
vanstee
137
6.8k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
32
2.1k
Making Projects Easy
brettharned
116
6k
The Invisible Side of Design
smashingmag
299
50k
Docker and Python
trallard
44
3.3k
Transcript
‹#› Kosho Owa, Solutions Architect, Elastic Elasticsearch Meetup Tokyo, 2016-06-27
ElasticsearchͱLogstashͰ࢝ΊΔ Enterprise Search Platform
Elastic StackͷϢʔεέʔε 2 ϩά + ੳ ݕࡧ
ElasticsearchͱLogstashͰ࢝ΊΔEnterprise Search • ϢʔβSambaͰڞ༗͞ΕͨετϨʔδʹυΩϡϝϯτΛอଘ͢Δ • υΩϡϝϯτͷՃɺߋ৽ɺআλΠϜϦʔʹै͠ɺυΩϡϝϯτΛ ΠϯσοΫε͢Δ • ϢʔβΣϒͷΠϯλʔϑΣΠεΛ௨ͯ͡ݕࡧ͢Δ (ࠓճͷείʔϓ֎)
3
ϑϩʔ 4 ࡞ɺߋ৽ɺআ͞ ΕͨϑΝΠϧͷࢹ ϑΝΠϧͷύʔε υΩϡϝϯτͷ࡞ɺ ߋ৽ɺআ ݕࡧΠϯλʔϑΣΠ εͷఏڙ σʔλͷೖ
υΩϡϝϯτͷ࡞ɺߋ৽ɺআͷݕग़ 5 ΠϯλʔϑΣΠε ಛ inotify ଟ͘ͷσΟετϦϏϡʔγϣϯͰར༻Մೳ ࢹରσΟϨΫτϦશͯྻڍ͢Δඞཁ͕͋Δ fanotify มߋ͕ߦΘΕΔલʹڐՄɾෆڐՄΛܾΊΒΕΔ ରԠ͍ͯ͠ͳ͍σΟετϦϏϡʔγϣϯ͕ଟ͍
Linux Security Module ϑΝΠϧʹର͢Δଟ͘ͷΦϖϨʔγϣϯʹରԠ͍ͯ͠Δ ΧʔωϧϞδϡʔϧͱͯ͠࡞͢Δඞཁ͕͋Δ Samba VFS - full_audit ଟ͘ͷσΟετϦϏϡʔγϣϯͰར༻Մೳ ࠪϩάʹϑΝΠϧͷมߋ͕ग़ྗ͞ΕΔ
vfs_full_audit ग़ྗαϯϓϧ 6 Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|open|ok|w|sample.txt Jun
18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|fstat|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|kernel_flock|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|create_file|ok|0x12019f|file| open_if|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|stat|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|sys_acl_get_file|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|get_nt_acl|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|strict_lock|ok|sample.txt:0-9:0 Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|pread|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|strict_unlock|ok|sample.txt: 0-9:0 Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|fstat|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|ntimes|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|strict_lock|ok|sample.txt:0-9:1 Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|fstat|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|pwrite|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|strict_unlock|ok|sample.txt: 0-9:1 Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|kernel_flock|ok|sample.txt Jun 18 05:50:08 ip-172-30-2-93 smbd_audit: nobody|172.30.2.196|close|ok|sample.txt
Sambaͷઃఆ • SambaͰڞ༗͍ͯ͠ΔྖҬͷvfs objectͱͯ͠full_auditΛࢦఆ • pwrite, rename, unlink ΠϕϯτͷΈࢹ͢Δ 7
# /etc/samba/smb.conf [public] comment = Public Stuff path = /home/samba … vfs objects = full_audit full_audit:success = pwrite rename unlink full_audit:failure = none
υΩϡϝϯτͷύʔε Mapper Attachments Plugin 8 $ cd elasticserach $ bin/plugin
install mapper-attachments • Elasticsearch Plugins and Integrations > Mapper Plugins > Mapper Attachments Plugin -https:// www.elastic.co/guide/en/elasticsearch/plugins/2.3/mapper-attachments.html • ElasticsearchͷΠϯετʔϧ • PPT, XLS, PDFͳͲͷҰൠతͳϑΥʔϚοτͷυΩϡϝϯτΛTikaΛ༻ ͯ͠ςΩετใΛൈ͖ग़͠ɺΠϯσοΫε͢Δ • Elasticsearch 5.0.0Ͱingest-attachmentʹஔ͖͑ݟࠐΈ
Mappingͷ࡞ 9 $ curl -d localhost:9200/docs -d ‘ { "mappings"
: { "_default_" : { "properties" : { "file" : { "type" : "attachment", "fields" : { "content" : { "type" : "string", "store" : true, "term_vector" : "with_positions_offsets", "analyzer" : "kuromoji" }, "author" : { "type" : "string", "store" : true, "analyzer" : "kuromoji" }, "title" : { "type" : "string", "store" : true, "analyzer" : "kuromoji" }, "name" : { "type" : "string", "store" : true, "analyzer" : "kuromoji" }, "date" : { "type" : "string", "store" : true }, "keywords" : { "type" : "string", "store" : true }, "content_type" : { "type" : "string", "store" : true }, "content_length" : { "type" : "string", "store" : true }, "language" : { "type" : "string", "store" : true } }}}}}}’
σʔλͷೖ • ϑΝΠϧͷύεͷSHA-1ΛElasticsearchͷυΩϡϝϯτͷ_idͱͯ͠࠾༻͢Δ • ϑΝΠϧͷ༰ΛBase64ͰΤϯίʔυͯ͠ɺ_contentϑΟʔϧυͷ༰ͱ͠ ͯPUT͢Δ • ϑΝΠϧ͕আ໊લ͕มߋ͞Εͨ߹ʹݹ͍υΩϡϝϯτΛDELETE͢Δ 10 $
curl localhost:9200/docs/doc/`echo sample.txt | sha1sum | cut -d’ ‘ -f1` -d ‘ { "_content": “ewogICJtYXBwaW5nc....gfQp9Cg==" }' $ curl -XDELETE localhost:9200/docs/doc/`echo sample.txt | md5sum | cut -d’ ‘ -f1`
ΠϕϯτͷύΠϓϥΠϯॲཧ Logstash 11 input filter output stdin file syslog jdbc
kafka s3 … grok geoip anonymize date mutate ruby csv … elasticsearch file csv http kafka stdout syslog …
Logstash - input {} ࠪϩάͷมߋΛࢹ͢Δ 12 input { file {
path => "/var/log/messages" } }
Logstash - filter {} 1/2 full_audit ϩάͷύʔεͱɺ_idͷܭࢉ 13 filter {
grok { match => { "message" => "%{SYSLOGTIMESTAMP:timestamp} %{HOSTNAME:hostname} %{DATA:process}: % {USER:user}\|%{IP:clientip}\|%{DATA:operation}\|%{DATA:result}\|%{GREEDYDATA:file}" } add_field => { "file_hash" => "%{file}" } } anonymize { key => "something_secret" algorithm => "SHA1" fields => ["file_hash"] } …
Logstash - filter {} 2/2 ϑΝΠϧ໊มߋ࣌ͷ_idͷܭࢉ 14 filter { …
if "rename" in [operation] { grok { match => { "file" => "%{DATA:file_prev}\|%{GREEDYDATA:file_aft}" } add_field => { "file_prev_hash" => "%{file_prev}" "file_aft_hash" => "%{file_aft}" } } anonymize { key => "something_secret" algorithm => "SHA1" fields => ["file_prev_hash", "file_aft_hash"] } }
Logstash - output {} 1/3 ϑΝΠϧͷ࡞ɺߋ৽ 15 output { if
"pwrite" in [operation] { exec { command => "temp=$(mktemp) ; echo \{ \"file\": \{ \"_content\": \”$(base64 /home/samba/ {file})\”, \"_name\": \"%{file}\"\}\} > $temp ; curl -XPUT localhost:9200/docs/doc/%{file_hash} - d@$temp; rm $temp" } } … }
Logstash - output {} 2/3 ϑΝΠϧ໊ͷมߋ 16 output { …
if "rename" in [operation] { exec { command => "curl -XDELETE localhost:9200/docs/doc/%{file_prev_hash}" } exec { command => "temp=$(mktemp) ; echo \{ \"file\": \{ \"_content\": \"$(base64 /home/samba/% {file_aft})\”, \"_name\": \"%{file_aft}\"\}\} > $temp ; curl -XPUT localhost:9200/docs/doc/% {file_aft_hash} -d@$temp; rm $temp" } } … }
Logstash - output {} 2/3 ϑΝΠϧͷআ 17 output { …
if "unlink" in [operation] { exec { command => "curl -XDELETE localhost:9200/docs/doc/%{file_hash}" } } }
Search 18 $ curl “localhost:9200/docs/_search?q=*&fields=file.title” { "took": 17, "timed_out": false,
"_shards": { "total": 1, "successful": 1, "failed": 0 }, "hits": { "total": 13, "max_score": 1, "hits": [ { "_index": "docs", "_type": "doc", "_id": "dfb573e25b4b4b612b6694edc63b9ad17450daf0", "_score": 1, "fields": { "file.title": [ “PDFαϯϓϧϑΝΠϧ" ]
Future Work 19 ݕࡧUI ൚༻ੑ ෆ߹ͷղফ ηΩϡϦςΟ
Thank you! ϒϩάͷຊޠίϯςϯπੋඇ͝ཡ͍ͩ͘͞ 20