Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
Searchnos & Search on Nostr @darashi npub1q7qyk7rvdga5qzmmyrvmlj29qd0n45snmfuhkrzsj4rk 0sm4c4psvqwt9c 2023-11-03 Nostrasia day 3
Slide 2
Slide 2 text
Let me introduce myself,
Slide 3
Slide 3 text
with my works...
Slide 4
Slide 4 text
Articles:
Slide 5
Slide 5 text
I wrote "NIP-01 を読む (Reading NIP-01)" and full Japanese translation of NIP-01
Slide 6
Slide 6 text
for ...
Slide 7
Slide 7 text
No content
Slide 8
Slide 8 text
"Hello, Nostr!" Fanzine https://nip-book.nostr-jp.org/
Slide 9
Slide 9 text
And the second issue is in press now,
Slide 10
Slide 10 text
No content
Slide 11
Slide 11 text
"Hello, Nostr! Yo Bluesky!" Fanzine will be available on 12th November 2023, at Techbookfest 15, Ikebukuro https://nip-book.nostr-jp.org/
Slide 12
Slide 12 text
I wrote a short summary of NIP-01 updates from the first issue, and
Slide 13
Slide 13 text
the Japanese translation of the latest version of NIP-01.
Slide 14
Slide 14 text
I also wrote " 作ってわかる Nostr プロトコル (Understanding the Nostr Protocol by Writing Code)" for the series for Nostr,
Slide 15
Slide 15 text
No content
Slide 16
Slide 16 text
in Software Design, November 2023 issue
Slide 17
Slide 17 text
My article is 5th of the series,
Slide 18
Slide 18 text
written by members of Japanese Nostr community.
Slide 19
Slide 19 text
No content
Slide 20
Slide 20 text
You can buy Software Design magazine at bookstores in Japan.
Slide 21
Slide 21 text
Good for souvenir :)
Slide 22
Slide 22 text
Softwares:
Slide 23
Slide 23 text
murasaki: Nostr to Speech a client reads notes with Text-to-Speech.
Slide 24
Slide 24 text
Mapnos: Map Notes and Other Stuff shows geotagged kind 1 notes on a map. https://mapnos.vercel.app/
Slide 25
Slide 25 text
No content
Slide 26
Slide 26 text
nostrbuzzs: buzzphrase detector for Nostr detects trending phrases in real time https://nostrbuzzs.deno.dev/
Slide 27
Slide 27 text
No content
Slide 28
Slide 28 text
This is a kind of "algo" you might not like.
Slide 29
Slide 29 text
The point is that anyone can implement their own algo with Nostr.
Slide 30
Slide 30 text
I just think it's fun to see what's going on in Nostr,
Slide 31
Slide 31 text
especially at least in this early stage of Nostr.
Slide 32
Slide 32 text
nos.today: Web client for NIP-50 search
Slide 33
Slide 33 text
Searchnos: NIP-50 relay
Slide 34
Slide 34 text
Today, I'm going to talk about "Searchnos".
Slide 35
Slide 35 text
Searchnos is a NIP-50 relay, having Elasticsearch as its backend.
Slide 36
Slide 36 text
It's an OSS and available on GitHub. https://github.com/darashi/searchnos
Slide 37
Slide 37 text
Motivation:
Slide 38
Slide 38 text
I want to search on Nostr in Japanese. (for nostrbuzzs)
Slide 39
Slide 39 text
As far as I know, at the time I started developing Searchnos,
Slide 40
Slide 40 text
relay.nostr.band was the only public relay that supported the NIP-50.
Slide 41
Slide 41 text
relay.nostr.band works very well in many cases,
Slide 42
Slide 42 text
but I noticed sometime unexpected results are returned when querying in Japanese.
Slide 43
Slide 43 text
I'm not sure but maybe due to tokenization.
Slide 44
Slide 44 text
Typical full text search approach is to tokenize the text into words:
Slide 45
Slide 45 text
"One beer, please." -> ["one", "beer", "please"]
Slide 46
Slide 46 text
Query "please bear" (AND search) should matches with the text "One beer, please."
Slide 47
Slide 47 text
In Japanese, words are not separated by spaces:
Slide 48
Slide 48 text
ビールを一杯ください。 bii-ru o ippai kudasai (One beer, please.)
Slide 49
Slide 49 text
Using a technique called morphological analysis, it is possible to break them into words.
Slide 50
Slide 50 text
ビールを一杯ください。 -> [" ビール", " を", " 一杯", " ください"]
Slide 51
Slide 51 text
This analysis depends on dictionaries,
Slide 52
Slide 52 text
and it's not always correct.
Slide 53
Slide 53 text
Especially weak to new words.
Slide 54
Slide 54 text
Another approach is to use N-gram indexing.
Slide 55
Slide 55 text
ビールを一杯ください。 -> [" ビー", " ール", " ルを", " を一", " 一杯", " 杯く", " く だ", " さい", " い。"] (bi-gram)
Slide 56
Slide 56 text
If we query " ルを ビー" (this doesn't make sense), it will be tokenized as [" ル を", " ビー"],
Slide 57
Slide 57 text
If we treat these tokens in the same way as English words,
Slide 58
Slide 58 text
it can result in false positives, because
Slide 59
Slide 59 text
[" ビー", " ール", " ルを", " を 一", " 一杯", " 杯く", " くだ", " さい", " い。"] ⊇ [" ルを", " ビー"]
Slide 60
Slide 60 text
We need to use N-gram indexing and consider the position of the tokens.
Slide 61
Slide 61 text
Today I won't go into details more ...
Slide 62
Slide 62 text
Any way, some effort is needed.
Slide 63
Slide 63 text
In order to tackle Japanese language specific problems,
Slide 64
Slide 64 text
it seemed like a good idea to have my own relay implementation.
Slide 65
Slide 65 text
So I made Searchnos.
Slide 66
Slide 66 text
Architecture:
Slide 67
Slide 67 text
No content
Slide 68
Slide 68 text
In order simplify the implimenation,
Slide 69
Slide 69 text
Searchnos continuously polls Elasticsearch after EOSE.
Slide 70
Slide 70 text
Sequence diagram:
Slide 71
Slide 71 text
Source Relay Indexer Elasticsearch Searchnos Relay Source Relay Indexer Elasticsearch Searchnos Relay loop loop Client REQ 1 query 2 response 3 EVENT (if matched) 4 EVENT (if matched) 5 EOSE 6 EVENT 7 EVENT 8 index request 9 wait 10 query 11 response 12 EVENT (if matched) 13 CLOSE 14 Client
Slide 72
Slide 72 text
I'm running Searchnos at wss://search.nos.today
Slide 73
Slide 73 text
Some details & future works:
Slide 74
Slide 74 text
(1) Stop polling Elasticsearch for events after EOSE
Slide 75
Slide 75 text
Doing this can reduce the latency of search results and load on Elasticsearch.
Slide 76
Slide 76 text
We need to implement "filter evaluator",
Slide 77
Slide 77 text
and it'll be duplicated with the procedure using Elasticseach.
Slide 78
Slide 78 text
(2) Different index lifetime by kind
Slide 79
Slide 79 text
Currently Searchnos indexes all events on a daily basis
Slide 80
Slide 80 text
and keeps 30 sub indices (in 30 days; configurable).
Slide 81
Slide 81 text
Elasticsearch can search multiple indices transparently,
Slide 82
Slide 82 text
and can also delete an index quite efficiently.
Slide 83
Slide 83 text
In this way, Searchnos put TTLs on events.
Slide 84
Slide 84 text
This design was choosen because I mainly targetted to real time search at first.
Slide 85
Slide 85 text
But it would be better to have longer TTLs for some kinds,
Slide 86
Slide 86 text
for example kind 0 and 30023 (NIP-23 long form contents).
Slide 87
Slide 87 text
So I want to make it possible to configure TTLs according to kinds.
Slide 88
Slide 88 text
(3) Support policy plugins (like strfry)
Slide 89
Slide 89 text
Searchnos uses indexer to filter events to be indexed.
Slide 90
Slide 90 text
Indexer sends events to Serchnos relay using NIP- 01,
Slide 91
Slide 91 text
to the special administrative endpoint
Slide 92
Slide 92 text
for spam prevention
Slide 93
Slide 93 text
If we have a policy plugin system, Searchnos can recieve events from users directly.
Slide 94
Slide 94 text
(4) Support more sophisticated queries.
Slide 95
Slide 95 text
Currently Searchnos treat queries as AND search separated by spaces.
Slide 96
Slide 96 text
(4-a) Logical operations: AND, OR, NOT, ...
Slide 97
Slide 97 text
In order to achive this, search query parser needs to be implemented.
Slide 98
Slide 98 text
(4-b) Language specific search
Slide 99
Slide 99 text
User may want to search like "nostrasia lang:ja".
Slide 100
Slide 100 text
Searchnos internally detects languages using Elasticsearch's language detector.
Slide 101
Slide 101 text
But how to treat "lang:ja"?
Slide 102
Slide 102 text
Where to parse the query?
Slide 103
Slide 103 text
Relay? or Client?
Slide 104
Slide 104 text
I'm not sure how should do this.
Slide 105
Slide 105 text
(4-c) Search events from a specific user
Slide 106
Slide 106 text
User may want to query "nostrasia from:darashi"
Slide 107
Slide 107 text
This is more difficult than the language specific search,
Slide 108
Slide 108 text
because we need to join with kind 0 with search results.
Slide 109
Slide 109 text
If we know the pubkey of the user, NIP-01 can filter the events from the user
Slide 110
Slide 110 text
But who should convert the query "user:darashi" into the NIP-01 filter?
Slide 111
Slide 111 text
Relay? or Client?
Slide 112
Slide 112 text
It's not realistic to expect all clients to implement this.
Slide 113
Slide 113 text
How about making libraries for clients?
Slide 114
Slide 114 text
It's not a bad idea, but we need to immplement in many programming languages.
Slide 115
Slide 115 text
On the other hand, if we implement on the relay side,
Slide 116
Slide 116 text
implementation differences between search relays may lead to inconsistent results.
Slide 117
Slide 117 text
Then how should we implement this?
Slide 118
Slide 118 text
Hybrid approach may be a solution.
Slide 119
Slide 119 text
What about converting the query to an intermediate representation at the client,
Slide 120
Slide 120 text
and sending it to the relay?
Slide 121
Slide 121 text
Anyway,
Slide 122
Slide 122 text
I think the beauty of Nostr is
Slide 123
Slide 123 text
that each part can be implemented with a little effort.
Slide 124
Slide 124 text
Complicated specs easily destroy that.
Slide 125
Slide 125 text
Do you have any good idea?
Slide 126
Slide 126 text
Conclusion
Slide 127
Slide 127 text
I made Searchnos and it's working.
Slide 128
Slide 128 text
But there are still many things to do.
Slide 129
Slide 129 text
Building a relay is open to everyone, and it's fun. Try it! Thank you.
Slide 130
Slide 130 text
(Please help me if you can translate the questions and answers.)
Slide 131
Slide 131 text
npub1q7qyk7rvdga5qzmmyrvmlj29qd0n45snmfuhkrzsj4rk 0sm4c4psvqwt9c