Slide 1

Slide 1 text

Searchnos & Search on Nostr @darashi npub1q7qyk7rvdga5qzmmyrvmlj29qd0n45snmfuhkrzsj4rk 0sm4c4psvqwt9c 2023-11-03 Nostrasia day 3

Slide 2

Slide 2 text

Let me introduce myself,

Slide 3

Slide 3 text

with my works...

Slide 4

Slide 4 text

Articles:

Slide 5

Slide 5 text

I wrote "NIP-01 を読む (Reading NIP-01)" and full Japanese translation of NIP-01

Slide 6

Slide 6 text

for ...

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

"Hello, Nostr!" Fanzine https://nip-book.nostr-jp.org/

Slide 9

Slide 9 text

And the second issue is in press now,

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

"Hello, Nostr! Yo Bluesky!" Fanzine will be available on 12th November 2023, at Techbookfest 15, Ikebukuro https://nip-book.nostr-jp.org/

Slide 12

Slide 12 text

I wrote a short summary of NIP-01 updates from the first issue, and

Slide 13

Slide 13 text

the Japanese translation of the latest version of NIP-01.

Slide 14

Slide 14 text

I also wrote " 作ってわかる Nostr プロトコル (Understanding the Nostr Protocol by Writing Code)" for the series for Nostr,

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

in Software Design, November 2023 issue

Slide 17

Slide 17 text

My article is 5th of the series,

Slide 18

Slide 18 text

written by members of Japanese Nostr community.

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

You can buy Software Design magazine at bookstores in Japan.

Slide 21

Slide 21 text

Good for souvenir :)

Slide 22

Slide 22 text

Softwares:

Slide 23

Slide 23 text

murasaki: Nostr to Speech a client reads notes with Text-to-Speech.

Slide 24

Slide 24 text

Mapnos: Map Notes and Other Stuff shows geotagged kind 1 notes on a map. https://mapnos.vercel.app/

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

nostrbuzzs: buzzphrase detector for Nostr detects trending phrases in real time https://nostrbuzzs.deno.dev/

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

This is a kind of "algo" you might not like.

Slide 29

Slide 29 text

The point is that anyone can implement their own algo with Nostr.

Slide 30

Slide 30 text

I just think it's fun to see what's going on in Nostr,

Slide 31

Slide 31 text

especially at least in this early stage of Nostr.

Slide 32

Slide 32 text

nos.today: Web client for NIP-50 search

Slide 33

Slide 33 text

Searchnos: NIP-50 relay

Slide 34

Slide 34 text

Today, I'm going to talk about "Searchnos".

Slide 35

Slide 35 text

Searchnos is a NIP-50 relay, having Elasticsearch as its backend.

Slide 36

Slide 36 text

It's an OSS and available on GitHub. https://github.com/darashi/searchnos

Slide 37

Slide 37 text

Motivation:

Slide 38

Slide 38 text

I want to search on Nostr in Japanese. (for nostrbuzzs)

Slide 39

Slide 39 text

As far as I know, at the time I started developing Searchnos,

Slide 40

Slide 40 text

relay.nostr.band was the only public relay that supported the NIP-50.

Slide 41

Slide 41 text

relay.nostr.band works very well in many cases,

Slide 42

Slide 42 text

but I noticed sometime unexpected results are returned when querying in Japanese.

Slide 43

Slide 43 text

I'm not sure but maybe due to tokenization.

Slide 44

Slide 44 text

Typical full text search approach is to tokenize the text into words:

Slide 45

Slide 45 text

"One beer, please." -> ["one", "beer", "please"]

Slide 46

Slide 46 text

Query "please bear" (AND search) should matches with the text "One beer, please."

Slide 47

Slide 47 text

In Japanese, words are not separated by spaces:

Slide 48

Slide 48 text

ビールを一杯ください。 bii-ru o ippai kudasai (One beer, please.)

Slide 49

Slide 49 text

Using a technique called morphological analysis, it is possible to break them into words.

Slide 50

Slide 50 text

ビールを一杯ください。 -> [" ビール", " を", " 一杯", " ください"]

Slide 51

Slide 51 text

This analysis depends on dictionaries,

Slide 52

Slide 52 text

and it's not always correct.

Slide 53

Slide 53 text

Especially weak to new words.

Slide 54

Slide 54 text

Another approach is to use N-gram indexing.

Slide 55

Slide 55 text

ビールを一杯ください。 -> [" ビー", " ール", " ルを", " を一", " 一杯", " 杯く", " く だ", " さい", " い。"] (bi-gram)

Slide 56

Slide 56 text

If we query " ルを ビー" (this doesn't make sense), it will be tokenized as [" ル を", " ビー"],

Slide 57

Slide 57 text

If we treat these tokens in the same way as English words,

Slide 58

Slide 58 text

it can result in false positives, because

Slide 59

Slide 59 text

[" ビー", " ール", " ルを", " を 一", " 一杯", " 杯く", " くだ", " さい", " い。"] ⊇ [" ルを", " ビー"]

Slide 60

Slide 60 text

We need to use N-gram indexing and consider the position of the tokens.

Slide 61

Slide 61 text

Today I won't go into details more ...

Slide 62

Slide 62 text

Any way, some effort is needed.

Slide 63

Slide 63 text

In order to tackle Japanese language specific problems,

Slide 64

Slide 64 text

it seemed like a good idea to have my own relay implementation.

Slide 65

Slide 65 text

So I made Searchnos.

Slide 66

Slide 66 text

Architecture:

Slide 67

Slide 67 text

No content

Slide 68

Slide 68 text

In order simplify the implimenation,

Slide 69

Slide 69 text

Searchnos continuously polls Elasticsearch after EOSE.

Slide 70

Slide 70 text

Sequence diagram:

Slide 71

Slide 71 text

Source Relay Indexer Elasticsearch Searchnos Relay Source Relay Indexer Elasticsearch Searchnos Relay loop loop Client REQ 1 query 2 response 3 EVENT (if matched) 4 EVENT (if matched) 5 EOSE 6 EVENT 7 EVENT 8 index request 9 wait 10 query 11 response 12 EVENT (if matched) 13 CLOSE 14 Client

Slide 72

Slide 72 text

I'm running Searchnos at wss://search.nos.today

Slide 73

Slide 73 text

Some details & future works:

Slide 74

Slide 74 text

(1) Stop polling Elasticsearch for events after EOSE

Slide 75

Slide 75 text

Doing this can reduce the latency of search results and load on Elasticsearch.

Slide 76

Slide 76 text

We need to implement "filter evaluator",

Slide 77

Slide 77 text

and it'll be duplicated with the procedure using Elasticseach.

Slide 78

Slide 78 text

(2) Different index lifetime by kind

Slide 79

Slide 79 text

Currently Searchnos indexes all events on a daily basis

Slide 80

Slide 80 text

and keeps 30 sub indices (in 30 days; configurable).

Slide 81

Slide 81 text

Elasticsearch can search multiple indices transparently,

Slide 82

Slide 82 text

and can also delete an index quite efficiently.

Slide 83

Slide 83 text

In this way, Searchnos put TTLs on events.

Slide 84

Slide 84 text

This design was choosen because I mainly targetted to real time search at first.

Slide 85

Slide 85 text

But it would be better to have longer TTLs for some kinds,

Slide 86

Slide 86 text

for example kind 0 and 30023 (NIP-23 long form contents).

Slide 87

Slide 87 text

So I want to make it possible to configure TTLs according to kinds.

Slide 88

Slide 88 text

(3) Support policy plugins (like strfry)

Slide 89

Slide 89 text

Searchnos uses indexer to filter events to be indexed.

Slide 90

Slide 90 text

Indexer sends events to Serchnos relay using NIP- 01,

Slide 91

Slide 91 text

to the special administrative endpoint

Slide 92

Slide 92 text

for spam prevention

Slide 93

Slide 93 text

If we have a policy plugin system, Searchnos can recieve events from users directly.

Slide 94

Slide 94 text

(4) Support more sophisticated queries.

Slide 95

Slide 95 text

Currently Searchnos treat queries as AND search separated by spaces.

Slide 96

Slide 96 text

(4-a) Logical operations: AND, OR, NOT, ...

Slide 97

Slide 97 text

In order to achive this, search query parser needs to be implemented.

Slide 98

Slide 98 text

(4-b) Language specific search

Slide 99

Slide 99 text

User may want to search like "nostrasia lang:ja".

Slide 100

Slide 100 text

Searchnos internally detects languages using Elasticsearch's language detector.

Slide 101

Slide 101 text

But how to treat "lang:ja"?

Slide 102

Slide 102 text

Where to parse the query?

Slide 103

Slide 103 text

Relay? or Client?

Slide 104

Slide 104 text

I'm not sure how should do this.

Slide 105

Slide 105 text

(4-c) Search events from a specific user

Slide 106

Slide 106 text

User may want to query "nostrasia from:darashi"

Slide 107

Slide 107 text

This is more difficult than the language specific search,

Slide 108

Slide 108 text

because we need to join with kind 0 with search results.

Slide 109

Slide 109 text

If we know the pubkey of the user, NIP-01 can filter the events from the user

Slide 110

Slide 110 text

But who should convert the query "user:darashi" into the NIP-01 filter?

Slide 111

Slide 111 text

Relay? or Client?

Slide 112

Slide 112 text

It's not realistic to expect all clients to implement this.

Slide 113

Slide 113 text

How about making libraries for clients?

Slide 114

Slide 114 text

It's not a bad idea, but we need to immplement in many programming languages.

Slide 115

Slide 115 text

On the other hand, if we implement on the relay side,

Slide 116

Slide 116 text

implementation differences between search relays may lead to inconsistent results.

Slide 117

Slide 117 text

Then how should we implement this?

Slide 118

Slide 118 text

Hybrid approach may be a solution.

Slide 119

Slide 119 text

What about converting the query to an intermediate representation at the client,

Slide 120

Slide 120 text

and sending it to the relay?

Slide 121

Slide 121 text

Anyway,

Slide 122

Slide 122 text

I think the beauty of Nostr is

Slide 123

Slide 123 text

that each part can be implemented with a little effort.

Slide 124

Slide 124 text

Complicated specs easily destroy that.

Slide 125

Slide 125 text

Do you have any good idea?

Slide 126

Slide 126 text

Conclusion

Slide 127

Slide 127 text

I made Searchnos and it's working.

Slide 128

Slide 128 text

But there are still many things to do.

Slide 129

Slide 129 text

Building a relay is open to everyone, and it's fun. Try it! Thank you.

Slide 130

Slide 130 text

(Please help me if you can translate the questions and answers.)

Slide 131

Slide 131 text

npub1q7qyk7rvdga5qzmmyrvmlj29qd0n45snmfuhkrzsj4rk 0sm4c4psvqwt9c