Slide 1

Slide 1 text

How nostrbuzzs works npub1q7qyk7rvdga5qzmmyrvmlj29qd0n45snmfuhkrzsj4rk0sm4c4psvqwt9c 2023-03-10 Nostr Meetup #1

Slide 2

Slide 2 text

@darashi Author of buzztter.com (no longer in operation), a service that captures the buzzs of Twitter. Created nostrbuzzs, a service to capture the trending phrases on Nostr (from notes in Japanese language). (NEW! I made Nostr Village Broadcasting Station; a kind 1 to speech streaming service via HLS)

Slide 3

Slide 3 text

nostrbuzzs https://nostrbuzzs.deno.dev/

Slide 4

Slide 4 text

System overview deno deploy fly.io nostrverse kind:1 kind:1 recent notes phrase frequency kind:38225 kind:38225 static assets nostrbuzzs.deno.dev indexer analyzer ElasticSearch nostr-rs-relay relay.damus.io nos.lol web-browser

Slide 5

Slide 5 text

Indexer deno deploy fly.io nostrverse kind:1 kind:1 recent notes phrase frequency kind:38225 kind:38225 static assets nostrbuzzs.deno.dev indexer analyzer ElasticSearch nostr-rs-relay relay.damus.io nos.lol web-browser Connects relays and collects kind 1 messages. Guesses the language of content and indexes notes that are predicted to be in Japanese into Elasticsearch. https://github.com/pemistahl/lingua-go is used for language detection.

Slide 6

Slide 6 text

Elasticsearch deno deploy fly.io nostrverse kind:1 kind:1 recent notes phrase frequency kind:38225 kind:38225 static assets nostrbuzzs.deno.dev indexer analyzer ElasticSearch nostr-rs-relay relay.damus.io nos.lol web-browser Uses https://www.elastic.co/ as is. Keeps data sent from Indexer. Provides necessary information for Analyzer to analyze. Indexes the data in N-grams without tokenizing it with morphemes, etc. for later analysis.

Slide 7

Slide 7 text

Analyzer deno deploy fly.io nostrverse kind:1 kind:1 recent notes phrase frequency kind:38225 kind:38225 static assets nostrbuzzs.deno.dev indexer analyzer ElasticSearch nostr-rs-relay relay.damus.io nos.lol web-browser Fetches data from Elasticsearch and compute buzzphrases. Sends the analysis results to Relay via Parameterized Replaceable Events (NIP- 33).

Slide 8

Slide 8 text

Web UI deno deploy fly.io nostrverse kind:1 kind:1 recent notes phrase frequency kind:38225 kind:38225 static assets nostrbuzzs.deno.dev indexer analyzer ElasticSearch nostr-rs-relay relay.damus.io nos.lol web-browser An SPA. Written in https://fresh.deno.dev/. At the initial version, I chose to use deno deploy because the app included an API server. It is currently just a static site.

Slide 9

Slide 9 text

More details about Analyzer

Slide 10

Slide 10 text

Analyzer 1/2 -- Frequent Phrase Extraction 1. Retrieve notes from Elasticsearch for the last 2 hours. 2. Calculate the similarity between notes using SimHash [1] (implementation: https://crates.io/crates/simhash) and delete notes with high similarity, leaving only one. 3. Parse notes into morphemes using Sudachi. 4. Use PrefixSpan [2] (custom implementation) to extract frequent phrases. [1] Charikar, Moses S. "Similarity estimation techniques from rounding algorithms." Proceedings of the thiry-fourth annual ACM symposium on Theory of computing. 2002. [2] Han, Jiawei, et al. "Prefixspan: Mining sequential patterns efficiently by prefix- projected pattern growth." Proceedings of the 17th international conference on data engineering. IEEE, 2001.

Slide 11

Slide 11 text

Analyzer 2/2 -- Score Calculation 5. Normalize frequent phrases (case and NFKC), and group together phrases that result in the same text. The most frequently occurring spelling will be used as the representative. 6. If there are multiple mentions of a phrase with the same pubkey, the latest note will be used as the representative. 7. The difference between the start time of analysis and the time of phrase occurrence is used to weigh the occurrence. More recent occurrences will have a higher weight (and the past end of the analysis window being smoothed out). 8. Query Elasticsearch to determine the frequency of past occurrences of the phrase. Give a high score to phrases that have many recent occurrences and few past occurrences. 9. Send analysis results to the relay.

Slide 12

Slide 12 text

Interesting points from Nostr's point of view Passing the results of analysis using NIP-33 https://github.com/nostr-protocol/nips/blob/master/33.md

Slide 13

Slide 13 text

An example: Put the analysis result in JSON in the content The kind is 38225 and has the tag buzz-phrases:jp . Send it to own relay. ❯ echo '["REQ", "_", {"kinds": [38225] }]' | nostcat --stream wss://example.com | jq . [ "EVENT", "_", { "content": "{\"phrases\":[{\"text\":\"bluesky といえば\"}, ...snip... ,{\"text\":\"Windows100\"}],\"created_at\":\"2023-02-26T08:36:14.559572950+00:00\",\"language\":\"ja\"}", "created_at": 1677400574, "id": "789cdedc73c7472428c40a39ada18177fa44a1996c30783f0ec0b194ca676cb8", "kind": 38225, "pubkey": "fe295340106bb7b8f5b08f8b7c22000862abc9731dbb86f2f141301e13b4d024", "sig": "f5c6aaf310cd0b6c631bdaf1f909563b1ebf7f45c5db8cced25ab3fced93fba6d8337a03277f23e2e58a1e7aaf1c94cf9996daa58d97a6e80bbde64829fda4f9", "tags": [ [ "d", "buzz-phrases:ja" ] ] } ]

Slide 14

Slide 14 text

Advantages of this setup There is no need for an API server to return analysis results to the browser. this API server: For newly connected browsers, immediately returns the latest cached results. If new results come from the Analyzer, broadcast it to all connected browsers. These processes are simple but require storage, which can be a bit cumbersome. (The initial version was implemented this way) This is resolved by NIP-33.

Slide 15

Slide 15 text

NIP-33: Parameterized Replaceable Events "Replace the old event with a new one if the kind, pubkey, and the first d tag are the same" In other words, the Relay only keeps and returns the latest event. Very convenient. Moreover, redundant configurations can be achieved simply by publishing to multiple relays.

Slide 16

Slide 16 text

Applications Anyone can create a site that receives and displays buzz phrases. The specifications may change without notice... If someone implements another buzz detection engine, but it implements the same protocol, users may be able to switch between them. You can distinguish between them with pubkey , for example. What is a "trend"? Anyone can send out a trend. Events for communication between people are mainstream, but it would be interesting to have a view of the world where services interact with each other via Nostr.

Slide 17

Slide 17 text

Operating a dedicated relay With nostr-rs-relay, you can do it quickly. What I did was just list Analyzer's pubkey in [authorization] section of config.toml . It's easy to think that it's exclusively for my service. Real-time API server for out of box use

Slide 18

Slide 18 text

Conclusion Created nostrbuzzs, a service that analyzes and displays Nostr buzzes. Built in a Nostr-way architecture by utilizing Relay.