How nostrbuzzs works

How nostrbuzzs works npub1q7qyk7rvdga5qzmmyrvmlj29qd0n45snmfuhkrzsj4rk0sm4c4psvqwt9c 2023-03-10 Nostr Meetup #1

@darashi Author of buzztter.com (no longer in operation), a service
that captures the buzzs of Twitter. Created nostrbuzzs, a service to capture the trending phrases on Nostr (from notes in Japanese language). (NEW! I made Nostr Village Broadcasting Station; a kind 1 to speech streaming service via HLS)

nostrbuzzs https://nostrbuzzs.deno.dev/

System overview deno deploy fly.io nostrverse kind:1 kind:1 recent notes
phrase frequency kind:38225 kind:38225 static assets nostrbuzzs.deno.dev indexer analyzer ElasticSearch nostr-rs-relay relay.damus.io nos.lol web-browser

Indexer deno deploy fly.io nostrverse kind:1 kind:1 recent notes phrase
frequency kind:38225 kind:38225 static assets nostrbuzzs.deno.dev indexer analyzer ElasticSearch nostr-rs-relay relay.damus.io nos.lol web-browser Connects relays and collects kind 1 messages. Guesses the language of content and indexes notes that are predicted to be in Japanese into Elasticsearch. https://github.com/pemistahl/lingua-go is used for language detection.

Elasticsearch deno deploy fly.io nostrverse kind:1 kind:1 recent notes phrase
frequency kind:38225 kind:38225 static assets nostrbuzzs.deno.dev indexer analyzer ElasticSearch nostr-rs-relay relay.damus.io nos.lol web-browser Uses https://www.elastic.co/ as is. Keeps data sent from Indexer. Provides necessary information for Analyzer to analyze. Indexes the data in N-grams without tokenizing it with morphemes, etc. for later analysis.

Analyzer deno deploy fly.io nostrverse kind:1 kind:1 recent notes phrase
frequency kind:38225 kind:38225 static assets nostrbuzzs.deno.dev indexer analyzer ElasticSearch nostr-rs-relay relay.damus.io nos.lol web-browser Fetches data from Elasticsearch and compute buzzphrases. Sends the analysis results to Relay via Parameterized Replaceable Events (NIP- 33).

Web UI deno deploy fly.io nostrverse kind:1 kind:1 recent notes
phrase frequency kind:38225 kind:38225 static assets nostrbuzzs.deno.dev indexer analyzer ElasticSearch nostr-rs-relay relay.damus.io nos.lol web-browser An SPA. Written in https://fresh.deno.dev/. At the initial version, I chose to use deno deploy because the app included an API server. It is currently just a static site.

More details about Analyzer

Analyzer 1/2 -- Frequent Phrase Extraction 1. Retrieve notes from
Elasticsearch for the last 2 hours. 2. Calculate the similarity between notes using SimHash [1] (implementation: https://crates.io/crates/simhash) and delete notes with high similarity, leaving only one. 3. Parse notes into morphemes using Sudachi. 4. Use PrefixSpan [2] (custom implementation) to extract frequent phrases. [1] Charikar, Moses S. "Similarity estimation techniques from rounding algorithms." Proceedings of the thiry-fourth annual ACM symposium on Theory of computing. 2002. [2] Han, Jiawei, et al. "Prefixspan: Mining sequential patterns efficiently by prefix- projected pattern growth." Proceedings of the 17th international conference on data engineering. IEEE, 2001.

Analyzer 2/2 -- Score Calculation 5. Normalize frequent phrases (case
and NFKC), and group together phrases that result in the same text. The most frequently occurring spelling will be used as the representative. 6. If there are multiple mentions of a phrase with the same pubkey, the latest note will be used as the representative. 7. The difference between the start time of analysis and the time of phrase occurrence is used to weigh the occurrence. More recent occurrences will have a higher weight (and the past end of the analysis window being smoothed out). 8. Query Elasticsearch to determine the frequency of past occurrences of the phrase. Give a high score to phrases that have many recent occurrences and few past occurrences. 9. Send analysis results to the relay.

Interesting points from Nostr's point of view Passing the results
of analysis using NIP-33 https://github.com/nostr-protocol/nips/blob/master/33.md

An example: Put the analysis result in JSON in the
content The kind is 38225 and has the tag buzz-phrases:jp . Send it to own relay. ❯ echo '["REQ", "_", {"kinds": [38225] }]' | nostcat --stream wss://example.com | jq . [ "EVENT", "_", { "content": "{\"phrases\":[{\"text\":\"bluesky といえば\"}, ...snip... ,{\"text\":\"Windows100\"}],\"created_at\":\"2023-02-26T08:36:14.559572950+00:00\",\"language\":\"ja\"}", "created_at": 1677400574, "id": "789cdedc73c7472428c40a39ada18177fa44a1996c30783f0ec0b194ca676cb8", "kind": 38225, "pubkey": "fe295340106bb7b8f5b08f8b7c22000862abc9731dbb86f2f141301e13b4d024", "sig": "f5c6aaf310cd0b6c631bdaf1f909563b1ebf7f45c5db8cced25ab3fced93fba6d8337a03277f23e2e58a1e7aaf1c94cf9996daa58d97a6e80bbde64829fda4f9", "tags": [ [ "d", "buzz-phrases:ja" ] ] } ]

Advantages of this setup There is no need for an
API server to return analysis results to the browser. this API server: For newly connected browsers, immediately returns the latest cached results. If new results come from the Analyzer, broadcast it to all connected browsers. These processes are simple but require storage, which can be a bit cumbersome. (The initial version was implemented this way) This is resolved by NIP-33.

NIP-33: Parameterized Replaceable Events "Replace the old event with a
new one if the kind, pubkey, and the first d tag are the same" In other words, the Relay only keeps and returns the latest event. Very convenient. Moreover, redundant configurations can be achieved simply by publishing to multiple relays.

Applications Anyone can create a site that receives and displays
buzz phrases. The specifications may change without notice... If someone implements another buzz detection engine, but it implements the same protocol, users may be able to switch between them. You can distinguish between them with pubkey , for example. What is a "trend"? Anyone can send out a trend. Events for communication between people are mainstream, but it would be interesting to have a view of the world where services interact with each other via Nostr.

Operating a dedicated relay With nostr-rs-relay, you can do it
quickly. What I did was just list Analyzer's pubkey in [authorization] section of config.toml . It's easy to think that it's exclusively for my service. Real-time API server for out of box use

Conclusion Created nostrbuzzs, a service that analyzes and displays Nostr
buzzes. Built in a Nostr-way architecture by utilizing Relay.

How nostrbuzzs works

How nostrbuzzs works

Yoji Shidara

More Decks by Yoji Shidara

Other Decks in Technology

Featured

Transcript