Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How nostrbuzzs works

How nostrbuzzs works

2023-03-10 Nostr Meetup #1

This is an English translation of the material I presented at "Nostr勉強会 #1" meetup.

Yoji Shidara

March 10, 2023
Tweet

More Decks by Yoji Shidara

Other Decks in Technology

Transcript

  1. @darashi Author of buzztter.com (no longer in operation), a service

    that captures the buzzs of Twitter. Created nostrbuzzs, a service to capture the trending phrases on Nostr (from notes in Japanese language). (NEW! I made Nostr Village Broadcasting Station; a kind 1 to speech streaming service via HLS)
  2. System overview deno deploy fly.io nostrverse kind:1 kind:1 recent notes

    phrase frequency kind:38225 kind:38225 static assets nostrbuzzs.deno.dev indexer analyzer ElasticSearch nostr-rs-relay relay.damus.io nos.lol web-browser
  3. Indexer deno deploy fly.io nostrverse kind:1 kind:1 recent notes phrase

    frequency kind:38225 kind:38225 static assets nostrbuzzs.deno.dev indexer analyzer ElasticSearch nostr-rs-relay relay.damus.io nos.lol web-browser Connects relays and collects kind 1 messages. Guesses the language of content and indexes notes that are predicted to be in Japanese into Elasticsearch. https://github.com/pemistahl/lingua-go is used for language detection.
  4. Elasticsearch deno deploy fly.io nostrverse kind:1 kind:1 recent notes phrase

    frequency kind:38225 kind:38225 static assets nostrbuzzs.deno.dev indexer analyzer ElasticSearch nostr-rs-relay relay.damus.io nos.lol web-browser Uses https://www.elastic.co/ as is. Keeps data sent from Indexer. Provides necessary information for Analyzer to analyze. Indexes the data in N-grams without tokenizing it with morphemes, etc. for later analysis.
  5. Analyzer deno deploy fly.io nostrverse kind:1 kind:1 recent notes phrase

    frequency kind:38225 kind:38225 static assets nostrbuzzs.deno.dev indexer analyzer ElasticSearch nostr-rs-relay relay.damus.io nos.lol web-browser Fetches data from Elasticsearch and compute buzzphrases. Sends the analysis results to Relay via Parameterized Replaceable Events (NIP- 33).
  6. Web UI deno deploy fly.io nostrverse kind:1 kind:1 recent notes

    phrase frequency kind:38225 kind:38225 static assets nostrbuzzs.deno.dev indexer analyzer ElasticSearch nostr-rs-relay relay.damus.io nos.lol web-browser An SPA. Written in https://fresh.deno.dev/. At the initial version, I chose to use deno deploy because the app included an API server. It is currently just a static site.
  7. Analyzer 1/2 -- Frequent Phrase Extraction 1. Retrieve notes from

    Elasticsearch for the last 2 hours. 2. Calculate the similarity between notes using SimHash [1] (implementation: https://crates.io/crates/simhash) and delete notes with high similarity, leaving only one. 3. Parse notes into morphemes using Sudachi. 4. Use PrefixSpan [2] (custom implementation) to extract frequent phrases. [1] Charikar, Moses S. "Similarity estimation techniques from rounding algorithms." Proceedings of the thiry-fourth annual ACM symposium on Theory of computing. 2002. [2] Han, Jiawei, et al. "Prefixspan: Mining sequential patterns efficiently by prefix- projected pattern growth." Proceedings of the 17th international conference on data engineering. IEEE, 2001.
  8. Analyzer 2/2 -- Score Calculation 5. Normalize frequent phrases (case

    and NFKC), and group together phrases that result in the same text. The most frequently occurring spelling will be used as the representative. 6. If there are multiple mentions of a phrase with the same pubkey, the latest note will be used as the representative. 7. The difference between the start time of analysis and the time of phrase occurrence is used to weigh the occurrence. More recent occurrences will have a higher weight (and the past end of the analysis window being smoothed out). 8. Query Elasticsearch to determine the frequency of past occurrences of the phrase. Give a high score to phrases that have many recent occurrences and few past occurrences. 9. Send analysis results to the relay.
  9. Interesting points from Nostr's point of view Passing the results

    of analysis using NIP-33 https://github.com/nostr-protocol/nips/blob/master/33.md
  10. An example: Put the analysis result in JSON in the

    content The kind is 38225 and has the tag buzz-phrases:jp . Send it to own relay. ❯ echo '["REQ", "_", {"kinds": [38225] }]' | nostcat --stream wss://example.com | jq . [ "EVENT", "_", { "content": "{\"phrases\":[{\"text\":\"bluesky といえば\"}, ...snip... ,{\"text\":\"Windows100\"}],\"created_at\":\"2023-02-26T08:36:14.559572950+00:00\",\"language\":\"ja\"}", "created_at": 1677400574, "id": "789cdedc73c7472428c40a39ada18177fa44a1996c30783f0ec0b194ca676cb8", "kind": 38225, "pubkey": "fe295340106bb7b8f5b08f8b7c22000862abc9731dbb86f2f141301e13b4d024", "sig": "f5c6aaf310cd0b6c631bdaf1f909563b1ebf7f45c5db8cced25ab3fced93fba6d8337a03277f23e2e58a1e7aaf1c94cf9996daa58d97a6e80bbde64829fda4f9", "tags": [ [ "d", "buzz-phrases:ja" ] ] } ]
  11. Advantages of this setup There is no need for an

    API server to return analysis results to the browser. this API server: For newly connected browsers, immediately returns the latest cached results. If new results come from the Analyzer, broadcast it to all connected browsers. These processes are simple but require storage, which can be a bit cumbersome. (The initial version was implemented this way) This is resolved by NIP-33.
  12. NIP-33: Parameterized Replaceable Events "Replace the old event with a

    new one if the kind, pubkey, and the first d tag are the same" In other words, the Relay only keeps and returns the latest event. Very convenient. Moreover, redundant configurations can be achieved simply by publishing to multiple relays.
  13. Applications Anyone can create a site that receives and displays

    buzz phrases. The specifications may change without notice... If someone implements another buzz detection engine, but it implements the same protocol, users may be able to switch between them. You can distinguish between them with pubkey , for example. What is a "trend"? Anyone can send out a trend. Events for communication between people are mainstream, but it would be interesting to have a view of the world where services interact with each other via Nostr.
  14. Operating a dedicated relay With nostr-rs-relay, you can do it

    quickly. What I did was just list Analyzer's pubkey in [authorization] section of config.toml . It's easy to think that it's exclusively for my service. Real-time API server for out of box use
  15. Conclusion Created nostrbuzzs, a service that analyzes and displays Nostr

    buzzes. Built in a Nostr-way architecture by utilizing Relay.