Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How nostrbuzzs works

How nostrbuzzs works

2023-03-10 Nostr Meetup #1

This is an English translation of the material I presented at "Nostr勉強会 #1" meetup.

Yoji Shidara

March 10, 2023
Tweet

More Decks by Yoji Shidara

Other Decks in Technology

Transcript

  1. How nostrbuzzs works
    npub1q7qyk7rvdga5qzmmyrvmlj29qd0n45snmfuhkrzsj4rk0sm4c4psvqwt9c
    2023-03-10 Nostr Meetup #1

    View full-size slide

  2. @darashi
    Author of buzztter.com (no longer in operation), a service that captures the buzzs of
    Twitter.
    Created nostrbuzzs, a service to capture the trending phrases on Nostr (from notes in
    Japanese language).
    (NEW! I made Nostr Village Broadcasting Station; a kind 1 to speech streaming service
    via HLS)

    View full-size slide

  3. nostrbuzzs
    https://nostrbuzzs.deno.dev/

    View full-size slide

  4. System overview
    deno deploy
    fly.io
    nostrverse
    kind:1
    kind:1
    recent notes
    phrase frequency
    kind:38225 kind:38225
    static assets
    nostrbuzzs.deno.dev
    indexer analyzer
    ElasticSearch nostr-rs-relay
    relay.damus.io
    nos.lol
    web-browser

    View full-size slide

  5. Indexer
    deno deploy
    fly.io
    nostrverse
    kind:1
    kind:1
    recent notes
    phrase frequency
    kind:38225 kind:38225
    static assets
    nostrbuzzs.deno.dev
    indexer analyzer
    ElasticSearch nostr-rs-relay
    relay.damus.io
    nos.lol
    web-browser
    Connects relays and collects kind 1 messages.
    Guesses the language of content and indexes notes that are predicted to be in
    Japanese into Elasticsearch.
    https://github.com/pemistahl/lingua-go is used for language detection.

    View full-size slide

  6. Elasticsearch
    deno deploy
    fly.io
    nostrverse
    kind:1
    kind:1
    recent notes
    phrase frequency
    kind:38225 kind:38225
    static assets
    nostrbuzzs.deno.dev
    indexer analyzer
    ElasticSearch nostr-rs-relay
    relay.damus.io
    nos.lol
    web-browser
    Uses https://www.elastic.co/ as is.
    Keeps data sent from Indexer.
    Provides necessary information for Analyzer to analyze.
    Indexes the data in N-grams without tokenizing it with morphemes, etc. for later
    analysis.

    View full-size slide

  7. Analyzer
    deno deploy
    fly.io
    nostrverse
    kind:1
    kind:1
    recent notes
    phrase frequency
    kind:38225 kind:38225
    static assets
    nostrbuzzs.deno.dev
    indexer analyzer
    ElasticSearch nostr-rs-relay
    relay.damus.io
    nos.lol
    web-browser
    Fetches data from Elasticsearch and compute buzzphrases.
    Sends the analysis results to Relay via Parameterized Replaceable Events (NIP-
    33).

    View full-size slide

  8. Web UI
    deno deploy
    fly.io
    nostrverse
    kind:1
    kind:1
    recent notes
    phrase frequency
    kind:38225 kind:38225
    static assets
    nostrbuzzs.deno.dev
    indexer analyzer
    ElasticSearch nostr-rs-relay
    relay.damus.io
    nos.lol
    web-browser
    An SPA.
    Written in https://fresh.deno.dev/.
    At the initial version, I chose to use deno deploy because the app included an API
    server. It is currently just a static site.

    View full-size slide

  9. More details about Analyzer

    View full-size slide

  10. Analyzer 1/2 -- Frequent Phrase Extraction
    1. Retrieve notes from Elasticsearch for the last 2 hours.
    2. Calculate the similarity between notes using SimHash [1] (implementation:
    https://crates.io/crates/simhash) and delete notes with high similarity, leaving only
    one.
    3. Parse notes into morphemes using Sudachi.
    4. Use PrefixSpan [2] (custom implementation) to extract frequent phrases.
    [1] Charikar, Moses S. "Similarity estimation techniques from rounding algorithms."
    Proceedings of the thiry-fourth annual ACM symposium on Theory of computing. 2002.
    [2] Han, Jiawei, et al. "Prefixspan: Mining sequential patterns efficiently by prefix-
    projected pattern growth." Proceedings of the 17th international conference on data
    engineering. IEEE, 2001.

    View full-size slide

  11. Analyzer 2/2 -- Score Calculation
    5. Normalize frequent phrases (case and NFKC), and group together phrases that
    result in the same text. The most frequently occurring spelling will be used as the
    representative.
    6. If there are multiple mentions of a phrase with the same pubkey, the latest note will
    be used as the representative.
    7. The difference between the start time of analysis and the time of phrase
    occurrence is used to weigh the occurrence. More recent occurrences will have a
    higher weight (and the past end of the analysis window being smoothed out).
    8. Query Elasticsearch to determine the frequency of past occurrences of the phrase.
    Give a high score to phrases that have many recent occurrences and few past
    occurrences.
    9. Send analysis results to the relay.

    View full-size slide

  12. Interesting points from Nostr's point of view
    Passing the results of analysis using NIP-33
    https://github.com/nostr-protocol/nips/blob/master/33.md

    View full-size slide

  13. An example:
    Put the analysis result in JSON in the content
    The kind is 38225 and has the tag buzz-phrases:jp .
    Send it to own relay.
    ❯ echo '["REQ", "_", {"kinds": [38225] }]' | nostcat --stream wss://example.com | jq .
    [
    "EVENT",
    "_",
    {
    "content": "{\"phrases\":[{\"text\":\"bluesky
    といえば\"},
    ...snip...
    ,{\"text\":\"Windows100\"}],\"created_at\":\"2023-02-26T08:36:14.559572950+00:00\",\"language\":\"ja\"}",
    "created_at": 1677400574,
    "id": "789cdedc73c7472428c40a39ada18177fa44a1996c30783f0ec0b194ca676cb8",
    "kind": 38225,
    "pubkey": "fe295340106bb7b8f5b08f8b7c22000862abc9731dbb86f2f141301e13b4d024",
    "sig": "f5c6aaf310cd0b6c631bdaf1f909563b1ebf7f45c5db8cced25ab3fced93fba6d8337a03277f23e2e58a1e7aaf1c94cf9996daa58d97a6e80bbde64829fda4f9",
    "tags": [
    [
    "d",
    "buzz-phrases:ja"
    ]
    ]
    }
    ]

    View full-size slide

  14. Advantages of this setup
    There is no need for an API server to return analysis results to the browser.
    this API server:
    For newly connected browsers, immediately returns the latest cached results.
    If new results come from the Analyzer, broadcast it to all connected browsers.
    These processes are simple but require storage, which can be a bit cumbersome.
    (The initial version was implemented this way)
    This is resolved by NIP-33.

    View full-size slide

  15. NIP-33: Parameterized Replaceable Events
    "Replace the old event with a new one if the kind, pubkey, and the first d tag are
    the same"
    In other words, the Relay only keeps and returns the latest event.
    Very convenient.
    Moreover, redundant configurations can be achieved simply by publishing to multiple
    relays.

    View full-size slide

  16. Applications
    Anyone can create a site that receives and displays buzz phrases.
    The specifications may change without notice...
    If someone implements another buzz detection engine, but it implements the same
    protocol, users may be able to switch between them.
    You can distinguish between them with pubkey , for example.
    What is a "trend"? Anyone can send out a trend.
    Events for communication between people are mainstream, but it would be interesting
    to have a view of the world where services interact with each other via Nostr.

    View full-size slide

  17. Operating a dedicated relay
    With nostr-rs-relay, you can do it quickly. What I did was just list Analyzer's
    pubkey in [authorization] section of config.toml .
    It's easy to think that it's exclusively for my service.
    Real-time API server for out of box use

    View full-size slide

  18. Conclusion
    Created nostrbuzzs, a service that analyzes and displays Nostr buzzes.
    Built in a Nostr-way architecture by utilizing Relay.

    View full-size slide