that captures the buzzs of Twitter. Created nostrbuzzs, a service to capture the trending phrases on Nostr (from notes in Japanese language). (NEW! I made Nostr Village Broadcasting Station; a kind 1 to speech streaming service via HLS)
frequency kind:38225 kind:38225 static assets nostrbuzzs.deno.dev indexer analyzer ElasticSearch nostr-rs-relay relay.damus.io nos.lol web-browser Connects relays and collects kind 1 messages. Guesses the language of content and indexes notes that are predicted to be in Japanese into Elasticsearch. https://github.com/pemistahl/lingua-go is used for language detection.
frequency kind:38225 kind:38225 static assets nostrbuzzs.deno.dev indexer analyzer ElasticSearch nostr-rs-relay relay.damus.io nos.lol web-browser Uses https://www.elastic.co/ as is. Keeps data sent from Indexer. Provides necessary information for Analyzer to analyze. Indexes the data in N-grams without tokenizing it with morphemes, etc. for later analysis.
frequency kind:38225 kind:38225 static assets nostrbuzzs.deno.dev indexer analyzer ElasticSearch nostr-rs-relay relay.damus.io nos.lol web-browser Fetches data from Elasticsearch and compute buzzphrases. Sends the analysis results to Relay via Parameterized Replaceable Events (NIP- 33).
phrase frequency kind:38225 kind:38225 static assets nostrbuzzs.deno.dev indexer analyzer ElasticSearch nostr-rs-relay relay.damus.io nos.lol web-browser An SPA. Written in https://fresh.deno.dev/. At the initial version, I chose to use deno deploy because the app included an API server. It is currently just a static site.
Elasticsearch for the last 2 hours. 2. Calculate the similarity between notes using SimHash [1] (implementation: https://crates.io/crates/simhash) and delete notes with high similarity, leaving only one. 3. Parse notes into morphemes using Sudachi. 4. Use PrefixSpan [2] (custom implementation) to extract frequent phrases. [1] Charikar, Moses S. "Similarity estimation techniques from rounding algorithms." Proceedings of the thiry-fourth annual ACM symposium on Theory of computing. 2002. [2] Han, Jiawei, et al. "Prefixspan: Mining sequential patterns efficiently by prefix- projected pattern growth." Proceedings of the 17th international conference on data engineering. IEEE, 2001.
and NFKC), and group together phrases that result in the same text. The most frequently occurring spelling will be used as the representative. 6. If there are multiple mentions of a phrase with the same pubkey, the latest note will be used as the representative. 7. The difference between the start time of analysis and the time of phrase occurrence is used to weigh the occurrence. More recent occurrences will have a higher weight (and the past end of the analysis window being smoothed out). 8. Query Elasticsearch to determine the frequency of past occurrences of the phrase. Give a high score to phrases that have many recent occurrences and few past occurrences. 9. Send analysis results to the relay.
API server to return analysis results to the browser. this API server: For newly connected browsers, immediately returns the latest cached results. If new results come from the Analyzer, broadcast it to all connected browsers. These processes are simple but require storage, which can be a bit cumbersome. (The initial version was implemented this way) This is resolved by NIP-33.
new one if the kind, pubkey, and the first d tag are the same" In other words, the Relay only keeps and returns the latest event. Very convenient. Moreover, redundant configurations can be achieved simply by publishing to multiple relays.
buzz phrases. The specifications may change without notice... If someone implements another buzz detection engine, but it implements the same protocol, users may be able to switch between them. You can distinguish between them with pubkey , for example. What is a "trend"? Anyone can send out a trend. Events for communication between people are mainstream, but it would be interesting to have a view of the world where services interact with each other via Nostr.
quickly. What I did was just list Analyzer's pubkey in [authorization] section of config.toml . It's easy to think that it's exclusively for my service. Real-time API server for out of box use