Slide 1

Slide 1 text

Context Our Summarizing System Results Conclusions and Future Work Hierarchical Clustering in Improving Microblog Stream Summarization Andrei Olariu University of Bucharest Faculty of Mathematics and Computer Science CICLING 2013 Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 2

Slide 2 text

Context Our Summarizing System Results Conclusions and Future Work Outline 1 Context Microblogging Previous Work Motivation 2 Our Summarizing System Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization 3 Results Corpus Metrics Summarization Results 4 Conclusions and Future Work Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 3

Slide 3 text

Context Our Summarizing System Results Conclusions and Future Work Microblogging Previous Work Motivation Outline 1 Context Microblogging Previous Work Motivation 2 Our Summarizing System Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization 3 Results Corpus Metrics Summarization Results 4 Conclusions and Future Work Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 4

Slide 4 text

Context Our Summarizing System Results Conclusions and Future Work Microblogging Previous Work Motivation What is Microblogging microblogging form of blogging characterized by very short posts microblogging_platforms Twitter, Tumblr, Facebook Twitter's main highlights: over 500 million posts per day data is publicly accessible (unlike Facebook) posts are mainly text (unlike Tumblr - mostly images) posts are limited to 140 characters specic vocabulary (internet slang) abbreviations, misspelled words Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 5

Slide 5 text

Context Our Summarizing System Results Conclusions and Future Work Microblogging Previous Work Motivation What is Microblogging microblogging form of blogging characterized by very short posts microblogging_platforms Twitter, Tumblr, Facebook Twitter's main highlights: over 500 million posts per day data is publicly accessible (unlike Facebook) posts are mainly text (unlike Tumblr - mostly images) posts are limited to 140 characters specic vocabulary (internet slang) abbreviations, misspelled words Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 6

Slide 6 text

Context Our Summarizing System Results Conclusions and Future Work Microblogging Previous Work Motivation What is Microblogging microblogging form of blogging characterized by very short posts microblogging_platforms Twitter, Tumblr, Facebook Twitter's main highlights: over 500 million posts per day data is publicly accessible (unlike Facebook) posts are mainly text (unlike Tumblr - mostly images) posts are limited to 140 characters specic vocabulary (internet slang) abbreviations, misspelled words Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 7

Slide 7 text

Context Our Summarizing System Results Conclusions and Future Work Microblogging Previous Work Motivation What is Microblogging Data on Twitter is organized as a stream (sequence of posts) Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 8

Slide 8 text

Context Our Summarizing System Results Conclusions and Future Work Microblogging Previous Work Motivation Outline 1 Context Microblogging Previous Work Motivation 2 Our Summarizing System Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization 3 Results Corpus Metrics Summarization Results 4 Conclusions and Future Work Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 9

Slide 9 text

Context Our Summarizing System Results Conclusions and Future Work Microblogging Previous Work Motivation Microblog Event Detection detect the main topics in a stream Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 10

Slide 10 text

Context Our Summarizing System Results Conclusions and Future Work Microblogging Previous Work Motivation Microblog Event Detection model an event based on a stream of related posts cluster similar messages detect words that experience an increased frequency Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 11

Slide 11 text

Context Our Summarizing System Results Conclusions and Future Work Microblogging Previous Work Motivation Multi-sentence Compression multi-sentence_compression generate a short sentence that summarizes a group of related sentences Example The wife of a former U.S. president Bill Clinton Hillary Clinton visited China last Monday. Hillary Clinton wanted to visit China last month but postponed her plans till Monday last week. Hillary Clinton paid a visit to the People Republic of China on Monday. Last week the Secretary of State Ms. Clinton visited Chinese ocials. Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 12

Slide 12 text

Context Our Summarizing System Results Conclusions and Future Work Microblogging Previous Work Motivation Multi-sentence Compression multi-sentence_compression generate a short sentence that summarizes a group of related sentences Example The wife of a former U.S. president Bill Clinton Hillary Clinton visited China last Monday. Hillary Clinton wanted to visit China last month but postponed her plans till Monday last week. Hillary Clinton paid a visit to the People Republic of China on Monday. Last week the Secretary of State Ms. Clinton visited Chinese ocials. Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 13

Slide 13 text

Context Our Summarizing System Results Conclusions and Future Work Microblogging Previous Work Motivation Multi-sentence Compression The Multi-sentence Compression algorithm nds a path minimizing a cost function in a word graph: Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 14

Slide 14 text

Context Our Summarizing System Results Conclusions and Future Work Microblogging Previous Work Motivation Summarizing Microblogging Streams approached in two ways: choose a post that best describes the input stream generate a short sentence based on the stream - Phrase Reinforcement algorithm both approaches have been developed for streams of messages related to a given event Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 15

Slide 15 text

Context Our Summarizing System Results Conclusions and Future Work Microblogging Previous Work Motivation Summarizing Microblogging Streams approached in two ways: choose a post that best describes the input stream generate a short sentence based on the stream - Phrase Reinforcement algorithm both approaches have been developed for streams of messages related to a given event Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 16

Slide 16 text

Context Our Summarizing System Results Conclusions and Future Work Microblogging Previous Work Motivation Summarizing Microblogging Streams approached in two ways: choose a post that best describes the input stream generate a short sentence based on the stream - Phrase Reinforcement algorithm both approaches have been developed for streams of messages related to a given event Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 17

Slide 17 text

Context Our Summarizing System Results Conclusions and Future Work Microblogging Previous Work Motivation Phrase Reinforcement Phrase_Reinforcement algorithm that generates a summary starting from a given keyphrase and a stream of posts related to that keyphrase Example A tragedy: Ted Kennedy died today of cancer Ted Kennedy died today Ted Kennedy was a leader Ted Kennedy died at age 77 Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 18

Slide 18 text

Context Our Summarizing System Results Conclusions and Future Work Microblogging Previous Work Motivation Phrase Reinforcement Phrase_Reinforcement algorithm that generates a summary starting from a given keyphrase and a stream of posts related to that keyphrase Example A tragedy: Ted Kennedy died today of cancer Ted Kennedy died today Ted Kennedy was a leader Ted Kennedy died at age 77 Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 19

Slide 19 text

Context Our Summarizing System Results Conclusions and Future Work Microblogging Previous Work Motivation Phrase Reinforcement The graph built starting from the keyphrase Ted Kennedy: , Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 20

Slide 20 text

Context Our Summarizing System Results Conclusions and Future Work Microblogging Previous Work Motivation Outline 1 Context Microblogging Previous Work Motivation 2 Our Summarizing System Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization 3 Results Corpus Metrics Summarization Results 4 Conclusions and Future Work Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 21

Slide 21 text

Context Our Summarizing System Results Conclusions and Future Work Microblogging Previous Work Motivation Motivation All previous summarizing techniques require as input a stream of related posts: posts are ltered based on a given set of keywords keywords are manually selected to match a specic event/topic Yet, most streams are not about a specic event/topic and suer from a large amount of noise. How can we approach summarizing any kind of stream? Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 22

Slide 22 text

Context Our Summarizing System Results Conclusions and Future Work Microblogging Previous Work Motivation Motivation All previous summarizing techniques require as input a stream of related posts: posts are ltered based on a given set of keywords keywords are manually selected to match a specic event/topic Yet, most streams are not about a specic event/topic and suer from a large amount of noise. How can we approach summarizing any kind of stream? Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 23

Slide 23 text

Context Our Summarizing System Results Conclusions and Future Work Microblogging Previous Work Motivation Motivation Contributions: developed a system for summarizing unltered streams adapted the Phrase Reinforcement algorithm in order to integrate it into our system Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 24

Slide 24 text

Context Our Summarizing System Results Conclusions and Future Work Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Outline 1 Context Microblogging Previous Work Motivation 2 Our Summarizing System Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization 3 Results Corpus Metrics Summarization Results 4 Conclusions and Future Work Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 25

Slide 25 text

Context Our Summarizing System Results Conclusions and Future Work Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Approach Outline Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 26

Slide 26 text

Context Our Summarizing System Results Conclusions and Future Work Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Approach Outline Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 27

Slide 27 text

Context Our Summarizing System Results Conclusions and Future Work Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Outline 1 Context Microblogging Previous Work Motivation 2 Our Summarizing System Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization 3 Results Corpus Metrics Summarization Results 4 Conclusions and Future Work Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 28

Slide 28 text

Context Our Summarizing System Results Conclusions and Future Work Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Message Clustering based on Events event detection detect words that show an unusual increase in frequency cluster words based on how often they appear together in posts each cluster of words represents an event message clustering for each message, determine the word cluster most similar to it if the similarity is above a threshold, assign it to the event, otherwise consider it noise Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 29

Slide 29 text

Context Our Summarizing System Results Conclusions and Future Work Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Message Clustering based on Events event detection detect words that show an unusual increase in frequency cluster words based on how often they appear together in posts each cluster of words represents an event message clustering for each message, determine the word cluster most similar to it if the similarity is above a threshold, assign it to the event, otherwise consider it noise Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 30

Slide 30 text

Context Our Summarizing System Results Conclusions and Future Work Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Message Clustering based on Events event detection detect words that show an unusual increase in frequency cluster words based on how often they appear together in posts each cluster of words represents an event message clustering for each message, determine the word cluster most similar to it if the similarity is above a threshold, assign it to the event, otherwise consider it noise Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 31

Slide 31 text

Context Our Summarizing System Results Conclusions and Future Work Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Message Clustering based on Events event detection detect words that show an unusual increase in frequency cluster words based on how often they appear together in posts each cluster of words represents an event message clustering for each message, determine the word cluster most similar to it if the similarity is above a threshold, assign it to the event, otherwise consider it noise Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 32

Slide 32 text

Context Our Summarizing System Results Conclusions and Future Work Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Message Clustering based on Events event detection detect words that show an unusual increase in frequency cluster words based on how often they appear together in posts each cluster of words represents an event message clustering for each message, determine the word cluster most similar to it if the similarity is above a threshold, assign it to the event, otherwise consider it noise Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 33

Slide 33 text

Context Our Summarizing System Results Conclusions and Future Work Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Outline 1 Context Microblogging Previous Work Motivation 2 Our Summarizing System Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization 3 Results Corpus Metrics Summarization Results 4 Conclusions and Future Work Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 34

Slide 34 text

Context Our Summarizing System Results Conclusions and Future Work Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Hierarchical Event Analysis group very similar messages together in information blocks apply agglomerative clustering on the information blocks we use cosine similarity based on word n-grams Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 35

Slide 35 text

Context Our Summarizing System Results Conclusions and Future Work Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Hierarchical Event Analysis group very similar messages together in information blocks apply agglomerative clustering on the information blocks we use cosine similarity based on word n-grams Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 36

Slide 36 text

Context Our Summarizing System Results Conclusions and Future Work Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Hierarchical Event Analysis group very similar messages together in information blocks apply agglomerative clustering on the information blocks we use cosine similarity based on word n-grams Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 37

Slide 37 text

Context Our Summarizing System Results Conclusions and Future Work Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Outline 1 Context Microblogging Previous Work Motivation 2 Our Summarizing System Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization 3 Results Corpus Metrics Summarization Results 4 Conclusions and Future Work Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 38

Slide 38 text

Context Our Summarizing System Results Conclusions and Future Work Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Summarization Approaches We test two dierent approaches: Multi-sentence Compression (MSC) Frequent Phrase Summarization (FPS) an adaptation of Phrase Reinforcement that does not require a starting keyphrase the algorithm retrieves a popular sequence of words from the input stream one of our contributions Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 39

Slide 39 text

Context Our Summarizing System Results Conclusions and Future Work Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Summarization Approaches We test two dierent approaches: Multi-sentence Compression (MSC) Frequent Phrase Summarization (FPS) an adaptation of Phrase Reinforcement that does not require a starting keyphrase the algorithm retrieves a popular sequence of words from the input stream one of our contributions Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 40

Slide 40 text

Context Our Summarizing System Results Conclusions and Future Work Corpus Metrics Summarization Results Outline 1 Context Microblogging Previous Work Motivation 2 Our Summarizing System Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization 3 Results Corpus Metrics Summarization Results 4 Conclusions and Future Work Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 41

Slide 41 text

Context Our Summarizing System Results Conclusions and Future Work Corpus Metrics Summarization Results Corpus we used the Twitter API to retrieve recent tweets we experimented on 1.6 million tweets collected between the 4 th and the 8 th of July 2012 we used another 1.7 million tweets (collected during the previous week) as background data Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 42

Slide 42 text

Context Our Summarizing System Results Conclusions and Future Work Corpus Metrics Summarization Results Events The Event Detection module discovered an average of 20 events per day. Examples of events: real sporting events (wrestling, basketball, football) Independence Day celebrity news other: nding the Higgs boson, the European debt crisis virtual memes: thingsidislike popular hashtags popular retweets Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 43

Slide 43 text

Context Our Summarizing System Results Conclusions and Future Work Corpus Metrics Summarization Results Events The Event Detection module discovered an average of 20 events per day. Examples of events: real sporting events (wrestling, basketball, football) Independence Day celebrity news other: nding the Higgs boson, the European debt crisis virtual memes: thingsidislike popular hashtags popular retweets Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 44

Slide 44 text

Context Our Summarizing System Results Conclusions and Future Work Corpus Metrics Summarization Results Outline 1 Context Microblogging Previous Work Motivation 2 Our Summarizing System Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization 3 Results Corpus Metrics Summarization Results 4 Conclusions and Future Work Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 45

Slide 45 text

Context Our Summarizing System Results Conclusions and Future Work Corpus Metrics Summarization Results Metrics the summaries were rated regarding: completeness - how much information the summary expresses relative to the detected event grammaticality - the degree of grammatical and syntactical correctness redundancy - if a multi-sentence summary repeats the same information Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 46

Slide 46 text

Context Our Summarizing System Results Conclusions and Future Work Corpus Metrics Summarization Results Metrics hierarchical summarization procedure: each cluster tree was cut to the level where it has 4 clusters the 4 clusters were summarized, generating a multi-sentence summary trees with less than 4 clusters were removed from the analysis we were left with 50 sets of summaries a group of 4 volunteers assigned ratings on a scale of 1 to 5 Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 47

Slide 47 text

Context Our Summarizing System Results Conclusions and Future Work Corpus Metrics Summarization Results Metrics hierarchical summarization procedure: each cluster tree was cut to the level where it has 4 clusters the 4 clusters were summarized, generating a multi-sentence summary trees with less than 4 clusters were removed from the analysis we were left with 50 sets of summaries a group of 4 volunteers assigned ratings on a scale of 1 to 5 Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 48

Slide 48 text

Context Our Summarizing System Results Conclusions and Future Work Corpus Metrics Summarization Results Metrics Examples of ratings: Summary Ratings there is nothing wrong with america that cannot be cured by what is right with america. ~ bill clinton happy4th Completeness: 3 Grammaticality: 5 happy birthday 'merica they call me happy4th happy 4th of july merica there is nothing wrong with america that cannot be cured by what is right with america. ~ bill clinton happy4th Completeness: 4 Grammaticality: 4 Non-redundancy: 3 Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 49

Slide 49 text

Context Our Summarizing System Results Conclusions and Future Work Corpus Metrics Summarization Results Outline 1 Context Microblogging Previous Work Motivation 2 Our Summarizing System Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization 3 Results Corpus Metrics Summarization Results 4 Conclusions and Future Work Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 50

Slide 50 text

Context Our Summarizing System Results Conclusions and Future Work Corpus Metrics Summarization Results Summarization without Clustering MSC generates a meaningless summary 4 th of July summary: MSC: rt TWID you to the TWID URL summaries generated by MSC receive a grammaticality rating of 1 and a completeness rating of 1 Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 51

Slide 51 text

Context Our Summarizing System Results Conclusions and Future Work Corpus Metrics Summarization Results Summarization without Clustering MSC generates a meaningless summary 4 th of July summary: MSC: rt TWID you to the TWID URL summaries generated by MSC receive a grammaticality rating of 1 and a completeness rating of 1 Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 52

Slide 52 text

Context Our Summarizing System Results Conclusions and Future Work Corpus Metrics Summarization Results Summarization without Clustering MSC generates a meaningless summary 4 th of July summary: MSC: rt TWID you to the TWID URL summaries generated by MSC receive a grammaticality rating of 1 and a completeness rating of 1 Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 53

Slide 53 text

Context Our Summarizing System Results Conclusions and Future Work Corpus Metrics Summarization Results Summarization without Clustering FPS picks a long and frequent phrase (usually the one that was retweeted the most) 4 th of July summary: FPS: rt TWID dear mom&dad thank you for everything you've done to me i can never pay back all of them but i'm trying to be the best for both of you summaries generated by FPS receive a grammaticality rating of 5 and a completeness rating of 1. Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 54

Slide 54 text

Context Our Summarizing System Results Conclusions and Future Work Corpus Metrics Summarization Results Summarization without Clustering FPS picks a long and frequent phrase (usually the one that was retweeted the most) 4 th of July summary: FPS: rt TWID dear mom&dad thank you for everything you've done to me i can never pay back all of them but i'm trying to be the best for both of you summaries generated by FPS receive a grammaticality rating of 5 and a completeness rating of 1. Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 55

Slide 55 text

Context Our Summarizing System Results Conclusions and Future Work Corpus Metrics Summarization Results Summarization without Clustering FPS picks a long and frequent phrase (usually the one that was retweeted the most) 4 th of July summary: FPS: rt TWID dear mom&dad thank you for everything you've done to me i can never pay back all of them but i'm trying to be the best for both of you summaries generated by FPS receive a grammaticality rating of 5 and a completeness rating of 1. Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 56

Slide 56 text

Context Our Summarizing System Results Conclusions and Future Work Corpus Metrics Summarization Results Summarization with Clustering Completeness scores: Rated feature Summary size Average rating (standard deviation) Improvement MSC completeness One sentence 3.05 (1.03) 40.3% Four sentences 4.28 (0.85) FPS completeness One sentence 3.28 (0.99) 25.3% Four sentences 4.11 (0.86) Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 57

Slide 57 text

Context Our Summarizing System Results Conclusions and Future Work Corpus Metrics Summarization Results Summarization with Clustering Grammaticality scores: Rated feature Summary size Average rating (standard deviation) Improvement MSC grammaticality One sentence 4.05 (1.21) -1.2% Four sentences 4.00 (1.00) FPS grammaticality One sentence 4.25 (1.10) -15.0% Four sentences 3.61 (1.10) Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 58

Slide 58 text

Context Our Summarizing System Results Conclusions and Future Work Corpus Metrics Summarization Results Summarization with Clustering Redundancy scores: Rated feature Summary size Average rating (standard deviation) MSC non-redundancy Four sentences 4.01 (1.14) FPS non-redundancy Four sentences 3.82 (1.16) Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 59

Slide 59 text

Context Our Summarizing System Results Conclusions and Future Work Conclusions We showed that summarizing streams can be signicantly improved by clustering messages together and removing noise. The steps of the summarizing algorithm are: detecting the events people are talking about clustering posts related to those events applying classical summarizing algorithms to each cluster of posts Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 60

Slide 60 text

Context Our Summarizing System Results Conclusions and Future Work Conclusions We showed that summarizing streams can be signicantly improved by clustering messages together and removing noise. The steps of the summarizing algorithm are: detecting the events people are talking about clustering posts related to those events applying classical summarizing algorithms to each cluster of posts Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 61

Slide 61 text

Context Our Summarizing System Results Conclusions and Future Work Future Work fast online processing of streams develop a visual interface for rendering hierarchical summaries and investigating how large streams can be analyzed by users Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum

Slide 62

Slide 62 text

Context Our Summarizing System Results Conclusions and Future Work Thank You Thank you for your time. Do you have any questions? Contact: [email protected] http://andrei.olariu.org Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum