Hierarchical Clustering in Improving Microblog Stream Summarization

Hierarchical Clustering in Improving Microblog Stream Summarization

Aa7d8d7ad654a39f3d4093fee6235c82?s=128

Andrei Olariu

March 27, 2013
Tweet

Transcript

  1. 1.

    Context Our Summarizing System Results Conclusions and Future Work Hierarchical

    Clustering in Improving Microblog Stream Summarization Andrei Olariu University of Bucharest Faculty of Mathematics and Computer Science CICLING 2013 Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  2. 2.

    Context Our Summarizing System Results Conclusions and Future Work Outline

    1 Context Microblogging Previous Work Motivation 2 Our Summarizing System Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization 3 Results Corpus Metrics Summarization Results 4 Conclusions and Future Work Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  3. 3.

    Context Our Summarizing System Results Conclusions and Future Work Microblogging

    Previous Work Motivation Outline 1 Context Microblogging Previous Work Motivation 2 Our Summarizing System Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization 3 Results Corpus Metrics Summarization Results 4 Conclusions and Future Work Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  4. 4.

    Context Our Summarizing System Results Conclusions and Future Work Microblogging

    Previous Work Motivation What is Microblogging microblogging form of blogging characterized by very short posts microblogging_platforms Twitter, Tumblr, Facebook Twitter's main highlights: over 500 million posts per day data is publicly accessible (unlike Facebook) posts are mainly text (unlike Tumblr - mostly images) posts are limited to 140 characters specic vocabulary (internet slang) abbreviations, misspelled words Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  5. 5.

    Context Our Summarizing System Results Conclusions and Future Work Microblogging

    Previous Work Motivation What is Microblogging microblogging form of blogging characterized by very short posts microblogging_platforms Twitter, Tumblr, Facebook Twitter's main highlights: over 500 million posts per day data is publicly accessible (unlike Facebook) posts are mainly text (unlike Tumblr - mostly images) posts are limited to 140 characters specic vocabulary (internet slang) abbreviations, misspelled words Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  6. 6.

    Context Our Summarizing System Results Conclusions and Future Work Microblogging

    Previous Work Motivation What is Microblogging microblogging form of blogging characterized by very short posts microblogging_platforms Twitter, Tumblr, Facebook Twitter's main highlights: over 500 million posts per day data is publicly accessible (unlike Facebook) posts are mainly text (unlike Tumblr - mostly images) posts are limited to 140 characters specic vocabulary (internet slang) abbreviations, misspelled words Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  7. 7.

    Context Our Summarizing System Results Conclusions and Future Work Microblogging

    Previous Work Motivation What is Microblogging Data on Twitter is organized as a stream (sequence of posts) Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  8. 8.

    Context Our Summarizing System Results Conclusions and Future Work Microblogging

    Previous Work Motivation Outline 1 Context Microblogging Previous Work Motivation 2 Our Summarizing System Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization 3 Results Corpus Metrics Summarization Results 4 Conclusions and Future Work Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  9. 9.

    Context Our Summarizing System Results Conclusions and Future Work Microblogging

    Previous Work Motivation Microblog Event Detection detect the main topics in a stream Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  10. 10.

    Context Our Summarizing System Results Conclusions and Future Work Microblogging

    Previous Work Motivation Microblog Event Detection model an event based on a stream of related posts cluster similar messages detect words that experience an increased frequency Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  11. 11.

    Context Our Summarizing System Results Conclusions and Future Work Microblogging

    Previous Work Motivation Multi-sentence Compression multi-sentence_compression generate a short sentence that summarizes a group of related sentences Example The wife of a former U.S. president Bill Clinton Hillary Clinton visited China last Monday. Hillary Clinton wanted to visit China last month but postponed her plans till Monday last week. Hillary Clinton paid a visit to the People Republic of China on Monday. Last week the Secretary of State Ms. Clinton visited Chinese ocials. Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  12. 12.

    Context Our Summarizing System Results Conclusions and Future Work Microblogging

    Previous Work Motivation Multi-sentence Compression multi-sentence_compression generate a short sentence that summarizes a group of related sentences Example The wife of a former U.S. president Bill Clinton Hillary Clinton visited China last Monday. Hillary Clinton wanted to visit China last month but postponed her plans till Monday last week. Hillary Clinton paid a visit to the People Republic of China on Monday. Last week the Secretary of State Ms. Clinton visited Chinese ocials. Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  13. 13.

    Context Our Summarizing System Results Conclusions and Future Work Microblogging

    Previous Work Motivation Multi-sentence Compression The Multi-sentence Compression algorithm nds a path minimizing a cost function in a word graph: Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  14. 14.

    Context Our Summarizing System Results Conclusions and Future Work Microblogging

    Previous Work Motivation Summarizing Microblogging Streams approached in two ways: choose a post that best describes the input stream generate a short sentence based on the stream - Phrase Reinforcement algorithm both approaches have been developed for streams of messages related to a given event Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  15. 15.

    Context Our Summarizing System Results Conclusions and Future Work Microblogging

    Previous Work Motivation Summarizing Microblogging Streams approached in two ways: choose a post that best describes the input stream generate a short sentence based on the stream - Phrase Reinforcement algorithm both approaches have been developed for streams of messages related to a given event Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  16. 16.

    Context Our Summarizing System Results Conclusions and Future Work Microblogging

    Previous Work Motivation Summarizing Microblogging Streams approached in two ways: choose a post that best describes the input stream generate a short sentence based on the stream - Phrase Reinforcement algorithm both approaches have been developed for streams of messages related to a given event Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  17. 17.

    Context Our Summarizing System Results Conclusions and Future Work Microblogging

    Previous Work Motivation Phrase Reinforcement Phrase_Reinforcement algorithm that generates a summary starting from a given keyphrase and a stream of posts related to that keyphrase Example A tragedy: Ted Kennedy died today of cancer Ted Kennedy died today Ted Kennedy was a leader Ted Kennedy died at age 77 Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  18. 18.

    Context Our Summarizing System Results Conclusions and Future Work Microblogging

    Previous Work Motivation Phrase Reinforcement Phrase_Reinforcement algorithm that generates a summary starting from a given keyphrase and a stream of posts related to that keyphrase Example A tragedy: Ted Kennedy died today of cancer Ted Kennedy died today Ted Kennedy was a leader Ted Kennedy died at age 77 Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  19. 19.

    Context Our Summarizing System Results Conclusions and Future Work Microblogging

    Previous Work Motivation Phrase Reinforcement The graph built starting from the keyphrase Ted Kennedy: , Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  20. 20.

    Context Our Summarizing System Results Conclusions and Future Work Microblogging

    Previous Work Motivation Outline 1 Context Microblogging Previous Work Motivation 2 Our Summarizing System Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization 3 Results Corpus Metrics Summarization Results 4 Conclusions and Future Work Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  21. 21.

    Context Our Summarizing System Results Conclusions and Future Work Microblogging

    Previous Work Motivation Motivation All previous summarizing techniques require as input a stream of related posts: posts are ltered based on a given set of keywords keywords are manually selected to match a specic event/topic Yet, most streams are not about a specic event/topic and suer from a large amount of noise. How can we approach summarizing any kind of stream? Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  22. 22.

    Context Our Summarizing System Results Conclusions and Future Work Microblogging

    Previous Work Motivation Motivation All previous summarizing techniques require as input a stream of related posts: posts are ltered based on a given set of keywords keywords are manually selected to match a specic event/topic Yet, most streams are not about a specic event/topic and suer from a large amount of noise. How can we approach summarizing any kind of stream? Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  23. 23.

    Context Our Summarizing System Results Conclusions and Future Work Microblogging

    Previous Work Motivation Motivation Contributions: developed a system for summarizing unltered streams adapted the Phrase Reinforcement algorithm in order to integrate it into our system Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  24. 24.

    Context Our Summarizing System Results Conclusions and Future Work Approach

    Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Outline 1 Context Microblogging Previous Work Motivation 2 Our Summarizing System Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization 3 Results Corpus Metrics Summarization Results 4 Conclusions and Future Work Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  25. 25.

    Context Our Summarizing System Results Conclusions and Future Work Approach

    Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Approach Outline Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  26. 26.

    Context Our Summarizing System Results Conclusions and Future Work Approach

    Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Approach Outline Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  27. 27.

    Context Our Summarizing System Results Conclusions and Future Work Approach

    Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Outline 1 Context Microblogging Previous Work Motivation 2 Our Summarizing System Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization 3 Results Corpus Metrics Summarization Results 4 Conclusions and Future Work Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  28. 28.

    Context Our Summarizing System Results Conclusions and Future Work Approach

    Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Message Clustering based on Events event detection detect words that show an unusual increase in frequency cluster words based on how often they appear together in posts each cluster of words represents an event message clustering for each message, determine the word cluster most similar to it if the similarity is above a threshold, assign it to the event, otherwise consider it noise Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  29. 29.

    Context Our Summarizing System Results Conclusions and Future Work Approach

    Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Message Clustering based on Events event detection detect words that show an unusual increase in frequency cluster words based on how often they appear together in posts each cluster of words represents an event message clustering for each message, determine the word cluster most similar to it if the similarity is above a threshold, assign it to the event, otherwise consider it noise Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  30. 30.

    Context Our Summarizing System Results Conclusions and Future Work Approach

    Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Message Clustering based on Events event detection detect words that show an unusual increase in frequency cluster words based on how often they appear together in posts each cluster of words represents an event message clustering for each message, determine the word cluster most similar to it if the similarity is above a threshold, assign it to the event, otherwise consider it noise Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  31. 31.

    Context Our Summarizing System Results Conclusions and Future Work Approach

    Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Message Clustering based on Events event detection detect words that show an unusual increase in frequency cluster words based on how often they appear together in posts each cluster of words represents an event message clustering for each message, determine the word cluster most similar to it if the similarity is above a threshold, assign it to the event, otherwise consider it noise Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  32. 32.

    Context Our Summarizing System Results Conclusions and Future Work Approach

    Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Message Clustering based on Events event detection detect words that show an unusual increase in frequency cluster words based on how often they appear together in posts each cluster of words represents an event message clustering for each message, determine the word cluster most similar to it if the similarity is above a threshold, assign it to the event, otherwise consider it noise Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  33. 33.

    Context Our Summarizing System Results Conclusions and Future Work Approach

    Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Outline 1 Context Microblogging Previous Work Motivation 2 Our Summarizing System Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization 3 Results Corpus Metrics Summarization Results 4 Conclusions and Future Work Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  34. 34.

    Context Our Summarizing System Results Conclusions and Future Work Approach

    Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Hierarchical Event Analysis group very similar messages together in information blocks apply agglomerative clustering on the information blocks we use cosine similarity based on word n-grams Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  35. 35.

    Context Our Summarizing System Results Conclusions and Future Work Approach

    Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Hierarchical Event Analysis group very similar messages together in information blocks apply agglomerative clustering on the information blocks we use cosine similarity based on word n-grams Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  36. 36.

    Context Our Summarizing System Results Conclusions and Future Work Approach

    Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Hierarchical Event Analysis group very similar messages together in information blocks apply agglomerative clustering on the information blocks we use cosine similarity based on word n-grams Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  37. 37.

    Context Our Summarizing System Results Conclusions and Future Work Approach

    Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Outline 1 Context Microblogging Previous Work Motivation 2 Our Summarizing System Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization 3 Results Corpus Metrics Summarization Results 4 Conclusions and Future Work Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  38. 38.

    Context Our Summarizing System Results Conclusions and Future Work Approach

    Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Summarization Approaches We test two dierent approaches: Multi-sentence Compression (MSC) Frequent Phrase Summarization (FPS) an adaptation of Phrase Reinforcement that does not require a starting keyphrase the algorithm retrieves a popular sequence of words from the input stream one of our contributions Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  39. 39.

    Context Our Summarizing System Results Conclusions and Future Work Approach

    Outline Message Clustering based on Events Hierarchical Event Analysis Summarization Summarization Approaches We test two dierent approaches: Multi-sentence Compression (MSC) Frequent Phrase Summarization (FPS) an adaptation of Phrase Reinforcement that does not require a starting keyphrase the algorithm retrieves a popular sequence of words from the input stream one of our contributions Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  40. 40.

    Context Our Summarizing System Results Conclusions and Future Work Corpus

    Metrics Summarization Results Outline 1 Context Microblogging Previous Work Motivation 2 Our Summarizing System Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization 3 Results Corpus Metrics Summarization Results 4 Conclusions and Future Work Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  41. 41.

    Context Our Summarizing System Results Conclusions and Future Work Corpus

    Metrics Summarization Results Corpus we used the Twitter API to retrieve recent tweets we experimented on 1.6 million tweets collected between the 4 th and the 8 th of July 2012 we used another 1.7 million tweets (collected during the previous week) as background data Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  42. 42.

    Context Our Summarizing System Results Conclusions and Future Work Corpus

    Metrics Summarization Results Events The Event Detection module discovered an average of 20 events per day. Examples of events: real sporting events (wrestling, basketball, football) Independence Day celebrity news other: nding the Higgs boson, the European debt crisis virtual memes: thingsidislike popular hashtags popular retweets Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  43. 43.

    Context Our Summarizing System Results Conclusions and Future Work Corpus

    Metrics Summarization Results Events The Event Detection module discovered an average of 20 events per day. Examples of events: real sporting events (wrestling, basketball, football) Independence Day celebrity news other: nding the Higgs boson, the European debt crisis virtual memes: thingsidislike popular hashtags popular retweets Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  44. 44.

    Context Our Summarizing System Results Conclusions and Future Work Corpus

    Metrics Summarization Results Outline 1 Context Microblogging Previous Work Motivation 2 Our Summarizing System Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization 3 Results Corpus Metrics Summarization Results 4 Conclusions and Future Work Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  45. 45.

    Context Our Summarizing System Results Conclusions and Future Work Corpus

    Metrics Summarization Results Metrics the summaries were rated regarding: completeness - how much information the summary expresses relative to the detected event grammaticality - the degree of grammatical and syntactical correctness redundancy - if a multi-sentence summary repeats the same information Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  46. 46.

    Context Our Summarizing System Results Conclusions and Future Work Corpus

    Metrics Summarization Results Metrics hierarchical summarization procedure: each cluster tree was cut to the level where it has 4 clusters the 4 clusters were summarized, generating a multi-sentence summary trees with less than 4 clusters were removed from the analysis we were left with 50 sets of summaries a group of 4 volunteers assigned ratings on a scale of 1 to 5 Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  47. 47.

    Context Our Summarizing System Results Conclusions and Future Work Corpus

    Metrics Summarization Results Metrics hierarchical summarization procedure: each cluster tree was cut to the level where it has 4 clusters the 4 clusters were summarized, generating a multi-sentence summary trees with less than 4 clusters were removed from the analysis we were left with 50 sets of summaries a group of 4 volunteers assigned ratings on a scale of 1 to 5 Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  48. 48.

    Context Our Summarizing System Results Conclusions and Future Work Corpus

    Metrics Summarization Results Metrics Examples of ratings: Summary Ratings there is nothing wrong with america that cannot be cured by what is right with america. ~ bill clinton happy4th Completeness: 3 Grammaticality: 5 happy birthday 'merica they call me happy4th happy 4th of july merica there is nothing wrong with america that cannot be cured by what is right with america. ~ bill clinton happy4th Completeness: 4 Grammaticality: 4 Non-redundancy: 3 Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  49. 49.

    Context Our Summarizing System Results Conclusions and Future Work Corpus

    Metrics Summarization Results Outline 1 Context Microblogging Previous Work Motivation 2 Our Summarizing System Approach Outline Message Clustering based on Events Hierarchical Event Analysis Summarization 3 Results Corpus Metrics Summarization Results 4 Conclusions and Future Work Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  50. 50.

    Context Our Summarizing System Results Conclusions and Future Work Corpus

    Metrics Summarization Results Summarization without Clustering MSC generates a meaningless summary 4 th of July summary: MSC: rt TWID you to the TWID URL summaries generated by MSC receive a grammaticality rating of 1 and a completeness rating of 1 Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  51. 51.

    Context Our Summarizing System Results Conclusions and Future Work Corpus

    Metrics Summarization Results Summarization without Clustering MSC generates a meaningless summary 4 th of July summary: MSC: rt TWID you to the TWID URL summaries generated by MSC receive a grammaticality rating of 1 and a completeness rating of 1 Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  52. 52.

    Context Our Summarizing System Results Conclusions and Future Work Corpus

    Metrics Summarization Results Summarization without Clustering MSC generates a meaningless summary 4 th of July summary: MSC: rt TWID you to the TWID URL summaries generated by MSC receive a grammaticality rating of 1 and a completeness rating of 1 Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  53. 53.

    Context Our Summarizing System Results Conclusions and Future Work Corpus

    Metrics Summarization Results Summarization without Clustering FPS picks a long and frequent phrase (usually the one that was retweeted the most) 4 th of July summary: FPS: rt TWID dear mom&dad thank you for everything you've done to me i can never pay back all of them but i'm trying to be the best for both of you  summaries generated by FPS receive a grammaticality rating of 5 and a completeness rating of 1. Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  54. 54.

    Context Our Summarizing System Results Conclusions and Future Work Corpus

    Metrics Summarization Results Summarization without Clustering FPS picks a long and frequent phrase (usually the one that was retweeted the most) 4 th of July summary: FPS: rt TWID dear mom&dad thank you for everything you've done to me i can never pay back all of them but i'm trying to be the best for both of you  summaries generated by FPS receive a grammaticality rating of 5 and a completeness rating of 1. Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  55. 55.

    Context Our Summarizing System Results Conclusions and Future Work Corpus

    Metrics Summarization Results Summarization without Clustering FPS picks a long and frequent phrase (usually the one that was retweeted the most) 4 th of July summary: FPS: rt TWID dear mom&dad thank you for everything you've done to me i can never pay back all of them but i'm trying to be the best for both of you  summaries generated by FPS receive a grammaticality rating of 5 and a completeness rating of 1. Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  56. 56.

    Context Our Summarizing System Results Conclusions and Future Work Corpus

    Metrics Summarization Results Summarization with Clustering Completeness scores: Rated feature Summary size Average rating (standard deviation) Improvement MSC completeness One sentence 3.05 (1.03) 40.3% Four sentences 4.28 (0.85) FPS completeness One sentence 3.28 (0.99) 25.3% Four sentences 4.11 (0.86) Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  57. 57.

    Context Our Summarizing System Results Conclusions and Future Work Corpus

    Metrics Summarization Results Summarization with Clustering Grammaticality scores: Rated feature Summary size Average rating (standard deviation) Improvement MSC grammaticality One sentence 4.05 (1.21) -1.2% Four sentences 4.00 (1.00) FPS grammaticality One sentence 4.25 (1.10) -15.0% Four sentences 3.61 (1.10) Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  58. 58.

    Context Our Summarizing System Results Conclusions and Future Work Corpus

    Metrics Summarization Results Summarization with Clustering Redundancy scores: Rated feature Summary size Average rating (standard deviation) MSC non-redundancy Four sentences 4.01 (1.14) FPS non-redundancy Four sentences 3.82 (1.16) Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  59. 59.

    Context Our Summarizing System Results Conclusions and Future Work Conclusions

    We showed that summarizing streams can be signicantly improved by clustering messages together and removing noise. The steps of the summarizing algorithm are: detecting the events people are talking about clustering posts related to those events applying classical summarizing algorithms to each cluster of posts Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  60. 60.

    Context Our Summarizing System Results Conclusions and Future Work Conclusions

    We showed that summarizing streams can be signicantly improved by clustering messages together and removing noise. The steps of the summarizing algorithm are: detecting the events people are talking about clustering posts related to those events applying classical summarizing algorithms to each cluster of posts Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  61. 61.

    Context Our Summarizing System Results Conclusions and Future Work Future

    Work fast online processing of streams develop a visual interface for rendering hierarchical summaries and investigating how large streams can be analyzed by users Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum
  62. 62.

    Context Our Summarizing System Results Conclusions and Future Work Thank

    You Thank you for your time. Do you have any questions? Contact: andrei@olariu.org http://andrei.olariu.org Andrei Olariu Hierarchical Clustering in Improving Microblog Stream Sum