Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Real-time event detection in social network data streams

Real-time event detection in social network data streams

Today online social network services are challenging state-of-the-art social media mining algorithms due to its real-time nature, scale and amount of unstructured data generated. Given the amount and cadence of the data made available by online social network services like Twitter, classical text mining techniques are not suitable to deal with such new mining challenges. Event detection is no exception, state-of-the-art algorithms rely on text mining techniques applied to pre-known datasets processed with no restrictions about computational complexity and required execution time per document analysis. If from the point of view of a natural language processing, text mining and unsupervised learning, the problem of detecting events in unbounded text streams is hard, dealing with dynamic networks with millions of nodes and edges is also not an easy task.

This work presents a research proposal towards a robust real-time social network text stream event detection system that combines text stream mining and network analysis methods. This proposal presents the current state-of-the-art systems, algorithms and methodologies to perform event detection in streaming environments. The present research proposal is based on the premise that the precision and accuracy of an event detection algorithm could be improved by considering network properties of social network when events happen. It is expected to be proven that events are better predicted by taking into account extra information about the network rather than just considering the data stream text or terms alone.

Mário Cordeiro

March 17, 2014
Tweet

More Decks by Mário Cordeiro

Other Decks in Research

Transcript

  1. PRODEI031 PhD Thesis Proposal FEUP ProDEI – 7th Edition Mário

    Miguel Fernandes Cordeiro [email protected] Supervisor: Dr. João Gama, FEP, LIAAD, Universidade do Porto, Portugal Co-supervisors: Dr. Ricardo Morla, FEUP, INESC Porto, Universidade do Porto, Portugal Dr. Miles Osborne, School of Informatics, University of Edinburgh, UK 17/03/2014 Image Source: http://www.breakingnews.com/
  2. Introduction • Motivation • Problem statement Real-time Event Detection Proposal

    • Hypothesis • Research questions • Evaluation Planning Discussion
  3. Source: Timeline: How Our News Sources Changed in the Last

    200+ Years 2020 2015 2010 2009 2008 2007 2006 2004 2002 2000 1998 1995
  4. Hussein, D., Alaa, G., and Hamad, A. (2011). Towards usage-centered

    design patterns for social networking systems.
  5. Lardinois, F. (2010) Readwritesocial: The short lifespan of a tweet:

    Retweets only happen within the first hour. A.-L. Barabasi (2005). The origin of bursts and heavy tails in human dynamics.
  6. Source: Twitter Company conversations mapped Source: Twitter Company conversations mapped

    Source: How Stuff Spreads: How Videos Go Viral part I rich social connections temporal attributes of each text piece more context sensitive
  7. Source: Twitter Company conversations mapped Source: Twitter Company conversations mapped

    Source: How Stuff Spreads: How Videos Go Viral part I rich social connections temporal attributes of each text piece more context sensitive
  8. Source: Twitter Company conversations mapped Source: Twitter Company conversations mapped

    Source: How Stuff Spreads: How Videos Go Viral part I rich social connections temporal attributes of each text piece more context sensitive
  9. Image Source: http://socialbits.org Each OSN user is regarded as a

    sensor of the real world; each message as sensory information. Sakaki, T., Okazaki, M., & Matsuo, Y. (2010). Earthquake shakes Twitter users: real-time event detection by social sensors
  10. Topic detection and tracking (TDT): event detection • Yang, Y.,

    Pierce, T., & Carbonell, J. G. (1998). A Study of Retrospective and On-Line Event Detection. first story detection / novelty detection • Allan, J., Lavrenko, V., & Jin, H. (2000). First Story Detection In TDT Is Hard. • Allan, J., Lavrenko, V., Malin, D., & Swan, R. (2000). Detections, Bounds, and Timelines: UMass and TDT-3.
  11. Advent and massification OSNs and big data era: first story

    detection: • Petrovic, S. (2012). Real-time Event Detection in Massive Streams. University of Edinburgh survey event detection: • Atefeh, F., & Khreich, W. (2013). A Survey of Techniques for Event Detection in Twitter. Computational Intelligence
  12. should be able to mine continuously, high-volume, open-ended social network

    data stream documents as they arrive, interpret their network relations and be ready to detect new events at any time
  13. Natural Language Processing Data Stream Mining Social network analysis Data

    Mining » Machine Learning » Unsupervised Learning
  14. In social networks real-time event detection using data stream algorithms,

    major events are better predicted by correlating the observation of peaks in a specific set of topic mentions contained in the text stream, and the spontaneous creation or growth of their network linked communities.
  15. Is the abrupt increase of topic mentions in a social

    network text stream representative of the occurrence of an event?
  16. Can the accuracy of a Social Network event detection algorithm

    be enhanced with the dynamics of the network and its information spreading patterns?
  17. Reference systems: dynamic community detection • Louvain method (Blondel et

    al., 2008) event detection • UMASS system (Allan et al., 2000b) • LSH, (Petrovic, 2012) Datasets: FSD twitter corpus • 50 million tweets • 27 manually annotated events • 3035 tweets were labeled as being on-topic for one of the 27 events (Osborne et al., 2012). Example of DET curve from the TDT 2000 evaluation (Fiscus and Doddington, 2002)
  18. Dynamic Community Detection Algorithm: Based Louvain method (Blondel et al.,

    2008) Adding removing modes and edges Image Source: https://sites.google.com/site/findcommunities/
  19. 2014: • Sarmento, R. P., Cordeiro, M., Gama, J. (2014).

    Streaming Approach for Visualizing Large Scale Telecommunications Networks. 15th IEEE International Conference on Mobile Data Management. (submitted) • Cordeiro, M., Sarmento, R. P., Gama, J. (2014). Dynamic Community Detection in Evolving Large Scale Networks using Locality Modularity Optimization. (in preparation) 2012: • Cordeiro, M. (2012). Twitter event detection: combining wavelet analysis and topic inference summarization. In the Doctoral Symposium on Informatics Engineering - DSIE’12. (3 citations, 21 readers Mendeley)
  20. November 2013: • Big Data Spain – http://www.bigdataspain.org • Strata

    Conf EU – http://strataconf.com/strataeu2013 July 2013: • 3rd Lisbon Machine Learning School – http://lxmls.it.pt/2013
  21. Oporto MongoDB User Group: • Founder of the user group

    • Community with 140 members • Total 3 meetups (average 35 participants) – http://www.meetup.com/Oporto-mongoDB-User-Group/
  22. Books: • Gama, J. (2010). Knowledge Discovery from Data Streams

    (pp. I–XIX, 1–237). CRC Press. • Rajaraman, A., & Ullman, J. D. (2011). Mining of Massive Datasets. (G. Shrey, M. Storus, & R. Sumbaly, Eds.)Lecture Notes for Stanford CS345A Web Mining, 67(3), 328. • Easley, D., & Kleinberg, J. (2010). Networks, Crowds, and Markets. Science (Vol. 81, p. 744). Cambridge: Cambridge University Press. • Cook, D. J., & Holder, L. B. (2007). Mining Graph Data. (D. Cook & L. Holder, Eds.)Book (p. 502). Wiley-Interscience. • Ross, S. M. (2009). Introduction to Probability Models, Tenth Edition (p. 800). Academic Press.
  23. “a topically cohesive segment of news that includes two or

    more declarative independent clauses about a single event.” “something that happens at some specific time and place along with all necessary preconditions and unavoidable consequences.” “a seminal event or activity, along with all directly related events and activities.”
  24. Data Stream Mining • Properties: – approximate answer, dependent on

    chosen accuracy – models based on a summary or "sketch" of the data stream in memory • Requirements: – Process an example at a time, inspect it only one – Use limited amount of memory – Work in a limited amount of time – Be ready to predict at any time
  25. Social Network Analysis • Community detection: – Based on modularity

    – Spectral Analysis • Network is not static, evolves over time – Creation, growth and disband of communities • Group Formation: – exploring the principles by which groups develop and evolve in large-scale social networks • Information spreading: – Identification of “social sensors” that pass information quickly – Cascading behavior (in Blogs)
  26. Natural Language Processing • Text representation models: – unstructured text:

    vector space model (VSM); – feature extraction: bag-of-words, entity recognition, summarization, sentiment analysis • Text analysis: – term trend approach: trends in text streams (frequencies) – semantic space approach (category found in the collection) • Topic extraction: – Latent Dirichlet Allocation (LDA), Dirichlet Compound Multinomial (DCM) mixtures and von-Mises Fisher (vMF) mixture models • Event detection: – statistical methods (LSH), wavelets, topic models (LDA)