Slide 1

Slide 1 text

PRODEI031 PhD Thesis Proposal FEUP ProDEI – 7th Edition Mário Miguel Fernandes Cordeiro [email protected] Supervisor: Dr. João Gama, FEP, LIAAD, Universidade do Porto, Portugal Co-supervisors: Dr. Ricardo Morla, FEUP, INESC Porto, Universidade do Porto, Portugal Dr. Miles Osborne, School of Informatics, University of Edinburgh, UK 17/03/2014 Image Source: http://www.breakingnews.com/

Slide 2

Slide 2 text

Introduction • Motivation • Problem statement Real-time Event Detection Proposal • Hypothesis • Research questions • Evaluation Planning Discussion

Slide 3

Slide 3 text

Source: Timeline: How Our News Sources Changed in the Last 200+ Years 2020 2015 2010 2009 2008 2007 2006 2004 2002 2000 1998 1995

Slide 4

Slide 4 text

Image Source: http://www.foreignpolicy.com/articles/2011/06/20/the_revolution_will_be_tweeted Hounshell, B. (2011). Foreign Policy, (187):20–21.

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

Hussein, D., Alaa, G., and Hamad, A. (2011). Towards usage-centered design patterns for social networking systems.

Slide 8

Slide 8 text

Lardinois, F. (2010) Readwritesocial: The short lifespan of a tweet: Retweets only happen within the first hour. A.-L. Barabasi (2005). The origin of bursts and heavy tails in human dynamics.

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

Source: Twitter Company conversations mapped Source: Twitter Company conversations mapped Source: How Stuff Spreads: How Videos Go Viral part I rich social connections temporal attributes of each text piece more context sensitive

Slide 11

Slide 11 text

Source: Twitter Company conversations mapped Source: Twitter Company conversations mapped Source: How Stuff Spreads: How Videos Go Viral part I rich social connections temporal attributes of each text piece more context sensitive

Slide 12

Slide 12 text

Source: Twitter Company conversations mapped Source: Twitter Company conversations mapped Source: How Stuff Spreads: How Videos Go Viral part I rich social connections temporal attributes of each text piece more context sensitive

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

Image Source: http://socialbits.org Each OSN user is regarded as a sensor of the real world; each message as sensory information. Sakaki, T., Okazaki, M., & Matsuo, Y. (2010). Earthquake shakes Twitter users: real-time event detection by social sensors

Slide 16

Slide 16 text

Image Source: http://globalhighered.files.wordpress.com/2011/04/beauchesnemap1.jpg

Slide 17

Slide 17 text

Image Source: http://globalhighered.files.wordpress.com/2011/04/beauchesnemap1.jpg

Slide 18

Slide 18 text

Image Source: http://www.forbes.com/sites/gilpress/2013/04/23/the-big-data-landscape-revisited/ minutes

Slide 19

Slide 19 text

Image Source: http://www.datasciencecentral.com/profiles/blogs/data-veracity

Slide 20

Slide 20 text

Volume Velocity Exact Solution Quick Response low data volume and/or velocity

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

Topic detection and tracking (TDT): event detection • Yang, Y., Pierce, T., & Carbonell, J. G. (1998). A Study of Retrospective and On-Line Event Detection. first story detection / novelty detection • Allan, J., Lavrenko, V., & Jin, H. (2000). First Story Detection In TDT Is Hard. • Allan, J., Lavrenko, V., Malin, D., & Swan, R. (2000). Detections, Bounds, and Timelines: UMass and TDT-3.

Slide 23

Slide 23 text

Advent and massification OSNs and big data era: first story detection: • Petrovic, S. (2012). Real-time Event Detection in Massive Streams. University of Edinburgh survey event detection: • Atefeh, F., & Khreich, W. (2013). A Survey of Techniques for Event Detection in Twitter. Computational Intelligence

Slide 24

Slide 24 text

should be able to mine continuously, high-volume, open-ended social network data stream documents as they arrive, interpret their network relations and be ready to detect new events at any time

Slide 25

Slide 25 text

Natural Language Processing Data Stream Mining Social network analysis Data Mining » Machine Learning » Unsupervised Learning

Slide 26

Slide 26 text

In social networks real-time event detection using data stream algorithms, major events are better predicted by correlating the observation of peaks in a specific set of topic mentions contained in the text stream, and the spontaneous creation or growth of their network linked communities.

Slide 27

Slide 27 text

Can a data stream algorithm provide a robust community identification and tracking in dynamic social networks?

Slide 28

Slide 28 text

Is the abrupt increase of topic mentions in a social network text stream representative of the occurrence of an event?

Slide 29

Slide 29 text

Can the accuracy of a Social Network event detection algorithm be enhanced with the dynamics of the network and its information spreading patterns?

Slide 30

Slide 30 text

Reference systems: dynamic community detection • Louvain method (Blondel et al., 2008) event detection • UMASS system (Allan et al., 2000b) • LSH, (Petrovic, 2012) Datasets: FSD twitter corpus • 50 million tweets • 27 manually annotated events • 3035 tweets were labeled as being on-topic for one of the 27 events (Osborne et al., 2012). Example of DET curve from the TDT 2000 evaluation (Fiscus and Doddington, 2002)

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

Dynamic Community Detection Algorithm: Based Louvain method (Blondel et al., 2008) Adding removing modes and edges Image Source: https://sites.google.com/site/findcommunities/

Slide 33

Slide 33 text

2014: • Sarmento, R. P., Cordeiro, M., Gama, J. (2014). Streaming Approach for Visualizing Large Scale Telecommunications Networks. 15th IEEE International Conference on Mobile Data Management. (submitted) • Cordeiro, M., Sarmento, R. P., Gama, J. (2014). Dynamic Community Detection in Evolving Large Scale Networks using Locality Modularity Optimization. (in preparation) 2012: • Cordeiro, M. (2012). Twitter event detection: combining wavelet analysis and topic inference summarization. In the Doctoral Symposium on Informatics Engineering - DSIE’12. (3 citations, 21 readers Mendeley)

Slide 34

Slide 34 text

November 2013: • Big Data Spain – http://www.bigdataspain.org • Strata Conf EU – http://strataconf.com/strataeu2013 July 2013: • 3rd Lisbon Machine Learning School – http://lxmls.it.pt/2013

Slide 35

Slide 35 text

Oporto MongoDB User Group: • Founder of the user group • Community with 140 members • Total 3 meetups (average 35 participants) – http://www.meetup.com/Oporto-mongoDB-User-Group/

Slide 36

Slide 36 text

Books: • Gama, J. (2010). Knowledge Discovery from Data Streams (pp. I–XIX, 1–237). CRC Press. • Rajaraman, A., & Ullman, J. D. (2011). Mining of Massive Datasets. (G. Shrey, M. Storus, & R. Sumbaly, Eds.)Lecture Notes for Stanford CS345A Web Mining, 67(3), 328. • Easley, D., & Kleinberg, J. (2010). Networks, Crowds, and Markets. Science (Vol. 81, p. 744). Cambridge: Cambridge University Press. • Cook, D. J., & Holder, L. B. (2007). Mining Graph Data. (D. Cook & L. Holder, Eds.)Book (p. 502). Wiley-Interscience. • Ross, S. M. (2009). Introduction to Probability Models, Tenth Edition (p. 800). Academic Press.

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

“a topically cohesive segment of news that includes two or more declarative independent clauses about a single event.” “something that happens at some specific time and place along with all necessary preconditions and unavoidable consequences.” “a seminal event or activity, along with all directly related events and activities.”

Slide 40

Slide 40 text

Data Stream Mining • Properties: – approximate answer, dependent on chosen accuracy – models based on a summary or "sketch" of the data stream in memory • Requirements: – Process an example at a time, inspect it only one – Use limited amount of memory – Work in a limited amount of time – Be ready to predict at any time

Slide 41

Slide 41 text

Social Network Analysis • Community detection: – Based on modularity – Spectral Analysis • Network is not static, evolves over time – Creation, growth and disband of communities • Group Formation: – exploring the principles by which groups develop and evolve in large-scale social networks • Information spreading: – Identification of “social sensors” that pass information quickly – Cascading behavior (in Blogs)

Slide 42

Slide 42 text

Natural Language Processing • Text representation models: – unstructured text: vector space model (VSM); – feature extraction: bag-of-words, entity recognition, summarization, sentiment analysis • Text analysis: – term trend approach: trends in text streams (frequencies) – semantic space approach (category found in the collection) • Topic extraction: – Latent Dirichlet Allocation (LDA), Dirichlet Compound Multinomial (DCM) mixtures and von-Mises Fisher (vMF) mixture models • Event detection: – statistical methods (LSH), wavelets, topic models (LDA)

Slide 43

Slide 43 text

No content