Slide 1

Slide 1 text

Representation Learning and Sampling for Networks Tanay Kumar Saha May 13, 2018 Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 1 / 45

Slide 2

Slide 2 text

Outline 1 About Me 2 Introduction and Motivation 3 Properties of Algorithm for Learning Representation 4 Representation Learning of Nodes in an Evolving Network 5 Representation Learning of Sentences 6 Substructure Sampling 7 Total Recall 8 Name Disambiguation Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 2 / 45

Slide 3

Slide 3 text

About Me Defended PhD Thesis (Advisor(s): Mohammad Al Hasan and Jennifer Neville) Problem/Areas Worked: (1) Latent Representation in Networks, (2) Network Sampling, (3) Total Recall, (4) Name Disambiguation Already Published: ECML/PKDD(1), CIKM (1), TCBB (1), SADM (1), SNAM (1), IEEE Big Data (1), ASONAM (1), Complenet (1), IEEE CNS (1), BIOKDD (1) Poster Presentation: RECOMB (1), IEEE Big Data (1) Paper Under Review: KDD (1), JBHI (1), TMC(1) In Preparation: ECML/PKDD(1), CIKM(1) Reproducible Research: Released codes for all the works related to the thesis Served as a Reviewer: TKDE, TOIS Provisional Patent Application (3) Apparatus and Method of Implementing Batch-mode active learning for Technology-Assisted Review (iControlESI) Apparatus and Method of Implementing Enhanced Batch-Mode Active Learning for Technology-Assisted Review of Documents (iControlESI) Method and System for Log Based Computer Server Failure Diagnosis (NEC Labs) Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 3 / 45

Slide 4

Slide 4 text

Outline 1 About Me 2 Introduction and Motivation 3 Properties of Algorithm for Learning Representation 4 Representation Learning of Nodes in an Evolving Network 5 Representation Learning of Sentences 6 Substructure Sampling 7 Total Recall 8 Name Disambiguation Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 4 / 45

Slide 5

Slide 5 text

Data Representation For machine learning algorithms, we may need to represent data (i.e., learn feature-set, X ) in the d-dimensional space (learn d-factors of variation) For classification, we learn a function, F which can map from a feature-set, X to the corresponding label, Y, i.e., F : X → Y For clustering, we learn a function, F which can map from a feature-set, X to an unknown label, Z, i.e., F : X → Z Representation Learning: Learn a function which can convert the raw-data into a suitable feature representation, i.e. F : D[, Y] → X task-agnostic vs. task-sensitive localist vs. distributed How do we define the suitability of feature-set, X (quantification/qualification)? Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse Disentangled Representations in Neural Models by Tenenbaum et. al Representation Learning: A Review and New Perspectives by Bengio et al www.cs.toronto.edu/~bonner/courses/2016s/csc321/webpages/lectures.htm Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 5 / 45

Slide 6

Slide 6 text

Data Representation Good criteria for learning representation (learning X )? There is no clearly defined objective Different from machine learning tasks such as, classification and clustering (we have a clearly defined objective) A good representation must disentangle the underlying factors of variation in the training data? How do we translate the objective into appropriate training criteria? Is it even neecessary to do anything but maximize likelihood under a good model? Representation Learning: A Review and New Perspectives by Bengio et al (Introduce Bias) Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings How do we even decide on how many factors of variation is the best for an application? Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 6 / 45

Slide 7

Slide 7 text

Data Representation For link prediction in network, we may represent edges in the space of total number of nodes in a network (d = |V |) Network Repre. of Nodes Repre. of Edges 1 2 3 4 5 id V1 V2 V3 · · · V1 0 1 1 · · · V2 1 0 1 · · · id V1 V2 V3 · · · V1-V2 0 0 1 · · · Edge features: Common neighbor, Adamic-Adar (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 7 / 45

Slide 8

Slide 8 text

Data Representation For link prediction in network, we may represent edges in the space of total number of nodes in a network (d = |V |) Network Repre. of Nodes Repre. of Edges 1 2 3 4 5 id V1 V2 V3 · · · V1 0 1 1 · · · V2 1 0 1 · · · id V1 V2 V3 · · · V1-V2 0 0 1 · · · Edge features: Common neighbor, Adamic-Adar For document summarization, we may represent a particular sentence in the space of vocabulary/word size (d = |W|) Sentence Representation Sent id Content w1 w2 w3 · · · S1 This place is nice 1 0 1 · · · S2 This place is beautiful 1 1 0 · · · (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 7 / 45

Slide 9

Slide 9 text

Data Representation in the Latent Space Capture syntactic (homophily) and semantic (structural equivalence) properties of textual (words, sentences) and network units (nodes, edges) For link prediction in network, we may represent edges as a fixed-length vector Network Repre. of Nodes Repre. of Edges 1 2 3 4 5 id a1 a2 a3 V1 0.2 0.3 0.1 V2 0.1 0.2 0.3 id a1 a2 a3 V1-V2 0.02 0.06 0.03 (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 8 / 45

Slide 10

Slide 10 text

Data Representation in the Latent Space Capture syntactic (homophily) and semantic (structural equivalence) properties of textual (words, sentences) and network units (nodes, edges) For link prediction in network, we may represent edges as a fixed-length vector Network Repre. of Nodes Repre. of Edges 1 2 3 4 5 id a1 a2 a3 V1 0.2 0.3 0.1 V2 0.1 0.2 0.3 id a1 a2 a3 V1-V2 0.02 0.06 0.03 Also for document summarization, we may represent a particular sentence as a fixed-length vector (say, 3-dimensional space) Sentence Representation Sent id Content a1 a2 a3 S1 This place is nice 0.2 0.3 0.4 S2 This place is beautiful 0.2 0.3 0.4 (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 8 / 45

Slide 11

Slide 11 text

Data Representation (Higher-order feature/substructure as a feature) A B F C D D A D B E C B E D G1 G2 G3 Given a set of networks, such as G1, G2, and G3 (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 9 / 45

Slide 12

Slide 12 text

Data Representation (Higher-order feature/substructure as a feature) A B F C D D A D B E C B E D G1 G2 G3 Given a set of networks, such as G1, G2, and G3 Find frequent subgraphs of different sizes and use them as features A B B C B D B E D E 2-node frequent subgraphs (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 9 / 45

Slide 13

Slide 13 text

Data Representation (Higher-order feature/substructure as a feature) A B F C D D A D B E C B E D G1 G2 G3 Given a set of networks, such as G1, G2, and G3 Find frequent subgraphs of different sizes and use them as features A B B C B D B E D E 2-node frequent subgraphs D B E A B C A B D B D E B E D C B D B D E 3-node frequent subgraphs (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 9 / 45

Slide 14

Slide 14 text

Data Representation (Higher-order feature/substructure as a feature) A B F C D D A D B E C B E D G1 G2 G3 Given a set of networks, such as G1, G2, and G3 Find frequent subgraphs of different sizes and use them as features A B B C B D B E D E 2-node frequent subgraphs D B E A B C A B D B D E B E D C B D B D E 3-node frequent subgraphs A B C D 4-node frequent subgraphs (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 9 / 45

Slide 15

Slide 15 text

Data Representation (Higher-order feature/substructure as a feature) A B F C D D A D B E C B E D G1 G2 G3 Given a set of networks, such as G1, G2, and G3 Find frequent subgraphs of different sizes and use them as features A B B C B D B E D E 2-node frequent subgraphs A B C B D E 3-node frequent subgraphs Induced Subgraphs (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 9 / 45

Slide 16

Slide 16 text

Data Representation (Higher-order feature/substructure as a feature) A B F C D D A D B E C B E D G1 G2 G3 Given a set of networks, such as G1, G2, and G3 Find frequent subgraphs of different sizes and use them as features Similar to learning compositional semantics (learning representation for phrases, sentences, paragraphs, or documents) in text domain Graph can have cycles, so tree-lstm kind of recursive structure is not an option (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 9 / 45

Slide 17

Slide 17 text

Data Representation (Higher-order feature/substructure as a feature) Given a single large undirected network, find the concentration of 3, 4, and 5-size graphlets 3-node subgraph patterns 4-node subgraph patterns 5-node subgraph patterns Figure: All 3, 4 and 5 node topologies This type of substructure statistics can be used for the structural information diffusion in the representation learning (within or across modality of the data) (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 10 / 45

Slide 18

Slide 18 text

Data Representation (Higher-order feature/substructure as a feature) Given a single large directed network, find the concentration of 3, 4, and 5-size directed graphlets ω3 , 1 ω3 , 2 ω3 , 3 ω3 , 4 ω3 , 5 ω3 , 6 ω3 , 7 ω3 , 8 ω3 , 9 ω3 , 10 ω3 , 11 ω3 , 12 ω3 , 13 Figure: The 13 unique 3-graphlet types ω3,i (i = 1, 2, . . . , 13). This type of substructure statistics can be used for the structural information diffusion in the representation learning (within or across modality of the data) (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 11 / 45

Slide 19

Slide 19 text

Outline 1 About Me 2 Introduction and Motivation 3 Properties of Algorithm for Learning Representation 4 Representation Learning of Nodes in an Evolving Network 5 Representation Learning of Sentences 6 Substructure Sampling 7 Total Recall 8 Name Disambiguation Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 12 / 45

Slide 20

Slide 20 text

Properties of Algorithm for Learning Representation Good criteria for learning representations (learning X )? Representation Learning: F : D[, Y] → X There is no clearly defined objective Different from machine learning tasks such as, classification and clustering (we have a clearly defined objective) A good representation must disentangle the underlying factors of variation in the training data? How do we translate the objective into appropriate training criteria? Is it even neecessary to do anything but maximize likelihood under a good model? Representation Learning: A Review and New Perspectives by Bengio et al Disentangled Representation for Manipulation of Sentiment in Text Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 13 / 45

Slide 21

Slide 21 text

Outline 1 About Me 2 Introduction and Motivation 3 Properties of Algorithm for Learning Representation 4 Representation Learning of Nodes in an Evolving Network 5 Representation Learning of Sentences 6 Substructure Sampling 7 Total Recall 8 Name Disambiguation Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 14 / 45

Slide 22

Slide 22 text

Static vs Dynamic (Evolving) Network Static: A Single snapshot of a network at a particular time-stamp 4 1 3 2 5 G1 Figure: A Toy Evolving Network. G1 , G2 and G3 are three snapshots of the Network.

Slide 23

Slide 23 text

Static vs Dynamic (Evolving) Network Static: A Single snapshot of a network at a particular time-stamp 4 1 3 2 5 G1 4 1 3 2 5 G2 4 1 3 2 5 G3 Figure: A Toy Evolving Network. G1 , G2 and G3 are three snapshots of the Network. Evolving: Multiple snapshots of a network at various time-stamps Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 15 / 45

Slide 24

Slide 24 text

Latent Representation of Nodes in a Static Network Network 1 2 3 4 5 Is it even neecessary to do anything but maximize likelihood under a good model? word2vec, sen2vec, paragraph2vec, doc2vec || Deepwalk, LINE, Node2Vec || Speech2Vec Manifold learning: LLE, ISOMAP; Dimensionality Reduction: PCA, SVD Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 16 / 45

Slide 25

Slide 25 text

Latent Representation of Nodes in a Static Network Network Create Corpus 1 2 3 4 5 3 4 5 1 3 2 2 3 4 3 4 5 Is it even neecessary to do anything but maximize likelihood under a good model? word2vec, sen2vec, paragraph2vec, doc2vec || Deepwalk, LINE, Node2Vec || Speech2Vec Manifold learning: LLE, ISOMAP; Dimensionality Reduction: PCA, SVD Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 16 / 45

Slide 26

Slide 26 text

Latent Representation of Nodes in a Static Network Network Create Corpus Learn Representation 1 2 3 4 5 3 4 5 1 3 2 2 3 4 3 4 5 4 -log P( | 3 ) 5 -log P( | 4 ) Train a skipped version of Language Model Minimize Negative log likelihood Usually solved using negative sampling instead of softmax Is it even neecessary to do anything but maximize likelihood under a good model? word2vec, sen2vec, paragraph2vec, doc2vec || Deepwalk, LINE, Node2Vec || Speech2Vec Manifold learning: LLE, ISOMAP; Dimensionality Reduction: PCA, SVD Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 16 / 45

Slide 27

Slide 27 text

Latent Representation of Nodes in a Static Network Most of the existing works can not capture structural equivalence as advertised Lyu et al. show that external information such as orbit participation of nodes may be helpful in this regard Is it even neecessary to do anything but maximize likelihood under a good model? word2vec, sen2vec, paragraph2vec, doc2vec || Deepwalk, LINE, Node2Vec || Speech2Vec Enhancing the Network Embedding Quality with Structural Similarity Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 17 / 45

Slide 28

Slide 28 text

Latent Representation in an Evolving Network 4 1 3 2 5 G1 φ1 The latent representation of a node in an evolving network should be closer to it’s neighbor from the current snapshot And, also it should not go far-away from its position in the previous time-step (i) A good representation must disentangle the underlying factors of variation in the training data (ii) Is it even neecessary to do anything but maximize likelihood under a good model?

Slide 29

Slide 29 text

Latent Representation in an Evolving Network 4 1 3 2 5 G1 4 1 3 2 5 G2 φ1 φ2 The latent representation of a node in an evolving network should be closer to it’s neighbor from the current snapshot And, also it should not go far-away from its position in the previous time-step (i) A good representation must disentangle the underlying factors of variation in the training data (ii) Is it even neecessary to do anything but maximize likelihood under a good model?

Slide 30

Slide 30 text

Latent Representation in an Evolving Network 4 1 3 2 5 G1 4 1 3 2 5 G2 4 1 3 2 5 G3 φ1 φ2 φ3 The latent representation of a node in an evolving network should be closer to it’s neighbor from the current snapshot And, also it should not go far-away from its position in the previous time-step (i) A good representation must disentangle the underlying factors of variation in the training data (ii) Is it even neecessary to do anything but maximize likelihood under a good model?

Slide 31

Slide 31 text

Latent Representation in an Evolving Network 4 1 3 2 5 G1 4 1 3 2 5 G2 4 1 3 2 5 G3 φ1 φ2 φ3 φ4 The latent representation of a node in an evolving network should be closer to it’s neighbor from the current snapshot And, also it should not go far-away from its position in the previous time-step (i) A good representation must disentangle the underlying factors of variation in the training data (ii) Is it even neecessary to do anything but maximize likelihood under a good model? Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 18 / 45

Slide 32

Slide 32 text

Latent Representation in an Evolving Network 4 1 3 2 5 G1 4 1 3 2 5 G2 4 1 3 2 5 G3 4 1 3 2 5 G4 φ1 φ2 φ3 φ4 The latent representation of a node in an evolving network should be closer to it’s neighbor from the current snapshot And, also it should not go far-away from its position in the previous time-step (i) A good representation must disentangle the underlying factors of variation in the training data (ii) Is it even neecessary to do anything but maximize likelihood under a good model? Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 18 / 45

Slide 33

Slide 33 text

Our Solution 4 1 3 2 5 4 1 3 2 5 4 1 3 2 5 4 1 3 2 5 G1 G2 G3 G4 φ1 φ2 φ3 φ4 Figure: Our Expectation φ1 φ2 φ3 W1 W2 W Smoothing (c) Heter LT Model φ2 φ1 φ3 φ2 W φ1 φ2 φ3 G1 G2 G3 (a) RET Model D eepW alk R ET R ET (b) Homo LT Model Figure: Toy illustration of our method Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 19 / 45

Slide 34

Slide 34 text

Solution Sketch Figure: A conceptual sketch of retrofitting (top) and linear transformation (bottom) based temporal smoothness. Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 20 / 45

Slide 35

Slide 35 text

Mathematical Formulation Mathematical Formulation for Retrofitted Models J(φt) = v∈V αv ||φt(v) − φ(t−1) (v)||2 Temporal Smoothing + (v,u)∈Et βu,v ||φt(u) − φt(v)||2 Network Proximity (1) Mathematical Formulation for Homogeneous Transformation Models J(W ) = ||WX − Z||2, where X =      φ1 φ2 . . . φT−1      ; Z =      φ2 φ3 . . . φT      . (2) Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 21 / 45

Slide 36

Slide 36 text

Mathematical Formulation Heterogeneous Transformation Models J(Wt) = ||Wt φt − φt+1||2, for t = 1, 2, . . . , (T − 1). (3) (a) Uniform smoothing: We weight all projection matrices equally, and linearly combine them: (avg) W = 1 T − 1 T−1 t=1 Wt . (4) (b) Linear smoothing: We increment the weights of the projection matrices linearly with time: (linear) W = T−1 t=1 t T − 1 Wt . (5) (c) Exponential smoothing: We increase weights exponentially, using an exponential operator (exp) and a weighted-collapsed tensor (wct): (exp) W = T−1 t=1 exp t T−1 Wt (6) (wct) W = T−1 t=1 (1 − θ)T−1−t Wt . (7) Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 22 / 45

Slide 37

Slide 37 text

Similarity in Other Modality Exploiting Similarities among Languages for Machine Translation Can Network Embedding of Distributional Thesaurus be Combined with Word Vectors for Better Representation? Retrofitting word vectors to semantic lexicons (vision, text, knowledge graph) Cross-modal Knowledge Transfer: Improving the word embeddings of Apple by Looking at Oranges Disentangled Representations for Manipulation of Sentiment of Text unfortunately, this is a bad movie that is just plain bad overall, this is a good movie that is just good Deep manifold traversal: Changing labels with convolutional features (vision) Transform a smiling portrait into an angry one and make one individual look more like someone else without changing clothing and background Controllable Text generation Learning to generate reviews and discovering sentiment A neural algorithm of artistic style Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 23 / 45

Slide 38

Slide 38 text

Outline 1 About Me 2 Introduction and Motivation 3 Properties of Algorithm for Learning Representation 4 Representation Learning of Nodes in an Evolving Network 5 Representation Learning of Sentences 6 Substructure Sampling 7 Total Recall 8 Name Disambiguation Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 24 / 45

Slide 39

Slide 39 text

Motivation (Latent Representation of Sentences) Most existing Sen2Vec methods disregard context of a sentence Meaning of one sentence depends on the meaning of its neighbors I eat my dinner. Then I take some rest. After that I go to bed. Our approach: incorporate extra-sentential context into Sen2Vec We propose two methods: regularization and retrofitting We experiment with two types of context: discourse and similarity. Regularized and Retrofitted models for Learning Sentence Representation with Context CON-S2V: A Generic Framework for Incorporating Extra-Sentential Context into Sen2Vec Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 25 / 45

Slide 40

Slide 40 text

Motivation (Discourse is Important) A simple strategy of decoding the concatenation of the previous and current sentence leads to good performance A novel strategy of multiencoding and decoding of two sentences leads to the best performance Target side context is important in translation Evaluating Discourse Phenomena in Neural Machine Translation Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 26 / 45

Slide 41

Slide 41 text

Our Approach Consider content as well as context of a sentence Treat the context sentences as atomic linguistic units Similar in spirit to (Le & Mikolov, 2014) Efficient to train compared to compositional methods like encoder-decoder models (e.g., SDAE, Skip-Thought) Sen2Vec, SDAE, SAE, Fast-Sent, Skip-Thought, w2v-avg, c-phrase Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 27 / 45

Slide 42

Slide 42 text

Content Model (Sen2Vec) Treats sentences and words similarly Represented by vectors in shared embedding matrix v: I eat my dinner I eat my dinner v φ : V → Rd look-up Figure: Distributed bag of words or DBOW (Le & Mikolov, 2014) Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 28 / 45

Slide 43

Slide 43 text

Regularized Models (Reg-dis, Reg-sim) Incorporate neighborhood directly into the objective function of the content-based model (Sen2Vec) as a regularizer Objective function: J(φ) = v∈V Lc(v) + βLr (v, N(v)) = v∈V Lc(v) Content loss + β (v,u)∈E ||φ(u) − φ(v)||2 Graph smoothing (8) Train with SGD Regularization with discourse context ⇒ Reg-dis Regularization with similarity context ⇒ Reg-sim CON-S2V: A Generic Framework for Incorporating Extra-Sentential Context into Sen2Vec Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 29 / 45

Slide 44

Slide 44 text

Pictorial Depiction y : Or is it discarded to burn up on return to LEO? v : Is it reusable? u : And I was wondering about the GD LEV. (a) A sequence of sentences v φ is it reusable (b) Sen2Vec (DBOW) Lc is it reusable u y v φ (c) Reg-dis Lc Lr Lr Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 30 / 45

Slide 45

Slide 45 text

Retrofitted Model (Ret-dis, Ret-sim) Retrofit vectors learned from Sen2Vec s.t. the revised vector φ(v): Similar to the prior vector, φ (v) Similar to the vectors of its neighboring sentences, φ(u) Objective function: J(φ) = v∈V αv ||φ(v) − φ (v)||2 close to prior + (v,u)∈E βu,v ||φ(u) − φ(v)||2 graph smoothing (9) Solve using Jacobi iterative method Retrofit with discourse context ⇒ Ret-dis Retrofit with similarity context ⇒ Ret-sim Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 31 / 45

Slide 46

Slide 46 text

Outline 1 About Me 2 Introduction and Motivation 3 Properties of Algorithm for Learning Representation 4 Representation Learning of Nodes in an Evolving Network 5 Representation Learning of Sentences 6 Substructure Sampling 7 Total Recall 8 Name Disambiguation Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 32 / 45

Slide 47

Slide 47 text

Frequent Subgraph Mining (Sampling Substructure) Perform a first-order random-walk over the fixed-size substructure space MH algorithm calculates acceptance probability using the following equation: α(x, y) = min π(y)q(y, x) π(x)q(x, y) , 1 (10) For mining frequent substructure from a set of graphs, we use average (s1) and set interaction support (s2) as the target distibution, i.e., π = s1 or π = s2 For collecting statistics from a single large graph, we use uniform probabililty distribution as our target distribution In both cases, we use uniform distribution as our proposal distribution, i.e., q(x, y) = 1 dx F-S-Cube: A sampling based method for top-k frequent subgraph mining Finding network motifs using MCMC sampling Discovery of Functional Motifs from the Interface Region of Oligomeric Proteins using Frequent Subgraph Mining ACTS: Extracting android App topological signature through graphlet sampling Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 33 / 45

Slide 48

Slide 48 text

Data Representation (Frequent Subgraph Mining) A B F C D D A D B E C B E D G1 G2 G3 Graph Database A B B C B D B E D E 2-node frequent subgraphs A B C B D E 3-node frequent subgraphs Frequent Induced Subgraphs We find the support-set of edges BD, BE and DE of g13 which are {G1, G2, G3}, {G2, G3}, and {G2, G3} respectively So, for gBDE , s1(gBDE ) = 3+2+3 3 = 2.67, and s2(gBDE ) = 2 Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 34 / 45

Slide 49

Slide 49 text

Frequent Subgraph Mining (Sampling Substructure) 1 2 3 4 5 7 6 9 8 10 11 12 (a) 1 5,6,7,8,9,10 2 5,6,7,8,10 3 5,6,7,8,9,10 4 5,6,8,9 (b) (a) Left: A graph G with the current state of random walk; Right: Neighborhood information of the current state (1,2,3,4) 1 2 3 4 5 7 6 9 8 10 11 12 (a) 1 4,9 2 4,5,6,9,12 3 4,9 8 4,5,6,9 (b) (b) Left: The state of random walk on G (Figure 8a) after one transition; Right: Updated Neighborhood information Figure: Neighbor generation mechanism For this example, dx = 21, dy = 13 Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 35 / 45

Slide 50

Slide 50 text

Frequent Subgraph Mining (Sampling Substructure) Algorithm 1: SampleIndSubGraph Pseudocode Input : - Graph Gi - Size of subgraph, [2] [2]x ← State saved at Gi ; [4] [4]dx ← Neighbor-count of x ; [6] [6]a supx ← score of graph x ; [8] [8]while a neighbor state y is not found do [10] [10] y ← a random neighbor of x; [12] [12] dy ← Neighbor count of y ; [14] [14] a supy ← score of graph y ; [16] [16] accp val ← (dx ∗ a supy )/(dy ∗ a supx ) ; [18] [18] accp probablility ← min(1, accp val) ; [20] [20] if uniform(0, 1) ≤ accp probability then [22] [22] return y ; Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 36 / 45

Slide 51

Slide 51 text

Frequent Subgraph Mining (Sampling Substructure) Algorithm 2: SampleIndSubGraph Pseudocode Input : - Graph Gi - Size of subgraph, [2] [2]x ← State saved at Gi ; [4] [4]dx ← Neighbor-count of x ; [6] [6]a supx ← score of graph x ; [8] [8]while a neighbor state y is not found do [10] [10] y ← a random neighbor of x; [12] [12] dy ← Neighbor count of y ; [14] [14] a supy ← score of graph y ; [16] [16] accp val ← dx /dy ; [18] [18] accp probablility ← min(1, accp val) ; [20] [20] if uniform(0, 1) ≤ accp probability then [22] [22] return y ; Motif Counting Beyond Five Nodes Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 37 / 45

Slide 52

Slide 52 text

Frequent Subgraph Mining (Sampling Substructure) The random walks are ergodic It satisfies reversibility condition, so, it achieves the target distribution We have use spectral gap (λ = 1 − max{λ1, |λm−1|}) technique to measure the mixing rate of our random walk We compute the mixing time (inverse of spectral gap) for size-6 subgraphs of Mutagen dataset and found that the mixing time is approximately around 15 units We suggest to use multiple chains along with a suitable distance measure (for example, jaccard distance) for choossing a suitable iteration count We show that the acceptance probability for our technique is quite high (a large number of rejected moves indicate a poorly designed proposal distribution) Table: Probability of Acceptance of FS3 for Mutagen and PS Dataset Mutagen PS = 8 =9 =10 =6 =7 =8 Acceptance (%), Strat- egy =s1 82.70 ± 0.04 83.89 ± 0.03 81.66 ± 0.03 91.08 ± 0.01 92.23 ± 0.02 93.08 ± 0.01 Acceptance (%), Strat- egy =s2 75.27 ± 0.05 76.74 ± 0.03 75.20 ± 0.03 85.08 ± 0.05 87.46 ± 0.06 89.41 ± 0.07 Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 38 / 45

Slide 53

Slide 53 text

Outline 1 About Me 2 Introduction and Motivation 3 Properties of Algorithm for Learning Representation 4 Representation Learning of Nodes in an Evolving Network 5 Representation Learning of Sentences 6 Substructure Sampling 7 Total Recall 8 Name Disambiguation Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 39 / 45

Slide 54

Slide 54 text

The problem of Total Recall Vanity search: Find out everything about me Fandom: find out everything about my hero Research: Find out everything about my PhD topic Investigation: Find out everything about something or some activity Systematic review: Find all published studies evaluating some method or effect Patent search: find all prior art Electronic discovery: Find all documents responsive to a request for production in a legal matter Creating archival collections: Label all relevant documents, for posterity, future IR evaluation, etc. Batch-mode active learning for technology-assisted review A large scale study of SVM based methods for abstract screening in systematic reviews Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 40 / 45

Slide 55

Slide 55 text

An Active Learning Algorithm Algorithm 3: SelectABatch Input : hc, current hyperplane; D, available instances; k, batch size; and similarity threshold, t Output: A batch of k documents to be included in training 1 if Strategy is DS then 2 Bc ← EmptySet() 3 I ← ArgSort (Distance(hc , D), order = increase) 4 while Size (Bc) < k do 5 Insert(Bc , I[1]) 6 S ← GetSimilar(I[1], I, D, t, similarity = cosine) 7 I ← Remove(I, S) 8 else if Strategy is BPS then 9 w ← 1.0/(Distance(hc , D)2 10 w ← Normalize(w) 11 I ← List(D) 12 while Size (Bc) < k do 13 c ← Choose(I, prob = w, num = 1) 14 Insert(Bc , c) 15 S ← GetSimilar(c, I, D, t, similarity = cosine) 16 I ← Remove(I, S) 17 w ← Normalize(w[I]) 18 return Bc Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 41 / 45

Slide 56

Slide 56 text

Outline 1 About Me 2 Introduction and Motivation 3 Properties of Algorithm for Learning Representation 4 Representation Learning of Nodes in an Evolving Network 5 Representation Learning of Sentences 6 Substructure Sampling 7 Total Recall 8 Name Disambiguation Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 42 / 45

Slide 57

Slide 57 text

Name Disambiguation The graph in this figure corresponds to the ego network of u, Gu We also assume that u is a multi-node consisting of two name entities So the removal of the node u (along with all of its incident edges) from Gu makes two disjoint clusters Figure: A toy example of clustering based entity disambiguation (i) A good representation must disentangle the underlying factors of variation in the training data (ii) Is it even neecessary to do anything but maximize likelihood under a good model? Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 43 / 45

Slide 58

Slide 58 text

Name Disambiguation Calculate Normalized-cut score NC = k i=1 W (Ci , Ci ) W (Ci , Ci ) + W (Ci , Ci ) (11) Modeling Temporal Mobility 0 1 2 3 Cluster3 0 1 2 3 4 Number of Papers Cluster2 0 1 2 3 4 13 12 11 10 09 07 06 05 Cluster1 Calculating Temporal Mobility score TM-score = k−1 i=1 k j=i+1 w(Zi , Zj ) · D(Zi Zj ) + D(Zj Zi ) k × k−1 i=1 k j=i+1 w(Zi , Zj ) (12) Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 44 / 45

Slide 59

Slide 59 text

Thanks! Sen2Vec Code and Datasets: https://github.com/tksaha/con-s2v/tree/jointlearning Temporal node2vec Code: https://gitlab.com/tksaha/temporalnode2vec.git Motif Finding Code: https://github.com/tksaha/motif-finding Frequent Subgraph Mining Code: https://github.com/tksaha/fs3-graph-mining Finding Functional Motif Code: https://gitlab.com/tksaha/func motif Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 45 / 45