Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Presentation on my Research

Tanay Kumar Saha
May 13, 2018
3.4k

Presentation on my Research

In this presentation, I intend to motivate the topic of representation learning. Point out my research and the relation with some recent interesting works.

Tanay Kumar Saha

May 13, 2018
Tweet

Transcript

  1. Representation Learning and Sampling for Networks Tanay Kumar Saha May

    13, 2018 Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 1 / 45
  2. Outline 1 About Me 2 Introduction and Motivation 3 Properties

    of Algorithm for Learning Representation 4 Representation Learning of Nodes in an Evolving Network 5 Representation Learning of Sentences 6 Substructure Sampling 7 Total Recall 8 Name Disambiguation Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 2 / 45
  3. About Me Defended PhD Thesis (Advisor(s): Mohammad Al Hasan and

    Jennifer Neville) Problem/Areas Worked: (1) Latent Representation in Networks, (2) Network Sampling, (3) Total Recall, (4) Name Disambiguation Already Published: ECML/PKDD(1), CIKM (1), TCBB (1), SADM (1), SNAM (1), IEEE Big Data (1), ASONAM (1), Complenet (1), IEEE CNS (1), BIOKDD (1) Poster Presentation: RECOMB (1), IEEE Big Data (1) Paper Under Review: KDD (1), JBHI (1), TMC(1) In Preparation: ECML/PKDD(1), CIKM(1) Reproducible Research: Released codes for all the works related to the thesis Served as a Reviewer: TKDE, TOIS Provisional Patent Application (3) Apparatus and Method of Implementing Batch-mode active learning for Technology-Assisted Review (iControlESI) Apparatus and Method of Implementing Enhanced Batch-Mode Active Learning for Technology-Assisted Review of Documents (iControlESI) Method and System for Log Based Computer Server Failure Diagnosis (NEC Labs) Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 3 / 45
  4. Outline 1 About Me 2 Introduction and Motivation 3 Properties

    of Algorithm for Learning Representation 4 Representation Learning of Nodes in an Evolving Network 5 Representation Learning of Sentences 6 Substructure Sampling 7 Total Recall 8 Name Disambiguation Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 4 / 45
  5. Data Representation For machine learning algorithms, we may need to

    represent data (i.e., learn feature-set, X ) in the d-dimensional space (learn d-factors of variation) For classification, we learn a function, F which can map from a feature-set, X to the corresponding label, Y, i.e., F : X → Y For clustering, we learn a function, F which can map from a feature-set, X to an unknown label, Z, i.e., F : X → Z Representation Learning: Learn a function which can convert the raw-data into a suitable feature representation, i.e. F : D[, Y] → X task-agnostic vs. task-sensitive localist vs. distributed How do we define the suitability of feature-set, X (quantification/qualification)? Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse Disentangled Representations in Neural Models by Tenenbaum et. al Representation Learning: A Review and New Perspectives by Bengio et al www.cs.toronto.edu/~bonner/courses/2016s/csc321/webpages/lectures.htm Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 5 / 45
  6. Data Representation Good criteria for learning representation (learning X )?

    There is no clearly defined objective Different from machine learning tasks such as, classification and clustering (we have a clearly defined objective) A good representation must disentangle the underlying factors of variation in the training data? How do we translate the objective into appropriate training criteria? Is it even neecessary to do anything but maximize likelihood under a good model? Representation Learning: A Review and New Perspectives by Bengio et al (Introduce Bias) Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings How do we even decide on how many factors of variation is the best for an application? Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 6 / 45
  7. Data Representation For link prediction in network, we may represent

    edges in the space of total number of nodes in a network (d = |V |) Network Repre. of Nodes Repre. of Edges 1 2 3 4 5 id V1 V2 V3 · · · V1 0 1 1 · · · V2 1 0 1 · · · id V1 V2 V3 · · · V1-V2 0 0 1 · · · Edge features: Common neighbor, Adamic-Adar (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 7 / 45
  8. Data Representation For link prediction in network, we may represent

    edges in the space of total number of nodes in a network (d = |V |) Network Repre. of Nodes Repre. of Edges 1 2 3 4 5 id V1 V2 V3 · · · V1 0 1 1 · · · V2 1 0 1 · · · id V1 V2 V3 · · · V1-V2 0 0 1 · · · Edge features: Common neighbor, Adamic-Adar For document summarization, we may represent a particular sentence in the space of vocabulary/word size (d = |W|) Sentence Representation Sent id Content w1 w2 w3 · · · S1 This place is nice 1 0 1 · · · S2 This place is beautiful 1 1 0 · · · (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 7 / 45
  9. Data Representation in the Latent Space Capture syntactic (homophily) and

    semantic (structural equivalence) properties of textual (words, sentences) and network units (nodes, edges) For link prediction in network, we may represent edges as a fixed-length vector Network Repre. of Nodes Repre. of Edges 1 2 3 4 5 id a1 a2 a3 V1 0.2 0.3 0.1 V2 0.1 0.2 0.3 id a1 a2 a3 V1-V2 0.02 0.06 0.03 (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 8 / 45
  10. Data Representation in the Latent Space Capture syntactic (homophily) and

    semantic (structural equivalence) properties of textual (words, sentences) and network units (nodes, edges) For link prediction in network, we may represent edges as a fixed-length vector Network Repre. of Nodes Repre. of Edges 1 2 3 4 5 id a1 a2 a3 V1 0.2 0.3 0.1 V2 0.1 0.2 0.3 id a1 a2 a3 V1-V2 0.02 0.06 0.03 Also for document summarization, we may represent a particular sentence as a fixed-length vector (say, 3-dimensional space) Sentence Representation Sent id Content a1 a2 a3 S1 This place is nice 0.2 0.3 0.4 S2 This place is beautiful 0.2 0.3 0.4 (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 8 / 45
  11. Data Representation (Higher-order feature/substructure as a feature) A B F

    C D D A D B E C B E D G1 G2 G3 Given a set of networks, such as G1, G2, and G3 (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 9 / 45
  12. Data Representation (Higher-order feature/substructure as a feature) A B F

    C D D A D B E C B E D G1 G2 G3 Given a set of networks, such as G1, G2, and G3 Find frequent subgraphs of different sizes and use them as features A B B C B D B E D E 2-node frequent subgraphs (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 9 / 45
  13. Data Representation (Higher-order feature/substructure as a feature) A B F

    C D D A D B E C B E D G1 G2 G3 Given a set of networks, such as G1, G2, and G3 Find frequent subgraphs of different sizes and use them as features A B B C B D B E D E 2-node frequent subgraphs D B E A B C A B D B D E B E D C B D B D E 3-node frequent subgraphs (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 9 / 45
  14. Data Representation (Higher-order feature/substructure as a feature) A B F

    C D D A D B E C B E D G1 G2 G3 Given a set of networks, such as G1, G2, and G3 Find frequent subgraphs of different sizes and use them as features A B B C B D B E D E 2-node frequent subgraphs D B E A B C A B D B D E B E D C B D B D E 3-node frequent subgraphs A B C D 4-node frequent subgraphs (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 9 / 45
  15. Data Representation (Higher-order feature/substructure as a feature) A B F

    C D D A D B E C B E D G1 G2 G3 Given a set of networks, such as G1, G2, and G3 Find frequent subgraphs of different sizes and use them as features A B B C B D B E D E 2-node frequent subgraphs A B C B D E 3-node frequent subgraphs Induced Subgraphs (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 9 / 45
  16. Data Representation (Higher-order feature/substructure as a feature) A B F

    C D D A D B E C B E D G1 G2 G3 Given a set of networks, such as G1, G2, and G3 Find frequent subgraphs of different sizes and use them as features Similar to learning compositional semantics (learning representation for phrases, sentences, paragraphs, or documents) in text domain Graph can have cycles, so tree-lstm kind of recursive structure is not an option (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 9 / 45
  17. Data Representation (Higher-order feature/substructure as a feature) Given a single

    large undirected network, find the concentration of 3, 4, and 5-size graphlets 3-node subgraph patterns 4-node subgraph patterns 5-node subgraph patterns Figure: All 3, 4 and 5 node topologies This type of substructure statistics can be used for the structural information diffusion in the representation learning (within or across modality of the data) (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 10 / 45
  18. Data Representation (Higher-order feature/substructure as a feature) Given a single

    large directed network, find the concentration of 3, 4, and 5-size directed graphlets ω3 , 1 ω3 , 2 ω3 , 3 ω3 , 4 ω3 , 5 ω3 , 6 ω3 , 7 ω3 , 8 ω3 , 9 ω3 , 10 ω3 , 11 ω3 , 12 ω3 , 13 Figure: The 13 unique 3-graphlet types ω3,i (i = 1, 2, . . . , 13). This type of substructure statistics can be used for the structural information diffusion in the representation learning (within or across modality of the data) (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 11 / 45
  19. Outline 1 About Me 2 Introduction and Motivation 3 Properties

    of Algorithm for Learning Representation 4 Representation Learning of Nodes in an Evolving Network 5 Representation Learning of Sentences 6 Substructure Sampling 7 Total Recall 8 Name Disambiguation Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 12 / 45
  20. Properties of Algorithm for Learning Representation Good criteria for learning

    representations (learning X )? Representation Learning: F : D[, Y] → X There is no clearly defined objective Different from machine learning tasks such as, classification and clustering (we have a clearly defined objective) A good representation must disentangle the underlying factors of variation in the training data? How do we translate the objective into appropriate training criteria? Is it even neecessary to do anything but maximize likelihood under a good model? Representation Learning: A Review and New Perspectives by Bengio et al Disentangled Representation for Manipulation of Sentiment in Text Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 13 / 45
  21. Outline 1 About Me 2 Introduction and Motivation 3 Properties

    of Algorithm for Learning Representation 4 Representation Learning of Nodes in an Evolving Network 5 Representation Learning of Sentences 6 Substructure Sampling 7 Total Recall 8 Name Disambiguation Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 14 / 45
  22. Static vs Dynamic (Evolving) Network Static: A Single snapshot of

    a network at a particular time-stamp 4 1 3 2 5 G1 Figure: A Toy Evolving Network. G1 , G2 and G3 are three snapshots of the Network.
  23. Static vs Dynamic (Evolving) Network Static: A Single snapshot of

    a network at a particular time-stamp 4 1 3 2 5 G1 4 1 3 2 5 G2 4 1 3 2 5 G3 Figure: A Toy Evolving Network. G1 , G2 and G3 are three snapshots of the Network. Evolving: Multiple snapshots of a network at various time-stamps Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 15 / 45
  24. Latent Representation of Nodes in a Static Network Network 1

    2 3 4 5 Is it even neecessary to do anything but maximize likelihood under a good model? word2vec, sen2vec, paragraph2vec, doc2vec || Deepwalk, LINE, Node2Vec || Speech2Vec Manifold learning: LLE, ISOMAP; Dimensionality Reduction: PCA, SVD Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 16 / 45
  25. Latent Representation of Nodes in a Static Network Network Create

    Corpus 1 2 3 4 5 3 4 5 1 3 2 2 3 4 3 4 5 Is it even neecessary to do anything but maximize likelihood under a good model? word2vec, sen2vec, paragraph2vec, doc2vec || Deepwalk, LINE, Node2Vec || Speech2Vec Manifold learning: LLE, ISOMAP; Dimensionality Reduction: PCA, SVD Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 16 / 45
  26. Latent Representation of Nodes in a Static Network Network Create

    Corpus Learn Representation 1 2 3 4 5 3 4 5 1 3 2 2 3 4 3 4 5 4 -log P( | 3 ) 5 -log P( | 4 ) Train a skipped version of Language Model Minimize Negative log likelihood Usually solved using negative sampling instead of softmax Is it even neecessary to do anything but maximize likelihood under a good model? word2vec, sen2vec, paragraph2vec, doc2vec || Deepwalk, LINE, Node2Vec || Speech2Vec Manifold learning: LLE, ISOMAP; Dimensionality Reduction: PCA, SVD Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 16 / 45
  27. Latent Representation of Nodes in a Static Network Most of

    the existing works can not capture structural equivalence as advertised Lyu et al. show that external information such as orbit participation of nodes may be helpful in this regard Is it even neecessary to do anything but maximize likelihood under a good model? word2vec, sen2vec, paragraph2vec, doc2vec || Deepwalk, LINE, Node2Vec || Speech2Vec Enhancing the Network Embedding Quality with Structural Similarity Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 17 / 45
  28. Latent Representation in an Evolving Network 4 1 3 2

    5 G1 φ1 The latent representation of a node in an evolving network should be closer to it’s neighbor from the current snapshot And, also it should not go far-away from its position in the previous time-step (i) A good representation must disentangle the underlying factors of variation in the training data (ii) Is it even neecessary to do anything but maximize likelihood under a good model?
  29. Latent Representation in an Evolving Network 4 1 3 2

    5 G1 4 1 3 2 5 G2 φ1 φ2 The latent representation of a node in an evolving network should be closer to it’s neighbor from the current snapshot And, also it should not go far-away from its position in the previous time-step (i) A good representation must disentangle the underlying factors of variation in the training data (ii) Is it even neecessary to do anything but maximize likelihood under a good model?
  30. Latent Representation in an Evolving Network 4 1 3 2

    5 G1 4 1 3 2 5 G2 4 1 3 2 5 G3 φ1 φ2 φ3 The latent representation of a node in an evolving network should be closer to it’s neighbor from the current snapshot And, also it should not go far-away from its position in the previous time-step (i) A good representation must disentangle the underlying factors of variation in the training data (ii) Is it even neecessary to do anything but maximize likelihood under a good model?
  31. Latent Representation in an Evolving Network 4 1 3 2

    5 G1 4 1 3 2 5 G2 4 1 3 2 5 G3 φ1 φ2 φ3 φ4 The latent representation of a node in an evolving network should be closer to it’s neighbor from the current snapshot And, also it should not go far-away from its position in the previous time-step (i) A good representation must disentangle the underlying factors of variation in the training data (ii) Is it even neecessary to do anything but maximize likelihood under a good model? Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 18 / 45
  32. Latent Representation in an Evolving Network 4 1 3 2

    5 G1 4 1 3 2 5 G2 4 1 3 2 5 G3 4 1 3 2 5 G4 φ1 φ2 φ3 φ4 The latent representation of a node in an evolving network should be closer to it’s neighbor from the current snapshot And, also it should not go far-away from its position in the previous time-step (i) A good representation must disentangle the underlying factors of variation in the training data (ii) Is it even neecessary to do anything but maximize likelihood under a good model? Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 18 / 45
  33. Our Solution 4 1 3 2 5 4 1 3

    2 5 4 1 3 2 5 4 1 3 2 5 G1 G2 G3 G4 φ1 φ2 φ3 φ4 Figure: Our Expectation φ1 φ2 φ3 W1 W2 W Smoothing (c) Heter LT Model φ2 φ1 φ3 φ2 W φ1 φ2 φ3 G1 G2 G3 (a) RET Model D eepW alk R ET R ET (b) Homo LT Model Figure: Toy illustration of our method Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 19 / 45
  34. Solution Sketch Figure: A conceptual sketch of retrofitting (top) and

    linear transformation (bottom) based temporal smoothness. Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 20 / 45
  35. Mathematical Formulation Mathematical Formulation for Retrofitted Models J(φt) = v∈V

    αv ||φt(v) − φ(t−1) (v)||2 Temporal Smoothing + (v,u)∈Et βu,v ||φt(u) − φt(v)||2 Network Proximity (1) Mathematical Formulation for Homogeneous Transformation Models J(W ) = ||WX − Z||2, where X =      φ1 φ2 . . . φT−1      ; Z =      φ2 φ3 . . . φT      . (2) Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 21 / 45
  36. Mathematical Formulation Heterogeneous Transformation Models J(Wt) = ||Wt φt −

    φt+1||2, for t = 1, 2, . . . , (T − 1). (3) (a) Uniform smoothing: We weight all projection matrices equally, and linearly combine them: (avg) W = 1 T − 1 T−1 t=1 Wt . (4) (b) Linear smoothing: We increment the weights of the projection matrices linearly with time: (linear) W = T−1 t=1 t T − 1 Wt . (5) (c) Exponential smoothing: We increase weights exponentially, using an exponential operator (exp) and a weighted-collapsed tensor (wct): (exp) W = T−1 t=1 exp t T−1 Wt (6) (wct) W = T−1 t=1 (1 − θ)T−1−t Wt . (7) Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 22 / 45
  37. Similarity in Other Modality Exploiting Similarities among Languages for Machine

    Translation Can Network Embedding of Distributional Thesaurus be Combined with Word Vectors for Better Representation? Retrofitting word vectors to semantic lexicons (vision, text, knowledge graph) Cross-modal Knowledge Transfer: Improving the word embeddings of Apple by Looking at Oranges Disentangled Representations for Manipulation of Sentiment of Text unfortunately, this is a bad movie that is just plain bad overall, this is a good movie that is just good Deep manifold traversal: Changing labels with convolutional features (vision) Transform a smiling portrait into an angry one and make one individual look more like someone else without changing clothing and background Controllable Text generation Learning to generate reviews and discovering sentiment A neural algorithm of artistic style Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 23 / 45
  38. Outline 1 About Me 2 Introduction and Motivation 3 Properties

    of Algorithm for Learning Representation 4 Representation Learning of Nodes in an Evolving Network 5 Representation Learning of Sentences 6 Substructure Sampling 7 Total Recall 8 Name Disambiguation Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 24 / 45
  39. Motivation (Latent Representation of Sentences) Most existing Sen2Vec methods disregard

    context of a sentence Meaning of one sentence depends on the meaning of its neighbors I eat my dinner. Then I take some rest. After that I go to bed. Our approach: incorporate extra-sentential context into Sen2Vec We propose two methods: regularization and retrofitting We experiment with two types of context: discourse and similarity. Regularized and Retrofitted models for Learning Sentence Representation with Context CON-S2V: A Generic Framework for Incorporating Extra-Sentential Context into Sen2Vec Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 25 / 45
  40. Motivation (Discourse is Important) A simple strategy of decoding the

    concatenation of the previous and current sentence leads to good performance A novel strategy of multiencoding and decoding of two sentences leads to the best performance Target side context is important in translation Evaluating Discourse Phenomena in Neural Machine Translation Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 26 / 45
  41. Our Approach Consider content as well as context of a

    sentence Treat the context sentences as atomic linguistic units Similar in spirit to (Le & Mikolov, 2014) Efficient to train compared to compositional methods like encoder-decoder models (e.g., SDAE, Skip-Thought) Sen2Vec, SDAE, SAE, Fast-Sent, Skip-Thought, w2v-avg, c-phrase Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 27 / 45
  42. Content Model (Sen2Vec) Treats sentences and words similarly Represented by

    vectors in shared embedding matrix v: I eat my dinner I eat my dinner v φ : V → Rd look-up Figure: Distributed bag of words or DBOW (Le & Mikolov, 2014) Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 28 / 45
  43. Regularized Models (Reg-dis, Reg-sim) Incorporate neighborhood directly into the objective

    function of the content-based model (Sen2Vec) as a regularizer Objective function: J(φ) = v∈V Lc(v) + βLr (v, N(v)) = v∈V Lc(v) Content loss + β (v,u)∈E ||φ(u) − φ(v)||2 Graph smoothing (8) Train with SGD Regularization with discourse context ⇒ Reg-dis Regularization with similarity context ⇒ Reg-sim CON-S2V: A Generic Framework for Incorporating Extra-Sentential Context into Sen2Vec Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 29 / 45
  44. Pictorial Depiction y : Or is it discarded to burn

    up on return to LEO? v : Is it reusable? u : And I was wondering about the GD LEV. (a) A sequence of sentences v φ is it reusable (b) Sen2Vec (DBOW) Lc is it reusable u y v φ (c) Reg-dis Lc Lr Lr Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 30 / 45
  45. Retrofitted Model (Ret-dis, Ret-sim) Retrofit vectors learned from Sen2Vec s.t.

    the revised vector φ(v): Similar to the prior vector, φ (v) Similar to the vectors of its neighboring sentences, φ(u) Objective function: J(φ) = v∈V αv ||φ(v) − φ (v)||2 close to prior + (v,u)∈E βu,v ||φ(u) − φ(v)||2 graph smoothing (9) Solve using Jacobi iterative method Retrofit with discourse context ⇒ Ret-dis Retrofit with similarity context ⇒ Ret-sim Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 31 / 45
  46. Outline 1 About Me 2 Introduction and Motivation 3 Properties

    of Algorithm for Learning Representation 4 Representation Learning of Nodes in an Evolving Network 5 Representation Learning of Sentences 6 Substructure Sampling 7 Total Recall 8 Name Disambiguation Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 32 / 45
  47. Frequent Subgraph Mining (Sampling Substructure) Perform a first-order random-walk over

    the fixed-size substructure space MH algorithm calculates acceptance probability using the following equation: α(x, y) = min π(y)q(y, x) π(x)q(x, y) , 1 (10) For mining frequent substructure from a set of graphs, we use average (s1) and set interaction support (s2) as the target distibution, i.e., π = s1 or π = s2 For collecting statistics from a single large graph, we use uniform probabililty distribution as our target distribution In both cases, we use uniform distribution as our proposal distribution, i.e., q(x, y) = 1 dx F-S-Cube: A sampling based method for top-k frequent subgraph mining Finding network motifs using MCMC sampling Discovery of Functional Motifs from the Interface Region of Oligomeric Proteins using Frequent Subgraph Mining ACTS: Extracting android App topological signature through graphlet sampling Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 33 / 45
  48. Data Representation (Frequent Subgraph Mining) A B F C D

    D A D B E C B E D G1 G2 G3 Graph Database A B B C B D B E D E 2-node frequent subgraphs A B C B D E 3-node frequent subgraphs Frequent Induced Subgraphs We find the support-set of edges BD, BE and DE of g13 which are {G1, G2, G3}, {G2, G3}, and {G2, G3} respectively So, for gBDE , s1(gBDE ) = 3+2+3 3 = 2.67, and s2(gBDE ) = 2 Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 34 / 45
  49. Frequent Subgraph Mining (Sampling Substructure) 1 2 3 4 5

    7 6 9 8 10 11 12 (a) 1 5,6,7,8,9,10 2 5,6,7,8,10 3 5,6,7,8,9,10 4 5,6,8,9 (b) (a) Left: A graph G with the current state of random walk; Right: Neighborhood information of the current state (1,2,3,4) 1 2 3 4 5 7 6 9 8 10 11 12 (a) 1 4,9 2 4,5,6,9,12 3 4,9 8 4,5,6,9 (b) (b) Left: The state of random walk on G (Figure 8a) after one transition; Right: Updated Neighborhood information Figure: Neighbor generation mechanism For this example, dx = 21, dy = 13 Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 35 / 45
  50. Frequent Subgraph Mining (Sampling Substructure) Algorithm 1: SampleIndSubGraph Pseudocode Input

    : - Graph Gi - Size of subgraph, [2] [2]x ← State saved at Gi ; [4] [4]dx ← Neighbor-count of x ; [6] [6]a supx ← score of graph x ; [8] [8]while a neighbor state y is not found do [10] [10] y ← a random neighbor of x; [12] [12] dy ← Neighbor count of y ; [14] [14] a supy ← score of graph y ; [16] [16] accp val ← (dx ∗ a supy )/(dy ∗ a supx ) ; [18] [18] accp probablility ← min(1, accp val) ; [20] [20] if uniform(0, 1) ≤ accp probability then [22] [22] return y ; Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 36 / 45
  51. Frequent Subgraph Mining (Sampling Substructure) Algorithm 2: SampleIndSubGraph Pseudocode Input

    : - Graph Gi - Size of subgraph, [2] [2]x ← State saved at Gi ; [4] [4]dx ← Neighbor-count of x ; [6] [6]a supx ← score of graph x ; [8] [8]while a neighbor state y is not found do [10] [10] y ← a random neighbor of x; [12] [12] dy ← Neighbor count of y ; [14] [14] a supy ← score of graph y ; [16] [16] accp val ← dx /dy ; [18] [18] accp probablility ← min(1, accp val) ; [20] [20] if uniform(0, 1) ≤ accp probability then [22] [22] return y ; Motif Counting Beyond Five Nodes Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 37 / 45
  52. Frequent Subgraph Mining (Sampling Substructure) The random walks are ergodic

    It satisfies reversibility condition, so, it achieves the target distribution We have use spectral gap (λ = 1 − max{λ1, |λm−1|}) technique to measure the mixing rate of our random walk We compute the mixing time (inverse of spectral gap) for size-6 subgraphs of Mutagen dataset and found that the mixing time is approximately around 15 units We suggest to use multiple chains along with a suitable distance measure (for example, jaccard distance) for choossing a suitable iteration count We show that the acceptance probability for our technique is quite high (a large number of rejected moves indicate a poorly designed proposal distribution) Table: Probability of Acceptance of FS3 for Mutagen and PS Dataset Mutagen PS = 8 =9 =10 =6 =7 =8 Acceptance (%), Strat- egy =s1 82.70 ± 0.04 83.89 ± 0.03 81.66 ± 0.03 91.08 ± 0.01 92.23 ± 0.02 93.08 ± 0.01 Acceptance (%), Strat- egy =s2 75.27 ± 0.05 76.74 ± 0.03 75.20 ± 0.03 85.08 ± 0.05 87.46 ± 0.06 89.41 ± 0.07 Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 38 / 45
  53. Outline 1 About Me 2 Introduction and Motivation 3 Properties

    of Algorithm for Learning Representation 4 Representation Learning of Nodes in an Evolving Network 5 Representation Learning of Sentences 6 Substructure Sampling 7 Total Recall 8 Name Disambiguation Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 39 / 45
  54. The problem of Total Recall Vanity search: Find out everything

    about me Fandom: find out everything about my hero Research: Find out everything about my PhD topic Investigation: Find out everything about something or some activity Systematic review: Find all published studies evaluating some method or effect Patent search: find all prior art Electronic discovery: Find all documents responsive to a request for production in a legal matter Creating archival collections: Label all relevant documents, for posterity, future IR evaluation, etc. Batch-mode active learning for technology-assisted review A large scale study of SVM based methods for abstract screening in systematic reviews Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 40 / 45
  55. An Active Learning Algorithm Algorithm 3: SelectABatch Input : hc,

    current hyperplane; D, available instances; k, batch size; and similarity threshold, t Output: A batch of k documents to be included in training 1 if Strategy is DS then 2 Bc ← EmptySet() 3 I ← ArgSort (Distance(hc , D), order = increase) 4 while Size (Bc) < k do 5 Insert(Bc , I[1]) 6 S ← GetSimilar(I[1], I, D, t, similarity = cosine) 7 I ← Remove(I, S) 8 else if Strategy is BPS then 9 w ← 1.0/(Distance(hc , D)2 10 w ← Normalize(w) 11 I ← List(D) 12 while Size (Bc) < k do 13 c ← Choose(I, prob = w, num = 1) 14 Insert(Bc , c) 15 S ← GetSimilar(c, I, D, t, similarity = cosine) 16 I ← Remove(I, S) 17 w ← Normalize(w[I]) 18 return Bc Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 41 / 45
  56. Outline 1 About Me 2 Introduction and Motivation 3 Properties

    of Algorithm for Learning Representation 4 Representation Learning of Nodes in an Evolving Network 5 Representation Learning of Sentences 6 Substructure Sampling 7 Total Recall 8 Name Disambiguation Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 42 / 45
  57. Name Disambiguation The graph in this figure corresponds to the

    ego network of u, Gu We also assume that u is a multi-node consisting of two name entities So the removal of the node u (along with all of its incident edges) from Gu makes two disjoint clusters Figure: A toy example of clustering based entity disambiguation (i) A good representation must disentangle the underlying factors of variation in the training data (ii) Is it even neecessary to do anything but maximize likelihood under a good model? Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 43 / 45
  58. Name Disambiguation Calculate Normalized-cut score NC = k i=1 W

    (Ci , Ci ) W (Ci , Ci ) + W (Ci , Ci ) (11) Modeling Temporal Mobility 0 1 2 3 Cluster3 0 1 2 3 4 Number of Papers Cluster2 0 1 2 3 4 13 12 11 10 09 07 06 05 Cluster1 Calculating Temporal Mobility score TM-score = k−1 i=1 k j=i+1 w(Zi , Zj ) · D(Zi Zj ) + D(Zj Zi ) k × k−1 i=1 k j=i+1 w(Zi , Zj ) (12) Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 44 / 45
  59. Thanks! Sen2Vec Code and Datasets: https://github.com/tksaha/con-s2v/tree/jointlearning Temporal node2vec Code: https://gitlab.com/tksaha/temporalnode2vec.git

    Motif Finding Code: https://github.com/tksaha/motif-finding Frequent Subgraph Mining Code: https://github.com/tksaha/fs3-graph-mining Finding Functional Motif Code: https://gitlab.com/tksaha/func motif Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling May 13, 2018 45 / 45