a novel neural embedding method Starspace [Wu et al., 2017] but this paper assumes that readers understand Word2vec [Mikolov et al., 2013]. I guess most of you are unfamiliar with this algorithm, thus I will also introduce Word2vec but to do that, I should briefly explain ”Distributional Hypothesis”, thus I will start from ”Distributional Hypothesis”[Harris, 1954, Firth, 1957]. 2/14 Keita Watanabe October 11, 2017
the semantic theory of language usage, i.e. words that are used and occur in the same contexts tend to purport similar meanings.[Harris, 1954] The underlying idea that ”a word is characterized by the company it keeps” was popularized by Firth.[Firth, 1957]”. (From wikipedia) 3/14 Keita Watanabe October 11, 2017
a group of related models that are used to produce word embeddings. These models are shallow, two-layer neural networks that are trained to reconstruct linguistic contexts of words. Word2vec takes as its input a large corpus of text and produces a vector space, typically of several hundred dimensions, with each unique word in the corpus being assigned a corresponding vector in the space.” (From wikipedia) 4/14 Keita Watanabe October 11, 2017
c) ∏ c′∈Unigramk (D) P(−1|w, c′) Here ∗ P(+1|w, c): a probability that word c is one of the context of w ∗ P(−1|w, c′): a probability that word c is not the context of w ∗ Unigramk (D): k context that are sampled from P(w). (pseudo) negative sampling 6/14 Keita Watanabe October 11, 2017
Word2vec and PMI(Point-wise mutual information) ∗ [Arora et al., 2015]: discusses mathematical operation in the vector space. Good summary of the article. ∗ The original paper was not crystal clear to me. Review papers [Rong, 2014, Goldberg and Levy, 2014] were quite helpful. 8/14 Keita Watanabe October 11, 2017
2 ), . . . , sim(a, b− k )) ∗ The generator of positive entity pairs (a, b) coming from the set E+. This is task dependent and will be described subsequently. ∗ The generator of negative entities b− i coming from the set E−. We utilize a k-negative sampling strategy (Mikolov et al. 2013) ∗ The similarity function sim(·, ·): cosine osimilarity or inner product ∗ The loss function that compares the positive pair (a, b) with the negative pairs (a, b− i ): ranking loss or negative log loss of softmax (same as Word2vec). 10/14 Keita Watanabe October 11, 2017
Classification): The positive pair generator comes directly from a training set of labeled data specifying (a, b) pairs where a are documents (bags-of-words) and b are labels (singleton features). Negative entities b− are sampled from the set of possible labels. Learning Sentence Embeddings: Learning word embeddings (e.g. as above) and using them to embed sentences does not seem optimal when you can learn sentence embeddings directly. Given a training set of unlabeled documents, each consisting of sentences, we select a and b as a pair of sentences both coming from the same document; b− are sentences coming from other documents. 11/14 Keita Watanabe October 11, 2017
trained at the same time if they share some features in the base dictionary F. For example one could combine supervised classification with unsupervised word or sentence embedding, to give semi-supervised learning. 12/14 Keita Watanabe October 11, 2017
Liang, Y., Ma, T., and Risteski, A. (2015). RAND-WALK: A Latent Variable Model Approach to Word Embeddings. [Firth, 1957] Firth, J. R. (1957). A synopsis of linguistic theory . [Goldberg and Levy, 2014] Goldberg, Y. and Levy, O. (2014). word2vec Explained: deriving Mikolov et al.’s negative-sampling word-embedding method. [Harris, 1954] Harris, Z. S. (1954). Distributional Structure. WORD, 10(2-3):146–162. [Levy and Goldberg, 2014] Levy, O. and Goldberg, Y. (2014). Neural Word Embedding as Implicit Matrix Factorization. pages 2177–2185. [Mikolov et al., 2013] Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J. (2013). Distributed Representations of Words and Phrases and their Compositionality. [Rong, 2014] Rong, X. (2014). word2vec Parameter Learning Explained. arXiv.org. 13/14 Keita Watanabe October 11, 2017