Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Networks and Natural Language Processing

Networks and Natural Language Processing

A Survey of Graph-Based Algorithms used in Natural Language Processing.

Ahmed Magdy

June 01, 2012

More Decks by Ahmed Magdy

Other Decks in Science


  1. Networks and NLP
    Networks and Natural
    Language Processing
    Presented by: Ahmed Magdy Ezzeldin

    View Slide

  2. Graphs in NLP

    Graphs are used in many NLP applications like :
    - Text Summarization
    - Syntactic parsing
    - Word sense disambiguation
    - Ontology construction
    - Sentiment and subjectivity analysis
    - Text clustering

    Associative or semantic networks are used to
    represent the language units and their relations
    where language units are the vertices (nodes) and
    the relations are the edges (links).

    View Slide

  3. Networks are Graphs
    Nodes are Vertices
    Links are Edges
    - Node can represent text units can be : (words,
    collocations, word senses, sentences,
    - Graph nodes do not have to be of the same
    - Edges can represent relations: (co-occurrence,
    collocation, syntactic dependency, lexical

    View Slide

  4. Outline

    1- Dependency Parsing
    2- Prepositional Phrase Attachment
    3- Co-reference Resolution

    Lexical Semantics
    1- Lexical Networks
    2- Semantic Similarity and Relatedness
    3- Word Sense Disambiguation
    4- Sentiment and Subjectivity Analysis

    Other Applications
    1- Summarization
    2- Semi-supervised Passage Retrieval
    3- Keyword Extraction

    View Slide

  5. Syntax

    View Slide

  6. 1- Dependency Parsing

    An approach to sentence parsing

    Dependency tree of a sentence is
    a directed subgraph of the full
    graph connecting all words in the

    So this subgraph is a tree

    The root of the tree is the main
    predicate that takes arguments
    which are the child nodes

    View Slide

  7. (McDonald et al, 2005) made a parser that
    finds the tree with the highest score using CLE
    (Chu Liu Edmonds) Algorithm of Maximum
    spanning tree (MST) in a directed graph.

    Each node picks the neighbor with the highest
    score which will lead to a spanning tree or a cycle

    CLE collapses each cycles into a single node

    CLE runs in O(n^2)

    View Slide

  8. No tree covers all nodes so the closest 2 nodes
    are collapsed

    View Slide

  9. We repeat this step until
    all nodes are collapsed
    then an MST is constructed
    by reversing the procedure
    and expanding all nodes.

    McDonald achieved
    excellent results on a
    standard English data set
    and even better results on
    Czech (free word order

    View Slide

  10. 2- Prepositional Phrase Attachment

    (Toutanova et al., 2004) A preposition like "with" is either
    attached to the main predicate (high verbal attachment) or the
    noun phrase before it (low nominal attachment).
    - “I ate pizza with olives.”
    - “I ate pizza with a knife.”

    He proposed a semi-supervised learning process where a
    graph of nouns and verbs is constructed and if 2 words
    appear in the same context they are connected with an edge.

    Random walk until convergence

    Reached performance of 87.54% classification accuracy
    which is near the human performance which is 88.20%

    View Slide

  11. 3- Co-reference Resolution

    Identifying relations between entity
    references in a text

    Can be nouns or pronouns

    Approximate the correct assignment of
    references to entities in a text by using a graph-
    cut algorithm.
    A graph is constructed for each entity

    Every entity is linked to all the possible co-
    reference with weighted edges where weights
    are the confidence of each co-reference.

    Min-cut partitioning separate each entity and its

    View Slide

  12. Lexical Semantics
    Semantic Analysis, Machine Translation, Information
    retrieval, question answering, knowledge acquisition,
    word sense disambiguation, semantic role labeling,
    textual entailment, lexical acquisition, semantic relations

    View Slide

  13. 1- Lexical Networks
    a- Unsupervised lexical acquisition (Widdows and
    Dorow, 2002)
    Goal: build semantic classes automatically from raw

    Build a co-occurrence graph from British National
    Corpus where nodes are words linked by conjunction

    Over 100,000 nodes and over half a million edges.

    Representative nouns are manually selected and put in
    a seed set.

    Largest number of links with the seed set is added to
    the seed

    View Slide

  14. Result:
    Accuracy 82% which is
    far better than before
    The drawback of this
    method is low coverage
    as it is limited to words in
    conjunction relation only.

    View Slide

  15. 1- Lexical Networks [continued]
    b- Lexical Network Properties(Ferrer-i-Cancho and
    Sole, 2001)

    Observe Lexical Networks properties

    Build a co-occurrence network where words are
    nodes that are linked with edges if they appear in the
    same sentences with distance of 2 words at most.

    Half million nodes with over 10 million edges

    Small-world effect: 2-3 jumps can connect any 2 words

    Distribution of node degree is scale-free

    View Slide

  16. 2- Semantic Similarity and Relatedness

    Methods include metrics calculated on existing
    semantic networks like WordNet by applying shortest
    path algorithms to identify the closest semantic relation
    between 2 concepts (Leacock et al. 1998)

    Random Walk algorithm (Hughes and Ramage, 2007)

    PageRank gets the stationary distribution of nodes in
    WordNet biased on each word of an input word pair.

    Divergence between these distributions is calculated to
    show the words relatedness.

    View Slide

  17. 3- Word Sense Disambiguation
    a- Label Propagation Algorithm (Niu et al. 2005)

    Construct a graph of labeled and unlabeled examples for a
    given ambiguous word

    Word sense examples are the nodes and weighted edges are
    drawn by pairwise metric of similarity.

    Known labeled examples are the seed set are assigned with
    their correct labels (manually)

    Labels are propagated through the graph through the weighted

    Labels are assigned with certain probability

    The propagation is repeated until the correct labels are
    Result: Performs better than SVM when there is a small number
    of examples provided.

    View Slide

  18. b- Knowledge-
    based word sense
    (Mihalcea et al.
    2004, Sinha and
    Mihalcea 2007)

    View Slide

  19. Method:

    Build a graph for a given text and all the senses of its
    words as nodes

    Senses are connected on the basis of their semantic
    relations (synonymy, antonymy ...)

    A random walk results in a set of scores that reflects the
    importance of each word sense.

    Superior to other Knowledge-based word sense
    disambiguation that did not use graph based representations.
    Follow up work:

    Mihalcea did not use semantic relations but she used
    weighted edges using a measure of lexical similarity

    Brought generality as it can use any electronic dictionary
    not just a semantic network like WordNet

    View Slide

  20. c- Comparative Evaluation of Graph Connectivity
    Algorithms (Navigli and Lapata, 2007)

    Applied on word sense graphs derived from WordNET

    Found out that the best measure to use is a closeness

    View Slide

  21. 4- Sentiment and Subjectivity Analysis
    a- Using min-cut graph algorithm (Pang and Lea 2004)

    Drawing a graph where sentences are the nodes and the
    edges are drawn according to the sentences proximity

    Each node is assigned a score showing the probability that its
    sentence is subjective using a supervised subjectivity classifier

    Use min-cut algorithm to separate subjective from objective

    Better than the supervised subjectivity classifier
    b- By Assignment subjectivity and polarity labels (Esuli and
    Sebastiani 2007)

    Random walk on a graph seeded with nodes labeled for
    subjectivity and polarity.

    View Slide

  22. View Slide

  23. Other Applications

    View Slide

  24. 1- Summarization
    a- (Salton et al. 1994, 1997)

    Draw a graph of the corpus where every node is a paragraph

    Lexically similar paragraphs are linked with edges

    A summary is retrieved by following paths defined by different
    algorithms to cover as much of the content of the graph as
    b- Lexical Centrality (Erkan and Radev 2004) (Mihalcea and
    Tarau 2004)

    Sentences are nodes of the graph

    Random walk to define the most visited nodes as central to
    the documents

    Remove duplicates or near duplicates

    Select sentences with maximal marginal relevance

    View Slide

  25. 2- Semi-supervised Passage Retrieval

    Question Biased Passage Retrieval
    (OtterBacher et al., 2005)
    Answer a question from a group of documents

    Use biased random walk on a graph seeded
    with positive and negative examples

    Each node is labeled according to the
    percentage a random walk ends at this node

    The nodes with the highest score are central to
    the document set and similar to the seed nodes.

    View Slide

  26. 3- Keyword Extraction

    A set of terms that
    best describes the

    Used in terminology
    Extraction and
    construction of
    domain specific

    View Slide

  27. Mihalcea and Tarau, 2004

    Build a co-occurrence graph of for the input text where
    words are the the text words

    Words are linked by co-occurrence relation limited by
    the distance between words.

    Random walk on graph

    Words ranked as important important and found next
    to each other are collapsed into one key phrase

    A lot better than tf.idf

    View Slide

  28. References
    Networks and Natural Language Processing
    (Mihalcea and Radev 2008)
    Dragomir Radev
    University of Michigan
    [email protected]
    Rada Mihalcea
    University of North Texas
    [email protected]

    View Slide