Networks and Natural Language Processing

Networks and NLP Networks and Natural Language Processing Presented by:
Ahmed Magdy Ezzeldin

Graphs in NLP • Graphs are used in many NLP
applications like : - Text Summarization - Syntactic parsing - Word sense disambiguation - Ontology construction - Sentiment and subjectivity analysis - Text clustering • Associative or semantic networks are used to represent the language units and their relations where language units are the vertices (nodes) and the relations are the edges (links).

Networks are Graphs Nodes are Vertices Links are Edges -
Node can represent text units can be : (words, collocations, word senses, sentences, documents) - Graph nodes do not have to be of the same category - Edges can represent relations: (co-occurrence, collocation, syntactic dependency, lexical similarity)

Outline • Syntax 1- Dependency Parsing 2- Prepositional Phrase Attachment
3- Co-reference Resolution • Lexical Semantics 1- Lexical Networks 2- Semantic Similarity and Relatedness 3- Word Sense Disambiguation 4- Sentiment and Subjectivity Analysis • Other Applications 1- Summarization 2- Semi-supervised Passage Retrieval 3- Keyword Extraction

Syntax

1- Dependency Parsing  An approach to sentence parsing 
Dependency tree of a sentence is a directed subgraph of the full graph connecting all words in the sentence.  So this subgraph is a tree  The root of the tree is the main predicate that takes arguments which are the child nodes

• (McDonald et al, 2005) made a parser that finds
the tree with the highest score using CLE (Chu Liu Edmonds) Algorithm of Maximum spanning tree (MST) in a directed graph. • Each node picks the neighbor with the highest score which will lead to a spanning tree or a cycle • CLE collapses each cycles into a single node • CLE runs in O(n^2)

• No tree covers all nodes so the closest 2
nodes are collapsed

• We repeat this step until all nodes are collapsed
then an MST is constructed by reversing the procedure and expanding all nodes. • McDonald achieved excellent results on a standard English data set and even better results on Czech (free word order language)

2- Prepositional Phrase Attachment • (Toutanova et al., 2004) A
preposition like "with" is either attached to the main predicate (high verbal attachment) or the noun phrase before it (low nominal attachment). - “I ate pizza with olives.” - “I ate pizza with a knife.” • He proposed a semi-supervised learning process where a graph of nouns and verbs is constructed and if 2 words appear in the same context they are connected with an edge. • Random walk until convergence • Reached performance of 87.54% classification accuracy which is near the human performance which is 88.20%

3- Co-reference Resolution • Identifying relations between entity references in
a text • Can be nouns or pronouns • Approximate the correct assignment of references to entities in a text by using a graph- cut algorithm. Method: A graph is constructed for each entity • Every entity is linked to all the possible co- reference with weighted edges where weights are the confidence of each co-reference. • Min-cut partitioning separate each entity and its co-references.

Lexical Semantics Semantic Analysis, Machine Translation, Information retrieval, question answering,
knowledge acquisition, word sense disambiguation, semantic role labeling, textual entailment, lexical acquisition, semantic relations

1- Lexical Networks a- Unsupervised lexical acquisition (Widdows and Dorow,
2002) Goal: build semantic classes automatically from raw corpora Method: • Build a co-occurrence graph from British National Corpus where nodes are words linked by conjunction (and/or) • Over 100,000 nodes and over half a million edges. • Representative nouns are manually selected and put in a seed set. • Largest number of links with the seed set is added to the seed

Result: Accuracy 82% which is far better than before The
drawback of this method is low coverage as it is limited to words in conjunction relation only.

1- Lexical Networks [continued] b- Lexical Network Properties(Ferrer-i-Cancho and Sole,
2001) Goal: • Observe Lexical Networks properties Method: • Build a co-occurrence network where words are nodes that are linked with edges if they appear in the same sentences with distance of 2 words at most. • Half million nodes with over 10 million edges Result: • Small-world effect: 2-3 jumps can connect any 2 words • Distribution of node degree is scale-free

2- Semantic Similarity and Relatedness • Methods include metrics calculated
on existing semantic networks like WordNet by applying shortest path algorithms to identify the closest semantic relation between 2 concepts (Leacock et al. 1998) • Random Walk algorithm (Hughes and Ramage, 2007) • PageRank gets the stationary distribution of nodes in WordNet biased on each word of an input word pair. • Divergence between these distributions is calculated to show the words relatedness.

3- Word Sense Disambiguation a- Label Propagation Algorithm (Niu et
al. 2005) Method: • Construct a graph of labeled and unlabeled examples for a given ambiguous word • Word sense examples are the nodes and weighted edges are drawn by pairwise metric of similarity. • Known labeled examples are the seed set are assigned with their correct labels (manually) • Labels are propagated through the graph through the weighted edges • Labels are assigned with certain probability • The propagation is repeated until the correct labels are assigned. Result: Performs better than SVM when there is a small number of examples provided.

b- Knowledge- based word sense disambiguation (Mihalcea et al. 2004,
Sinha and Mihalcea 2007)

Method: • Build a graph for a given text and
all the senses of its words as nodes • Senses are connected on the basis of their semantic relations (synonymy, antonymy ...) • A random walk results in a set of scores that reflects the importance of each word sense. Result: • Superior to other Knowledge-based word sense disambiguation that did not use graph based representations. Follow up work: • Mihalcea did not use semantic relations but she used weighted edges using a measure of lexical similarity • Brought generality as it can use any electronic dictionary not just a semantic network like WordNet

c- Comparative Evaluation of Graph Connectivity Algorithms (Navigli and Lapata,
2007) • Applied on word sense graphs derived from WordNET • Found out that the best measure to use is a closeness measure

4- Sentiment and Subjectivity Analysis a- Using min-cut graph algorithm
(Pang and Lea 2004) Method: • Drawing a graph where sentences are the nodes and the edges are drawn according to the sentences proximity • Each node is assigned a score showing the probability that its sentence is subjective using a supervised subjectivity classifier • Use min-cut algorithm to separate subjective from objective sentences. Results: • Better than the supervised subjectivity classifier b- By Assignment subjectivity and polarity labels (Esuli and Sebastiani 2007) Method: • Random walk on a graph seeded with nodes labeled for subjectivity and polarity.

Other Applications

1- Summarization a- (Salton et al. 1994, 1997) • Draw
a graph of the corpus where every node is a paragraph • Lexically similar paragraphs are linked with edges • A summary is retrieved by following paths defined by different algorithms to cover as much of the content of the graph as possible. b- Lexical Centrality (Erkan and Radev 2004) (Mihalcea and Tarau 2004) Method: • Sentences are nodes of the graph • Random walk to define the most visited nodes as central to the documents • Remove duplicates or near duplicates • Select sentences with maximal marginal relevance

2- Semi-supervised Passage Retrieval • Question Biased Passage Retrieval (OtterBacher
et al., 2005) Answer a question from a group of documents Method: • Use biased random walk on a graph seeded with positive and negative examples • Each node is labeled according to the percentage a random walk ends at this node • The nodes with the highest score are central to the document set and similar to the seed nodes.

3- Keyword Extraction • A set of terms that best
describes the document • Used in terminology Extraction and construction of domain specific dictionaries

• Mihalcea and Tarau, 2004 Method: • Build a co-occurrence
graph of for the input text where words are the the text words • Words are linked by co-occurrence relation limited by the distance between words. • Random walk on graph • Words ranked as important important and found next to each other are collapsed into one key phrase Result: • A lot better than tf.idf

References Networks and Natural Language Processing (Mihalcea and Radev 2008)
Dragomir Radev University of Michigan [email protected] Rada Mihalcea University of North Texas [email protected]

Networks and Natural Language Processing

Networks and Natural Language Processing

Ahmed Magdy

More Decks by Ahmed Magdy

Other Decks in Science

Featured

Transcript

Networks and NLP Networks and Natural Language Processing Presented by:

Graphs in NLP • Graphs are used in many NLP

Networks are Graphs Nodes are Vertices Links are Edges -

Outline • Syntax 1- Dependency Parsing 2- Prepositional Phrase Attachment

Syntax

1- Dependency Parsing  An approach to sentence parsing 

• (McDonald et al, 2005) made a parser that finds

• No tree covers all nodes so the closest 2

• We repeat this step until all nodes are collapsed

2- Prepositional Phrase Attachment • (Toutanova et al., 2004) A

3- Co-reference Resolution • Identifying relations between entity references in

Lexical Semantics Semantic Analysis, Machine Translation, Information retrieval, question answering,

1- Lexical Networks a- Unsupervised lexical acquisition (Widdows and Dorow,

Result: Accuracy 82% which is far better than before The

1- Lexical Networks [continued] b- Lexical Network Properties(Ferrer-i-Cancho and Sole,

2- Semantic Similarity and Relatedness • Methods include metrics calculated

3- Word Sense Disambiguation a- Label Propagation Algorithm (Niu et

b- Knowledge- based word sense disambiguation (Mihalcea et al. 2004,

Method: • Build a graph for a given text and

c- Comparative Evaluation of Graph Connectivity Algorithms (Navigli and Lapata,

4- Sentiment and Subjectivity Analysis a- Using min-cut graph algorithm

Other Applications

1- Summarization a- (Salton et al. 1994, 1997) • Draw

2- Semi-supervised Passage Retrieval • Question Biased Passage Retrieval (OtterBacher

3- Keyword Extraction • A set of terms that best

• Mihalcea and Tarau, 2004 Method: • Build a co-occurrence

References Networks and Natural Language Processing (Mihalcea and Radev 2008)