parsing tools. • Morphology Analyzer – Motivation and Morphology Approaches • DM based Morphology Analyzer • Hindi Nominal and Verbal Inflection in DM • Development of Lexicon – Lexicon Entry Tool • DM Based MA Implementation and its Demo • Output and Result • Future Work • References
and coherent account of the various aspects of Hindi inflectional morphology. • Apply it concretely in the development of Natural Language Processing tools (like Morphology Analyzer) for Hindi. • Our Implementation is rule - based i.e. required to analyze and describe the various inflectional forms in Hindi • This analysis is presented here in theoretical framework of Distributed Morphology.
to develop a number of language processing systems for tasks such as spell-checking, stemming, morphological analysis, Part-of-Speech Tagging, Chunking, Machine Translation. • It accounts for the morphological properties of languages in a systematic manner, enabling us to understand how words are formed, what their constituents are, how they may be arranged to make larger units, what the semantic and grammatical constraints involved are, and how morphological processes interact with syntactic and phonological ones.
ways to produce roots of different POS categories. • मेरे कई खाते है mere kai khate hai {I have many bank accounts} Analysis: khata (root) + suffix /-e/ • वे रोज चावल खाते है ve roz chaval khate hai {They eat rice everyday} Analysis: kha (root) + suffixes {/-t-/ and /-e/}
morphology based MA (using a paradigm-based approach) as statistical methods often fail to correctly learn and represent the morphological patterns and the linguistic generalizations. • Emphasis on: Efficiency, High Accuracy and High Coverage.
Considers a sharp division between syntax and morphology. – Words are formed in lexicon before they enter the terminal node of the syntactic tree. – Syntax has no access to alter the word internal structure. It can only rearrange these words to form phrases. – Problem: Dealing with freely formed Phrasal Compounds in Afrikaans. – Ex: [[saas laat in die bed] kinders ] (children who go to bed late)
– Promotes that there is no independent component like Morphology. – All words are formed by syntactically using WFR (Word Formation Rules) . All kinds of derivations are treated as syntax. – No affixes in the model. – Problematic Stand: Model cannot treat stem modification and affixation as two different operations. – Ex: Following verb pairs cannot be treated as same kind of operation: sing-sang (Partial Stem Modification); go-went (Total Stem Modification); cut-cut (No Modification); play- played (Affixation).
[Halle and Marantz 1993], the morphological structure of a word or a word form is generated using syntactic operations. • It is syntax that provides features and the structures upon which morphology operates. • Various components of morphology are distributed among various levels in the process of word computation. • It combines the features of both Lexicalist and the Affixless approaches.
rules (Ordered + well-constrained.) • Rules that can both analyze and generate the possible morphological forms. • Increase in efficiency and accuracy over the existing Hindi Morphological Analyzers • Ease of computational implementation; can be used in other NLP tools such as POS taggers and Chunkers (word groupers). • Ability to mimic as far as possible a native speaker’ s use of morphological knowledge – the representation is not non-intuitive. • Use of a simple lexicon with fewer inflectional classes, makes it easier for lexicographers to classify words.
Feminine – क्रोध • Class B+ : Feminine, take याँ / यां In 'direct, plural' and ओं (which becomes यो after glide insertion) In 'plural, oblique' लड़की, उपलि, ब्ध, गुड़ियड़या • Class C+ : Feminine, take the suffix एँ/एं in the ‘plural, direct’ and /-õ/ in the ‘plural, oblique’ case. रात, माला, ऋतुड़, बहू, लौ. • Class D+ : Masculine ending in आ or या. लड़का, धागा, क ुड़ आँ, साया • Class E+ : Masculine nouns that inflect only in the ‘plural, oblique’ form and take the 'suffix' -ओं (becomes –यो for ई and इ ending roots due to glide insertion) and 'null' for all other case-number values. आलू, साधुड़, माली, कियव, खेत, घर, राजा, ियपता, भैया.
ोो/X,X,NC /[+pl, +oblique] Stem Obtained: घर Root Formation [Looked up in the lexicon] Only one Lexicon Entry Found: <घर> <E+,+masc,NC> <noun> Apply Readjustment Rules: In condition where Readjustment Rules may apply to make new stems,thereby we use them and lookup in the lexicon again to get new roots. Readjustment Rule: None [In this case] Output Morphological Analysis ------------------Set of Roots and Features are---------------------- Token : घरो, Total Output : 1 [ Root : घर, Class : E, Category : noun, Suffix : ोो ] [ Gender : +masc, Number : +pl, Person : x, Case : +oblique, Tense : x, Aspect : x, Mood : x ]
> infinitive > passive marker > person- number > Modal > Aspect >Tense/Mood > gender- number Morpheme arrangement rules enable us to identify valid word forms and rule out invalid ones.
that always remain unchanged. Consonantal ending (बहादुड़र, शान्त/शांत) or vowel ending such as भारी. • Class B - Adjectives that do not inflect in the masculine gender but are marked with आ to mark the feminine gender, for example, अबल-अबला, महोदय-महोदया. • Class C - Adjectives that inflect for feminine and plural and oblique.; as अच्छा अच्छे अच्छी ( लंबा, छोटा, काला)
word and provides the root, the grammatical category, the inflectional class and the feature values associated with the word. • Detailed morphological analysis for each morpheme that constitutes the word form. • The morpheme analysis of each suffix is produced in a seven field with values for the features: gender, number, person, case, tense, aspect, and mood. Input Token: TOKEN_IN • Possible Root 1: class: category: suffix: morphemes (morpheme1, morpheme 2, ..): Morpheme analysis (morpheme 1, morpheme 2, ... ) • Possible Root 2: category: suffix: morphemes (morpheme1, morpheme 2, .... ): Morpheme analysis (morpheme 1, morpheme 2, ... )
idea is to see whether the framework is able to capture all kinds of words forms in Hindi – both regular and irregular. • The implementation will not be very different from that of the MA’ s implementation. • The linguistic resources used in the DM-based MA namely, the vocabulary items (suffixal entries) and the re-adjustment rules need to be applied in the reverse direction to produce fully inflected words using the root entries from the root-list and combining them with the affixal entries to generate surface forms.
within the framework of Distributed Morphology. • The study analyses the formation of inflectional forms of Hindi through the application of suffix insertion rules and phonological readjustment rules. • Implementation of a DM-based Hindi Morphological Analyzer that uses a set of ordered contextual rules to extract out suffixes from a word form and to provide detailed morpheme analysis. • We showed that the DM-based Hindi morphological Analyzer is quite accurate and reliable, capable of both analysis and generation. • We show that using Distributed Morphology, the representation of Hindi morphology is minimal, affix driven, efficient and accurate for the tasks of stemming and morphological analysis.
and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics. 2nd edition. Prentice-Hall. • Smriti Singh, PhD Thesis. 2012. Hindi Inflectional Morphology and its Implementation in Language Processing Tools: A Distributed Morphology Approach. • Morris Halle and Alec Marantz 1993. Distributed Morphology and pieces of Inflection, In The View from Building 20: Essays in Linguistics in Honor of Sylvain Bromberger, eds. K. Hale and S. J. Keyser, 111– 176. Cambridge, Mass.: MIT Press.