Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Natural Language Processing (1) Introduction

Natural Language Processing (1) Introduction

Kazuhide Yamamoto
Nagaoka University of Technology

自然言語処理研究室

September 06, 2013
Tweet

More Decks by 自然言語処理研究室

Other Decks in Education

Transcript

  1. 1 / 31 Natural Language Processing (1) Introduction Kazuhide Yamamoto

    Dept. of Electrical Engineering Nagaoka University of Technology
  2. 3 / 31 About this course • For graduate students

    of Electrical Dept. • Classes are held in Information Processing Center. • May cancel some (maybe one or two) classes. • No textbook is used. Slides are used instead. – You will see them on the Web, just before each class. • The assessment will be assigned by a final report, and some questions given during the class.
  3. 4 / 31 My policy • Although students are supposed

    to have background of information processing, all students are welcome to join. • Broader topics are introduced as you understand our technology field correctly. • Some technical details are omitted as (most of) you will not be a researcher/engineer in this field. • Recent topics are attempted to be included.
  4. 6 / 31 What is “language”? • Japanese, English, Chinese,

    ... • Esperanto • sign language(for hearing impaired), braille(for visually impaired) • C++, Java, Python, MATLAB, ... • musical score, mathematical formula
  5. 7 / 31 Definition of “language” Language is a system

    of objects or symbols, such as sounds or character sequences, that can be combined in various ways following a set of rules, especially to communicate thoughts, feelings, or instructions. The American Heritage® Science Dictionary Copyright © 2005 by Houghton Mifflin Company. Published by Houghton Mifflin Company. All rights reserved.
  6. 8 / 31 Natural language • What is "natural language"?

    – it's language that is "natural" ?? – Is it used to contrast "unnatural language" ???
  7. 9 / 31 Natural language (cont'd) • Natural language is

    language used by human for communication. • There is a language used NOT for human, such as some computer languages, so we call them "natural" language for human and "formal" language for mathematics (and computers). • I don't know why it's called as "natural" language and who starts to call it.
  8. 10 / 31 Natural language processing, NLP • Also called

    as "computational linguistics." Recently, someone call it "human language technology." • A technical field for computer to process (i.e., read, write, and hopefully interpret) natural language.
  9. 11 / 31 Related fields • artificial Intelligence, AI •

    speech recognition and character recognition • information retrieval, IR • database Wider research areas are related; in our university, researchers in department of Electrical engineering, Management and information system engineering, Language center, International center, and E- learning center are somewhat engaged.
  10. 12 / 31 NLP application • language input (of Japanese,

    Chinese, etc.) • Web page search (Google, Yahoo!, Bing, etc.) • spelling correction and proofreading • machine translation (also called "automatic translation") • document summarization • and more.
  11. 14 / 31 Significance of NLP (1) NLP is significant

    for human-to-human communication. • We use language everyday. Language is a most important "tool" or "media" for human communication. • We human now have huge amount of language information available on the Web. It's still impossible to find what I want without help of computers. • sign language and braille • communication with nonnative people
  12. 15 / 31 Significance of NLP(2) NLP is significant for

    human-to-computer communication. • Easy-to-use computers are required as computer users (of beginners) are increasing day by day. • You are very happy if computers understand our language, since – you want to ask to translate these slides into Japanese, – and summarize them in order to read it later.
  13. 16 / 31 Significance of NLP(3) NLP is significant as

    a basic research for related research field: • AI to find out mechanism of knowledge computation, • brain science to find out mechanism of brain processing, and • cognitive science to find out mechanism of language acquisition.
  14. 18 / 31 History(1) 1950's and before • History of

    NLP have been started soon after invention of electronic computer system. • The history started with machine translation. – 1947: research started in Britain and France. – 1954: demonstration of Russian-English translation. – 1955: research started in Japan (in national lab and Kyushu Univ.) – around 1960: first peak of researchers in US.
  15. 19 / 31 History (2) 1960's • ALPAC (Automatic Language

    Processing Advisory Committee) of US Government have reported in 1964; – that emphasized the need for basic research in computational linguistics. – This report eventually caused the US Government to reduce its funding of the topic dramatically. • The boom has passed.
  16. 20 / 31 History (3) 1970's • Europe has different

    situation; – necessity of machine translation due to its many-language community. – 1982-1991: seven-language machine translation project • In Japan; – word processor available to public. – 1982-1985: Mu project (joint research by Science and Technology Agency and Kyoto University)
  17. 21 / 31 History (4) 1980's "AI boom" is started.

    • Computer power is greatly improved. • Some experimental AI system are developed; – ELIZA (1966): dialogue system – SHRDLU (1972): natural language understanding system in a small "blocks world" "Office Automation": (Japanese) companies start to use computer. • Number of computer texts are increased and stored.
  18. 22 / 31 History (5) 1990's • Computers are also

    in widespread at home. – Number of electrical texts are increased further. • Internet generation is started. – Electrical texts are starting to be distributed (as e-mails and Web pages).
  19. 23 / 31 History (6) 2000's More and more texts

    are stored "on the clouds." – that affects advertising, education, and many fields of our life. – As people casually write messages to blogs everyday, blogs are becoming one of essential communication tool. The rise of Google – organizing the world's information and make it universally accessible and useful Japan: some national projects again: – "Information-explosion" project – "Information Grand Voyage" project – Project for authenticity verification of information
  20. 25 / 31 History (7) present • Kana-Kanji conversion (IME)

    • Web page search engine • machine translation • spelling correction and proofreading • sentiment analysis
  21. 27 / 31 Difficulties of language processing Processing language is

    not so easy due to the following reasons; • Few (or no) mathematical and physical background (cf. speech processing, image processing, and numerical processing). • Difficult to formulate language phenomena (in order to utilize some mathematical, statistical technology) • Only loose rules in language. Many exceptions for regularity of language, as language changes day by day.
  22. 28 / 31 Change of language • 食べれる, 投げれる –

    change of conjugation • すごいきれい – change of part-of-speech • 全然OK – change of usage; it was used only in negative sentences. • 超気持ちいい / てゆうか、... / … みたいな。
  23. 29 / 31 Style of language • Law and contract

    – written, technical, long • Comic duo (Manzai) – spoken, dialect, dialog • Web BBS (such as 2-channel) – intermediate of written and spoken (?), intentionally broken, future style (?)
  24. 30 / 31 Exception of rules: an example Hypothesis: Use

    「お」for Japanese-origin words,「ご/御」for Chinese- origin words, and never use such prefixes for Western-derived words. Examples: お名前、おさかな、お食べになる ご氏名、ご苦労、ご感想、御入学 (?)おデータ、(?)ごテニス、(?)御バイク Exception:
  25. 31 / 31 Today's keywords • history of language processing

    • significance of language • characteristics of language • difficulties of language processing