Natural Language Processing (1) Introduction

Natural Language Processing (1) Introduction

Kazuhide Yamamoto
Nagaoka University of Technology

C04e17d9b3810e5c0ad22cb8a12589de?s=128

自然言語処理研究室

September 06, 2013
Tweet

Transcript

  1. 1.

    1 / 31 Natural Language Processing (1) Introduction Kazuhide Yamamoto

    Dept. of Electrical Engineering Nagaoka University of Technology
  2. 3.

    3 / 31 About this course • For graduate students

    of Electrical Dept. • Classes are held in Information Processing Center. • May cancel some (maybe one or two) classes. • No textbook is used. Slides are used instead. – You will see them on the Web, just before each class. • The assessment will be assigned by a final report, and some questions given during the class.
  3. 4.

    4 / 31 My policy • Although students are supposed

    to have background of information processing, all students are welcome to join. • Broader topics are introduced as you understand our technology field correctly. • Some technical details are omitted as (most of) you will not be a researcher/engineer in this field. • Recent topics are attempted to be included.
  4. 6.

    6 / 31 What is “language”? • Japanese, English, Chinese,

    ... • Esperanto • sign language(for hearing impaired), braille(for visually impaired) • C++, Java, Python, MATLAB, ... • musical score, mathematical formula
  5. 7.

    7 / 31 Definition of “language” Language is a system

    of objects or symbols, such as sounds or character sequences, that can be combined in various ways following a set of rules, especially to communicate thoughts, feelings, or instructions. The American Heritage® Science Dictionary Copyright © 2005 by Houghton Mifflin Company. Published by Houghton Mifflin Company. All rights reserved.
  6. 8.

    8 / 31 Natural language • What is "natural language"?

    – it's language that is "natural" ?? – Is it used to contrast "unnatural language" ???
  7. 9.

    9 / 31 Natural language (cont'd) • Natural language is

    language used by human for communication. • There is a language used NOT for human, such as some computer languages, so we call them "natural" language for human and "formal" language for mathematics (and computers). • I don't know why it's called as "natural" language and who starts to call it.
  8. 10.

    10 / 31 Natural language processing, NLP • Also called

    as "computational linguistics." Recently, someone call it "human language technology." • A technical field for computer to process (i.e., read, write, and hopefully interpret) natural language.
  9. 11.

    11 / 31 Related fields • artificial Intelligence, AI •

    speech recognition and character recognition • information retrieval, IR • database Wider research areas are related; in our university, researchers in department of Electrical engineering, Management and information system engineering, Language center, International center, and E- learning center are somewhat engaged.
  10. 12.

    12 / 31 NLP application • language input (of Japanese,

    Chinese, etc.) • Web page search (Google, Yahoo!, Bing, etc.) • spelling correction and proofreading • machine translation (also called "automatic translation") • document summarization • and more.
  11. 14.

    14 / 31 Significance of NLP (1) NLP is significant

    for human-to-human communication. • We use language everyday. Language is a most important "tool" or "media" for human communication. • We human now have huge amount of language information available on the Web. It's still impossible to find what I want without help of computers. • sign language and braille • communication with nonnative people
  12. 15.

    15 / 31 Significance of NLP(2) NLP is significant for

    human-to-computer communication. • Easy-to-use computers are required as computer users (of beginners) are increasing day by day. • You are very happy if computers understand our language, since – you want to ask to translate these slides into Japanese, – and summarize them in order to read it later.
  13. 16.

    16 / 31 Significance of NLP(3) NLP is significant as

    a basic research for related research field: • AI to find out mechanism of knowledge computation, • brain science to find out mechanism of brain processing, and • cognitive science to find out mechanism of language acquisition.
  14. 18.

    18 / 31 History(1) 1950's and before • History of

    NLP have been started soon after invention of electronic computer system. • The history started with machine translation. – 1947: research started in Britain and France. – 1954: demonstration of Russian-English translation. – 1955: research started in Japan (in national lab and Kyushu Univ.) – around 1960: first peak of researchers in US.
  15. 19.

    19 / 31 History (2) 1960's • ALPAC (Automatic Language

    Processing Advisory Committee) of US Government have reported in 1964; – that emphasized the need for basic research in computational linguistics. – This report eventually caused the US Government to reduce its funding of the topic dramatically. • The boom has passed.
  16. 20.

    20 / 31 History (3) 1970's • Europe has different

    situation; – necessity of machine translation due to its many-language community. – 1982-1991: seven-language machine translation project • In Japan; – word processor available to public. – 1982-1985: Mu project (joint research by Science and Technology Agency and Kyoto University)
  17. 21.

    21 / 31 History (4) 1980's "AI boom" is started.

    • Computer power is greatly improved. • Some experimental AI system are developed; – ELIZA (1966): dialogue system – SHRDLU (1972): natural language understanding system in a small "blocks world" "Office Automation": (Japanese) companies start to use computer. • Number of computer texts are increased and stored.
  18. 22.

    22 / 31 History (5) 1990's • Computers are also

    in widespread at home. – Number of electrical texts are increased further. • Internet generation is started. – Electrical texts are starting to be distributed (as e-mails and Web pages).
  19. 23.

    23 / 31 History (6) 2000's More and more texts

    are stored "on the clouds." – that affects advertising, education, and many fields of our life. – As people casually write messages to blogs everyday, blogs are becoming one of essential communication tool. The rise of Google – organizing the world's information and make it universally accessible and useful Japan: some national projects again: – "Information-explosion" project – "Information Grand Voyage" project – Project for authenticity verification of information
  20. 24.
  21. 25.

    25 / 31 History (7) present • Kana-Kanji conversion (IME)

    • Web page search engine • machine translation • spelling correction and proofreading • sentiment analysis
  22. 27.

    27 / 31 Difficulties of language processing Processing language is

    not so easy due to the following reasons; • Few (or no) mathematical and physical background (cf. speech processing, image processing, and numerical processing). • Difficult to formulate language phenomena (in order to utilize some mathematical, statistical technology) • Only loose rules in language. Many exceptions for regularity of language, as language changes day by day.
  23. 28.

    28 / 31 Change of language • 食べれる, 投げれる –

    change of conjugation • すごいきれい – change of part-of-speech • 全然OK – change of usage; it was used only in negative sentences. • 超気持ちいい / てゆうか、... / … みたいな。
  24. 29.

    29 / 31 Style of language • Law and contract

    – written, technical, long • Comic duo (Manzai) – spoken, dialect, dialog • Web BBS (such as 2-channel) – intermediate of written and spoken (?), intentionally broken, future style (?)
  25. 30.

    30 / 31 Exception of rules: an example Hypothesis: Use

    「お」for Japanese-origin words,「ご/御」for Chinese- origin words, and never use such prefixes for Western-derived words. Examples: お名前、おさかな、お食べになる ご氏名、ご苦労、ご感想、御入学 (?)おデータ、(?)ごテニス、(?)御バイク Exception:
  26. 31.

    31 / 31 Today's keywords • history of language processing

    • significance of language • characteristics of language • difficulties of language processing