of Electrical Dept. • Classes are held in Information Processing Center. • May cancel some (maybe one or two) classes. • No textbook is used. Slides are used instead. – You will see them on the Web, just before each class. • The assessment will be assigned by a final report, and some questions given during the class.
to have background of information processing, all students are welcome to join. • Broader topics are introduced as you understand our technology field correctly. • Some technical details are omitted as (most of) you will not be a researcher/engineer in this field. • Recent topics are attempted to be included.
language used by human for communication. • There is a language used NOT for human, such as some computer languages, so we call them "natural" language for human and "formal" language for mathematics (and computers). • I don't know why it's called as "natural" language and who starts to call it.
as "computational linguistics." Recently, someone call it "human language technology." • A technical field for computer to process (i.e., read, write, and hopefully interpret) natural language.
speech recognition and character recognition • information retrieval, IR • database Wider research areas are related; in our university, researchers in department of Electrical engineering, Management and information system engineering, Language center, International center, and E- learning center are somewhat engaged.
for human-to-human communication. • We use language everyday. Language is a most important "tool" or "media" for human communication. • We human now have huge amount of language information available on the Web. It's still impossible to find what I want without help of computers. • sign language and braille • communication with nonnative people
human-to-computer communication. • Easy-to-use computers are required as computer users (of beginners) are increasing day by day. • You are very happy if computers understand our language, since – you want to ask to translate these slides into Japanese, – and summarize them in order to read it later.
a basic research for related research field: • AI to find out mechanism of knowledge computation, • brain science to find out mechanism of brain processing, and • cognitive science to find out mechanism of language acquisition.
NLP have been started soon after invention of electronic computer system. • The history started with machine translation. – 1947: research started in Britain and France. – 1954: demonstration of Russian-English translation. – 1955: research started in Japan (in national lab and Kyushu Univ.) – around 1960: first peak of researchers in US.
Processing Advisory Committee) of US Government have reported in 1964; – that emphasized the need for basic research in computational linguistics. – This report eventually caused the US Government to reduce its funding of the topic dramatically. • The boom has passed.
situation; – necessity of machine translation due to its many-language community. – 1982-1991: seven-language machine translation project • In Japan; – word processor available to public. – 1982-1985: Mu project (joint research by Science and Technology Agency and Kyoto University)
• Computer power is greatly improved. • Some experimental AI system are developed; – ELIZA (1966): dialogue system – SHRDLU (1972): natural language understanding system in a small "blocks world" "Office Automation": (Japanese) companies start to use computer. • Number of computer texts are increased and stored.
in widespread at home. – Number of electrical texts are increased further. • Internet generation is started. – Electrical texts are starting to be distributed (as e-mails and Web pages).
are stored "on the clouds." – that affects advertising, education, and many fields of our life. – As people casually write messages to blogs everyday, blogs are becoming one of essential communication tool. The rise of Google – organizing the world's information and make it universally accessible and useful Japan: some national projects again: – "Information-explosion" project – "Information Grand Voyage" project – Project for authenticity verification of information
not so easy due to the following reasons; • Few (or no) mathematical and physical background (cf. speech processing, image processing, and numerical processing). • Difficult to formulate language phenomena (in order to utilize some mathematical, statistical technology) • Only loose rules in language. Many exceptions for regularity of language, as language changes day by day.
change of conjugation • すごいきれい – change of part-of-speech • 全然OK – change of usage; it was used only in negative sentences. • 超気持ちいい / てゆうか、... / … みたいな。
– written, technical, long • Comic duo (Manzai) – spoken, dialect, dialog • Web BBS (such as 2-channel) – intermediate of written and spoken (?), intentionally broken, future style (?)
「お」for Japanese-origin words,「ご/御」for Chinese- origin words, and never use such prefixes for Western-derived words. Examples: お名前、おさかな、お食べになる ご氏名、ご苦労、ご感想、御入学 (?)おデータ、(?)ごテニス、(?)御バイク Exception: