Build Mandarin AI Conversational Agent with Rasa

Build Mandarin AI Conversational Agent with Rasa 陳皓遠(@circlelychen) 2019/11/23 @
Taichung.py

打造面向金融場景的中文自然語言理解引擎 v Speaker @ PyCon Taiwan 2019 Ø
https://speakerdeck.com/circlel ychen/da-zao-mian-xiang-jin- rong-chang-jing-de-zhong-wen- zi-ran-yu-yan-li-jie-yin-qing Ø https://www.youtube.com/watch? v=o7DMrWVMZCA Community Experiences

Outline Introduction to Conversational Agent Rasa framework • Introduction •
Simple Tutorial Build Rasa custom components based on ckiptagger • Motivation and mechanism • Introduction to ckiptagger • Components Implementation Demo

Levels of Conversational Assistants Level 1: Notifications Level 2 FAQs
Level 4 Personal Assistants Level 5 Autonomous Organization of Assistants Level 3 Contextual Assistants Intelligence Time

Notifications Hao Yuan Chen (circlelychen) opened !417 update nlu corpus
in drd-ctbc/nlp Hao Yuan Chen pushed to branch master of drd- ctbc/nlp (Compare changes) • Push notification • Users passively receive notifications • Nothing happens when users reply FAQs I need to renew my renter's insurance. How much will it be ? You can calculate your renewal price on our website: https://xxx.bbb/site • Users get a response by asking simple question • Stimulate basic FAQ pages with a search tool • Most common type of assistant right now

Contextual Assistants Yes Thanks! Your renew rate from Sept. 1st
onwards would be $10 /month Yes Great – so just confirming it’s 980 sq ft ? I need to renew my renter's insurance. How much will it be ? I’d be happy to check for you. Firstly, are you still living in the same apartment ? • Allow users freely chat endless as expected • Be capable of understanding and responding with multiple follow-up questions • Context • what the user has said before is expected knowledge

Simple tutorial Build Rasa custom components based on ckiptagger • Motivation and mechanism on customizing Rasa NLU pipeline • Introduction to ckiptagger • Components Implementation Demo

Task-oriented conversational agent https://www.csie.ntu.edu.tw/~yvchen/s105-icb/doc/170321_Ontology.pdf

Performance Benchmark on Conversational Agents Benchmarking Natural Language Understanding Services
for building Conversational Agents Xingkun Liu, Arash Eshghi, Pawel Swietojanski, Verena Rieser Accepted by IWSDS2019

Strategy Survey on Conversational Agents Rasa Dialogflow LUIS Watson On
premise Free Open source Extendible Privacy

Rasa Modules Rasa An open source machine learning tools For
Developers For Conversational AI NLU Natural language processing • Intent classification • Entity extraction Integration Knowledge base interaction • Knowledge base interaction • Language generation Channel User Interface • Message • Voice Core Contextual Dialogue management • Decision making • Context tracking

Rasa Architecture

Rasa Architecture Rasa NLU Rasa Core Integration Channel

Rasa Dialogue Flow I am sick. I need GP in
94310. Context Should we look for the earlier appointment ?

Project Structure QuickStart • config.yml • Pipeline for NLU •
Policies for Core • nlu.md • Training data for NLU $> pip3 install rasa[tensorflow] $> rasa init • stories.md • Training data for Core • domain.yml • Chatbot’s domain • Actions

Steps for Preparing NLU Training Data Step1: Collect dialogue samples
Step2: Define labels Step3: Compose data/nlu.md and data/domain.yml • Leverage the knowledge of domain experts • Check the most common search queries and questions • Define intents by observing dialogue set • Define entities by checking search queries • Annotate samples with intents and entities in data/nlu.md • Dump intents and entities in data/domain.yml

Step 1: Collect dialogue samples Good morning how can I
help you? Thank you Bye-bye I want a british restaurant in the east part of town here's what I found: xxx, yyy hello there how can I help you? can you book a table in london in a expensive price range with spanish cuisine for two ok let me see what I can find hey bot how can I help you? west part of town for how many people? find me a cheap vietnamese restaurant where? … … … … …

Step 2: Define labels im looking for an expensive restaurant
in the east town want something in the south side of town thats moderately priced good morning hello there hey bot good evening good afternoon hey okay thank you thank you goodbye thanks goodbye thank you good bye thank you goodbye you rock

Step 2: Define labels im looking for an expensive restaurant
in the east town want something in the south side of town thats moderately priced good morning hello there hey bot good evening good afternoon hey okay thank you thank you goodbye thanks goodbye thank you good bye thank you goodbye you rock • Intent: Inform • Intent: Greet • Intent: Thankyou Entity price location

## intent:greet - good morning - hello there … ##
intent:thankyou - okay thank you - thank you bye … ## intent:inform - im looking for an [expensive](price) restaurant in the [east](location) town - want something in the [south](location) side of town that’s [moderately](price:moderate) priced - what about [italian](location) … Step 3: Compose data/nlu.md and data/domain.yml data/nlu.md …. …. entities: - Location - Price - Cuisine Intents: - great - thankyou - info … … data/domain.yml

Setup Rasa NLU Pipeline Step1: Choose built-in pipeline or configure
them by components Step2: Compose in data/config.md language: "en” pipeline: "supervised_embeddings" language: "en” pipeline: "pretrained_embeddings_spacy" Built-in pipeline language: “en” pipeline: - name: "SpacyNLP" - name: "SpacyTokenizer" - name: "SpacyFeaturizer" - name: "SklearnIntentClassifier" - name: "CRFEntityExtractor" - name: "EntitySynonymMapper" language: “en” pipeline: - name: "WhitespaceTokenizer" - name: "RegexFeaturizer" - name: "CRFEntityExtractor" - name: "EntitySynonymMapper" - name: "CountVectorsFeaturizer” analyze’: ‘word’- name token_pattern: ‘(?u)\b\w+\b’ - name: "EmbeddingIntentClassifier" Pipeline-embedded components

Pros and Cons between two Built-in Pipeline supervised_embeddings pretrained_embeddings_spacy Pros
• The model pick up domain specific vocabulary • Support any language that can be tokenized Cons • Plenty of data required • More training time indeed Pros • Better model performance with less training data required • Faster training time Cons • pre-trained word embeddings • No specific domain vocabulary

Decision Making on Pipeline Selection

How NLU Pipelines work Pipeline flow Component steps Rasa NLU
training lifecycle

Rasa NLU Pipeline Anatomy language: “en” pipeline: - name: "WhitespaceTokenizer"
- name: "RegexFeaturizer" - name: "CRFEntityExtractor" - name: "EntitySynonymMapper" - name: "CountVectorsFeaturizer” analyze’: ‘word’- name token_pattern: ‘(?u)\b\w+\b’ - name: "EmbeddingIntentClassifier" supervised_embeddings language: “en” pipeline: - name: "WhitespaceTokenizer" - name: "RegexFeaturizer" - name: "CRFEntityExtractor" - name: "EntitySynonymMapper" - name: "CountVectorsFeaturizer” analyze’: ‘word’- name token_pattern: ‘(?u)\b\w+\b’ - name: "EmbeddingIntentClassifier" Components for entity extraction Components for intent recognition

Demo Rasa NLU with built-in pipeline

Rasa NLU Demo Commands # Evaluation based on confusion matrix
$> rasa test # NLU model training $> rasa train nlu # NLU model Inference via shell $> rasa shell nlu

Steps for Prepare Rasa Core Training Data Step1: Design dialogue
flow Step2: Design dialogue flow interns of intents and entities Step2: Compose data/stories.md and domain.yml Good morning how can I help you? afghan food for how many people? I want a british restaurant in the east part of town what kind of cuisine would you like? … …

flow Step2: Design dialogue flow interns of intents and entities Step2: Compose data/stories.md and domain.yml Good morning how can I help you? afghan food for how many people? I want a british restaurant in the east part of town what kind of cuisine would you like? … … ## story_1 * greet - utter_ask_howcanhelp *inform{"location": "london"} - utter_ask_cuisine * inform{"cuisine": "spanish"} - utter_ask_numpeople …

flow Step2: Design dialogue flow interns of intents and entities Step2: Compose data/stories.md and domain.yml Good morning how can I help you? afghan food for how many people? I want a british restaurant in the east part of town what kind of cuisine would you like? … … ## story_1 * greet - utter_ask_howcanhelp *inform{"location": "london"} - utter_ask_cuisine * inform{"cuisine": "spanish"} - utter_ask_numpeople … template utter_ask_cuisine: - text: "what kind of cuisine would you like?" utter_ask_howcanhelp: - text: "how can I help you?” utter_ask_numpeople: - text: "for how many people?” …

Steps Rasa Core Policy Step1: Design dialogue flow Step2: Choose
built-in or custom policies Step2: Compose domain.yml policies: - name: ”FallbackPolicy" - name: ”MappingPolicy" - name: ”KerasPolicy" - name: ”MemoizationPolicy” Priority 5 EmbeddingPolicy KeraPolicy SklearnPoklicy Priority on rule matching Priority 4 Mapping Policy Priority 3 MemoizationPolicy AugmentedMemoizationPolicy Priority 2 FallbackPolicy TwoStageFallbackPolicy Priority 1 FormPolicy https://rasa.com/docs/rasa/core/policies/

Demo Rasa NLU + Rasa Core with built-in pipeline

Rasa NLU Demo Commands # NLU model training $> rasa
train # NLU model Inference via shell $> rasa shell

Simple tutorial Build Rasa custom components based on ckiptagger • Motivation and mechanism • Introduction to ckiptagger • Components Implementation Demo

Motivation Challenge on bad performance in Mandarin Reasons • Word
segmentation is a hard problem instead of white space delimiter • Token- based features extraction on Mandarin is unique skill 寫個能幹的中⽂斷詞系統 @ PyCon Taiwan 2019 https://tw.pycon.org/2019/en-us/events/talk/852751430614778081/

Proposed Solution Modify supervised_embeddings with custom components language: "zh" pipeline:
- name: ”CKIPTokenizer" - name: ”CKIPFeaturizer" - name: "CRFEntityExtractor" - name: "EntitySynonymMapper" - name: "CountVectorsFeaturizer” analyze’: ‘word’- name token_pattern: ‘(?u)\b\w+\b’ - name: "EmbeddingIntentClassifier" • Create pipeline for zh • Implement tokenizer for Mandarin • Implement featurizer for Mandarin tokens Empower Mandarin capability to rasa-based chatbot

Mechanism Step1: Create component skeleton Step2: Define attributes Step3: Implement
required methods • __init__ • train • process • persist • load • name • provides • requires • defaults • language_list from … import Component Class Tokenizer(Component): “”” Build our own custom component “”” # define attributes … # implement methods …

ckiptagger • Deep learning based tool for • Word segmentation
• POS tagging • Named entity recognition • Pure python package with simple dependency • Tensorflow >= 1.13 • GPL-3.0 license https://github.com/ckiplab/ckiptagger

Features Name Entity Recognition Word Segmentation and POSTagging 傅達仁今將執⾏安樂死，卻突然爆出⾃⼰20年前遭緯來體育台封殺，他不懂⾃⼰哪裡得罪到電視台。美國參議院針對今天總統布什所提名的勞⼯部⻑趙⼩蘭展開認可聽證會，預料她將會很順利通過參議院⽀持，成為該國有
史以來第⼀位的華裔⼥性內閣成員。 Text

Implement Rasa NLU components embedded ckiptagger https://github.com/circlelychen/rukip

Demo • Intent recognition • CKIPTokenizer (customized) • EmbeddingIntentClassifier (built-in)
• Named Entity Recognition • CKIPTokenizer (customized) • CKIPFeaturizer (customized) Rasa NLU + Rasa Core + rukip + Google assistant

Build Mandarin AI Conversational Agent with Rasa

Build Mandarin AI Conversational Agent with Rasa

More Decks by circlelychen

Other Decks in Technology

Featured

Transcript