Build Mandarin AI
Conversational Agent with Rasa
陳皓遠(@circlelychen)
2019/11/23 @ Taichung.py
Slide 2
Slide 2 text
打造面向金融場景
的中文自然語言理
解引擎
v Speaker @ PyCon Taiwan 2019
Ø https://speakerdeck.com/circlel
ychen/da-zao-mian-xiang-jin-
rong-chang-jing-de-zhong-wen-
zi-ran-yu-yan-li-jie-yin-qing
Ø https://www.youtube.com/watch?
v=o7DMrWVMZCA
Community Experiences
Slide 3
Slide 3 text
Outline
Introduction to Conversational Agent
Rasa framework
• Introduction
• Simple Tutorial
Build Rasa custom components based on ckiptagger
• Motivation and mechanism
• Introduction to ckiptagger
• Components Implementation
Demo
Slide 4
Slide 4 text
Levels of Conversational Assistants
Level 1:
Notifications
Level 2
FAQs
Level 4
Personal Assistants
Level 5
Autonomous Organization of Assistants
Level 3
Contextual Assistants
Intelligence
Time
Slide 5
Slide 5 text
Notifications
Hao Yuan Chen (circlelychen)
opened !417 update nlu
corpus in drd-ctbc/nlp
Hao Yuan Chen pushed to
branch master of drd-
ctbc/nlp (Compare changes)
• Push notification
• Users passively receive notifications
• Nothing happens when users reply
FAQs
I need to renew my renter's
insurance. How much will it be ?
You can calculate your renewal
price on our website:
https://xxx.bbb/site
• Users get a response by asking simple
question
• Stimulate basic FAQ pages with a search
tool
• Most common type of assistant right
now
Slide 6
Slide 6 text
Contextual Assistants
Yes
Thanks! Your renew rate from
Sept. 1st onwards would be $10
/month
Yes
Great – so just confirming it’s 980
sq ft ?
I need to renew my renter's
insurance. How much will it be ?
I’d be happy to check for you.
Firstly, are you still living in
the same apartment ?
• Allow users freely chat endless as
expected
• Be capable of understanding and
responding with multiple follow-up
questions
• Context
• what the user has said before is
expected knowledge
Slide 7
Slide 7 text
Outline
Introduction to Conversational Agent
Rasa framework
• Introduction
• Simple tutorial
Build Rasa custom components based on ckiptagger
• Motivation and mechanism on customizing Rasa NLU pipeline
• Introduction to ckiptagger
• Components Implementation
Demo
Performance Benchmark on Conversational Agents
Benchmarking Natural Language Understanding Services for building
Conversational Agents
Xingkun Liu, Arash Eshghi, Pawel Swietojanski, Verena Rieser
Accepted by IWSDS2019
Slide 10
Slide 10 text
Strategy Survey on Conversational Agents
Rasa
Dialogflow
LUIS
Watson
On premise Free Open source Extendible Privacy
Slide 11
Slide 11 text
Rasa Modules
Rasa
An open source machine learning tools
For Developers
For Conversational AI
NLU
Natural language processing
• Intent classification
• Entity extraction
Integration
Knowledge base interaction
• Knowledge base interaction
• Language generation
Channel
User Interface
• Message
• Voice
Core
Contextual Dialogue management
• Decision making
• Context tracking
Slide 12
Slide 12 text
Rasa Architecture
Slide 13
Slide 13 text
Rasa Architecture
Rasa NLU
Rasa Core
Integration
Channel
Slide 14
Slide 14 text
Rasa Dialogue Flow
I am sick. I need GP in 94310.
Context
Should we look for the earlier appointment ?
Slide 15
Slide 15 text
Project Structure
QuickStart
• config.yml
• Pipeline for NLU
• Policies for Core
• nlu.md
• Training data for
NLU
$> pip3 install rasa[tensorflow]
$> rasa init
• stories.md
• Training data
for Core
• domain.yml
• Chatbot’s
domain
• Actions
Slide 16
Slide 16 text
Steps for Preparing NLU Training Data
Step1:
Collect dialogue samples
Step2:
Define labels
Step3:
Compose data/nlu.md and
data/domain.yml
• Leverage the knowledge of
domain experts
• Check the most common
search queries and questions
• Define intents by observing
dialogue set
• Define entities by checking
search queries
• Annotate samples with
intents and entities in
data/nlu.md
• Dump intents and entities in
data/domain.yml
Slide 17
Slide 17 text
Step 1: Collect dialogue samples
Good morning
how can I help you?
Thank you
Bye-bye
I want a british
restaurant in the east
part of town
here's what I found:
xxx,
yyy
hello there
how can I help you?
can you book a table in
london in a expensive
price range with
spanish cuisine for two
ok let me see what I can
find
hey bot
how can I help you?
west part of town
for how many people?
find me a cheap
vietnamese restaurant
where?
…
…
…
…
…
Slide 18
Slide 18 text
Step 2: Define labels
im looking for an expensive
restaurant in the east town
want something in the
south side of town thats
moderately priced
good morning
hello there
hey bot
good evening
good afternoon
hey
okay thank you
thank you goodbye
thanks goodbye
thank you good bye
thank you goodbye
you rock
Slide 19
Slide 19 text
Step 2: Define labels
im looking for an expensive
restaurant in the east town
want something in the
south side of town thats
moderately priced
good morning
hello there
hey bot
good evening
good afternoon
hey
okay thank you
thank you goodbye
thanks goodbye
thank you good bye
thank you goodbye
you rock
• Intent: Inform
• Intent: Greet • Intent: Thankyou
Entity
price location
Slide 20
Slide 20 text
## intent:greet
- good morning
- hello there
…
## intent:thankyou
- okay thank you
- thank you bye
…
## intent:inform
- im looking for an [expensive](price) restaurant in
the [east](location) town
- want something in the [south](location) side of
town that’s [moderately](price:moderate) priced
- what about [italian](location)
…
Step 3: Compose data/nlu.md and data/domain.yml
data/nlu.md
….
….
entities:
- Location
- Price
- Cuisine
Intents:
- great
- thankyou
- info
…
…
data/domain.yml
Pros and Cons between two Built-in Pipeline
supervised_embeddings pretrained_embeddings_spacy
Pros
• The model pick up domain specific
vocabulary
• Support any language that can be tokenized
Cons
• Plenty of data required
• More training time indeed
Pros
• Better model performance with less training
data required
• Faster training time
Cons
• pre-trained word embeddings
• No specific domain vocabulary
Slide 23
Slide 23 text
Decision Making on Pipeline Selection
Slide 24
Slide 24 text
How NLU Pipelines work
Pipeline flow
Component steps
Rasa NLU training lifecycle
Rasa NLU Demo Commands
# Evaluation based on confusion matrix
$> rasa test
# NLU model training
$> rasa train nlu
# NLU model Inference via shell
$> rasa shell nlu
Slide 28
Slide 28 text
Steps for Prepare Rasa Core Training Data
Step1:
Design dialogue flow
Step2:
Design dialogue flow interns of intents and entities
Step2:
Compose data/stories.md and domain.yml
Good morning
how can I help you?
afghan food
for how many people?
I want a british
restaurant in the east
part of town
what kind of cuisine
would you like?
…
…
Slide 29
Slide 29 text
Steps for Prepare Rasa Core Training Data
Step1:
Design dialogue flow
Step2:
Design dialogue flow interns of intents and entities
Step2:
Compose data/stories.md and domain.yml
Good morning
how can I help you?
afghan food
for how many people?
I want a british
restaurant in the east
part of town
what kind of cuisine
would you like?
…
…
## story_1
* greet
- utter_ask_howcanhelp
*inform{"location": "london"}
- utter_ask_cuisine
* inform{"cuisine": "spanish"}
- utter_ask_numpeople
…
Slide 30
Slide 30 text
Steps for Prepare Rasa Core Training Data
Step1:
Design dialogue flow
Step2:
Design dialogue flow interns of intents and entities
Step2:
Compose data/stories.md and domain.yml
Good morning
how can I help you?
afghan food
for how many people?
I want a british
restaurant in the east
part of town
what kind of cuisine
would you like?
…
…
## story_1
* greet
- utter_ask_howcanhelp
*inform{"location": "london"}
- utter_ask_cuisine
* inform{"cuisine": "spanish"}
- utter_ask_numpeople
…
template
utter_ask_cuisine:
- text: "what kind of cuisine would you like?"
utter_ask_howcanhelp:
- text: "how can I help you?”
utter_ask_numpeople:
- text: "for how many people?”
…
Rasa NLU Demo Commands
# NLU model training
$> rasa train
# NLU model Inference via shell
$> rasa shell
Slide 34
Slide 34 text
Outline
Introduction to Conversational Agent
Rasa framework
• Introduction
• Simple tutorial
Build Rasa custom components based on ckiptagger
• Motivation and mechanism
• Introduction to ckiptagger
• Components Implementation
Demo
Slide 35
Slide 35 text
Motivation
Challenge on bad performance in Mandarin
Reasons
• Word segmentation is a hard problem instead of white space
delimiter
• Token- based features extraction on Mandarin is unique skill
寫個能幹的中⽂斷詞系統 @ PyCon Taiwan 2019
https://tw.pycon.org/2019/en-us/events/talk/852751430614778081/
Slide 36
Slide 36 text
Proposed Solution
Modify supervised_embeddings with custom components
language: "zh"
pipeline:
- name: ”CKIPTokenizer"
- name: ”CKIPFeaturizer"
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "CountVectorsFeaturizer”
analyze’: ‘word’- name
token_pattern: ‘(?u)\b\w+\b’
- name: "EmbeddingIntentClassifier"
• Create pipeline for zh
• Implement tokenizer for Mandarin
• Implement featurizer for Mandarin
tokens
Empower Mandarin capability to rasa-based chatbot
ckiptagger
• Deep learning based tool for
• Word segmentation
• POS tagging
• Named entity recognition
• Pure python package with simple
dependency
• Tensorflow >= 1.13
• GPL-3.0 license
https://github.com/ckiplab/ckiptagger
Slide 39
Slide 39 text
Features
Name Entity Recognition
Word Segmentation and POSTagging
傅達仁今將執⾏安樂死,卻突然爆出⾃⼰20年前遭緯來體育台封殺,他不懂⾃⼰哪裡得罪到電視台。
美國參議院針對今天總統布什所提名的勞⼯部⻑趙⼩蘭展開認可聽證會,預料她將會很順利通過參議院⽀持,成為該國有
史以來第⼀位的華裔⼥性內閣成員。
Text
Slide 40
Slide 40 text
Implement Rasa NLU components embedded ckiptagger
https://github.com/circlelychen/rukip
Slide 41
Slide 41 text
Demo
• Intent recognition
• CKIPTokenizer (customized)
• EmbeddingIntentClassifier (built-in)
• Named Entity Recognition
• CKIPTokenizer (customized)
• CKIPFeaturizer (customized)
Rasa NLU + Rasa Core + rukip + Google assistant