Slide 1

Slide 1 text

Build Mandarin AI Conversational Agent with Rasa 陳皓遠(@circlelychen) 2019/11/23 @ Taichung.py

Slide 2

Slide 2 text

打造面向金融場景 的中文自然語言理 解引擎 v Speaker @ PyCon Taiwan 2019 Ø https://speakerdeck.com/circlel ychen/da-zao-mian-xiang-jin- rong-chang-jing-de-zhong-wen- zi-ran-yu-yan-li-jie-yin-qing Ø https://www.youtube.com/watch? v=o7DMrWVMZCA Community Experiences

Slide 3

Slide 3 text

Outline Introduction to Conversational Agent Rasa framework • Introduction • Simple Tutorial Build Rasa custom components based on ckiptagger • Motivation and mechanism • Introduction to ckiptagger • Components Implementation Demo

Slide 4

Slide 4 text

Levels of Conversational Assistants Level 1: Notifications Level 2 FAQs Level 4 Personal Assistants Level 5 Autonomous Organization of Assistants Level 3 Contextual Assistants Intelligence Time

Slide 5

Slide 5 text

Notifications Hao Yuan Chen (circlelychen) opened !417 update nlu corpus in drd-ctbc/nlp Hao Yuan Chen pushed to branch master of drd- ctbc/nlp (Compare changes) • Push notification • Users passively receive notifications • Nothing happens when users reply FAQs I need to renew my renter's insurance. How much will it be ? You can calculate your renewal price on our website: https://xxx.bbb/site • Users get a response by asking simple question • Stimulate basic FAQ pages with a search tool • Most common type of assistant right now

Slide 6

Slide 6 text

Contextual Assistants Yes Thanks! Your renew rate from Sept. 1st onwards would be $10 /month Yes Great – so just confirming it’s 980 sq ft ? I need to renew my renter's insurance. How much will it be ? I’d be happy to check for you. Firstly, are you still living in the same apartment ? • Allow users freely chat endless as expected • Be capable of understanding and responding with multiple follow-up questions • Context • what the user has said before is expected knowledge

Slide 7

Slide 7 text

Outline Introduction to Conversational Agent Rasa framework • Introduction • Simple tutorial Build Rasa custom components based on ckiptagger • Motivation and mechanism on customizing Rasa NLU pipeline • Introduction to ckiptagger • Components Implementation Demo

Slide 8

Slide 8 text

Task-oriented conversational agent https://www.csie.ntu.edu.tw/~yvchen/s105-icb/doc/170321_Ontology.pdf

Slide 9

Slide 9 text

Performance Benchmark on Conversational Agents Benchmarking Natural Language Understanding Services for building Conversational Agents Xingkun Liu, Arash Eshghi, Pawel Swietojanski, Verena Rieser Accepted by IWSDS2019

Slide 10

Slide 10 text

Strategy Survey on Conversational Agents Rasa Dialogflow LUIS Watson On premise Free Open source Extendible Privacy

Slide 11

Slide 11 text

Rasa Modules Rasa An open source machine learning tools For Developers For Conversational AI NLU Natural language processing • Intent classification • Entity extraction Integration Knowledge base interaction • Knowledge base interaction • Language generation Channel User Interface • Message • Voice Core Contextual Dialogue management • Decision making • Context tracking

Slide 12

Slide 12 text

Rasa Architecture

Slide 13

Slide 13 text

Rasa Architecture Rasa NLU Rasa Core Integration Channel

Slide 14

Slide 14 text

Rasa Dialogue Flow I am sick. I need GP in 94310. Context Should we look for the earlier appointment ?

Slide 15

Slide 15 text

Project Structure QuickStart • config.yml • Pipeline for NLU • Policies for Core • nlu.md • Training data for NLU $> pip3 install rasa[tensorflow] $> rasa init • stories.md • Training data for Core • domain.yml • Chatbot’s domain • Actions

Slide 16

Slide 16 text

Steps for Preparing NLU Training Data Step1: Collect dialogue samples Step2: Define labels Step3: Compose data/nlu.md and data/domain.yml • Leverage the knowledge of domain experts • Check the most common search queries and questions • Define intents by observing dialogue set • Define entities by checking search queries • Annotate samples with intents and entities in data/nlu.md • Dump intents and entities in data/domain.yml

Slide 17

Slide 17 text

Step 1: Collect dialogue samples Good morning how can I help you? Thank you Bye-bye I want a british restaurant in the east part of town here's what I found: xxx, yyy hello there how can I help you? can you book a table in london in a expensive price range with spanish cuisine for two ok let me see what I can find hey bot how can I help you? west part of town for how many people? find me a cheap vietnamese restaurant where? … … … … …

Slide 18

Slide 18 text

Step 2: Define labels im looking for an expensive restaurant in the east town want something in the south side of town thats moderately priced good morning hello there hey bot good evening good afternoon hey okay thank you thank you goodbye thanks goodbye thank you good bye thank you goodbye you rock

Slide 19

Slide 19 text

Step 2: Define labels im looking for an expensive restaurant in the east town want something in the south side of town thats moderately priced good morning hello there hey bot good evening good afternoon hey okay thank you thank you goodbye thanks goodbye thank you good bye thank you goodbye you rock • Intent: Inform • Intent: Greet • Intent: Thankyou Entity price location

Slide 20

Slide 20 text

## intent:greet - good morning - hello there … ## intent:thankyou - okay thank you - thank you bye … ## intent:inform - im looking for an [expensive](price) restaurant in the [east](location) town - want something in the [south](location) side of town that’s [moderately](price:moderate) priced - what about [italian](location) … Step 3: Compose data/nlu.md and data/domain.yml data/nlu.md …. …. entities: - Location - Price - Cuisine Intents: - great - thankyou - info … … data/domain.yml

Slide 21

Slide 21 text

Setup Rasa NLU Pipeline Step1: Choose built-in pipeline or configure them by components Step2: Compose in data/config.md language: "en” pipeline: "supervised_embeddings" language: "en” pipeline: "pretrained_embeddings_spacy" Built-in pipeline language: “en” pipeline: - name: "SpacyNLP" - name: "SpacyTokenizer" - name: "SpacyFeaturizer" - name: "SklearnIntentClassifier" - name: "CRFEntityExtractor" - name: "EntitySynonymMapper" language: “en” pipeline: - name: "WhitespaceTokenizer" - name: "RegexFeaturizer" - name: "CRFEntityExtractor" - name: "EntitySynonymMapper" - name: "CountVectorsFeaturizer” analyze’: ‘word’- name token_pattern: ‘(?u)\b\w+\b’ - name: "EmbeddingIntentClassifier" Pipeline-embedded components

Slide 22

Slide 22 text

Pros and Cons between two Built-in Pipeline supervised_embeddings pretrained_embeddings_spacy Pros • The model pick up domain specific vocabulary • Support any language that can be tokenized Cons • Plenty of data required • More training time indeed Pros • Better model performance with less training data required • Faster training time Cons • pre-trained word embeddings • No specific domain vocabulary

Slide 23

Slide 23 text

Decision Making on Pipeline Selection

Slide 24

Slide 24 text

How NLU Pipelines work Pipeline flow Component steps Rasa NLU training lifecycle

Slide 25

Slide 25 text

Rasa NLU Pipeline Anatomy language: “en” pipeline: - name: "WhitespaceTokenizer" - name: "RegexFeaturizer" - name: "CRFEntityExtractor" - name: "EntitySynonymMapper" - name: "CountVectorsFeaturizer” analyze’: ‘word’- name token_pattern: ‘(?u)\b\w+\b’ - name: "EmbeddingIntentClassifier" supervised_embeddings language: “en” pipeline: - name: "WhitespaceTokenizer" - name: "RegexFeaturizer" - name: "CRFEntityExtractor" - name: "EntitySynonymMapper" - name: "CountVectorsFeaturizer” analyze’: ‘word’- name token_pattern: ‘(?u)\b\w+\b’ - name: "EmbeddingIntentClassifier" Components for entity extraction Components for intent recognition

Slide 26

Slide 26 text

Demo Rasa NLU with built-in pipeline

Slide 27

Slide 27 text

Rasa NLU Demo Commands # Evaluation based on confusion matrix $> rasa test # NLU model training $> rasa train nlu # NLU model Inference via shell $> rasa shell nlu

Slide 28

Slide 28 text

Steps for Prepare Rasa Core Training Data Step1: Design dialogue flow Step2: Design dialogue flow interns of intents and entities Step2: Compose data/stories.md and domain.yml Good morning how can I help you? afghan food for how many people? I want a british restaurant in the east part of town what kind of cuisine would you like? … …

Slide 29

Slide 29 text

Steps for Prepare Rasa Core Training Data Step1: Design dialogue flow Step2: Design dialogue flow interns of intents and entities Step2: Compose data/stories.md and domain.yml Good morning how can I help you? afghan food for how many people? I want a british restaurant in the east part of town what kind of cuisine would you like? … … ## story_1 * greet - utter_ask_howcanhelp *inform{"location": "london"} - utter_ask_cuisine * inform{"cuisine": "spanish"} - utter_ask_numpeople …

Slide 30

Slide 30 text

Steps for Prepare Rasa Core Training Data Step1: Design dialogue flow Step2: Design dialogue flow interns of intents and entities Step2: Compose data/stories.md and domain.yml Good morning how can I help you? afghan food for how many people? I want a british restaurant in the east part of town what kind of cuisine would you like? … … ## story_1 * greet - utter_ask_howcanhelp *inform{"location": "london"} - utter_ask_cuisine * inform{"cuisine": "spanish"} - utter_ask_numpeople … template utter_ask_cuisine: - text: "what kind of cuisine would you like?" utter_ask_howcanhelp: - text: "how can I help you?” utter_ask_numpeople: - text: "for how many people?” …

Slide 31

Slide 31 text

Steps Rasa Core Policy Step1: Design dialogue flow Step2: Choose built-in or custom policies Step2: Compose domain.yml policies: - name: ”FallbackPolicy" - name: ”MappingPolicy" - name: ”KerasPolicy" - name: ”MemoizationPolicy” Priority 5 EmbeddingPolicy KeraPolicy SklearnPoklicy Priority on rule matching Priority 4 Mapping Policy Priority 3 MemoizationPolicy AugmentedMemoizationPolicy Priority 2 FallbackPolicy TwoStageFallbackPolicy Priority 1 FormPolicy https://rasa.com/docs/rasa/core/policies/

Slide 32

Slide 32 text

Demo Rasa NLU + Rasa Core with built-in pipeline

Slide 33

Slide 33 text

Rasa NLU Demo Commands # NLU model training $> rasa train # NLU model Inference via shell $> rasa shell

Slide 34

Slide 34 text

Outline Introduction to Conversational Agent Rasa framework • Introduction • Simple tutorial Build Rasa custom components based on ckiptagger • Motivation and mechanism • Introduction to ckiptagger • Components Implementation Demo

Slide 35

Slide 35 text

Motivation Challenge on bad performance in Mandarin Reasons • Word segmentation is a hard problem instead of white space delimiter • Token- based features extraction on Mandarin is unique skill 寫個能幹的中⽂斷詞系統 @ PyCon Taiwan 2019 https://tw.pycon.org/2019/en-us/events/talk/852751430614778081/

Slide 36

Slide 36 text

Proposed Solution Modify supervised_embeddings with custom components language: "zh" pipeline: - name: ”CKIPTokenizer" - name: ”CKIPFeaturizer" - name: "CRFEntityExtractor" - name: "EntitySynonymMapper" - name: "CountVectorsFeaturizer” analyze’: ‘word’- name token_pattern: ‘(?u)\b\w+\b’ - name: "EmbeddingIntentClassifier" • Create pipeline for zh • Implement tokenizer for Mandarin • Implement featurizer for Mandarin tokens Empower Mandarin capability to rasa-based chatbot

Slide 37

Slide 37 text

Mechanism Step1: Create component skeleton Step2: Define attributes Step3: Implement required methods • __init__ • train • process • persist • load • name • provides • requires • defaults • language_list from … import Component Class Tokenizer(Component): “”” Build our own custom component “”” # define attributes … # implement methods …

Slide 38

Slide 38 text

ckiptagger • Deep learning based tool for • Word segmentation • POS tagging • Named entity recognition • Pure python package with simple dependency • Tensorflow >= 1.13 • GPL-3.0 license https://github.com/ckiplab/ckiptagger

Slide 39

Slide 39 text

Features Name Entity Recognition Word Segmentation and POSTagging 傅達仁今將執⾏安樂死,卻突然爆出⾃⼰20年前遭緯來體育台封殺,他不懂⾃⼰哪裡得罪到電視台。 美國參議院針對今天總統布什所提名的勞⼯部⻑趙⼩蘭展開認可聽證會,預料她將會很順利通過參議院⽀持,成為該國有 史以來第⼀位的華裔⼥性內閣成員。 Text

Slide 40

Slide 40 text

Implement Rasa NLU components embedded ckiptagger https://github.com/circlelychen/rukip

Slide 41

Slide 41 text

Demo • Intent recognition • CKIPTokenizer (customized) • EmbeddingIntentClassifier (built-in) • Named Entity Recognition • CKIPTokenizer (customized) • CKIPFeaturizer (customized) Rasa NLU + Rasa Core + rukip + Google assistant