Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Bring your chatbots to production

Bring your chatbots to production

According to Gartner, we will spent in the next 3 years, 50% of our IT workloads on chatbot development, rather than mobile app development. Isn't it crazy, that we still use chatbot development processes that can be compaired to how we built apps, 10 years ago? To create a self fulfilling prophecy, let's build Chatbots the modern way.

This talk is about automatic deployments, Development, Test and Production Chatbot Agents, Unit Testing Chatbots, TP, TF, FP and FN. - Enterprises, get ready to bring your chatbots in production!

8a235da15adae86851fa3216834198ed?s=128

Lee Boonstra

June 18, 2019
Tweet

Transcript

  1. Bring your bots to production by using continuous integration pipelines

    Lee Boonstra Sales engineer Google Cloud
  2. Bring your bots to production by using continuous integration pipelines

    During Google Cloud Next 2019; ING has seen the presentation of credit card company: Discover on bringing virtual assistants to production by using continuous integration / development approaches. This helps Discover to enable DF agents to focus on more complex interactions over multiple channels. Discover showed how they are making use of metrics. And afterwards they gave a demo of their staging portal, which they have created for their product owners / scrum team. They explained the flow of bringing chatbot model updates from Dev to staging to production. ING expressed interest to crack the same problem. In the next 20 min, I will explain to you, how you can collect metrics and automate the process of bringing chatbot model updates to production by building a continuous integration pipeline for Dialogflow. Introduction
  3. Enterprise teams working with Dialogflow • IT department ◦ Setup

    the cloud environment, IAM roles, network, rights/roles for usage of Dialogflow, Compute (Kubernetes), ML APIs, Pub/Sub, BigQuery... • Data Scientists ◦ Collect metrics and analytics of frequent asked questions and customer experiences. ◦ Test the conversation. • UX Conversational Designers & Content Writers ◦ Write the conversation. • Engineers ◦ Building fulfillments. ◦ Integrate with web services & APIs. ◦ Configure the chatbot output channels. Typical enterprise organization
  4. Typical flow, of building a chatbot IT department Setup GCP

    Account UX Conversation Designers Create Conversation in Dialogflow UI Engineers Integrate the channels with Dialogflow SDK Engineers (optional) Build fulfillment UX Conversation Designers / Engineers Deploy Agents Data Scientists / UX Conversation Designers Test Agents in Production Channel Data Scientists / UX Conversation Designers Agent Training in Dialogflow UI Data Scientists / UX Conversation Designers Gather Metrics Dialogflow UI / BigQuery UX Conversation Designers Optimize Conversation in Dialogflow UI
  5. Use Case: Discover Discover presented their use case live at

    Google Cloud Next 2019 in San Francisco. See here there recordings: https://www.youtube.com/watch?v=L7nbmHPbrEo
  6. Bring your Dialogflow experience to the next level Learnings Discover

    shared on Google Cloud Next 2019: • Dev Environment ◦ Creation/Edits of Conversations in Dialogflow UI • Staging Environment ◦ Export the Dev Intent Changes, run a Diff to see what’s newly added. ◦ Validation ◦ Run Unit Test with Test User Queries - Request Intent Matching Confidence Score ◦ Regression Tests, to ensure previous created intents aren’t broken ◦ Metrics / Confusion Matrix • Production Environment ◦ Enable / Disable Intents ◦ Per Intent Threshold ◦ Overrule Intents ◦ Live Analytics
  7. IT department Setup GCP Account UX Conversation Designers Create Conversation

    in Dialogflow UI in Dev Engineers Integrate the channels with Dialogflow SDK Engineers (optional) Build fulfillment UX Conversation Designers / Engineers Deploy to Staging Data Scientists / UX Conversation Designers Test Agents in Staging Environment Data Scientists / UX Conversation Designers Agent Training in Dialogflow UI Data Scientists / UX Conversation Designers Gather Metrics Dialogflow UI / BigQuery UX Conversation Designers Optimize Conversation in Dialogflow UI UX Conversation Designers / Engineers Deploy to Production
  8. Example “ The mobile app is not working? Is the

    service down?” MATCHED INTENT: -service down- Response: “Thank you for your comment. We will log this issue.” But when there is a known issue with the app, the business can overrule without modifying the Dialogflow model, by turning the intent off: Response: “Yes. The network is down. We are currently under maintenance.”
  9. Example General Confidence Threshold set to 20% “ I want

    to block my card.” “ This is my account, I want to block card IBAN...” MATCHED INTENT: -blockcard- Except when asked: “ I want to block my account.” Confidence Threshold 90% MATCHED INTENT: -blockaccount-
  10. Dialogflow Features

  11. Dialogflow Intent Detection Card Intent Account Intent Mortgage Intent User

    says: <“I want to disable my account.”> Dialogflow searches for the highest intent “match”. It returns a confidence level. Dialogflow returns response: <“You account has been disabled.”> 1. 2. 3.
  12. Dialogflow Intent Detection • Programmatically you can retrieve the queryResult

    ◦ projects.agent.sessions.detectIntent ◦ Requires: queryInput ▪ Can be text, spoken text, event trigger ◦ Returns: detectntentQueryResponse ▪ Contains queryResult that contains the matched intent and intentDetectionConfidence
  13. Dialogflow Confidence Level Intent Detection Confidence Score – Percentage of

    how confident the model was in the Intent detection Minimum Confidence Threshold – Configurable value in Dialogflow Settings.
  14. Import & Exports • Dialogflow Agents can be exported to

    a zip file, which contains JSON files for each intent and entities. • Programmatically you can export, import and train agents by using the API: ◦ projects.agent.export ◦ projects.agent.import ◦ projects.agent.train
  15. Dialogflow Environments • There’s a default feature in Dialogflow to

    create multiple environments. • Unique Webhooks per environment. • Switch versions / rollback options
  16. Demo’s

  17. Babs the Banking Bot Web Chat Google Assistant Hey Google,

    let me talk to Babs The Banking Bot Welcome, how can I help you? I want to transfer money. Let’s get Babs the Banking Bot How much do you want to transfer? 100 euro.
  18. Which customers are unhappy and why? (Analytics)

  19. How can I improve the user experience? (Analytics)

  20. Collect real-time chats from Dialogflow SDK

  21. Mask sensitive Information with DLP API

  22. Understand the text with NLP API

  23. Store all data in a data-warehouse

  24. Optimize your agent

  25. Confidential + Proprietary Advanced Chatflow with machine learning bot analytics

    User types to custom UI or channel Chatbot replies Dialogflow Enterprise Customer Client JS Angular 5 web front-end Kubernetes Engine Chat Server Dialogflow SDK / socket.io Kubernetes Engine Back-end CRM Python / Django Kubernetes Engine Container Registry Containers images can be stored in the Container Registry Messaging Publisher Pub/Sub Webhook Router Cloud Function Webhook Container Builder Building Dev Pipelines
  26. Confidential + Proprietary Advanced Chatflow with machine learning and bot

    analytics User types to custom UI or channel Chatbot replies Dialogflow Enterprise Customer Client JS Angular 5 web front-end Kubernetes Engine Chat Server Dialogflow SDK / socket.io Kubernetes Engine Back-end CRM Python / Django Kubernetes Engine Subscription Cloud Function Sensitivity Filter DLP API Sentiment Detector NLP API Data Warehouse BigQuery Messaging Publisher Pub/Sub Webhook Router Cloud Function Webhook
  27. Metrics Once you have built your chatbot. The most important

    question that arises is; how good is your ML model?
  28. Test Datasets • UX / Content writers create the validation

    data set. They use this to train the Dialogflow agent model by entering it as user phrases. • To create test data sets that aren’t biased. Use logs from Chat / IVR / Virtual Assistants, seperate from the intent user phrases that are created. User PII data can be anonymized / masked. (Note the FutureBank.nl BigQuery/Dashboard demo) • Create a unit test, that passes in the anonymized test phrase to the detectIntent API method. The detected intent and the confidence score can be evaluated with your validation dataset.
  29. Example: True Positive (TP) A true positive is an outcome

    where the chatbot correctly detects the right (positive) intent. • Dialogflow User Phrases / Data to train the Dialogflow Agent Model ◦ “Did my salary came in yet?” ◦ “Have I received my salary?” ◦ INTENT: Salary Intent • Test Data: ◦ “My salary, when will I receive it?” ◦ Expected Intent: Salary Intent ◦ Detected Intent: Salary Intent
  30. Example: True Negative (TN) / Unsupported Request Similarly, a true

    negative is an outcome where the chatbot correctly mapped the user phrase to a fallback. • Dialogflow User Phrases / Data to train the Dialogflow Agent Model ◦ Everything that can’t be mapped. ◦ Global Fallback • Test Data: ◦ “My salary, when will I receive it?” ◦ Expected Intent: Global Fallback ◦ Detected Intent: Global Fallback
  31. Example: False Positive (FP) / Missed Understood Request A false

    positive is an outcome where the chatbot matches the wrong intent. (It should have been a different intent or in case it didn’t exist, a fallback intent.) • Dialogflow User Phrases / Data to train the Dialogflow Agent Model ◦ I want to block my card. ◦ INTENT: Block Card ◦ I want to renew my card. ◦ INTENT: Renew Card • Test Data: ◦ “My account is blocked, can I get a new card?” ◦ Should be: “Renew Card” ◦ Instead returned: Block Card Missed understood Request
  32. Example: False Negative (FN) / Missed Request And a False

    negative is an outcome where the intent exists, but the chatbot didn’t detect it and therefore a fallback was triggered. • Dialogflow User Phrases / Data to train the Dialogflow Agent Model ◦ “Did my salary came in yet?” ◦ INTENT: Salary Intent • Test Data: ◦ “My salary, when will I receive it?” ◦ Should be: INTENT: Salary Intent ◦ Instead returned: Fallback Message
  33. Calculate Accuracy Is a ratio of correctly predicted observation to

    the total observations. (Ratio of all correct handled intents.) total correct = total TP + total TN. total incorrect = total FP + total FN. accuracy = correct / correct + incorrect
  34. Calculate Precision Is a ratio of positive prediction values. (To

    determine if there are problems with False Positives / misunderstood Requests. The higher the precision the lower the FP rate.) precision = total TP / total TP + total FP
  35. Calculate Recall A sensitivity ratio. (To determine if intents are

    too narrowly defined and missed requests. When it’s above 0.5 it can be considered good.) recall = total TP / total TP + total FN
  36. Calculate F1 Score The weighted average score of precision and

    recall. (To determine if intents are too narrowly defined and missed requests. When it’s above 0.5 it can be considered good.) f1 score = 2 * (recall * precision) / (recall + precision)
  37. True Positive False Negative True Negative False Positive Detected Intent

    by Dialogflow Expected Intent by you. Metrics True positive and true negatives are the observations that are correctly detected and therefore shown in green. We want to minimize the false positives and false negatives. (red).
  38. Confusion Matrix A confusion matrix is a table that is

    often used to describe the performance of a classification model on a set of test data for which the true values are known.
  39. AOC - ROC curve ROC tells us how good the

    model is for distinguishing the given intents, in terms of the detected probability. The steeper the line, the better. Using this info, you can make a decision on how you want to set the confidence thresholds.
  40. Build • You will need to know the intent name,

    and the user phrases it was trained on. • Write your own phrases • Run unit tests on your phrases it('TP', () => { let myUserPhrase = ‘Can I block my card?’; let myUserPhrase2 = ‘Please cancel my pass.’; let intentName = ‘BLOCK_CARD’; expect(detectIntent(myUserPhrase).intent).toBe(intentName); expect(detectIntent(myUserPhrase2).intent).toBe(intentName); }); • Count the total TP, TN, FP, FN, F1, Precision and Recall • Based on these generate a confusion matrix
  41. Solving this programmatically.

  42. Dialogflow Agents • Development / Training – All manual agent

    updates happen here. • Staging – Export from dev to perform all acceptance and regression testing. Artifact is created and versioned from the process. API access only. • Production – Only artifacts are deployed to prod. API access only.
  43. Acceptance Dashboard 1. Export the Dev Intent Changes, and run

    a diff to collect the new intents, and it’s user / training phrases. 2. Upload / Create a validation set based on collected metrics / or create your own for the new intents. 3. Run a a Unit Test and test it against intent and confidence score, by making detectIntent API calls. 4. Run regression test to ensure previous created intents aren’t broken. 5. Plot the results in Confusion Matrix. 6. Compare Metrics against previous version 7. Export a summary report 8. When tests approved, push intents to production environment. (import) 9. When tests are disapproved, send message to development team?
  44. 44 Production flow Current. User types user queries in a

    chatbot. Website Dialogflow matches an intent and replies to a user session Dialogflow Enterprise Customer Client JS Angular 5 web front-end Kubernetes Engine Chat Server Dialogflow SDK Kubernetes Engine
  45. 45 Production flow Advanced flow. User types user queries in

    a chatbot. Website The Chatbot admin server overrules response with custom messages, threshold or fallbacks. Dialogflow Enterprise Customer Client JS Angular 5 web front-end Kubernetes Engine Chat Server Dialogflow SDK Kubernetes Engine Admin Server Dialogflow SDK Kubernetes Engine Dialogflow matches an intent and checks the results with production config.
  46. Admin Dashboard 1. All controlled intents need to have fulfillment

    enabled. 2. List all intents, with switches to enabled / disable intents. a. When enabled, it gets the response from DF UI b. When disabled, catch and respond nothing / overrule responses. 3. Live Analytics Dashboard (like futurebank.nl example)
  47. Demo

  48. Confidential + Proprietary Dialogflow DEV Customer Client JS Angular 5

    web front-end Chat Server Dialogflow SDK Acceptance Board JS Angular 5 web front-end Admin Panel JS Angular 5 web front-end Dialogflow Acceptance Dialogflow Production Production Config Dialogflow SDK Dev Acceptance Prod. export dev intents import to acceptance agent run unit test / metrics push to production
  49. Thanks