Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Bring your chatbots to production

Bring your chatbots to production

According to Gartner, we will spent in the next 3 years, 50% of our IT workloads on chatbot development, rather than mobile app development. Isn't it crazy, that we still use chatbot development processes that can be compaired to how we built apps, 10 years ago? To create a self fulfilling prophecy, let's build Chatbots the modern way.

This talk is about automatic deployments, Development, Test and Production Chatbot Agents, Unit Testing Chatbots, TP, TF, FP and FN. - Enterprises, get ready to bring your chatbots in production!

Lee Boonstra

June 18, 2019
Tweet

More Decks by Lee Boonstra

Other Decks in Technology

Transcript

  1. Bring your bots to production
    by using continuous integration pipelines
    Lee Boonstra
    Sales engineer Google Cloud

    View Slide

  2. Bring your bots to production
    by using continuous integration pipelines
    During Google Cloud Next 2019; ING has seen the presentation of credit card company: Discover on bringing virtual assistants
    to production by using continuous integration / development approaches.
    This helps Discover to enable DF agents to focus on more complex interactions over multiple channels.
    Discover showed how they are making use of metrics. And afterwards they gave a demo of their staging portal, which they
    have created for their product owners / scrum team. They explained the flow of bringing chatbot model updates from Dev to
    staging to production.
    ING expressed interest to crack the same problem.
    In the next 20 min, I will explain to you, how you can collect metrics and automate the process of bringing chatbot model
    updates to production by building a continuous integration pipeline for Dialogflow.
    Introduction

    View Slide

  3. Enterprise teams working with
    Dialogflow
    ● IT department
    ○ Setup the cloud environment, IAM roles, network, rights/roles for usage of
    Dialogflow, Compute (Kubernetes), ML APIs, Pub/Sub, BigQuery...
    ● Data Scientists
    ○ Collect metrics and analytics of frequent asked questions and customer
    experiences.
    ○ Test the conversation.
    ● UX Conversational Designers & Content Writers
    ○ Write the conversation.
    ● Engineers
    ○ Building fulfillments.
    ○ Integrate with web services & APIs.
    ○ Configure the chatbot output channels.
    Typical enterprise organization

    View Slide

  4. Typical flow, of building a chatbot
    IT department
    Setup GCP Account
    UX Conversation Designers
    Create Conversation in
    Dialogflow UI
    Engineers
    Integrate the channels
    with Dialogflow SDK
    Engineers (optional)
    Build fulfillment
    UX Conversation Designers /
    Engineers
    Deploy Agents
    Data Scientists /
    UX Conversation Designers
    Test Agents in
    Production Channel
    Data Scientists /
    UX Conversation Designers
    Agent Training in
    Dialogflow UI
    Data Scientists /
    UX Conversation Designers
    Gather Metrics
    Dialogflow UI / BigQuery
    UX Conversation Designers
    Optimize Conversation
    in Dialogflow UI

    View Slide

  5. Use Case: Discover
    Discover presented their use case live at
    Google Cloud Next 2019 in San Francisco.
    See here there recordings:
    https://www.youtube.com/watch?v=L7nbmHPbrEo

    View Slide

  6. Bring your Dialogflow
    experience to the next level
    Learnings Discover shared on Google Cloud Next 2019:
    ● Dev Environment
    ○ Creation/Edits of Conversations in Dialogflow UI
    ● Staging Environment
    ○ Export the Dev Intent Changes, run a Diff to see what’s newly added.
    ○ Validation
    ○ Run Unit Test with Test User Queries - Request Intent Matching Confidence Score
    ○ Regression Tests, to ensure previous created intents aren’t broken
    ○ Metrics / Confusion Matrix
    ● Production Environment
    ○ Enable / Disable Intents
    ○ Per Intent Threshold
    ○ Overrule Intents
    ○ Live Analytics

    View Slide

  7. IT department
    Setup GCP Account
    UX Conversation Designers
    Create Conversation in
    Dialogflow UI in Dev
    Engineers
    Integrate the channels
    with Dialogflow SDK
    Engineers (optional)
    Build fulfillment
    UX Conversation Designers /
    Engineers
    Deploy to Staging
    Data Scientists /
    UX Conversation Designers
    Test Agents in Staging
    Environment
    Data Scientists /
    UX Conversation Designers
    Agent Training in
    Dialogflow UI
    Data Scientists /
    UX Conversation Designers
    Gather Metrics
    Dialogflow UI / BigQuery
    UX Conversation Designers
    Optimize Conversation
    in Dialogflow UI
    UX Conversation Designers /
    Engineers
    Deploy to Production

    View Slide

  8. Example
    “ The mobile app is not working? Is the service down?”
    MATCHED INTENT: -service down-
    Response: “Thank you for your comment. We will log this issue.”
    But when there is a known issue with the app, the business can overrule
    without modifying the Dialogflow model, by turning the intent off:
    Response: “Yes. The network is down. We are currently under maintenance.”

    View Slide

  9. Example
    General Confidence Threshold set to 20%
    “ I want to block my card.”
    “ This is my account, I want to block card IBAN...”
    MATCHED INTENT: -blockcard-
    Except when asked:
    “ I want to block my account.”
    Confidence Threshold 90%
    MATCHED INTENT: -blockaccount-

    View Slide

  10. Dialogflow Features

    View Slide

  11. Dialogflow Intent Detection
    Card Intent
    Account Intent
    Mortgage Intent
    User says:
    <“I want to disable my
    account.”>
    Dialogflow searches
    for the
    highest intent
    “match”. It returns a
    confidence level.
    Dialogflow returns
    response:
    <“You account has
    been disabled.”>
    1.
    2.
    3.

    View Slide

  12. Dialogflow Intent Detection
    ● Programmatically you can retrieve the queryResult
    ○ projects.agent.sessions.detectIntent
    ○ Requires: queryInput
    ■ Can be text, spoken text, event trigger
    ○ Returns: detectntentQueryResponse
    ■ Contains queryResult that contains the
    matched intent and
    intentDetectionConfidence

    View Slide

  13. Dialogflow Confidence Level
    Intent Detection Confidence Score – Percentage of how confident the model was in the Intent detection
    Minimum Confidence Threshold – Configurable value in Dialogflow Settings.

    View Slide

  14. Import & Exports
    ● Dialogflow Agents can be exported
    to a zip file, which contains JSON
    files for each intent and entities.
    ● Programmatically you can export,
    import and train agents by using the
    API:
    ○ projects.agent.export
    ○ projects.agent.import
    ○ projects.agent.train

    View Slide

  15. Dialogflow Environments
    ● There’s a default feature in
    Dialogflow to create multiple
    environments.
    ● Unique Webhooks per environment.
    ● Switch versions / rollback options

    View Slide

  16. Demo’s

    View Slide

  17. Babs the Banking Bot
    Web Chat Google Assistant
    Hey Google, let me talk to Babs The Banking Bot
    Welcome, how can I help you?
    I want to transfer money.
    Let’s get Babs the Banking Bot
    How much do you want to transfer?
    100 euro.

    View Slide

  18. Which customers are unhappy and why?
    (Analytics)

    View Slide

  19. How can I improve the user experience?
    (Analytics)

    View Slide

  20. Collect real-time chats
    from Dialogflow SDK

    View Slide

  21. Mask sensitive
    Information
    with DLP API

    View Slide

  22. Understand the text
    with NLP API

    View Slide

  23. Store all data in
    a data-warehouse

    View Slide

  24. Optimize your agent

    View Slide

  25. Confidential + Proprietary
    Advanced Chatflow with machine learning bot analytics
    User types to custom UI
    or channel
    Chatbot replies
    Dialogflow
    Enterprise
    Customer Client
    JS Angular 5 web front-end
    Kubernetes Engine
    Chat Server
    Dialogflow SDK / socket.io
    Kubernetes Engine
    Back-end CRM
    Python / Django
    Kubernetes Engine
    Container
    Registry
    Containers images can be
    stored in the Container Registry
    Messaging Publisher
    Pub/Sub
    Webhook
    Router
    Cloud Function
    Webhook
    Container
    Builder
    Building Dev
    Pipelines

    View Slide

  26. Confidential + Proprietary
    Advanced Chatflow with machine learning and bot analytics
    User types to custom UI
    or channel
    Chatbot replies
    Dialogflow
    Enterprise
    Customer Client
    JS Angular 5 web front-end
    Kubernetes Engine
    Chat Server
    Dialogflow SDK / socket.io
    Kubernetes Engine
    Back-end CRM
    Python / Django
    Kubernetes Engine
    Subscription
    Cloud Function
    Sensitivity
    Filter
    DLP API
    Sentiment
    Detector
    NLP API
    Data
    Warehouse
    BigQuery
    Messaging Publisher
    Pub/Sub
    Webhook
    Router
    Cloud Function
    Webhook

    View Slide

  27. Metrics
    Once you have built your chatbot. The
    most important question that arises is;
    how good is your ML model?

    View Slide

  28. Test Datasets
    ● UX / Content writers create the validation data set. They use this to train the Dialogflow agent model by
    entering it as user phrases.
    ● To create test data sets that aren’t biased. Use logs from Chat / IVR / Virtual Assistants, seperate from the
    intent user phrases that are created. User PII data can be anonymized / masked. (Note the FutureBank.nl
    BigQuery/Dashboard demo)
    ● Create a unit test, that passes in the anonymized test phrase to the detectIntent API method. The
    detected intent and the confidence score can be evaluated with your validation dataset.

    View Slide

  29. Example: True Positive (TP)
    A true positive is an outcome where the chatbot correctly detects the right (positive) intent.
    ● Dialogflow User Phrases / Data to train the Dialogflow Agent Model
    ○ “Did my salary came in yet?”
    ○ “Have I received my salary?”
    ○ INTENT: Salary Intent
    ● Test Data:
    ○ “My salary, when will I receive it?”
    ○ Expected Intent: Salary Intent
    ○ Detected Intent: Salary Intent

    View Slide

  30. Example: True Negative (TN) / Unsupported Request
    Similarly, a true negative is an outcome where the chatbot correctly mapped the user phrase
    to a fallback.
    ● Dialogflow User Phrases / Data to train the Dialogflow Agent Model
    ○ Everything that can’t be mapped.
    ○ Global Fallback
    ● Test Data:
    ○ “My salary, when will I receive it?”
    ○ Expected Intent: Global Fallback
    ○ Detected Intent: Global Fallback

    View Slide

  31. Example: False Positive (FP) / Missed Understood Request
    A false positive is an outcome where the chatbot matches the wrong intent. (It should have
    been a different intent or in case it didn’t exist, a fallback intent.)
    ● Dialogflow User Phrases / Data to train the Dialogflow Agent Model
    ○ I want to block my card.
    ○ INTENT: Block Card
    ○ I want to renew my card.
    ○ INTENT: Renew Card
    ● Test Data:
    ○ “My account is blocked, can I get a new card?”
    ○ Should be: “Renew Card”
    ○ Instead returned: Block Card
    Missed understood Request

    View Slide

  32. Example: False Negative (FN) / Missed Request
    And a False negative is an outcome where the intent exists, but the chatbot didn’t detect it and
    therefore a fallback was triggered.
    ● Dialogflow User Phrases / Data to train the Dialogflow Agent Model
    ○ “Did my salary came in yet?”
    ○ INTENT: Salary Intent
    ● Test Data:
    ○ “My salary, when will I receive it?”
    ○ Should be: INTENT: Salary Intent
    ○ Instead returned: Fallback Message

    View Slide

  33. Calculate Accuracy
    Is a ratio of correctly predicted observation to the total observations.
    (Ratio of all correct handled intents.)
    total correct = total TP + total TN.
    total incorrect = total FP + total FN.
    accuracy = correct / correct + incorrect

    View Slide

  34. Calculate Precision
    Is a ratio of positive prediction values.
    (To determine if there are problems with False Positives /
    misunderstood Requests. The higher the precision the lower the FP
    rate.)
    precision = total TP / total TP + total FP

    View Slide

  35. Calculate Recall
    A sensitivity ratio.
    (To determine if intents are too narrowly defined and missed requests.
    When it’s above 0.5 it can be considered good.)
    recall = total TP / total TP + total FN

    View Slide

  36. Calculate F1 Score
    The weighted average score of precision and recall.
    (To determine if intents are too narrowly defined and missed requests.
    When it’s above 0.5 it can be considered good.)
    f1 score = 2 * (recall * precision) / (recall + precision)

    View Slide

  37. True Positive False Negative
    True Negative
    False Positive
    Detected Intent by Dialogflow
    Expected Intent
    by you.
    Metrics
    True positive and true negatives are the observations that are correctly detected and therefore shown in green.
    We want to minimize the false positives and false negatives. (red).

    View Slide

  38. Confusion Matrix
    A confusion matrix is a table that is often used to describe the performance of a classification model on a set of
    test data for which the true values are known.

    View Slide

  39. AOC - ROC curve
    ROC tells us how good the model is for distinguishing the given intents, in terms of the detected probability.
    The steeper the line, the better. Using this info, you can make a decision on how you want to set the confidence
    thresholds.

    View Slide

  40. Build
    ● You will need to know the intent name, and the user phrases it was trained on.
    ● Write your own phrases
    ● Run unit tests on your phrases
    it('TP', () => {
    let myUserPhrase = ‘Can I block my card?’;
    let myUserPhrase2 = ‘Please cancel my pass.’;
    let intentName = ‘BLOCK_CARD’;
    expect(detectIntent(myUserPhrase).intent).toBe(intentName);
    expect(detectIntent(myUserPhrase2).intent).toBe(intentName);
    });
    ● Count the total TP, TN, FP, FN, F1, Precision and Recall
    ● Based on these generate a confusion matrix

    View Slide

  41. Solving this programmatically.

    View Slide

  42. Dialogflow Agents
    ● Development / Training – All manual agent updates
    happen here.
    ● Staging – Export from dev to perform all
    acceptance and regression testing. Artifact is
    created and versioned from the process. API access
    only.
    ● Production – Only artifacts are deployed to prod.
    API access only.

    View Slide

  43. Acceptance Dashboard
    1. Export the Dev Intent Changes, and run a diff to collect the new intents, and it’s
    user / training phrases.
    2. Upload / Create a validation set based on collected metrics / or create your own
    for the new intents.
    3. Run a a Unit Test and test it against intent and confidence score, by making
    detectIntent API calls.
    4. Run regression test to ensure previous created intents aren’t broken.
    5. Plot the results in Confusion Matrix.
    6. Compare Metrics against previous version
    7. Export a summary report
    8. When tests approved, push intents to production environment. (import)
    9. When tests are disapproved, send message to development team?

    View Slide

  44. 44
    Production flow
    Current.
    User types user
    queries in
    a chatbot.
    Website
    Dialogflow matches an
    intent and replies to
    a user session
    Dialogflow
    Enterprise
    Customer Client
    JS Angular 5 web front-end
    Kubernetes Engine
    Chat Server
    Dialogflow SDK
    Kubernetes Engine

    View Slide

  45. 45
    Production flow
    Advanced flow.
    User types user
    queries in
    a chatbot.
    Website
    The Chatbot admin server
    overrules response with
    custom messages,
    threshold or fallbacks.
    Dialogflow
    Enterprise
    Customer Client
    JS Angular 5 web front-end
    Kubernetes Engine
    Chat Server
    Dialogflow SDK
    Kubernetes Engine
    Admin Server
    Dialogflow SDK
    Kubernetes Engine
    Dialogflow matches an
    intent and checks the
    results with production
    config.

    View Slide

  46. Admin Dashboard
    1. All controlled intents need to have fulfillment enabled.
    2. List all intents, with switches to enabled / disable intents.
    a. When enabled, it gets the response from DF UI
    b. When disabled, catch and respond nothing / overrule responses.
    3. Live Analytics Dashboard (like futurebank.nl example)

    View Slide

  47. Demo

    View Slide

  48. Confidential + Proprietary
    Dialogflow
    DEV
    Customer Client
    JS Angular 5 web front-end
    Chat Server
    Dialogflow SDK
    Acceptance Board
    JS Angular 5 web front-end
    Admin Panel
    JS Angular 5 web front-end
    Dialogflow
    Acceptance
    Dialogflow
    Production
    Production Config
    Dialogflow SDK
    Dev
    Acceptance
    Prod.
    export dev intents
    import to acceptance agent
    run unit test / metrics
    push to production

    View Slide

  49. Thanks

    View Slide