Upgrade to Pro — share decks privately, control downloads, hide ads and more …

#44 Agents conversationnels pour le domaine de l'aéronautique

#44 Agents conversationnels pour le domaine de l'aéronautique

Sujet: Evaluation of Conversational Agents for Aerospace Domain

Speaker: Gérard Dupont, senior AI researcher at AIRBUS

What if we put a chatbot in a cockpit?

From the original research idea to the actual live experiments in a cockpit... simulator since it's far from being ready to flight. The objective of the talk will be to show the trajectory of this research topic, from funding hypothesis to actual prototyping and results. We will dive in the technical details of the system proposed and how state-of-the)art approaches have been adapted to a real industrial scenario and validated with humans.

The presentation will mostly rely on published material in CIRCLE2020 conference which comes after a 2 years efforts to support pilots and their access to cockpit documentation.

Toulouse Data Science

September 30, 2020
Tweet

More Decks by Toulouse Data Science

Other Decks in Education

Transcript

  1. Conversational
    QA for pilots
    from research hypothesis to live human experiments

    View Slide

  2. INTRO

    View Slide

  3. Who we are?
    Central Research & Technology

    View Slide

  4. Who we are?
    Alexandre
    Virtual assistant experience
    regarding architecture, NLP, S2T,
    planning & reasoning capabilities
    via RL
    Gérard
    More than 10 years on research
    projects with NLP, search, ML, RL
    and data processing for Airbus
    DS. Joined in 2018
    Catherine
    NLP expert, before joining in 2018,
    worked in many industries
    focusing on S2T and NMT
    François
    Adaptive human-machine
    interactions, ML, deep learning
    and big data computing expertise
    Pooja
    Classical ML & DL background
    with experience in NLP,
    knowledge
    extraction & representation

    View Slide

  5. What is this talk about?

    View Slide

  6. What is this talk about?

    View Slide

  7. THE RESULTS

    View Slide

  8. (published) Results
    Evaluation of Conversational Agents for Aerospace Domain
    We are here!

    View Slide

  9. THE CONTEXT

    View Slide

  10. Incremental autonomy vision for aircraft
    Toward more autonomous aircraft
    L1
    L5
    At Airbus, we are building
    certifiable, safe and secure
    autonomy systems and
    programmes to power the next
    generation of commercial aircraft
    applications.
    L2
    L3
    L4
    REDACTED

    View Slide

  11. Future cockpit vision
    https://www.airbus.com/innovation/autonomous-and-connected/autonomous-flight.html

    View Slide

  12. A chatbot for pilots?
    Current Cockpit Future Cockpit vision
    High
    cognitive
    workload for
    single pilot
    Reduced
    cognitive
    workload for
    single pilot
    Cognitive workload
    split on the 2 pilots
    Virtual Assistant

    View Slide

  13. TIMELINE

    View Slide

  14. Timeline
    Problem
    statement
    Data
    collection
    Model
    training
    Results
    validation

    View Slide

  15. Timeline
    Problem
    statement
    Data
    collection
    Model
    training
    Results
    validation

    View Slide

  16. Timeline
    Problem
    statement
    Data
    collection
    More
    model
    training
    Research
    problem
    statement
    Early tests
    Prototyping
    Evaluation
    protocol
    More data
    Results
    validation
    Human
    experiments
    setup
    Research
    literature review

    View Slide

  17. Industrial problem statement
    Examples:
    ● Support for taxiing
    ● Pre-flight briefing
    ● In flight troubleshooting
    ● Air traffic control communication
    ● …
    Focus on access to documentation: pilot
    assistant as a “Smart Librarian”
    Future Cockpit vision
    High
    cognitive
    workload for
    single pilot
    Reduced
    cognitive
    workload for
    single pilot
    Virtual Assistant

    View Slide

  18. Literature review
    Conversational search

    Radlinski & Craswell (2017, p. 120) https://www.felicecurcelli.net/blog/category/architecture-design

    View Slide

  19. Literature review

    View Slide

  20. Data collection
    Learning by collecting…
    - Internal technical documentation
    - Pilot training documents
    - FCOM workshop
    - Engineering Hackathon
    “Real” data is the one you have in hands

    View Slide

  21. Research problem statement



    View Slide

  22. Early tests














    View Slide

  23. Data collection (again)
    Learning by collecting…
    - Internal technical documentation
    - Pilot training documents
    - FCOM workshop
    - Engineering Hackathon
    => PilotQA dataset

    View Slide

  24. THE PROTOTYPE

    View Slide

  25. Quick system overview

    View Slide

  26. A great search platform
    Secret sauce: => Use BM25 not tf/idf

    View Slide

  27. BM25F probabilistic weighting model for result
    ranking (best model for the last 25 years)
    Blazing fast: ~100ms search query time on
    millions of docs
    A great search platform
    "Simple" inverted index:
    ● Transformation of any common formats (pdf/doc/html…) with Apache Tika (Solr Cell)
    ● Natively multilingual tokenizer and language processing
    ● Lot of features including: dynamic weighting, query rewriting, facetting…

    View Slide

  28. A customized BERT QA
    Secret sauce: => multitask training

    View Slide

  29. A customized BERT QA
    Secret sauce: => multitask training

    View Slide

  30. A great chatbot Platform
    Secret sauce: Conversation Driven Development

    View Slide

  31. Retriever/QA/dialog integration
    Question answering skill
    When do you extend the
    RAT manually ?
    Search engine
    QA engine
    Documents
    In electrical emergency,
    when the RAT is not
    automatically deployed
    Top docs
    Dialog engine
    Top answers
    blabla1
    blabla1
    blabla1
    QA
    model
    Dialog
    model
    Index

    View Slide

  32. User interface view

    View Slide

  33. Model training

    View Slide

  34. HUMAN EXPERIMENT

    View Slide

  35. Research protocol
    Interactive experiments with REAL humans
    Pros:
    - Only way to validate hypothesis H2
    and H3
    - More concrete feedback on the
    perception of system performance (is
    it really better?)
    - Humans are people
    Cons:
    - Much more time consuming
    - Less definitive conclusion (sensitive
    perception)
    Automated/simulated user evaluations
    Pros:
    - More data
    - Precise metrics
    - Reproductible scenarios and
    experiments
    Cons:
    - How to simulate a pilot? A human?
    - What is measured?
    - Really reproductible?

    View Slide

  36. Research protocol
    Interactive experiments with REAL humans
    ● Triple check protocols in literature
    ● Select subject population
    ● Define controlled experimental conditions
    ● Decide on scenarios/tasks
    ● Define constraints (time, cognitive pressure…)
    => Apply for ethical approval
    ● Plan
    ● Test
    ● Replan
    ● Retest
    ● ...

    View Slide

  37. Research protocol

    View Slide

  38. Results



    View Slide

  39. Results



    View Slide

  40. LESSONS LEARNED

    View Slide

  41. AI/ML/Data research path
    Problem
    statement
    Data
    collection
    Model
    training
    Results
    validation

    View Slide

  42. Timeline
    Problem
    statement
    Data
    collection
    More
    model
    training
    Research
    problem
    statement
    Early tests
    Prototyping
    Evaluation
    protocol
    More data
    Results
    validation
    Human
    experiments
    setup
    Research
    literature review

    View Slide

  43. AI/ML/Data research path
    Problem
    statement
    Data
    collection
    Model
    training
    Results
    validation
    Research
    literature review
    Research
    problem
    statement
    Early tests
    Prototyping
    More
    data
    More
    model
    training
    Evaluation
    protocol
    Human
    experiments
    setup

    View Slide

  44. Acknowledgements
    This study was funded by AIRBUS Central Research &
    Technology (and executed within a great team).
    With the support from the Aeronautical Computer Interaction Lab
    (ACHIL), from the Ecole Nationale de l’Aviation Civile (ENAC)
    and Dr Ying-Hsang LIU from University of South Denmark (SDU)
    and previously Australian National University (ANU).

    View Slide

  45. Thanks for listening
    Gérard DUPONT
    More than 10 years on research
    projects with NLP, search, ML, RL
    and data processing for Airbus
    @ggdupont
    Most illustrative pictures found on the web (but not the koala) - all credits to their respective authors.

    View Slide