#44 Agents conversationnels pour le domaine de l'aéronautique

#44 Agents conversationnels pour le domaine de l'aéronautique

Sujet: Evaluation of Conversational Agents for Aerospace Domain

Speaker: Gérard Dupont, senior AI researcher at AIRBUS

What if we put a chatbot in a cockpit?

From the original research idea to the actual live experiments in a cockpit... simulator since it's far from being ready to flight. The objective of the talk will be to show the trajectory of this research topic, from funding hypothesis to actual prototyping and results. We will dive in the technical details of the system proposed and how state-of-the)art approaches have been adapted to a real industrial scenario and validated with humans.

The presentation will mostly rely on published material in CIRCLE2020 conference which comes after a 2 years efforts to support pilots and their access to cockpit documentation.

6aa4f3c589d3108830b371d0310bc4da?s=128

Toulouse Data Science

September 30, 2020
Tweet

Transcript

  1. Conversational QA for pilots from research hypothesis to live human

    experiments
  2. INTRO

  3. Who we are? Central Research & Technology

  4. Who we are? Alexandre Virtual assistant experience regarding architecture, NLP,

    S2T, planning & reasoning capabilities via RL Gérard More than 10 years on research projects with NLP, search, ML, RL and data processing for Airbus DS. Joined in 2018 Catherine NLP expert, before joining in 2018, worked in many industries focusing on S2T and NMT François Adaptive human-machine interactions, ML, deep learning and big data computing expertise Pooja Classical ML & DL background with experience in NLP, knowledge extraction & representation
  5. What is this talk about?

  6. What is this talk about?

  7. THE RESULTS

  8. (published) Results Evaluation of Conversational Agents for Aerospace Domain We

    are here!
  9. THE CONTEXT

  10. Incremental autonomy vision for aircraft Toward more autonomous aircraft L1

    L5 At Airbus, we are building certifiable, safe and secure autonomy systems and programmes to power the next generation of commercial aircraft applications. L2 L3 L4 REDACTED
  11. Future cockpit vision https://www.airbus.com/innovation/autonomous-and-connected/autonomous-flight.html

  12. A chatbot for pilots? Current Cockpit Future Cockpit vision High

    cognitive workload for single pilot Reduced cognitive workload for single pilot Cognitive workload split on the 2 pilots Virtual Assistant
  13. TIMELINE

  14. Timeline Problem statement Data collection Model training Results validation

  15. Timeline Problem statement Data collection Model training Results validation

  16. Timeline Problem statement Data collection More model training Research problem

    statement Early tests Prototyping Evaluation protocol More data Results validation Human experiments setup Research literature review
  17. Industrial problem statement Examples: • Support for taxiing • Pre-flight

    briefing • In flight troubleshooting • Air traffic control communication • … Focus on access to documentation: pilot assistant as a “Smart Librarian” Future Cockpit vision High cognitive workload for single pilot Reduced cognitive workload for single pilot Virtual Assistant
  18. Literature review Conversational search … Radlinski & Craswell (2017, p.

    120) https://www.felicecurcelli.net/blog/category/architecture-design
  19. Literature review

  20. Data collection Learning by collecting… - Internal technical documentation -

    Pilot training documents - FCOM workshop - Engineering Hackathon “Real” data is the one you have in hands
  21. Research problem statement • • •

  22. Early tests • • • • • • • •

    • • • • • •
  23. Data collection (again) Learning by collecting… - Internal technical documentation

    - Pilot training documents - FCOM workshop - Engineering Hackathon => PilotQA dataset
  24. THE PROTOTYPE

  25. Quick system overview

  26. A great search platform Secret sauce: => Use BM25 not

    tf/idf
  27. BM25F probabilistic weighting model for result ranking (best model for

    the last 25 years) Blazing fast: ~100ms search query time on millions of docs A great search platform "Simple" inverted index: • Transformation of any common formats (pdf/doc/html…) with Apache Tika (Solr Cell) • Natively multilingual tokenizer and language processing • Lot of features including: dynamic weighting, query rewriting, facetting…
  28. A customized BERT QA Secret sauce: => multitask training

  29. A customized BERT QA Secret sauce: => multitask training

  30. A great chatbot Platform Secret sauce: Conversation Driven Development

  31. Retriever/QA/dialog integration Question answering skill When do you extend the

    RAT manually ? Search engine QA engine Documents In electrical emergency, when the RAT is not automatically deployed Top docs Dialog engine Top answers blabla1 blabla1 blabla1 QA model Dialog model Index
  32. User interface view

  33. Model training

  34. HUMAN EXPERIMENT

  35. Research protocol Interactive experiments with REAL humans Pros: - Only

    way to validate hypothesis H2 and H3 - More concrete feedback on the perception of system performance (is it really better?) - Humans are people Cons: - Much more time consuming - Less definitive conclusion (sensitive perception) Automated/simulated user evaluations Pros: - More data - Precise metrics - Reproductible scenarios and experiments Cons: - How to simulate a pilot? A human? - What is measured? - Really reproductible?
  36. Research protocol Interactive experiments with REAL humans • Triple check

    protocols in literature • Select subject population • Define controlled experimental conditions • Decide on scenarios/tasks • Define constraints (time, cognitive pressure…) => Apply for ethical approval • Plan • Test • Replan • Retest • ...
  37. Research protocol

  38. Results • • •

  39. Results • • •

  40. LESSONS LEARNED

  41. AI/ML/Data research path Problem statement Data collection Model training Results

    validation
  42. Timeline Problem statement Data collection More model training Research problem

    statement Early tests Prototyping Evaluation protocol More data Results validation Human experiments setup Research literature review
  43. AI/ML/Data research path Problem statement Data collection Model training Results

    validation Research literature review Research problem statement Early tests Prototyping More data More model training Evaluation protocol Human experiments setup
  44. Acknowledgements This study was funded by AIRBUS Central Research &

    Technology (and executed within a great team). With the support from the Aeronautical Computer Interaction Lab (ACHIL), from the Ecole Nationale de l’Aviation Civile (ENAC) and Dr Ying-Hsang LIU from University of South Denmark (SDU) and previously Australian National University (ANU).
  45. Thanks for listening Gérard DUPONT More than 10 years on

    research projects with NLP, search, ML, RL and data processing for Airbus @ggdupont Most illustrative pictures found on the web (but not the koala) - all credits to their respective authors.