Upgrade to Pro — share decks privately, control downloads, hide ads and more …

#44 Agents conversationnels pour le domaine de l'aéronautique

#44 Agents conversationnels pour le domaine de l'aéronautique

Sujet: Evaluation of Conversational Agents for Aerospace Domain

Speaker: Gérard Dupont, senior AI researcher at AIRBUS

What if we put a chatbot in a cockpit?

From the original research idea to the actual live experiments in a cockpit... simulator since it's far from being ready to flight. The objective of the talk will be to show the trajectory of this research topic, from funding hypothesis to actual prototyping and results. We will dive in the technical details of the system proposed and how state-of-the)art approaches have been adapted to a real industrial scenario and validated with humans.

The presentation will mostly rely on published material in CIRCLE2020 conference which comes after a 2 years efforts to support pilots and their access to cockpit documentation.

Toulouse Data Science

September 30, 2020
Tweet

More Decks by Toulouse Data Science

Other Decks in Education

Transcript

  1. Who we are? Alexandre Virtual assistant experience regarding architecture, NLP,

    S2T, planning & reasoning capabilities via RL Gérard More than 10 years on research projects with NLP, search, ML, RL and data processing for Airbus DS. Joined in 2018 Catherine NLP expert, before joining in 2018, worked in many industries focusing on S2T and NMT François Adaptive human-machine interactions, ML, deep learning and big data computing expertise Pooja Classical ML & DL background with experience in NLP, knowledge extraction & representation
  2. Incremental autonomy vision for aircraft Toward more autonomous aircraft L1

    L5 At Airbus, we are building certifiable, safe and secure autonomy systems and programmes to power the next generation of commercial aircraft applications. L2 L3 L4 REDACTED
  3. A chatbot for pilots? Current Cockpit Future Cockpit vision High

    cognitive workload for single pilot Reduced cognitive workload for single pilot Cognitive workload split on the 2 pilots Virtual Assistant
  4. Timeline Problem statement Data collection More model training Research problem

    statement Early tests Prototyping Evaluation protocol More data Results validation Human experiments setup Research literature review
  5. Industrial problem statement Examples: • Support for taxiing • Pre-flight

    briefing • In flight troubleshooting • Air traffic control communication • … Focus on access to documentation: pilot assistant as a “Smart Librarian” Future Cockpit vision High cognitive workload for single pilot Reduced cognitive workload for single pilot Virtual Assistant
  6. Literature review Conversational search … Radlinski & Craswell (2017, p.

    120) https://www.felicecurcelli.net/blog/category/architecture-design
  7. Data collection Learning by collecting… - Internal technical documentation -

    Pilot training documents - FCOM workshop - Engineering Hackathon “Real” data is the one you have in hands
  8. Data collection (again) Learning by collecting… - Internal technical documentation

    - Pilot training documents - FCOM workshop - Engineering Hackathon => PilotQA dataset
  9. BM25F probabilistic weighting model for result ranking (best model for

    the last 25 years) Blazing fast: ~100ms search query time on millions of docs A great search platform "Simple" inverted index: • Transformation of any common formats (pdf/doc/html…) with Apache Tika (Solr Cell) • Natively multilingual tokenizer and language processing • Lot of features including: dynamic weighting, query rewriting, facetting…
  10. Retriever/QA/dialog integration Question answering skill When do you extend the

    RAT manually ? Search engine QA engine Documents In electrical emergency, when the RAT is not automatically deployed Top docs Dialog engine Top answers blabla1 blabla1 blabla1 QA model Dialog model Index
  11. Research protocol Interactive experiments with REAL humans Pros: - Only

    way to validate hypothesis H2 and H3 - More concrete feedback on the perception of system performance (is it really better?) - Humans are people Cons: - Much more time consuming - Less definitive conclusion (sensitive perception) Automated/simulated user evaluations Pros: - More data - Precise metrics - Reproductible scenarios and experiments Cons: - How to simulate a pilot? A human? - What is measured? - Really reproductible?
  12. Research protocol Interactive experiments with REAL humans • Triple check

    protocols in literature • Select subject population • Define controlled experimental conditions • Decide on scenarios/tasks • Define constraints (time, cognitive pressure…) => Apply for ethical approval • Plan • Test • Replan • Retest • ...
  13. Timeline Problem statement Data collection More model training Research problem

    statement Early tests Prototyping Evaluation protocol More data Results validation Human experiments setup Research literature review
  14. AI/ML/Data research path Problem statement Data collection Model training Results

    validation Research literature review Research problem statement Early tests Prototyping More data More model training Evaluation protocol Human experiments setup
  15. Acknowledgements This study was funded by AIRBUS Central Research &

    Technology (and executed within a great team). With the support from the Aeronautical Computer Interaction Lab (ACHIL), from the Ecole Nationale de l’Aviation Civile (ENAC) and Dr Ying-Hsang LIU from University of South Denmark (SDU) and previously Australian National University (ANU).
  16. Thanks for listening Gérard DUPONT More than 10 years on

    research projects with NLP, search, ML, RL and data processing for Airbus @ggdupont Most illustrative pictures found on the web (but not the koala) - all credits to their respective authors.