#44 Agents conversationnels pour le domaine de l'aéronautique

Conversational QA for pilots from research hypothesis to live human
experiments

Who we are? Central Research & Technology

Who we are? Alexandre Virtual assistant experience regarding architecture, NLP,
S2T, planning & reasoning capabilities via RL Gérard More than 10 years on research projects with NLP, search, ML, RL and data processing for Airbus DS. Joined in 2018 Catherine NLP expert, before joining in 2018, worked in many industries focusing on S2T and NMT François Adaptive human-machine interactions, ML, deep learning and big data computing expertise Pooja Classical ML & DL background with experience in NLP, knowledge extraction & representation

What is this talk about?

THE RESULTS

(published) Results Evaluation of Conversational Agents for Aerospace Domain We
are here!

THE CONTEXT

Incremental autonomy vision for aircraft Toward more autonomous aircraft L1
L5 At Airbus, we are building certifiable, safe and secure autonomy systems and programmes to power the next generation of commercial aircraft applications. L2 L3 L4 REDACTED

Future cockpit vision https://www.airbus.com/innovation/autonomous-and-connected/autonomous-flight.html

A chatbot for pilots? Current Cockpit Future Cockpit vision High
cognitive workload for single pilot Reduced cognitive workload for single pilot Cognitive workload split on the 2 pilots Virtual Assistant

TIMELINE

Timeline Problem statement Data collection Model training Results validation

Timeline Problem statement Data collection More model training Research problem
statement Early tests Prototyping Evaluation protocol More data Results validation Human experiments setup Research literature review

Industrial problem statement Examples: • Support for taxiing • Pre-flight
briefing • In flight troubleshooting • Air traffic control communication • … Focus on access to documentation: pilot assistant as a “Smart Librarian” Future Cockpit vision High cognitive workload for single pilot Reduced cognitive workload for single pilot Virtual Assistant

Literature review Conversational search … Radlinski & Craswell (2017, p.
120) https://www.felicecurcelli.net/blog/category/architecture-design

Literature review

Data collection Learning by collecting… - Internal technical documentation -
Pilot training documents - FCOM workshop - Engineering Hackathon “Real” data is the one you have in hands

Research problem statement • • •

Early tests • • • • • • • •
• • • • • •

Data collection (again) Learning by collecting… - Internal technical documentation
- Pilot training documents - FCOM workshop - Engineering Hackathon => PilotQA dataset

THE PROTOTYPE

Quick system overview

A great search platform Secret sauce: => Use BM25 not
tf/idf

BM25F probabilistic weighting model for result ranking (best model for
the last 25 years) Blazing fast: ~100ms search query time on millions of docs A great search platform "Simple" inverted index: • Transformation of any common formats (pdf/doc/html…) with Apache Tika (Solr Cell) • Natively multilingual tokenizer and language processing • Lot of features including: dynamic weighting, query rewriting, facetting…

A customized BERT QA Secret sauce: => multitask training

A great chatbot Platform Secret sauce: Conversation Driven Development

Retriever/QA/dialog integration Question answering skill When do you extend the
RAT manually ? Search engine QA engine Documents In electrical emergency, when the RAT is not automatically deployed Top docs Dialog engine Top answers blabla1 blabla1 blabla1 QA model Dialog model Index

User interface view

Model training

HUMAN EXPERIMENT

Research protocol Interactive experiments with REAL humans Pros: - Only
way to validate hypothesis H2 and H3 - More concrete feedback on the perception of system performance (is it really better?) - Humans are people Cons: - Much more time consuming - Less deﬁnitive conclusion (sensitive perception) Automated/simulated user evaluations Pros: - More data - Precise metrics - Reproductible scenarios and experiments Cons: - How to simulate a pilot? A human? - What is measured? - Really reproductible?

Research protocol Interactive experiments with REAL humans • Triple check
protocols in literature • Select subject population • Deﬁne controlled experimental conditions • Decide on scenarios/tasks • Deﬁne constraints (time, cognitive pressure…) => Apply for ethical approval • Plan • Test • Replan • Retest • ...

Research protocol

Results • • •

LESSONS LEARNED

AI/ML/Data research path Problem statement Data collection Model training Results
validation

Timeline Problem statement Data collection More model training Research problem
statement Early tests Prototyping Evaluation protocol More data Results validation Human experiments setup Research literature review

AI/ML/Data research path Problem statement Data collection Model training Results
validation Research literature review Research problem statement Early tests Prototyping More data More model training Evaluation protocol Human experiments setup

Acknowledgements This study was funded by AIRBUS Central Research &
Technology (and executed within a great team). With the support from the Aeronautical Computer Interaction Lab (ACHIL), from the Ecole Nationale de l’Aviation Civile (ENAC) and Dr Ying-Hsang LIU from University of South Denmark (SDU) and previously Australian National University (ANU).

Thanks for listening Gérard DUPONT More than 10 years on
research projects with NLP, search, ML, RL and data processing for Airbus @ggdupont Most illustrative pictures found on the web (but not the koala) - all credits to their respective authors.

#44 Agents conversationnels pour le domaine de ...

#44 Agents conversationnels pour le domaine de l'aéronautique

More Decks by Toulouse Data Science

Other Decks in Education

Featured

Transcript