Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PyLadies Dublin Aug Meetup: Text Classification using HuggingFace Transformers

PyLadies Dublin Aug Meetup: Text Classification using HuggingFace Transformers

We have Olga Minguett talking about "Text Classification using HuggingFace Transformers" and Sahana Hegde talking about "PySpark 101: Tips and Tricks".

Big thanks to Optum for partnering with us and having Olga and Sahana giving their talks.

👉 Event Page: https://www.meetup.com/PyLadiesDublin/events/279318562/

🎤 TALKS
=========
TALK 1: Text Classification using HuggingFace Transformers
----------------------------------------------------------------------------------
Explanation about HuggingFace Transformers, described in 2 sections. Theory: What it is? and How to use it? datasets and tasks that you can perform with it. Practice: Example using Text Classification using HFT

ABOUT OLGA: Olga Minguett is a Master’s in Artificial Intelligence student with interest in AI in Healthcare. She currently works as a data scientist for a technology and healthcare services company part of UnitedHealth Group. https://Linkedin.com/in/olgaminguett

TALK 2: PySpark 101: Tips and Tricks
--------------------------------------------------
In this session, I'd like to share a few tips and tricks that I've learnt over the years while using PySpark in my day-to-day activities by showing code snippets. These elements will help you create more efficient code that leads to better/faster results.

ABOUT SAHANA: I am a Data Scientist working with UnitedHealth Group during office hours, and I'm a passionate cook and yoga enthusiast outside. I love to travel in my free time and use my phone's lens to capture beautiful moments. https://www.linkedin.com/in/sahana-hegde

❤️ A BIG THANK YOU
====================
I'd like to thank all those who have been attending and watching our videos, we appreciate your support as it took a lot of work to set it up, if you are curious, you can read Vicky's post about it: https://dev.to/pyladiesdub/live-streaming-from-zoom-meet-via-obs-to-youtube-2l3h - any feedback would be helpful to make this process smoother and easier to manage. 🥰

📢 CALL FOR SPEAKERS for 2021 (from Sep onwards)
=========================================
Interested in speaking at our upcoming meetups, please submit talk details to: https://pyladiesdublin.typeform.com/to/VvW3iME6

If you have referrals of speakers you want us to invite, let us know also, being a virtual event helps close the boundaries of inviting speakers further afield than Ireland. 😊

🤔 QUESTIONS
==============
Email dublin@pyladies.com.

3476530ee3199731f810cb41daadad79?s=128

PyLadies Dublin

August 17, 2021
Tweet

More Decks by PyLadies Dublin

Other Decks in Technology

Transcript

  1. I am Olga Minguett I am a Data Scientist I

    work with Optum I am a Masters in AI student Hello!
  2. © 2021 Optum, Inc. All rights reserved. Confidential property of

    Optum. Do not distribute or reproduce without express permission from Optum. 2
  3. Olga Minguett 17 August 2021 Text Classification using Huggingface Transformers

  4. Theory Huggingface Transformers 1

  5. Key terms © 2021 Optum, Inc. All rights reserved. Confidential

    property of Optum. Do not distribute or reproduce without express permission from Optum. 5 Natural Language Processing — NLP Ability of a computer program to process and understand human language, as it is spoken and written. Natural Language Understanding — NLU Enables computers to interpret human language using syntactic and semantic analysis of text and speech. Natural Language Generation — NLG Generates written or spoken human language from structured data generated by the system to respond. Examples: • customer feedback analysis • automatic translation • email classification • Text summarisation
  6. What is … Huggingface Transformers? © 2021 Optum, Inc. All

    rights reserved. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 6 • APIs to download and use those pretrained models on a given text, and fine-tune them on your own datasets. • Each module defining an architecture is fully standalone and can be modified to enable research experiments. • Integration to Jax, PyTorch and TensorFlow. Library that contains state-of-the-art pretrained models for NLP to perform tasks on texts such as: • Feature Extraction • Fill-Mask • Named Entity Recognition • Question Answering • Sentiment Analysis • Summarisation • Text Generation • Translation
  7. What is … Huggingface Transformers? © 2021 Optum, Inc. All

    rights reserved. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 7 Overview of the pretrained Transformers models : • GPT-like (also called auto- regressive Transformer models) • BERT-like (also called auto- encoding Transformer models) • BART/T5-like (also called sequence-to- sequence Transformer models) Transformers are Language Models: • General trained model on large amounts of raw text in a self-supervised fashion, where the objective is automatically computed from the inputs of the model. Without human-annotated labels. • Specific trained model goes through a process of Transfer Learning, where the objective is fine-tuned in a supervised fashion. With human-annotated labels.
  8. What is … Huggingface Transformers? © 2021 Optum, Inc. All

    rights reserved. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 8 • Decoders (GPT-2): • Tasks: • Causal Language Modeling • Natural Language Generation • Encoders (BERT): • Tasks: • Sequence Classification, • Question Answering, • Masked Language Modeling • Natural Language Understanding • Sequence-to-Sequence (BART/T5): • Tasks: • Translation • Summarization • NLU / NLG
  9. Pre-Requisites © 2021 Optum, Inc. All rights reserved. Confidential property

    of Optum. Do not distribute or reproduce without express permission from Optum. 9 Task Describes the use-cases for that can be perform over different model configuration. 22 Tasks Libraries Collection of resources used to optimize tasks 22 Libraries 1178 Datasets Collection of data used for the task and to fine-tune the models over Dataset 13524 Models Language pre-trained transformer models Model T L D M
  10. 10 © 2021 Optum, Inc. All rights reserved. Confidential property

    of Optum. Do not distribute or reproduce without express permission from Optum.
  11. 11 © 2021 Optum, Inc. All rights reserved. Confidential property

    of Optum. Do not distribute or reproduce without express permission from Optum.
  12. Practice Huggingface Transformers 2

  13. © 2021 Optum, Inc. All rights reserved. Confidential property of

    Optum. Do not distribute or reproduce without express permission from Optum. 13 Google Collab
  14. Any questions? You can find me at: • Linkedin.com/in/olgaminguett/ •

    olgaminguett@gmail.com • olga_minguett@optum.com Thsnks!
  15. References Website https://huggingface.co/ GitHub https://github.com/huggingface Attention is all you need

    https://arxiv.org/abs/1706.03762
  16. Credits • Photographs by Paramount Pictures and DreamWorks