Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PyLadies Dublin Aug Meetup: Text Classification using HuggingFace Transformers

PyLadies Dublin Aug Meetup: Text Classification using HuggingFace Transformers

We have Olga Minguett talking about "Text Classification using HuggingFace Transformers" and Sahana Hegde talking about "PySpark 101: Tips and Tricks".

Big thanks to Optum for partnering with us and having Olga and Sahana giving their talks.

👉 Event Page: https://www.meetup.com/PyLadiesDublin/events/279318562/

🎤 TALKS
=========
TALK 1: Text Classification using HuggingFace Transformers
----------------------------------------------------------------------------------
Explanation about HuggingFace Transformers, described in 2 sections. Theory: What it is? and How to use it? datasets and tasks that you can perform with it. Practice: Example using Text Classification using HFT

ABOUT OLGA: Olga Minguett is a Master’s in Artificial Intelligence student with interest in AI in Healthcare. She currently works as a data scientist for a technology and healthcare services company part of UnitedHealth Group. https://Linkedin.com/in/olgaminguett

TALK 2: PySpark 101: Tips and Tricks
--------------------------------------------------
In this session, I'd like to share a few tips and tricks that I've learnt over the years while using PySpark in my day-to-day activities by showing code snippets. These elements will help you create more efficient code that leads to better/faster results.

ABOUT SAHANA: I am a Data Scientist working with UnitedHealth Group during office hours, and I'm a passionate cook and yoga enthusiast outside. I love to travel in my free time and use my phone's lens to capture beautiful moments. https://www.linkedin.com/in/sahana-hegde

❤️ A BIG THANK YOU
====================
I'd like to thank all those who have been attending and watching our videos, we appreciate your support as it took a lot of work to set it up, if you are curious, you can read Vicky's post about it: https://dev.to/pyladiesdub/live-streaming-from-zoom-meet-via-obs-to-youtube-2l3h - any feedback would be helpful to make this process smoother and easier to manage. 🥰

📢 CALL FOR SPEAKERS for 2021 (from Sep onwards)
=========================================
Interested in speaking at our upcoming meetups, please submit talk details to: https://pyladiesdublin.typeform.com/to/VvW3iME6

If you have referrals of speakers you want us to invite, let us know also, being a virtual event helps close the boundaries of inviting speakers further afield than Ireland. 😊

🤔 QUESTIONS
==============
Email [email protected].

PyLadies Dublin

August 17, 2021
Tweet

More Decks by PyLadies Dublin

Other Decks in Technology

Transcript

  1. I am Olga Minguett
    I am a Data Scientist
    I work with Optum
    I am a Masters in AI student
    Hello!

    View Slide

  2. © 2021 Optum, Inc. All rights reserved. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 2

    View Slide

  3. Olga Minguett
    17 August 2021
    Text Classification
    using Huggingface
    Transformers

    View Slide

  4. Theory
    Huggingface Transformers
    1

    View Slide

  5. Key terms
    © 2021 Optum, Inc. All rights reserved. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 5
    Natural Language Processing — NLP
    Ability of a computer program to process and understand
    human language, as it is spoken and written.
    Natural Language Understanding — NLU
    Enables computers to interpret human language using
    syntactic and semantic analysis of text and speech.
    Natural Language Generation — NLG
    Generates written or spoken human language from
    structured data generated by the system to respond.
    Examples:
    ● customer feedback analysis
    ● automatic translation
    ● email classification
    ● Text summarisation

    View Slide

  6. What is … Huggingface Transformers?
    © 2021 Optum, Inc. All rights reserved. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 6
    • APIs to download and use those pretrained models
    on a given text, and fine-tune them on your own
    datasets.
    • Each module defining an architecture is fully
    standalone and can be modified to enable research
    experiments.
    • Integration to Jax, PyTorch and TensorFlow.
    Library that contains state-of-the-art
    pretrained models for NLP to
    perform tasks on texts such as:
    • Feature Extraction
    • Fill-Mask
    • Named Entity Recognition
    • Question Answering
    • Sentiment Analysis
    • Summarisation
    • Text Generation
    • Translation

    View Slide

  7. What is … Huggingface Transformers?
    © 2021 Optum, Inc. All rights reserved. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 7
    Overview of the pretrained
    Transformers models :
    • GPT-like (also called auto-
    regressive Transformer models)
    • BERT-like (also called auto-
    encoding Transformer models)
    • BART/T5-like (also called sequence-to-
    sequence Transformer models)
    Transformers are Language Models:
    • General trained model on large amounts of
    raw text in a self-supervised fashion,
    where the objective is automatically
    computed from the inputs of the model.
    Without human-annotated labels.
    • Specific trained model goes through a
    process of Transfer Learning, where the
    objective is fine-tuned in a supervised
    fashion. With human-annotated labels.

    View Slide

  8. What is … Huggingface Transformers?
    © 2021 Optum, Inc. All rights reserved. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 8
    • Decoders (GPT-2):
    • Tasks:
    • Causal Language Modeling
    • Natural Language Generation
    • Encoders (BERT):
    • Tasks:
    • Sequence Classification,
    • Question Answering,
    • Masked Language Modeling
    • Natural Language Understanding
    • Sequence-to-Sequence (BART/T5):
    • Tasks:
    • Translation
    • Summarization
    • NLU / NLG

    View Slide

  9. Pre-Requisites
    © 2021 Optum, Inc. All rights reserved. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 9
    Task
    Describes the use-cases for
    that can be perform over
    different model configuration.
    22 Tasks
    Libraries
    Collection of resources used to
    optimize tasks
    22 Libraries
    1178 Datasets
    Collection of data used for the task
    and to fine-tune the models over
    Dataset
    13524 Models
    Language pre-trained
    transformer models
    Model
    T L
    D M

    View Slide

  10. 10
    © 2021 Optum, Inc. All rights reserved. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum.

    View Slide

  11. 11
    © 2021 Optum, Inc. All rights reserved. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum.

    View Slide

  12. Practice
    Huggingface Transformers
    2

    View Slide

  13. © 2021 Optum, Inc. All rights reserved. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 13
    Google Collab

    View Slide

  14. Any questions?
    You can find me at:
    • Linkedin.com/in/olgaminguett/
    [email protected]
    [email protected]
    Thsnks!

    View Slide

  15. References
    Website https://huggingface.co/
    GitHub https://github.com/huggingface
    Attention is all you need https://arxiv.org/abs/1706.03762

    View Slide

  16. Credits
    • Photographs by Paramount Pictures and
    DreamWorks

    View Slide