Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Unlock NER for Sensitive Data with NLOP - GDSC 2023

Unlock NER for Sensitive Data with NLOP - GDSC 2023

Today we’re going to learn about Natural Language Processing. NLP is a technology that teaches machines to understand human languages.
Specifically…we will learn how to use an NLP model to teach a machine to recognize when a human shares potentially sensitive information in a consumer or enterprise system.

Noble Ackerson

May 20, 2023
Tweet

More Decks by Noble Ackerson

Other Decks in Technology

Transcript

  1. NLP for Sensitive Data NORP DATE ORG GPE MONEY GPE

    PERSON QUANTITY Noble Ackerson Applied AI Product Lead, Former Google Developers Expert for Product Strategy Sensitive Text Detection with Custom Natural Language Processing (NLP) Models
  2. What is Natural Language Processing (NLP)? Input ❏ Natural Language

    ❏ Text ❏ Speech Output ❏ General text classification ❏ Entity extraction ❏ Machine translation ❏ Question and Answering ❏ Embeddings & Semantic Search ❏ Conversational machine interaction ❏ … Process ❏ Text representation ❏ Machine Learning Models ❏ Generative ❏ Semantic meaning ❏ Contextual understanding Generative AI (NLU with LLMs)
  3. Process 1 Identifying the right NLP Use Cases Annotate data

    with the help of Generative AI Train a custom Named Entity Recognition (NER) model & refine Evaluate, test, custom NER model Deploy and integrate 2 3 4 5 Today’s Tools • Google Colab (shared) • spaCy NLP library • Google Cloud Platform
  4. Production grade ML/NLP DATA MANAGEMENT DATA COLLECTION EXPLORATION & ANALYSIS

    TOOLS NLP CODE FEATURE ENGINEERING (Labeling, Annotations) MODEL TRAINING AT SCALE AUTOMATION LOGGING & MANAGEMENT MONITORING SERVING INFRASTRUCTURE NEEDS ANALYSIS Adapted from: “Hidden Technical Debt in Machine Learning Systems”, D. Sculley et. al, Google
  5. Identifying the right NLP Use Cases Automation Augmentation User doesn’t

    know how to do something User feels responsible for task User can’t do something High stakes situation Task is boring, repetitive, or dangerous Complicated personal preferences If Machine Learning and NLP is needed, which type is best?
  6. Process 1 Identifying the right NLP Use Cases Annotate data

    with the help of Generative AI Train a custom NER model & refine Evaluate, test, custom NER model Deploy and integrate 2 3 4 5
  7. Teacher: Class, pay attention Transformer Models: A brief bit about

    how NLU, LLM, GenAI play into NER workflow Deep Learning Machine Learning Natural Language Processing (NLP) Large Language Models (LLM) Artificial Intelligence
  8. Process 1 Identifying the right NLP Use Cases Annotate data

    with the help of Generative AI Train a custom NER model & refine Evaluate, test, custom NER model Deploy and integrate 2 3 4 5
  9. Process 1 Identifying the right NLP Use Cases Annotate data

    with the help of Generative AI Train a custom NER model & refine Evaluate, test, custom NER model Deploy and integrate 2 3 4 5
  10. Legal Contract Analysis Custom Entities for Legal Documents RIGHTS CLAUSE

    CONDITION GPE JURISDICTION PARTIES CONTRACT DATE
  11. Good luck and Thank you! Noble Ackerson AI Product Lead,

    GDE Alumni medium.com/@nobleackerson youtube.com/c/nobleackerson Resources What is Natural Language Processing? [Google Cloud] Natural Language Processing on Google Cloud [Cloud Skills Boost] TensorFlow Models NLP Library [tensorflow.org]
  12. Token Classification with Custom NER models Inputs Training & Validation

    Datasets Deploy API and Versioning Prediction Clients (Online Systems) REST API call with input variables Trained Models Training Serving Preprocess & Feature Creation Preprocess Labeling & Annotation Train/Tune Model