Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Competence-Level Prediction and Resume & Job De...

Competence-Level Prediction and Resume & Job Description Matching Using Context-Aware Transformer Models

Emory NLP

July 08, 2021
Tweet

More Decks by Emory NLP

Other Decks in Technology

Transcript

  1. Competence-Level Prediction and Resume & Job Description Matching Using Context-Aware

    Transformer Models Changmao Li, Elaine Fisher, Rebecca S. Thomas, Stephen Pittard, Vicki Hertzberg, and Jinho Choi Emory NLP
  2. Dataset Source: Clinical Research Coordinators(CRC) Applicants Resumes Here we have

    two kinds of annotations: 1. The levels they applied(an applicant can apply multiple levels). 2. The level they should be qualified. This is annotated by human experts with some annotation agreements. There are four levels, CRC1, CRC2, CRC3, CRC4. For the annotation, if the resume cannot match any level it will be annotated with Not Qualified(NQ) Besides, there is a job description for each level.
  3. Dataset Preprocessing: The original resume files are in DOC or

    PDF, they are parsed using some tools and splitted into 6 sections and finally put into the json file for the convenient use. The existence ratio of each section in the CRC levels
  4. Dataset Annotation: Two experts with experience in recruiting applicants for

    CRC positions of all levels design the annotation guidelines in 5 rounds by labeling each resume. Kappa scores measured for ITA during the five rounds of guideline development
  5. Tasks Two novel tasks are proposed for this new dataset:

    1. (Multiclass classification(5 class))Given a resume, decide which level of CRC positions that the corresponding applicant is suitable for.(Use the resume as input and the annotation 2 as the gold output) 2. (Binary classification)Given a resume and a CRC level job description, decide whether the applicant is suitable for that particular level.(Use both resume and job description for the levels they applied for as input and combine the annotation 1 and annotation 2 to get the binary gold output)
  6. Approaches Strategies when applying baseline models • Section Trimming for

    baseline models due to input length limitation of transformer encoders Task 1 Task 2
  7. Approaches Proposed Models for the binary classification task Approaches The

    context-aware models using chunk segmenting + section encoding + job description embedding and multi-head attention between the resume and the job description
  8. Approaches Strategies when applying models • Section Pruning for Proposed

    “encoding by sections” models in case each section exceeds the input length of transformer encoders
  9. Analysis on Section Pruning (in Appendix) Section lengths before section

    pruning Section lengths after section pruning
  10. Experiments Data split for the multiclass classification task(Keep label distributions):

    Data statistics for the competence-level classification task
  11. Experiments Data split for binary classification task(keep label and CRC

    distributions without overlap resumes between training and dev or test set ): Data statistics for the resume-to-job description matching task
  12. Algorithm to split dataset while avoiding overlaps between training and

    evaluation dataset(in Appendix) The key idea is 1. Split the data by targeted label distributions but with a smaller initial training set ratio than the original one. 2. If there are overlapping applicants, then the algorithm puts all the overlaps into the training set so that the training set ratio will be large enough to be close to the targeted training set ratio while the label distributions are still kept in a great extent.
  13. Experiments Experimented Models W! : Whole context model + section

    trimming P: Context-aware model + section pruning P⊕I:P+ section encoding C: Context-aware model + chunk segmenting C⊕I:C+ section encoding Models for the competence-level classification task W!" : Whole context + sec./job_desc. trimming P⊕I⊕J:P⊕I+ job_desc. embedding P⊕I⊕J⊕A:P⊕I⊕J+ multi-head attention P⊕I⊕J⊕AE:P⊕I⊕J-E# C⊕I⊕J:C⊕I+ job_desc. embedding C⊕I⊕J⊕A:C⊕I⊕J+ multi-head attention C⊕I⊕J⊕AE:C⊕I⊕J- E# Models for the resume-to-job description matching task
  14. Experiments Analysis for the competence-level classification task. Confusion matrix for

    the best model of the competence-level classification task
  15. Experiments Analysis for the resume-to-job description matching task. Confusion matrix

    for the best model of the resume-to-job description matching task
  16. Error Analysis • It’s unable to identify clinical research experience.

    • It can’t identify dates of experience. • It’s hard to distinguish adjacent CRC positions.
  17. Contributions • Introduced a new resume classification dataset. • Proposed

    two new tasks for this new dataset. • Proposed novel context-aware transformer approaches for two tasks. • Conducted experiments with several proposed models. • Conducted both quantitative and qualitative analysis for future improvements.