Competence-Level Prediction and Resume & Job Description Matching Using Context-Aware Transformer Models

Competence-Level Prediction and Resume & Job Description Matching Using Context-Aware
Transformer Models Changmao Li, Elaine Fisher, Rebecca S. Thomas, Stephen Pittard, Vicki Hertzberg, and Jinho Choi Emory NLP

Outline • Dataset • Tasks • Approaches • Experiments •
Error Analysis • Contributions

Dataset Source: Clinical Research Coordinators(CRC) Applicants Resumes Here we have
two kinds of annotations: 1. The levels they applied(an applicant can apply multiple levels). 2. The level they should be qualified. This is annotated by human experts with some annotation agreements. There are four levels, CRC1, CRC2, CRC3, CRC4. For the annotation, if the resume cannot match any level it will be annotated with Not Qualified(NQ) Besides, there is a job description for each level.

Dataset Preprocessing: The original resume files are in DOC or
PDF, they are parsed using some tools and splitted into 6 sections and finally put into the json file for the convenient use. The existence ratio of each section in the CRC levels

Dataset Annotation: Two experts with experience in recruiting applicants for
CRC positions of all levels design the annotation guidelines in 5 rounds by labeling each resume. Kappa scores measured for ITA during the five rounds of guideline development

Tasks Two novel tasks are proposed for this new dataset:
1. (Multiclass classification(5 class))Given a resume, decide which level of CRC positions that the corresponding applicant is suitable for.(Use the resume as input and the annotation 2 as the gold output) 2. (Binary classification)Given a resume and a CRC level job description, decide whether the applicant is suitable for that particular level.(Use both resume and job description for the levels they applied for as input and combine the annotation 1 and annotation 2 to get the binary gold output)

Approaches Baseline Approaches for both tasks

Approaches Strategies when applying baseline models • Section Trimming for
baseline models due to input length limitation of transformer encoders Task 1 Task 2

Approaches Proposed Models for the multiclass classification task The context-aware
model using section pruning and section encoding

Approaches Proposed Models for the multiclass classification task The context-aware
model using chunk segmenting and section encoding

Approaches Proposed Models for the binary classification task Approaches The
context-aware models using chunk segmenting + section encoding + job description embedding and multi-head attention between the resume and the job description

Approaches Strategies when applying models • Section Pruning for Proposed
“encoding by sections” models in case each section exceeds the input length of transformer encoders

Analysis on Section Pruning (in Appendix) Section lengths before section
pruning Section lengths after section pruning

Experiments Data split for the multiclass classification task(Keep label distributions):
Data statistics for the competence-level classification task

Experiments Data split for binary classification task(keep label and CRC
distributions without overlap resumes between training and dev or test set ): Data statistics for the resume-to-job description matching task

Algorithm to split dataset while avoiding overlaps between training and
evaluation dataset(in Appendix) The key idea is 1. Split the data by targeted label distributions but with a smaller initial training set ratio than the original one. 2. If there are overlapping applicants, then the algorithm puts all the overlaps into the training set so that the training set ratio will be large enough to be close to the targeted training set ratio while the label distributions are still kept in a great extent.

Experiments Experimented Models W! : Whole context model + section
trimming P: Context-aware model + section pruning P⊕I:P+ section encoding C: Context-aware model + chunk segmenting C⊕I:C+ section encoding Models for the competence-level classification task W!" : Whole context + sec./job_desc. trimming P⊕I⊕J:P⊕I+ job_desc. embedding P⊕I⊕J⊕A:P⊕I⊕J+ multi-head attention P⊕I⊕J⊕AE:P⊕I⊕J-E# C⊕I⊕J:C⊕I+ job_desc. embedding C⊕I⊕J⊕A:C⊕I⊕J+ multi-head attention C⊕I⊕J⊕AE:C⊕I⊕J- E# Models for the resume-to-job description matching task

Experiments Results for the competence-level classification task.

Experiments Results for the resume-to-job description matching task.

Experiments Analysis for the competence-level classification task. Confusion matrix for
the best model of the competence-level classification task

Experiments Analysis for the resume-to-job description matching task. Confusion matrix
for the best model of the resume-to-job description matching task

Error Analysis • It’s unable to identify clinical research experience.
• It can’t identify dates of experience. • It’s hard to distinguish adjacent CRC positions.

Contributions • Introduced a new resume classification dataset. • Proposed
two new tasks for this new dataset. • Proposed novel context-aware transformer approaches for two tasks. • Conducted experiments with several proposed models. • Conducted both quantitative and qualitative analysis for future improvements.

Thank You Q & A

Competence-Level Prediction and Resume & Job De...

Competence-Level Prediction and Resume & Job Description Matching Using Context-Aware Transformer Models

Emory NLP

More Decks by Emory NLP

Other Decks in Technology

Featured

Transcript