vectors • Clean up redundant tags Compute similarity matrices • User-User • Job-Job 250k+ tags: 2-5 years experience, MS office, BA degree, …. M Measuring the chances of success or failure • Previous application history on the job • Similarity of the applicants Building the model Study the data • All categorical data • 2000 distinct features Cosine similarity Similarity weighted average Reducing feature space: 20X reduc<on in feature space size