Building a Recommender System for Medical Research Papers

Building a Recommender System for Medical Research Papers by Jill
Cates

Problem • QxMD’s Read app lets healthcare professionals stay up
to date with medical research • Read has a feature where users can curate “collections” of papers to be shared with the community • Collections are currently under-utilized

Objective Build a recommender system for community-curated collections of papers

User Features Datatype Institution name str Specialty str Profession str
Journal access bool Registration date datetime Last login datetime Paper Features Datatype Journal str Publication date datetime MeSH terms list Dataset

User-paper interaction Datatype Read abstract? bool Dwell time on abstract
(s) int Read full-text? bool Dwell time on full-text (s) int Shared paper? bool Thumbs-up paper? bool Thumbs-down paper? bool Dataset

Dataset User-paper interaction Datatype Read abstract? bool Dwell time on
abstract (s) int Read full-text? bool Dwell time on full-text (s) int Shared paper? bool Thumbs-up paper? bool Thumbs-down paper? bool

Data Cleaning • Inactive users: haven’t used the platform in
2 years • Suspicious users (e.g., bots): multiple shares, paper and abstract reads per minute Remove Outliers Deﬁne Threshold for Collection • >= 5 papers per collection • Actively curated (last updated in the past 2 years)

Data Transformation Two types of recommender systems: Collaborative Filtering Content-Based
Filtering Similar users like similar things Relies on user and item features item user John Jim Anne Liz Erica

10 5 0 0 100 0 6 1 0 0
54 29 74 35 12 0 0 0 0 0 0 20 95 38 Data Transformation User-collection interaction score users collections Represents a user’s interaction with a collection Create a user-collections (“utility”) matrix

Data Transformation Aggregate number of abstract and full-text reads within
a collection 10 5 0 0 100 0 6 1 0 0 54 29 74 35 12 0 0 0 0 0 0 20 95 38 users collections Create a user-collections (“utility”) matrix

Data Transformation 10 5 0 0 100 0 6 1
0 0 54 29 74 35 12 0 0 0 0 0 0 20 95 38 users collections Create a user-collections (“utility”) matrix Pros - Does not need to know anything about the user or items - Can easily modify the “interaction” score based on the behaviour that you want to promote (e.g., abstract reads vs. shares) - Computationally eﬃcient (parallelizable) - Captures inherent subtle characteristics Cons - Does not work for new users or items - Does not perform well on sparse datasets (i.e., not enough interactions)

Data Transformation 10 5 0 0 100 0 6 1
0 0 54 29 74 35 12 0 0 0 0 0 0 20 95 38 users collections new user Create a user-collections (“utility”) matrix Pros - Does not need to know anything about the user or items - Can easily modify the “interaction” score based on the behaviour that you want to promote (e.g., abstract reads vs. shares) - Computationally eﬃcient (parallelizable) - Captures inherent subtle characteristics Cons - Does not work for new users or items - Does not perform well on sparse datasets (i.e., not enough interactions)

• Matrix factorization: an unsupervised learning technique • Factorize user-item
matrix (R) into two latent factor matrices: 1) user-factor matrix (n_users, k), 2) item-factor matrix (k, n_items) Model Training Alternating Least Squares Rmn ≈ Pmk × QT nk = ̂ R

Model Evaluation Precision@K Recall@K • Proportion of items in a
user’s top K recommendations that are relevant • Of the user’s top K recommendations, what proportion are relevant? precision = TP TP + FP recall = TP TP + FN • Proportion of relevant items that are captured in a user’s top K recommendation • Of the user’s relevant items, what proportion were captured in a user’s top K recommendations? Minimize number of false positives Minimize number of false negatives

Model Evaluation Traditional ML Recommender Systems Split data into train
and test sets

Generating Collection Titles Observational studies Anesthesia Respiratory aspiration Child Intensive
care unit Reviews Adolescents Combined modality therapy Longitudinal studies Infant, newborn Blood glucose Metabolic diseases Top MeSH Terms Using TF-IDF

Building a Recommender System for Medical Resea...

Building a Recommender System for Medical Research Papers

Jill Cates

More Decks by Jill Cates

Other Decks in Technology

Featured

Transcript

Building a Recommender System for Medical Research Papers by Jill

Problem • QxMD’s Read app lets healthcare professionals stay up

Objective Build a recommender system for community-curated collections of papers

User Features Datatype Institution name str Specialty str Profession str

User-paper interaction Datatype Read abstract? bool Dwell time on abstract

Dataset User-paper interaction Datatype Read abstract? bool Dwell time on

Data Cleaning • Inactive users: haven’t used the platform in

Data Transformation Two types of recommender systems: Collaborative Filtering Content-Based

10 5 0 0 100 0 6 1 0 0

Data Transformation Aggregate number of abstract and full-text reads within

Data Transformation 10 5 0 0 100 0 6 1

Data Transformation 10 5 0 0 100 0 6 1

• Matrix factorization: an unsupervised learning technique • Factorize user-item

Model Evaluation Precision@K Recall@K • Proportion of items in a

Model Evaluation Traditional ML Recommender Systems Split data into train

Generating Collection Titles Observational studies Anesthesia Respiratory aspiration Child Intensive