to date with medical research • Read has a feature where users can curate “collections” of papers to be shared with the community • Collections are currently under-utilized
Journal access bool Registration date datetime Last login datetime Paper Features Datatype Journal str Publication date datetime MeSH terms list Dataset
2 years • Suspicious users (e.g., bots): multiple shares, paper and abstract reads per minute Remove Outliers Define Threshold for Collection • >= 5 papers per collection • Actively curated (last updated in the past 2 years)
0 0 54 29 74 35 12 0 0 0 0 0 0 20 95 38 users collections Create a user-collections (“utility”) matrix Pros - Does not need to know anything about the user or items - Can easily modify the “interaction” score based on the behaviour that you want to promote (e.g., abstract reads vs. shares) - Computationally efficient (parallelizable) - Captures inherent subtle characteristics Cons - Does not work for new users or items - Does not perform well on sparse datasets (i.e., not enough interactions)
0 0 54 29 74 35 12 0 0 0 0 0 0 20 95 38 users collections new user Create a user-collections (“utility”) matrix Pros - Does not need to know anything about the user or items - Can easily modify the “interaction” score based on the behaviour that you want to promote (e.g., abstract reads vs. shares) - Computationally efficient (parallelizable) - Captures inherent subtle characteristics Cons - Does not work for new users or items - Does not perform well on sparse datasets (i.e., not enough interactions)
matrix (R) into two latent factor matrices: 1) user-factor matrix (n_users, k), 2) item-factor matrix (k, n_items) Model Training Alternating Least Squares Rmn ≈ Pmk × QT nk = ̂ R
user’s top K recommendations that are relevant • Of the user’s top K recommendations, what proportion are relevant? precision = TP TP + FP recall = TP TP + FN • Proportion of relevant items that are captured in a user’s top K recommendation • Of the user’s relevant items, what proportion were captured in a user’s top K recommendations? Minimize number of false positives Minimize number of false negatives
care unit Reviews Adolescents Combined modality therapy Longitudinal studies Infant, newborn Blood glucose Metabolic diseases Top MeSH Terms Using TF-IDF