Zhentao Xu, Veronica Perez-Rosas, Rada Mihalcea University of Michigan {frankxu, vrncapr, mihalcea}@umich.edu The Problem Background: Growing number of people suffer from mental disorders. Current diagnosis is limited. Assumption: People who often use words/tags related to mental disorders are more likely to suffer from these disorders. Research Questions: Can we identify a person’s mental health status from his/her social media posts? Can we predict the potential onset of disorder in future from current social media posts? Data Collection Modalities Top Features in Characterizing Mental Disorder & Research Conclusions mentalillness, bipolardisorder, schizophrenia, selfharm, depression, bipolar, psychosis, insanity, anxiety, depressed, insane, disorder, suicide, despair Visual modalities from photos - low-level: color, brightness, texture, lines, - high-level: faces, objects, scenes Language modalities from captions - stylistic features, (part-of-speech) n-grams, LIWC Meta modalities from posts - views, activity time, EXIF Classification Experiments and Performances Experiments: Identify social media users with mental disorder (under our assumption) Predict users prone to mental disorders using both machine learning classifiers and a two-layer neural network. Performance: Top visual features Top language features Top meta features Significant Visual Features ( Healthy users prefer features w/ positive effect size) Significant Language Features Significant Meta Features Temporal Pattern between healthy, mental disorder, and potential mental disorder posts Sample Data mentalillness 12056 tags Extend the seed tag with co-occurance Filter tags using (1) popularity, (2) PMI, (3) manually checking The Final Mental-Illness Tag Set Tag Level Post Level Final Dataset: 15,000 healthy posts; 12,056 mental illness posts; 11,828 pre-mental illness posts Mental Disorder Identification Performance Mental Disorder Prediction Performance Conclusions: Individuals suffering from or prone to mental disorders prefer darker images with high contrasts showing indoors scenes and fewer faces The derived features are useful for the prediction of user’s mental health status The combination of visual, language, and meta information lead to better performance as compared to the use of individual modalities This material is based in part upon work supported by the Michigan Institute for Data Science (MIDAS)