Abstract
Despite our best efforts, tackling hate speech remains an elusive issue for researchers and practitioners alike. What can be considered hateful is subject to context, time, geography, and culture. This poses a challenge in defining standard benchmarks and modelling techniques to combat hate. However, what underpins hate is universally accepted as the intent of dehumanising and biasing against a historically vulnerable group. Unfortunately, determining both intent and power dynamics in an online setting is formidable; further, the influence of the human evaluator's lived experiences creates a gap in the human and computational understanding of hatefulness.
By examining the role of external priming via contextual signals, we aim to bridge this information gap and improve the human-computer alignment for analysing and monitoring hateful content on the Web.
Through a series of five datasets and model pairs, the thesis empirically establishes the efficacy of contextual signals in modelling hate speech-related tasks. The compelling use of contextual signals gets further solidified as our findings apply to any pipeline from feature-engineered logistic regressor to zero-shot prompted large language models. However, we caution against using a one-size-fits-all setup by quantifying the toxic connotations and scalability challenges of certain signals. To this end, the thesis outlines strategies for deployable, human-centric tools for reactive and proactive moderation paradigms, focusing on the multilingual and implicit nature of hate.