CLASSIEfier: Using Machine Learning to Paint a Picture of Social Sector Trends

CLASSIEFIER: USING MACHINE LEARNING TO PAINT A PICTURE OF SOCIAL
TRENDS Dr. Paola Oliva-Altamirano, Innovation Lab, Our Community, May 2019

Who am I? Our Community - Innovation Lab 2 A
foreigner From Honduras to the US to Australia From Galaxies to Taxonomies • Dr. Paola Oliva-Altamirano, Innovation Lab, Our Community, May 2019

Outline: • Introducing Our community’s data initiatives • Background: CLASSIE
a social dictionary • How did we scope CLASSIEfier? • How did CLASSIEfier evolve as a project? • Data science for social good concept • Results and conclusions Our Community - Innovation Lab 3

Is a social enterprise and B Corp that provides advice,
connections, training and easy-to-use tech tools for community-builders. Donation Platform Grants database Training and networking Software for grants applications

Our Community - Innovation Lab 5

From CLASSIE to CLASSIEfier

Main objective – Classification of grants Our Community - Innovation
Lab 7 Australia lacked a unified taxonomy to classify subjects, beneficiaries and organization types In 2016, OC introduced CLASSIE The classification system for Australian social sector initiatives and entities CLASSIE opens the door to standard classification

CLASSIE • Subjects • Populations • Organisation type Our Community
- Innovation Lab 8 A social sector dictionary Where is the money going? and How is the Australian social sector working?

Hierarchical Classification – e.g. Subjects Social Sciences Anthropology Archeology Biological
anthropology Interdisciplinary studies Ethnic studies Indigenous studies Asian studies Sport and recreation Community recreation Parks Camps Sport Outdoor sport Mountain and rock climbing Hiking and walking Paralympics Level 1 17 categories Level 4 243 categories Level 3 492 categories Level 2 132 categories Our Community - Innovation Lab 9

Questions • How do we ensure that users are choosing
the correct category? • How do we classify historical data? 800,000 grant applications since 2010 Now we have the dictionary – How do we apply it? Our Community - Innovation Lab 10

CLASSIEfier is a tool that will automatically classify grants Our
Community - Innovation Lab 11

How did we scope CLASSIEfier?

Source: “One model to rule them all” by Christoph Molnar

CLASSIEfier – Two different models Our Community - Innovation Lab
14 1. To give automatic suggestions to grant applicants 2. To classify historical data Seems like you are applying for: q Sports and recreation q Art and culture q Community and development

CLASSIEfier: How does it work? Our Community - Innovation Lab
15

How did CLASSIEfier evolve?

CLASSIEfier – The Algorithm How do we generate more labels?
At least 2000 applications per category What do we have? Our Community - Innovation Lab 17 800,000 grant applications 4,000 grant applications labeled by users since CLASSIE went live

First phase: a simple keyword matching to extract more labels
Keyword matching = the process of searching for ‘Literal’ matches (e.g. “hospital”) in a given piece of text (e.g. a grant description) to identify groups or subjects (e.g. health sector). Stages: • Identify keywords for CLASSIE • Extract applications that exhibit a strong match • Score the classification done by Users We found that: • Keyword matching accuracy differs from one category to another. • On average is around 80% Example: This project will raise awareness and empower deaf people by providing key mental health information in their primary language (Australian Sign Language). People with hearing impediment. Our Community - Innovation Lab 18 CLASSIEfier – The Algorithm For example: “orphans” is a confusing category. “wildlife welfare” is a straight forward category

Our Community - Innovation Lab 19 DIFFICULTY #1: Multilabel Second
phase: Training the Machine Learning model CLASSIEfier – The Algorithm Training dataset: 128,000 grant applications Classified by keyword matching DIFFICULTY #2: Hierarchy DIFFICULTY #3: Number of labels per category

Our Community - Innovation Lab 20 Example: A grant application
that is aimed at helping teenagers with autism. Beneficiaries: • “Children and youth” at level 1 • “Adolescents” at level 2 And also, • “People with disabilities” at level 1 • “People with intellectual disabilities” at level 2 Multilabels and Hierarchy

• Categories such as Confucius, North American people, Nomadic people
among others will have less than 100 grant applications. Our Community - Innovation Lab 21 20X less Than the 2000 minimum required DIFFICULTY #3: Number of labels per category Niche classification or “black holes”

Reads the application Classification Level 1 – Machine learning Sports
and recreation Classification Level 2: We have enough labels we use another ML model Classification Level 3: Keyword matching Information and communications Classification Level 2: we do not have enough labels we use keyword matching Classification Level 3: Keyword matching How do we solve it? – Separate training Our Community - Innovation Lab 22

Our Community - Innovation Lab 23 Stages: • Choose the
best model – k-nearest neighbours (k-nn) • Choose the best parameters • Choose the best scoring Third phase: Model interpretation: scoring and checking for biases CLASSIEfier – The Algorithm

Scoring Our Community - Innovation Lab 24 Recall: !" !"#$%
&'(&)*+&,' ,- .&//&'0 1*213/ Precision: !" !"#$" &'(&)*+&,' ,- 2*( 453(&)+&,'/

Scoring Our Community - Innovation Lab 25 Based on the
fact that each application has several categories Recall:How many categories got picked per application 0 None 1 <45% 2 >45% 3 Perfect match Precision:How many categories are wrong per application 0 All 1 >55% 2 <55% 3 None – Perfect match 0 6 Useless Model Perfect Model!! CLASSIEfier ~4-5

Misclassifications and black holeswill cause to underfund minorities that are
already overlooked Our Community - Innovation Lab 26

The Data Science for Social Good Movement “The best minds
of my generation are thinking about how to make people click ads,” he says. “That sucks.” -- Jeff Hammerbacher (Cloudera and Facebook data leader)

Algorithmic bias • This will happen if you feed in
the algorithm with data that is already biased or with insufficient data - The algorithm will predict biased classifications. • Algorithms are mirrors Our Community - Innovation Lab 28 Sport people

Know your Model! Our Community - Innovation Lab 29 xkdc.com/1838/

Our Community - Innovation Lab 30 SHAP (SHapley Additive exPlanations)
WEAT tests proposed in Caliskan et al. 2017 AI Fairness 360

Our Community - Innovation Lab 31 Document everything! – this
is how we tackle biases Choose transparency

Results and conclusions It is not feasible to classify human
natural languages with 100% accuracy Our Community - Innovation Lab 32 Church Religion Christian Model = Religion Reality – A fete in a Catholic school

Results and conclusions • CLASSIEfier works similar to humans, not
better not worse. ~70-80% accuracy Our Community - Innovation Lab 33 Church Religion Christian Out 200 applications classified by Users we found that: 63% right 18% wrong 19% Half right

Results and conclusions • The model is also discriminating between
good and bad applications Our Community - Innovation Lab 34 Church Religion Christian Approved Grant applications 85% accuracy Declined Grant applications 75% accuracy

Results and conclusions CLASSIEfier is now feeding back into CLASSIE
Our Community - Innovation Lab 35 Church Religion Christian Seems like you are applying for: q Sports and recreation q Art and culture q Community and development

CLASSIEfier – More than just an algorithm Data preprocessing Writing
and testing the algorithm Production – back and front end product Maintenance Our Community - Innovation Lab 36

DO YOU WANT TO LEARN MORE? Linkedin: paola-oliva-altamirano Email: [email protected]
Innovation lab: https://www.ourcommunity.com.au/innovationlab

CLASSIEfier: Using Machine Learning to Paint a ...

CLASSIEfier: Using Machine Learning to Paint a Picture of Social Sector Trends

More Decks by Paola Oliva-Altamirano

Other Decks in Research

Featured

Transcript