Finding Unavailable Data

Yeli @YellzHeard omayeli.com

What is the male equivalent of a nun?

google.com

quora.com

english.stackexchange.com

1. You can find all the gendered words. 2. You
can find the equivalent of a gendered word.

→ lady / gentleman → prince / princess → king
/ queen → father / mother → seamstress / seamster → ministress / minister → iron man → cougar

Where and how to get data

APIs Static Data Web Scraping

['woman', 'female', 'girl', 'lady', 'women', 'mother', 'daughter', 'wife'] ['man', 'male',
'boy', 'men', 'son', 'father', 'husband'] A gendered word is a word with one of these terms (above ) in its definition.

APIs: Application Programming Interface

programmableweb.com

wordnik.com

'boy', 'men', 'son', 'father', 'husband']

400 words

Static Data .json .txt .csv ...

/ queen → father / mother → seamstress → ministress → iron man

Regular Expressions -> a sequence of characters that define a
search pattern

regextester.com

~ 8000 words

Patterns

Patterns -> object of a preposition

nltk -> natural language toolkit -> for processing the english
language

text-processing.com/demo/tokenize Tokenization -> chopping up a string into pieces (called
tokens) -> throwing away certain characters, such as punctuation

Patterns -> object of a preposition -> clothing items

collinsdictionary.com/us/word-list

Web Scraping Icons made by Smashicons from www.flaticon.com/authors/smashicons

urllib.request -> opening URLs BeautifulSoup -> parsing HTML documents

~ 4000 words

/ queen → father / mother → actor / actress

bionlp-www.utu.fi/wv_demo/

Word2Vec -> words to vectors

suriyadeepan.github.io

My meal wasn’t very tasty so I put some maggi
on it. My meal wasn’t very tasty so I put some salt on it. My meal wasn’t very tasty so I put some seasoning on it. I sat on the chair to eat my meal.

Gensim -> Google trained word2vec model

Yeli @YellzHeard omayeli.com

Finding Unavailable Data

Finding Unavailable Data

More Decks by Omayeli Arenyeka

Other Decks in Technology

Featured

Transcript