Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building profanity filters on mobile: clbuttic sh!t

042b7c0e45c53de46667f07de2fb2614?s=47 vixentael
September 05, 2015

Building profanity filters on mobile: clbuttic sh!t

open pdf to be able to tap on links

-------------------
- profanity filters: why we need them in mobile at all?
- handle tricky cases: what is wrong with word 'classic'
- how to filter fast (strings vs sets)
- gentle filtering not to scare users

042b7c0e45c53de46667f07de2fb2614?s=128

vixentael

September 05, 2015
Tweet

More Decks by vixentael

Other Decks in Programming

Transcript

  1. BUILDING PROFANITY FILTERS clbuttic sh!t

  2. Framework Days. IT Saturday. 5.09.2015 INTERNET CENSORSHIP children religion sexual

    https://en.wikipedia.org/wiki/Censorship
  3. Framework Days. IT Saturday. 5.09.2015 WHY FILTERING GOOD ETHNICITY PROTECTS

    CHILDREN RELIGION SEXUAL ORIENTATION
  4. Framework Days. IT Saturday. 5.09.2015 WHY FILTERING BAD lack of

    trust to your users their willing to break rules →
  5. Framework Days. IT Saturday. 5.09.2015

  6. Framework Days. IT Saturday. 5.09.2015 COCK

  7. LET’S BUILD FILTER! Framework Days. IT Saturday. 5.09.2015 filter =

    list of dirty words + list of replacements + filter rule
  8. – George Carlin, 1972 Shit, piss, fuck, cunt, cocksucker, motherfucker,

    and tits. “Seven Words You Can Never Say on Television” Framework Days. IT Saturday. 5.09.2015
  9. FILTER RULES Framework Days. IT Saturday. 5.09.2015 1. search by

    entry psss…
  10. RANGE OF WORD Framework Days. IT Saturday. 5.09.2015 NSRange range

    = [text rangeOfString:badWord options:NSCaseInsensitiveSearch]; BOOL hasDirtyWord = [text localizedCaseInsensitiveContainsString:badWord];
  11. RANGE OF WORD Framework Days. IT Saturday. 5.09.2015 - (NSArray

    * )rangesOfBadWordsWithSpaceInString:(NSString * )text { __block NSMutableArray * result = [NSMutableArray array]; [self.listOfBadWordsWithSpace enumerateObjectsUsingBlock:^(NSString * badWord, NSUInteger idx, BOOL * stop) { NSRange range = [text rangeOfString:badWord options:NSCaseInsensitiveSearch]; while (range.location != NSNotFound) { [result addObject:[NSValue valueWithRange:range]]; NSRange nextRange = NSMakeRange(range.location + 1, [text length] - range.location - 1); range = [text rangeOfString:badWord options:NSCaseInsensitiveSearch range:nextRange]; } }]; return result; }
  12. SEARCH BY ENTRY Framework Days. IT Saturday. 5.09.2015 Get your

    ass down here! The grass around the creek was new, giving it a velvety look. Dusty, his heartless assassin, had found his mate.
  13. SEARCH BY ENTRY Framework Days. IT Saturday. 5.09.2015 Get your

    ass down here! The grass around the creek was new, giving it a velvety look. Dusty, his heartless assassin, had found his mate.
  14. SEARCH BY ENTRY Framework Days. IT Saturday. 5.09.2015 Get your

    ass down here! The grass around the creek was new, giving it a velvety look. Dusty, his heartless assassin, had found his mate.
  15. FALSE POSITIVES Framework Days. IT Saturday. 5.09.2015

  16. Framework Days. IT Saturday. 5.09.2015 Get your ass down here!

    The grass around the creek was new, giving it a velvety look. Dusty, his heartless assassin, had found his mate. FALSE POSITIVES
  17. Framework Days. IT Saturday. 5.09.2015 assart assault association assurance ‘ASS’

    WORDS harassment hassel hourglass impassable pass passion piassaba preassign 1250 words found http://www.morewords.com/contains/ass/
  18. Framework Days. IT Saturday. 5.09.2015 ass → butt REPLACEMENT RULES…

  19. Framework Days. IT Saturday. 5.09.2015 classic → clbuttic …FAILS

  20. Framework Days. IT Saturday. 5.09.2015

  21. Framework Days. IT Saturday. 5.09.2015 Constitution → Consbreastution AND FAILS

    AGAIN… medieval → medireview Tyson Gay → Tyson Homosexual
  22. FILTER RULES Framework Days. IT Saturday. 5.09.2015 1. search by

    entry 2. search whole word don’t u know me?
  23. SEARCH WHOLE WORD Framework Days. IT Saturday. 5.09.2015 NSString *

    scanned; if ([scanner scanCharactersFromSet:wordCharacters intoString:&scanned]) { if ([wordSet containsObject:[scanned lowercaseString]]) { NSRange range = NSMakeRange(scanner.scanLocation - scanned.length, scanned.length); [result addObject:[NSValue valueWithRange:range]]; } } NSSet * badWordsSet = [NSMutableSet setWithArray:self.listOfBadWords]; NSScanner * scanner = [NSScanner scannerWithString:text]; NSCharacterSet * wordCharacters = [NSCharacterSet alphanumericCharacterSet];
  24. SEARCH WHOLE WORD Framework Days. IT Saturday. 5.09.2015 Get your

    ass down here! The grass around the creek was new, giving it a velvety look. Dusty, his heartless assassin, had found his mate.
  25. SPACE! Framework Days. IT Saturday. 5.09.2015 AND OTHER PUNCTUATION

  26. Framework Days. IT Saturday. 5.09.2015 Get your a s s

    down here! You'd probably fire my a.s.s the first day on the job. You've covered my a_s_s every time I screwed up. PUNCTUATION
  27. Framework Days. IT Saturday. 5.09.2015 Get your a s s

    down here! You'd probably fire my a.s.s the first day on the job. You've covered my a_s_s every time I screwed up. PUNCTUATION
  28. FILTER RULES Framework Days. IT Saturday. 5.09.2015 1. search by

    entry 2. search whole word 3. handle punctuation don’t tell anyone…
  29. 1337 59341< Framework Days. IT Saturday. 5.09.2015

  30. L33T SPEAK Framework Days. IT Saturday. 5.09.2015 HOW MANY DIFFERENT

    SPELLINGS HAS ONE WORD?
  31. BITCH Framework Days. IT Saturday. 5.09.2015

  32. B1TCH Framework Days. IT Saturday. 5.09.2015 I → 1 BITCH

  33. B!TCH Framework Days. IT Saturday. 5.09.2015 I → ! BITCH

    B1TCH
  34. BI+CH Framework Days. IT Saturday. 5.09.2015 T → + BITCH

    B1TCH B!TCH
  35. I3ITCH Framework Days. IT Saturday. 5.09.2015 B → I3 BITCH

    B1TCH B!TCH BI+CH
  36. BITCH B!TCH B1TCH 8ITCH ßITCH 13ITCH L3ITCH BI7CH BI+CH BI†CH

    BIT[H BIT¢H BIT<H BITC# BITC: B1T¢H 8!†C# 8ITC/-/ 817[# (3][+(: Framework Days. IT Saturday. 5.09.2015
  37. FILTER RULES Framework Days. IT Saturday. 5.09.2015 1. search by

    entry 2. search whole word 3. handle punctuation 4. handle l33t speak my name is…
  38. SCUNTHORPE PROBLEM Framework Days. IT Saturday. 5.09.2015 https://en.wikipedia.org/wiki/Scunthorpe_problem

  39. NICE TITS In 2007, the Royal Society for the Protection

    of Birds blocked ornithological terms such as cock (male bird) and tit, shag and booby from its discussion forums Framework Days. IT Saturday. 5.09.2015
  40. FILTER RULES Framework Days. IT Saturday. 5.09.2015 1. search by

    entry 2. search whole word 3. handle punctuation 4. handle l33t speak 5. remember about exceptions blue- footed booby!
  41. TEXT FILTERING ON IOS Framework Days. IT Saturday. 5.09.2015 words

    dictionary (boobs, b00bs, b00b5) whole word scan NSScanner, NSSet
  42. TEXT FILTERING ON IOS Framework Days. IT Saturday. 5.09.2015 phrases

    dictionary (b o o b s, b.o.o.b.s, b!o!o!bs) substring scan rangeOfString
  43. TEXT FILTERING ON IOS Framework Days. IT Saturday. 5.09.2015 words

    dictionary (boobs, b00bs, b00b5) whole word scan NSScanner, NSSet phrases dictionary (b o o b s, b.o.o.b.s, bo!obs) substring scan rangeOfString +
  44. HOW FAST IS IT? Framework Days. IT Saturday. 5.09.2015 time,

    seconds 0 0,1 0,2 0,3 0,4 user text, characters count 1000 5000 10000 20000 range scanner both dirty words dictionary contains 455 words
  45. LIVE FILTERING Framework Days. IT Saturday. 5.09.2015 use RAC and

    run filter every time user inputs character
  46. IMPROVE FILTERING • Keep dictionary up to date • Whitelist

    • Levenshtein distance • Soundex functions (where a word sounds like another) • Naive bayesian inference filtering of phrases/terms Framework Days. IT Saturday. 5.09.2015
  47. POST-MODERATION Framework Days. IT Saturday. 5.09.2015 alive moderators solid community

    flag abuse
  48. DIRTY WORDS • list of dirty words in different languages

    https://github.com/shutterstock/List-of-Dirty- Naughty-Obscene-and-Otherwise-Bad-Words • list of dirty words i’ve used https://gist.github.com/vixentael/ 5ce4168e3e94d9686405 Framework Days. IT Saturday. 5.09.2015
  49. LAST SLIDE @vixentael Framework Days. IT Saturday. 5.09.2015 iOS developer

    at Stanfy
  50. THANK YOU FOR WATCHING! Framework Days. IT Saturday. 5.09.2015