Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Great Language Game

Lars Yencken
September 02, 2013

The Great Language Game

A brief introduction to the Great Language Game, given to the Melbourne Python User Group.

Lars Yencken

September 02, 2013
Tweet

More Decks by Lars Yencken

Other Decks in Programming

Transcript

  1. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . The Great Language Game Lars Yencken Melbourne Python User Group September 2, 2013
  2. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . I’m a language geek
  3. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . I’m a human language geek
  4. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . The world has something like 7,000 languages
  5. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . The world has something like 7,000 languages So many!
  6. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . The world has something like 7,000 languages Too many to learn!
  7. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . But... with the help of a lil Python we can at least learn to tell the difference between languages
  8. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Aside: langid.py Distinguish between languages in text form
  9. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Aside: langid.py Distinguish between languages in text form ผู้สื่อข่าวไทยวิเคราะห์นโยบายผู้ขอลี้ภัยพรรคต่างๆ >>> import langid >>> langid.classify(l.encode(’utf8’)) (’th’, 1.0) >>> langid.classify(’¡Venga hombre!’) (’es’, 0.5726778160604622)
  10. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . First attempt: streaming radio ▶ There’s lots of internet radio out there! ▶ But it’s all in shitty old formats ▶ And Python support for decoding them all is not great ▶ Solution: sh module and mplayer ▶ Still too hard!
  11. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Second attempt: scrape SBS ▶ Podcasts News podcasts in about 70 languages ▶ Good quality recordings! ▶ (Sometimes) daggy Australian accents ▶ Fetching: pyquery, requests and parse ▶ Processing audio: wave + sh wrapping avconv and mp3gain ▶ Success!
  12. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Aside: sh Wraps shell calls like a boss! >>> from sh import ffmpeg >>> ffmpeg(’-i’, input_file, output_file) >>> from sh import mp3gain >>> mp3gain(’-r’, ’-k’, ’-t’, ’-s’, ’r’, sound_file)
  13. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . More about languages ▶ Wikipedia: manual data entry ▶ Freebase API: via requests
  14. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . End result: demo time!
  15. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Thanks http://greatlanguagegame.com/