A brief introduction to the Great Language Game, given to the Melbourne Python User Group.
..........................................................................................................The Great Language GameLars YenckenMelbourne Python User GroupSeptember 2, 2013
View Slide
..........................................................................................................I’m a language geek
..........................................................................................................I’m a human language geek
..........................................................................................................The world has something like 7,000 languages
..........................................................................................................The world has something like 7,000 languagesSo many!
..........................................................................................................The world has something like 7,000 languagesToo many to learn!
..........................................................................................................But... with the help of a lil Pythonwe can at least learn to tell the difference between languages
..........................................................................................................Aside: langid.pyDistinguish between languages in text form
..........................................................................................................Aside: langid.pyDistinguish between languages in text formผู้สื่อข่าวไทยวิเคราะห์นโยบายผู้ขอลี้ภัยพรรคต่างๆ>>> import langid>>> langid.classify(l.encode(’utf8’))(’th’, 1.0)>>> langid.classify(’¡Venga hombre!’)(’es’, 0.5726778160604622)
..........................................................................................................First attempt: streaming radio▶ There’s lots of internet radio out there!▶ But it’s all in shitty old formats▶ And Python support for decoding them all is not great▶ Solution: sh module and mplayer▶ Still too hard!
..........................................................................................................Second attempt: scrape SBS▶ Podcasts News podcasts in about 70 languages▶ Good quality recordings!▶ (Sometimes) daggy Australian accents▶ Fetching: pyquery, requests and parse▶ Processing audio: wave + sh wrapping avconv and mp3gain▶ Success!
..........................................................................................................Aside: shWraps shell calls like a boss!>>> from sh import ffmpeg>>> ffmpeg(’-i’, input_file, output_file)>>> from sh import mp3gain>>> mp3gain(’-r’, ’-k’, ’-t’, ’-s’, ’r’, sound_file)
..........................................................................................................More about languages▶ Wikipedia: manual data entry▶ Freebase API: via requests
..........................................................................................................End result: demo time!
..........................................................................................................Thankshttp://greatlanguagegame.com/