$30 off During Our Annual Pro Sale. View Details »

The Great Language Game

Lars Yencken
September 02, 2013

The Great Language Game

A brief introduction to the Great Language Game, given to the Melbourne Python User Group.

Lars Yencken

September 02, 2013
Tweet

More Decks by Lars Yencken

Other Decks in Programming

Transcript

  1. .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    The Great Language Game
    Lars Yencken
    Melbourne Python User Group
    September 2, 2013

    View Slide

  2. .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    I’m a language geek

    View Slide

  3. .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    I’m a human language geek

    View Slide

  4. .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    The world has something like 7,000 languages

    View Slide

  5. .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    The world has something like 7,000 languages
    So many!

    View Slide

  6. .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    The world has something like 7,000 languages
    Too many to learn!

    View Slide

  7. .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    But... with the help of a lil Python
    we can at least learn to tell the difference between languages

    View Slide

  8. .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    Aside: langid.py
    Distinguish between languages in text form

    View Slide

  9. .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    Aside: langid.py
    Distinguish between languages in text form
    ผู้สื่อข่าวไทยวิเคราะห์นโยบายผู้ขอลี้ภัยพรรคต่างๆ
    >>> import langid
    >>> langid.classify(l.encode(’utf8’))
    (’th’, 1.0)
    >>> langid.classify(’¡Venga hombre!’)
    (’es’, 0.5726778160604622)

    View Slide

  10. .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    First attempt: streaming radio
    ▶ There’s lots of internet radio out there!
    ▶ But it’s all in shitty old formats
    ▶ And Python support for decoding them all is not great
    ▶ Solution: sh module and mplayer
    ▶ Still too hard!

    View Slide

  11. .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    Second attempt: scrape SBS
    ▶ Podcasts News podcasts in about 70 languages
    ▶ Good quality recordings!
    ▶ (Sometimes) daggy Australian accents
    ▶ Fetching: pyquery, requests and parse
    ▶ Processing audio: wave + sh wrapping avconv and mp3gain
    ▶ Success!

    View Slide

  12. .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    Aside: sh
    Wraps shell calls like a boss!
    >>> from sh import ffmpeg
    >>> ffmpeg(’-i’, input_file, output_file)
    >>> from sh import mp3gain
    >>> mp3gain(’-r’, ’-k’, ’-t’, ’-s’, ’r’, sound_file)

    View Slide

  13. .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    More about languages
    ▶ Wikipedia: manual data entry
    ▶ Freebase API: via requests

    View Slide

  14. .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    End result: demo time!

    View Slide

  15. .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    .
    .
    ..
    .
    Thanks
    http://greatlanguagegame.com/

    View Slide