Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
The Great Language Game
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Lars Yencken
September 02, 2013
Programming
360
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
The Great Language Game
A brief introduction to the Great Language Game, given to the Melbourne Python User Group.
Lars Yencken
September 02, 2013
More Decks by Lars Yencken
See All by Lars Yencken
Linguistics, a whirlwind tour!
larsyencken
0
65
Pycon 2014 Recap
larsyencken
0
76
Nine months of food
larsyencken
0
310
Automation for web development
larsyencken
0
160
Scaling a web stack
larsyencken
4
210
Similarity metrics for Japanese kanji
larsyencken
0
91
Other Decks in Programming
See All in Programming
3Dシーンの圧縮
fadis
1
680
AIエージェントの隔離技術の徹底比較
kawayu
0
470
Claspは野良GASの夢をみるか
takter00
0
180
決定論的オーケストレーションの設計と実装 / Design and Implementation of Deterministic Orchestration
nrslib
3
1.2k
Signal Forms: Beyond the Basics @ngBaguette 2026 in Paris
manfredsteyer
PRO
0
230
TypeScript+Orvalで実現する型安全かつ堅牢でスケーラブルなマルチチャネル通知基盤 / TSKaigi Night talks ~after conference~
d0riven
0
320
脅威をエンジニアリングの糧にして――現場編 / Turning Threats into Engineering Fuel — Field Edition
nrslib
0
270
Lessons from Spec-Driven Development
simas
PRO
0
150
JJUG CCC 2026 Spring: JSpecify で実現する Kotlin フレンドリーな Java API 設計
ternbusty
1
160
PHPで使える日時の表現と、その知り方 #frontend_phpcon_do
o0h
PRO
0
230
作って学ぶ、 JSX (TSX) ランタイムの基本
syumai
7
1.6k
IBM Bobを活用したレガシーアプリの最新化
oniak3ibm
PRO
1
180
Featured
See All Featured
Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation
inesmontani
PRO
3
2.3k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
JAMstack: Web Apps at Ludicrous Speed - All Things Open 2022
reverentgeek
1
470
End of SEO as We Know It (SMX Advanced Version)
ipullrank
3
4.2k
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
3
150
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
122
22k
[RailsConf 2023] Rails as a piece of cake
palkan
59
6.7k
Speed Design
sergeychernyshev
33
1.8k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
254
22k
How to Think Like a Performance Engineer
csswizardry
28
2.6k
What the history of the web can teach us about the future of AI
inesmontani
PRO
1
610
Dominate Local Search Results - an insider guide to GBP, reviews, and Local SEO
greggifford
PRO
0
190
Transcript
. . . .. . . . .. . .
. .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . The Great Language Game Lars Yencken Melbourne Python User Group September 2, 2013
. . . .. . . . .. . .
. .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . I’m a language geek
. . . .. . . . .. . .
. .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . I’m a human language geek
. . . .. . . . .. . .
. .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . The world has something like 7,000 languages
. . . .. . . . .. . .
. .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . The world has something like 7,000 languages So many!
. . . .. . . . .. . .
. .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . The world has something like 7,000 languages Too many to learn!
. . . .. . . . .. . .
. .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . But... with the help of a lil Python we can at least learn to tell the difference between languages
. . . .. . . . .. . .
. .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Aside: langid.py Distinguish between languages in text form
. . . .. . . . .. . .
. .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Aside: langid.py Distinguish between languages in text form ผู้สื่อข่าวไทยวิเคราะห์นโยบายผู้ขอลี้ภัยพรรคต่างๆ >>> import langid >>> langid.classify(l.encode(’utf8’)) (’th’, 1.0) >>> langid.classify(’¡Venga hombre!’) (’es’, 0.5726778160604622)
. . . .. . . . .. . .
. .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . First attempt: streaming radio ▶ There’s lots of internet radio out there! ▶ But it’s all in shitty old formats ▶ And Python support for decoding them all is not great ▶ Solution: sh module and mplayer ▶ Still too hard!
. . . .. . . . .. . .
. .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Second attempt: scrape SBS ▶ Podcasts News podcasts in about 70 languages ▶ Good quality recordings! ▶ (Sometimes) daggy Australian accents ▶ Fetching: pyquery, requests and parse ▶ Processing audio: wave + sh wrapping avconv and mp3gain ▶ Success!
. . . .. . . . .. . .
. .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Aside: sh Wraps shell calls like a boss! >>> from sh import ffmpeg >>> ffmpeg(’-i’, input_file, output_file) >>> from sh import mp3gain >>> mp3gain(’-r’, ’-k’, ’-t’, ’-s’, ’r’, sound_file)
. . . .. . . . .. . .
. .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . More about languages ▶ Wikipedia: manual data entry ▶ Freebase API: via requests
. . . .. . . . .. . .
. .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . End result: demo time!
. . . .. . . . .. . .
. .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Thanks http://greatlanguagegame.com/