Slide 1

Slide 1 text

YOU USED PYTHON FOR WHAT?! James Tauber

Slide 2

Slide 2 text

YOU USED PYTHON FOR WHAT?! James Tauber ARE USING

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Quisition ?

Slide 6

Slide 6 text

habitualist

Slide 7

Slide 7 text

YOU USED PYTHON FOR WHAT?!

Slide 8

Slide 8 text

jtauber.github.com jtauber.com @jtauber

Slide 9

Slide 9 text

THE TOPICS •Some Mathematics •Some Weird Programming Languages •Analyzing Keyboard Performances •Generating Ancient Greek Graded Readers •Emulating the Apple ][ •Writing an Operating System

Slide 10

Slide 10 text

PRIMARY GOAL YOU FIND AT LEAST ONE OF THESE INTERESTING

Slide 11

Slide 11 text

SECONDARY GOAL YOU QUESTION MY SANITY

Slide 12

Slide 12 text

I DON’T REALLY UNDERSTAND SOMETHING UNTIL I’VE IMPLEMENTED IT IN PYTHON

Slide 13

Slide 13 text

BRAINF***

Slide 14

Slide 14 text

EIGHT COMMANDS > increment the data pointer < decrement the data pointer + increment the byte at the data pointer - decrement the byte at the data pointer . output the byte at the data pointer as an ASCII-encoded character , accept one byte of input, storing its value in the byte at the data pointer [ if the byte at the data pointer is zero, jump forward after matching ] ] if the byte at the data pointer is nonzero, jump back after matching [

Slide 15

Slide 15 text

>+++++++++[<++++++++>-]<. >+++++++[<++++>-]<+. +++++++. . +++. [-]>++++++++[<++++>-]<. >+++++++++++[<++++++++>-]<-. --------. +++. ------. --------. [-]>++++++++[<++++>-]<+. [-]++++++++++.

Slide 16

Slide 16 text

class brainf: def __init__(self, program): self.mem = [0] * 65536 self.p = 0 self.pc = 0 self.program = program

Slide 17

Slide 17 text

def run(self, stop=None): if stop is None: stop = len(self.program) while self.pc < stop: c = self.program[self.pc] if c == ">": self.p += 1 elif c == "<": self.p -= 1 elif c == "+": self.mem[self.p] += 1 elif c == "-": self.mem[self.p] -= 1 elif c == ".": sys.stdout.write(chr(self.mem[self.p])) elif c == ",": self.mem[self.p] = ord(sys.stdin.read(1)) elif c == "[": ...

Slide 18

Slide 18 text

elif c == "[": depth = 1 start = end = self.pc while depth: end += 1 if self.program[end] == "[": depth += 1 if self.program[end] == "]": depth -= 1 while self.mem[self.p]: self.pc = start + 1 self.run(end) self.pc = end elif c == "]": raise "unbalanced ]" else: pass self.pc += 1

Slide 19

Slide 19 text

,>++++++[<-------->-],[<+>-],<.>.

Slide 20

Slide 20 text

GIBBONS-LESTER-BIRD ALGORITHM FOR ENUMERATING THE POSITIVE RATIONALS

Slide 21

Slide 21 text

def reciprocal((n, d)): return (d, n) def one_take((n, d)): return (d - n, d) def proper_fraction((n, d)): return (n // d, (n % d, d)) def improper_fraction(i, (n, d)): return ((d * i) + n, d)

Slide 22

Slide 22 text

def rationals(): r = (0,1) while True: n, y = proper_fraction(r) z = improper_fraction(n, one_take(y)) r = reciprocal(z) yield r

Slide 23

Slide 23 text

def rationals(): r = (0,1) while True: r = (r[1], (r[1] * (r[0] // r[1])) + (r[1] - (r[0] % r[1]))) yield r

Slide 24

Slide 24 text

def rationals(r=(0,1)): while True: r = (r[1], (r[1] * (r[0] // r[1])) + (r[1] - (r[0] % r[1]))) yield r

Slide 25

Slide 25 text

PLOUFFE FORMULA FOR PI IN HEX

Slide 26

Slide 26 text

def pi(): N = 0 n, d = 0, 1 while True: xn = (120*N**2 + 151*N + 47) xd = (512*N**4 + 1024*N**3 + 712*N**2 + 194*N + 15) n = ((16 * n * xd) + (xn * d)) % (d * xd) d *= xd yield 16 * n // d N += 1

Slide 27

Slide 27 text

EVEN WHEN I’VE IMPLEMENTED IT IN PYTHON, THAT DOESN’T MEAN I UNDERSTAND IT

Slide 28

Slide 28 text

CHURCH ENCODING

Slide 29

Slide 29 text

TRUE = lambda a: lambda b: (a) FALSE = lambda a: lambda b: (b) (TRUE)(True)(False) == True (FALSE)(True)(False) == False

Slide 30

Slide 30 text

AND = lambda a: lambda b: (a)(b)(a) OR = lambda a: lambda b: (a)(a)(b) NOT = lambda a: lambda b: lambda c: (a)(c)(b) (AND)(TRUE)(FALSE) == (FALSE) (AND)(TRUE)(FALSE)(True)(False) == False

Slide 31

Slide 31 text

CONS = lambda a: lambda b: lambda c: (c)(a)(b) CAR = lambda a: (a)(TRUE) CDR = lambda a: (a)(FALSE) (CAR)((CONS)(1)(2)) == 1 (CDR)((CONS)(1)(2)) == 2

Slide 32

Slide 32 text

UNCHURCH_BOOLEAN = (CONS)(True)(False) (UNCHURCH_BOOLEAN)((NOT)(TRUE)) == False (UNCHURCH_BOOLEAN)((OR)(TRUE)(FALSE)) == True

Slide 33

Slide 33 text

ZERO = FALSE SUCC = lambda a: lambda b: lambda c: (b)((a)(b)(c)) ONE = (SUCC)(ZERO) TWO = (SUCC)(ONE) THREE = (SUCC)(TWO) FOUR = (SUCC)(THREE) def church_number(n): return SUCC(church_number(n - 1)) if n else FALSE

Slide 34

Slide 34 text

PLUS = lambda a: lambda b: lambda c: lambda d: (a)(c)((b)(c)(d)) MULT = lambda a: lambda b: lambda c: (b)((a)(c)) EXP = lambda a: lambda b: (b)(a) UNCHURCH_NUMBER = lambda a: (a)(lambda b: b + 1)(0)

Slide 35

Slide 35 text

(UNCHURCH_NUMBER)(ZERO) == 0 (UNCHURCH_NUMBER)(ONE) == 1 (UNCHURCH_NUMBER)(TWO) == 2 (UNCHURCH_NUMBER)((PLUS)(THREE)(TWO)) == 5 (UNCHURCH_NUMBER)((MULT)(THREE)(TWO)) == 6 (UNCHURCH_NUMBER)((EXP)(THREE)(TWO)) == 9

Slide 36

Slide 36 text

GIT-STYLE VERSIONING OF PYTHON DATA STRUCTURES

Slide 37

Slide 37 text

GIT FUNDAMENTALS • each file’s contents (called a blob) are hashed and that hash becomes a key into a dictionary of files • each directory is viewed as a sorted list of filenames paired with hash of file’s contents • this directory listing can then be hashed and so an entire directory tree can be hashed • a tree’s hash + commit message + committer + timestamp + parent(s) can then be hashed

Slide 38

Slide 38 text

FILES AND DIRECTORIES ARE KINDA LIKE STRINGS AND DICTIONARIES

Slide 39

Slide 39 text

class NodeBase: def __init__(self, repo, content): self.repo = repo self.content = self.shrink(content) def __bytes__(self): return ("%s\n%r" % (self.__class__, self.content) ).encode("utf-8") class Atom(NodeBase): def shrink(self, content): return content def expand(self): return self.content

Slide 40

Slide 40 text

class List(NodeBase): def shrink(self, content): return [ self.repo.shrink(item) for item in content ] def expand(self): return [ self.repo.expand(item) for item in self.content ]

Slide 41

Slide 41 text

class Repo: def __init__(self): self.objects = {} self.refs = {} self.HEAD = "master" def store(self, obj): sha = hashlib.sha1(bytes(obj)).hexdigest() self.objects[sha] = obj return sha

Slide 42

Slide 42 text

def shrink(self, content): if isinstance(content, dict): return self.store(Dictionary(self, content)) elif isinstance(content, list): return self.store(List(self, content)) elif isinstance(content, tuple): return self.store(Tuple(self, content)) else: return self.store(Atom(self, content))

Slide 43

Slide 43 text

def expand(self, sha): return self.objects[sha].expand()

Slide 44

Slide 44 text

def create_commit(self, obj_sha, message, parents=None): if parents is None: parents = [] return self.store( Commit(self, obj_sha, message, parents)) def commit(self, obj, message): obj_sha = self.shrink(obj) old_head = self.refs[self.HEAD] commit_sha = self.create_commit( obj_sha, message, parents=[old_head]) self.refs[self.HEAD] = commit_sha return commit_sha

Slide 45

Slide 45 text

def create_branch(self, branch_name, commit_sha=None): if commit_sha is None: commit_sha = self.refs[self.HEAD] self.refs[branch_name] = self.refs[self.HEAD] def checkout_branch(self, branch_name): self.HEAD = branch_name

Slide 46

Slide 46 text

FUTURE PLANS •more object types and representations •merging •diffing •remotes

Slide 47

Slide 47 text

CZERNY KEYBOARD PERFORMANCE ANALYSIS

Slide 48

Slide 48 text

THE BASIC IDEA •record a performance of the exercise / piece as MIDI (or similar) events •align the performed notes with the “score” notes •identify errors as well as fluctuations in timing, velocity, etc. •basically a performance “diff”

Slide 49

Slide 49 text

A_notes = [1, 2, 3, 2, 1, 3, 4, 5, 6, 5, 4, 5, 6, 5, 4, 3] B_notes = [6, 5, 4, 5, 6, 4, 3, 2, 1, 2, 3, 2, 1, 2, 3, 4] C_notes = [6, 5, 4, 5, 6, 4, 3, 2, 1, 2, 3, 2, 1, 2, 3, 2] D_notes = [1] scale = [0, 2, 4, 5, 7, 9, 11] full_scale = scale + [12 + i for i in scale] + [24 + i for i in scale] sections = [ (A_notes, 4, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]), (B_notes, 4, [13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]), (C_notes, 4, [0]), (D_notes, 32, [0]) ] for section in sections: pattern, duration_64, offset = section for o in offset: for note in pattern: print >>f, 48 + full_scale[note + o - 1], duration_64

Slide 50

Slide 50 text

48 4 50 4 52 4 50 4 48 4 52 4 ... SCORE

Slide 51

Slide 51 text

#!/usr/bin/env python from mac import CoreMIDI import time start = time.time() def callback(event): if event[0] == 156: print time.time() - start, event[1], event[2] CoreMIDI.pyCallback = callback while True: pass

Slide 52

Slide 52 text

4.44376397133 48 73 4.6487929821 50 59 4.68273806572 48 0 4.81475901604 50 0 4.83673501015 52 66 5.03977394104 50 69 5.04273104668 52 0 5.23374605179 50 0 5.24569511414 48 70 5.45274806023 52 63 5.4536960125 48 0 5.63474702835 52 0 PERFORMANCE

Slide 53

Slide 53 text

align with Needleman-Wunsch algorithm def note_similarity(score_note, performance_note): if score_note[0] == performance_note[1]: return 1 elif abs(score_note[0] - performance_note[1]) < 3: return 0.5 else: return 0 with a -1 for insertions and deletions plan to tweak to include velocity, duration, etc

Slide 54

Slide 54 text

FUTURE PLANS •represent performance as MIDI file •represent score as lilypond (and/or Sebastian) •generate anotated score showing mistakes •with fingering information, highlight deficiencies of particular fingers •grade performances •study expression

Slide 55

Slide 55 text

GRADED-READER GENERATION

Slide 56

Slide 56 text

THE BASIC IDEA •overcoming limitations of inductive, corpus-based learning of a language •generate graded readers •build up from much smaller pieces •optimize ordering of vocabulary •present in larger context even when context is not yet understood in target language

Slide 57

Slide 57 text

THE MYTH OF VOCABULARY COVERAGE The 10 most common words account for 37% of the text The 100 most common words account for 66% of the text

Slide 58

Slide 58 text

THE MYTH OF VOCABULARY COVERAGE if you learn top 100 words, you’ll know... one word in 99.9% of verses 50% of words in 91.3% of verses 75% of words in 24.4% of verses 90% of words in 2.1% of verses 95% of words in 0.6% of verses 100% of words in 0.4% of verses

Slide 59

Slide 59 text

THE MYTH OF VOCABULARY COVERAGE if you learn top 100 words, you’ll know... one word in 99.9% of verses 50% of words in 91.3% of verses 75% of words in 24.4% of verses 90% of words in 2.1% of verses 95% of words in 0.6% of verses 100% of words in 0.4% of verses

Slide 60

Slide 60 text

VOCABULARY ORDERING •frequency ordering is far from optimal •vocabulary ordering as traveling salesman problem •simulated annealing

Slide 61

Slide 61 text

def calc_score(self, target_list): known_items = set() score = 0.0 num_targets = float(len(target_list)) for step, target in enumerate(target_list): to_learn = target.prereqs - known_items score += len(to_learn) * step / num_targets known_items.update(to_learn) return score SIMULATED ANNEALING

Slide 62

Slide 62 text

while temp > final_temp: for i in range(iterations): s1 = self.scorer.calc_score(target_list) p1 = randrange(0, num_targets) p2 = randrange(0, num_targets) new_list = self.swap(target_list, p1, p2) s2 = self.scorer.calc_score(new_list) if s2 > s1: target_list = new_list else: if random() < exp((s2 - s1) / temp): target_list = new_list temp = temp * alpha

Slide 63

Slide 63 text

# a dictionary mapping targets to a set of items that are # needed (and initially missing) MISSING_IN_TARGET = defaultdict(set) ... # for each item, a score of how bad it is that it is missing MISSING_ITEMS = defaultdict(int) for missing in MISSING_IN_TARGET.values(): for item in missing: MISSING_ITEMS[item] += 1. / (2 ** len(missing)) if not MISSING_ITEMS: break next_item = sorted(MISSING_ITEMS, key=MISSING_ITEMS.get)[-1] for target in TARGETS_MISSING[next_item]: MISSING_IN_TARGET[target].remove(next_item)

Slide 64

Slide 64 text

OTHER WORK AND FUTURE PLANS •consideration of other prerequisite knowledge •intrinsic difficulty of a word (cognates, etc) •partial dynamic interlinears •inline replacement

Slide 65

Slide 65 text

CLEESE AN OPERATING SYSTEM IN PYTHON

Slide 66

Slide 66 text

THE BASIC IDEA •run the Python VM on bare metal •remove parts of CPython that assume an operating system •minimize libc •add built-ins to directly access memory and I/O ports •write drivers, etc in Python •even “must be written in assembly” parts use python-like syntax

Slide 67

Slide 67 text

static PyMethodDef ports_methods[] = { ! {"inb", ports_inb, METH_VARARGS, NULL}, ! {"outb", ports_outb, METH_VARARGS, NULL}, ! {NULL, NULL}, }; PyObject * _Ports_Init(void) { ! return Py_InitModule4("ports", ports_methods, ! ! NULL, (PyObject *)NULL, PYTHON_API_VERSION); }

Slide 68

Slide 68 text

static PyObject * ports_outb(PyObject *self, PyObject *args) { ! PyObject *v, *a; ! if(!PyArg_UnpackTuple(args, "outb", 2, 2, &v, &a)) ! ! return NULL; ! if(!PyInt_CheckExact(a)) ! ! return NULL; ! if(PyInt_CheckExact(v)) ! ! outb(PyInt_AS_LONG(a),PyInt_AS_LONG(v)); ! else if(PyString_Check(v) && PyString_GET_SIZE(v) == 1) ! ! outb(PyInt_AS_LONG(a), PyString_AS_STRING(v)[0]); ! else ! ! return NULL; ! ! Py_INCREF(Py_True); ! return Py_True; }

Slide 69

Slide 69 text

import ports def on(freq): if freq: ports.outb(freq, 0x42) ports.outb(freq >> 8, 0x42) ports.outb(ports.inb(0x61) | 0x03, 0x61) else: off() def off(): ! ports.outb(inb(0x61) & 0xFC, 0x61) SPEAKER

Slide 70

Slide 70 text

import ports def get_scancode(): while not (ports.inb(0x64) & 0x21) == 0x01: pass return ports.inb(0x60) KEYBOARD

Slide 71

Slide 71 text

TWO APPROACHES TO VM •start with full CPython and remove bits as you hit problems •start with nothing and add just the CPython bits you need

Slide 72

Slide 72 text

FUTURE PLANS •proper memory management •less task-specific subsetting of CPython •PyPy instead of CPython?

Slide 73

Slide 73 text

APPLEPY AN APPLE ][ EMULATOR

Slide 74

Slide 74 text

self.ops[0xA8] = lambda: self.TAY() self.ops[0xA9] = lambda: self.LDA(self.immediate_mode()) self.ops[0xAA] = lambda: self.TAX() self.ops[0xAC] = lambda: self.LDY(self.absolute_mode()) self.ops[0xAD] = lambda: self.LDA(self.absolute_mode())

Slide 75

Slide 75 text

def immediate_mode(self): return self.get_pc() def absolute_mode(self): self.cycles += 2 return self.read_pc_word()

Slide 76

Slide 76 text

def LDA(self, operand_address): self.accumulator = self.update_nz( self.read_byte(operand_address))

Slide 77

Slide 77 text

def update_nz(self, value): value = value % 0x100 self.zero_flag = [0, 1][(value == 0)] self.sign_flag = [0, 1][((value & 0x80) != 0)] return value

Slide 78

Slide 78 text

class SoftSwitches: ... def read_byte(self, cycle, address): assert 0xC000 <= address <= 0xCFFF if address == 0xC000: return self.kbd elif address == 0xC010: self.kbd = self.kbd & 0x7F elif address == 0xC030: if self.speaker: self.speaker.toggle(cycle)

Slide 79

Slide 79 text

THE REAL CHALLENGES •character generator •cassette interface •speaker •graphics •(disk)

Slide 80

Slide 80 text

OTHER THINGS •Forth Interpreter •Pascal Interpreter (as a prolegomenon to even more insanity) •Relational Algebra Implementation •Quantum Computing •Unicode Collation Algorithm Implementation •Global Illumination Renderer •Hacking Data Files from Lord of the Rings Online and Skyrim

Slide 81

Slide 81 text

jtauber.github.com jtauber.com @jtauber