Principles of Dynamic Languages

Copyright (C) 2008, http://www.dabeaz.com 2- Principles of Dynamic Languages CSPP51060
- Winter'08 University of Chicago David Beazley (http://www.dabeaz.com) 1

Copyright (C) 2008, http://www.dabeaz.com 1- Introduction to Dynamic Languages Section
1 1

Copyright (C) 2008, http://www.dabeaz.com 1- A Quote 2 “There are
two types of programming languages--those that everyone hates and those that nobody uses.” - John Ousterhout (overheard at a conference) This course is mostly about the ﬁrst category...

Copyright (C) 2008, http://www.dabeaz.com 1- A Recent Publication 3 IEEE
Computer, Feb. 2007

Copyright (C) 2008, http://www.dabeaz.com 1- Book Sales (Q1'07) 4 "State
of the Computer Book Market Q1 07" - Mike Hendrickson Java 63136 C# 52655 Javascript 48266 PHP 41933 C/C++ 41311 Visual Basic 26385 Ruby 25380 SQL 22188 Perl 10308 Python 9909

Copyright (C) 2008, http://www.dabeaz.com 1- Questions 5 • What is
a "dynamic programming language?" • What are they used for? • Where did they come from? • How do you use them?

Copyright (C) 2008, http://www.dabeaz.com 1- Part I 6 What is
a "dynamic programming language?"

Copyright (C) 2008, http://www.dabeaz.com 1- Language Classiﬁcation 7 • The
terminology is somewhat imprecise • Static programming languages (a.k.a., "serious" programming languages) C, C++, C#, Java, ADA, Pascal, etc. • Dynamic programming languages (a.k.a., "hacky" programming languages) Perl, Python, Ruby, Tcl, Javascript, PHP,, etc.

Copyright (C) 2008, http://www.dabeaz.com 1- What is the Difference? 8
• Clearly there is some kind of distinction • Other than dynamic languages often being derided by "real programmers" • Let's look at a simple programming problem...

Copyright (C) 2008, http://www.dabeaz.com 1- Programming Problem • Dave's Mortgage
Dave has taken out a $500,000 mortgage from Guido's Mortgage, Stock, and Viagra trading corporation. He got an unbelievable rate of 4% and a monthly payment of only $499. However, Guido, being kind of soft-spoken, didn't tell Dave that after 2 years, the rate changes to 9% and the monthly payment becomes $3999. 9 • Question: How much does Dave pay and how many months does it take?

Copyright (C) 2008, http://www.dabeaz.com 1- Question • How do we
write a program to solve this problem? • As this is a computer science course, let's use a "serious" programming language • For example : C 10

Copyright (C) 2008, http://www.dabeaz.com 1- Solution (ANSI C) #include <stdio.h>
int main(int argc, char *argv[]) { double principle = 500000; double payment = 499; double rate = 0.04; int month = 0; double total_paid = 0; while (principle > 0) { principle = principle*(1+rate/12) - payment; total_paid += payment; month += 1; if (month == 24) { payment = 3999; rate = 0.09; } } printf("Total paid %0.2f\n", total_paid); printf("Months %d\n", month); } 11

Copyright (C) 2008, http://www.dabeaz.com 1- Or if you prefer, Java...
public class Mortgage { public static void main(String[] args) { double principle = 500000; double payment = 499; double rate = 0.04; int month = 0; double total_paid = 0; while (principle > 0) { principle = principle*(1+rate/12) - payment; total_paid += payment; month += 1; if (month == 24) { payment = 3999; rate = 0.09; } } System.out.println("Total paid " + total_paid); System.out.println("Months " + month); } } 12

Copyright (C) 2008, http://www.dabeaz.com 1- Compilation • In "serious" languages,
programs are compiled 13 shell % cc mortgage.c -o mortgage.exe shell % • Requires the use of a compiler/development environment (gcc, Visual Studio, etc.) shell % javac Mortgage.java shell % • Produces an executable/class ﬁle that is separate from the original source code • That is what you use to run the program

Copyright (C) 2008, http://www.dabeaz.com 1- Sample Output shell % mortgage.exe
Total paid 2623323.00 Months 677 shell % shell % java Mortgage Total paid 2623323.0 Months 677 shell % 14 • Running the programs

Copyright (C) 2008, http://www.dabeaz.com 1- More on Compilation 15 •
Compilation is a one-time operation. When you want to run the program, you just use the output of the compiler (e.g., the .exe ﬁle) • If you want to make any change to the program, the source must be recompiled. • Edit/compile/run/debug cycle.

Copyright (C) 2008, http://www.dabeaz.com 1- More on Compilation 16 •
Compilers perform extensive error checking/validation. • Goal is to ﬁnd errors before the program runs (reported as compiler errors) • To do this, programs include extra speciﬁcations that are used to perform these checks. • Usually associated with "type-checking"

Copyright (C) 2008, http://www.dabeaz.com 1- Type Checking 17 • All
data/variables have a ﬁxed "type" (static) int main(int argc, char *argv[]) { double principle = 500000; double payment = 499; double rate = 0.04; int month = 0; double total_paid = 0; ... • Inconsistent use results in an error if (month == 24) { payment = "A lot"; rate = 0.09; } mortgage.c:15: error: incompatible types in assignment

Copyright (C) 2008, http://www.dabeaz.com 1- Type Checking 18 • All
functions/methods have prototypes double square(double x) { return x*x; } • Inconsistent use results in errors double y = square(3,4) // Error. Too many args double y = square("Hello") // Error. Bad arg type char *z = square(4.0) // Error. Bad return type • Emphasize: Errors caught during compilation

Copyright (C) 2008, http://www.dabeaz.com 1- Static Languages 19 • In
compiled languages, the main focus is the compiler. • Compiler produces executables, performs validation, reports errors, performs various kinds of optimizations, etc. • The result is a "static" program. A program whose functionality is rigidly ﬁxed at the time of compilation. A program that can not be changed without recompiling.

Copyright (C) 2008, http://www.dabeaz.com 1- Static Languages 20 • Since
"static" programs have been successfully compiled, you are reasonably sure that they are free from certain kinds of errors (especially inconsistent use of data). • (Of course, there may be other bugs) • Since a compiler provides a framework for analyzing programs, a lot of serious computer science has focused on this.

Copyright (C) 2008, http://www.dabeaz.com 1- Dynamic Languages 21 • A
main feature of "dynamic" languages is that they get rid of separate compilation • You write programs (usually without worrying about low-level details). • You then just "run" the program. • Let's look at an example...

Copyright (C) 2008, http://www.dabeaz.com 1- Solution (Python) # mortgage.py principle
= 500000 payment = 499 rate = 0.04 month = 0 total_paid = 0 while principle > 0: principle = principle*(1+rate/12) - payment total_paid += payment month += 1 if month == 24: payment = 3999 rate = 0.09 print "Total paid %0.2f" % total_paid print "Months %d" % month 22

Copyright (C) 2008, http://www.dabeaz.com 1- Sample Output shell % python
mortgage.py Total paid 2623323.00 Months 677 shell % 23 • Running a Python program • The program is executed by an interpreter (python) that reads statements from the input program and runs them one after the other.

Copyright (C) 2008, http://www.dabeaz.com 1- Some Observations 24 • There
is no separate compilation. You just run Python on the program. The source code is the program. • If you make changes, they show up next time. • You don't have to package code into a main() function or anything similar to that. • A program can be just a sequence of statements.

Copyright (C) 2008, http://www.dabeaz.com 1- More on Interpreters 25 •
Interpreters delay error checking/validation to run-time. As a result, programs don't generally involve explicit "type" declarations principle = 500000 payment = 499 rate = 0.04 month = 0 total_paid = 0 Notice how none of these variables assignments have a "type" • One consequence: Programs in dynamic languages tend to involve much less typing (at the keyboard)

Copyright (C) 2008, http://www.dabeaz.com 1- Dynamic Type Checking 26 x
= 42 # x is an integer ... x = "hello" # x is now a string (OK) • In dynamic languages, variables are not restricted to a single type of data • The type of a "variable" is associated with whatever value is currently assigned to the variable---it may change while running! • This is very different than C/C++/Java.

Copyright (C) 2008, http://www.dabeaz.com 1- Dynamic Type Checking 27 z
= x + y # Succeeds if x + y makes sense x = 37 y = 42 z = x + y # Ok. z = 79 x = "Hello" y = "World" z = x + y # Ok. z = "HelloWorld" x = 37 y = "World" z = x + y # Error! This operation fails because the two operands are incompatible (number and string) • All operations involve run-time checks

Copyright (C) 2008, http://www.dabeaz.com 1- Interactive Console 28 • Since
dynamic languages do everything at run-time, the interpreters can often be used interactively (like a shell) shell % python Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin >>> 3 + 4 7 >>> print "Hello World" Hello World >>> • Sometimes known as a "read-eval" loop

Copyright (C) 2008, http://www.dabeaz.com 1- Dynamic == Run time 29
• As a general rule, when a computer scientist talks about some part of a program being "dynamic", it means that it occurs while the program runs. • Dynamic Typing - Type checking at run time. • Dynamic Binding - Virtual methods in OO • Dynamic Linking - Run-time linking of program modules/libraries

Copyright (C) 2008, http://www.dabeaz.com 1- Some Complaints 30 • Performance.
Dynamic programs run much slower than static programs because they perform all of the error-checking as the program runs. • Systems. Hard to do low-level hacking of the hardware (i.e., device drivers) • Validation. Since errors are not detected until a program runs, programs may have hidden/obscure errors (that would have been caught by a compiler).

Copyright (C) 2008, http://www.dabeaz.com 1- Benefits 31 • Dynamic languages
have some benefits. • Rapid Development. Languages are high-level. Programs assembled from components • Scripting. Complex applications can be controlled by programmable scripts that can be changed without having to recompile • Flexibility. Programs can be easily changed/ reconfigured.

Copyright (C) 2008, http://www.dabeaz.com 1- Beneﬁts 32 • Ease of
use. Dynamic languages are often better suited for end-users. They do not require users to worry about low-level implementation details (such as types). • Portability. Languages are often so high level, they work easily across different machines. • Prototyping. Often signiﬁcantly easier to prototype a system in a dynamic language.

Copyright (C) 2008, http://www.dabeaz.com 1- Applications 33 • Dynamic languages
are used almost everywhere--often behind the scenes. • Internet (Google,Web, etc.) • Movie-making (special effects) • Television (control systems) • Scientiﬁc computing (supercomputing) • Robots • Video games

Copyright (C) 2008, http://www.dabeaz.com 1- Commentary 34 • Dynamic languages
are often viewed as exotic, unreliable, and "unserious" by managers and crusty software engineers • In many cases, their use in an organization is subversive (initiated by lone-programmers, interns, students, etc.) • In certain cases, the use of such languages is considered to be a strategic advantage (a.k.a., a "trade secret").

Copyright (C) 2008, http://www.dabeaz.com 1- The Real Secret 35 •
Programmers are not using dynamic languages as a replacement for C++ or Java. • They're using these languages in addition to static languages • They're writing programs that utilize the strengths of both (e.g., C++ for speed, dynamic languages for ﬂexibility).

Copyright (C) 2008, http://www.dabeaz.com 1- Part 2 36 The history
of dynamic languages (Where did these languages come from?)

Copyright (C) 2008, http://www.dabeaz.com 1- Prehistory 37 • In the
early days, computers were big, very expensive, and quite limited in power (your cell-phone has far more compute power). • Early programs written directly on the hardware (hard-wired, machine language). • Later, assembly language. • Fed to systems on punch-cards

Copyright (C) 2008, http://www.dabeaz.com 1- Programming Languages 38 • The
ﬁrst "high-level" programming languages • Fortran (1954) • Lisp (1958) • ALGOL (1958) • COBOL (1959) • Each of these efforts came out of different communities (Fortran - Engineering/Science, Lisp - Mathematics, COBOL - Business)

Copyright (C) 2008, http://www.dabeaz.com 1- Programming Languages 39 • In
the early days, no-one really knew exactly what they were doing • "Computer Science" didn't even emerge as a separate discipline until the 1960s • Aspects of "programming languages" were still being worked out.

Copyright (C) 2008, http://www.dabeaz.com 1- Fortran 40 • A language
primarily meant to replace hand- coding of assembly language • Highly focused on raw performance for science/engineering work • Initially developed by IBM around 1954 • Still used today for that same purpose (Fortran 2008 standard underway).

Copyright (C) 2008, http://www.dabeaz.com 1- Example (Fortran) C CALCULATE DAVE'S
MORTGAGE PROGRAM MORTGAGE PRINCIPLE = 500000.0 PAYMENT = 499.0 RATE = 0.04 NMONTHS = 0 TOTALPAID = 0 10 PRINCIPLE = PRINCIPLE*(1+RATE/12.0)-PAYMENT TOTALPAID = TOTALPAID + PAYMENT NMONTHS = NMONTHS + 1 IF (NMONTHS .EQ. 24) THEN PAYMENT = 3999.0 RATE = 0.09 END IF (PRINCIPLE .GT. 0) GO TO 10 WRITE(*,*) 'TOTAL PAID', TOTALPAID WRITE(*,*) 'MONTHS', NMONTHS STOP END 41

Copyright (C) 2008, http://www.dabeaz.com 1- Fortran 42 • A bizarre
language in many ways • Example: Implicit Typing NMONTHS = 0 TOTALPAID = 0 An integer A real • Type determined by ﬁrst character of the name (I-N are ints, all others are reals) • And this only scratches the surface. • Yet, compare the "look" of early Fortran to modern scripting languages

Copyright (C) 2008, http://www.dabeaz.com 1- Side by Side (Visual) C
CALCULATE DAVE'S MORTGAGE PROGRAM MORTGAGE PRINCIPLE = 500000.0 PAYMENT = 499.0 RATE = 0.04 NMONTHS = 0 TOTALPAID = 0 10 PRINCIPLE = PRINCIPLE*(1+RATE TOTALPAID = TOTALPAID + PAYME NMONTHS = NMONTHS + 1 IF (NMONTHS .EQ. 24) THEN PAYMENT = 3999.0 RATE = 0.09 END IF (PRINCIPLE .GT. 0) GO TO 10 WRITE(*,*) 'TOTAL PAID', TOTALPA WRITE(*,*) 'MONTHS', NMONTHS STOP END 43 # mortgage.py principle = 500000 payment = 499 rate = 0.04 month = 0 total_paid = 0 while principle > 0: principle = principle*(1+rate total_paid += payment month += 1 if month == 24: payment = 3999 rate = 0.09 print "Total paid %0.2f" % tota print "Months %d" % month Fortran Python

Copyright (C) 2008, http://www.dabeaz.com 1- ALGOL 44 • Development by
a committee of scientists around 1958 (ETH-Zurich) • Initial motivation was to address perceived problems with FORTRAN (of which there were many) • Was hugely inﬂuential in subsequent computer science research on programming languages, type-systems, compilers, etc.

Copyright (C) 2008, http://www.dabeaz.com 1- Example (ALGOL-60) begin comment Calculate
Dave's mortage; real principle,rate,payment,totalpaid; integer month; principle := 500000; rate := 0.04; payment := 499; totalpaid := 0; month := 0; next: principle := principle*(1+rate/12) - payment; totalpaid := totalpaid + payment; month := month + 1; if month = 24 then begin rate := 0.09; payment := 3999 end; if principle > 0 then go to next; outstring(1,"Total paid "); outreal(1,totalpaid); outstring(1,"\nMonths "); outinteger(1,month); outstring(1,"\n") end 45

Copyright (C) 2008, http://www.dabeaz.com 1- Example (ALGOL-60) begin comment Calculate
Dave's mortage; real principle,rate,payment,totalpaid; integer month; principle := 500000; rate := 0.04; payment := 499; totalpaid := 0; month := 0; next: principle := principle*(1+rate/12) - payment; totalpaid := totalpaid + payment; month := month + 1; if month = 24 then begin rate := 0.09; payment := 3999 end; if principle > 0 then go to next; outstring(1,"Total paid "); outreal(1,totalpaid); outstring(1,"\nMonths "); outinteger(1,month); outstring(1,"\n") end 46 Type declarations

Copyright (C) 2008, http://www.dabeaz.com 1- ALGOL 47 • Virtually all
modern programming languages utilize concepts that were worked out in various versions of ALGOL. • Very strong focus on the design of compilers • However, ALGOL itself never really caught on commercially (legacy ALGOL?) • The language never offered any standard I/O facilities (different on every machine)

Copyright (C) 2008, http://www.dabeaz.com 1- COBOL 48 • A language
used widely in business/ﬁnance • Developed in 1959 by committee (Burroughs, IBM, Honeywell, RCA, Sperry Rand, Sylvania, USAF, NIST, etc.) • Still lives today and has all of the features that you would expect from such a committee effort.

Copyright (C) 2008, http://www.dabeaz.com 1- Example (COBOL) IDENTIFICATION DIVISION. PROGRAM-ID.
MORTGAGE. DATA DIVISION. WORKING-STORAGE SECTION. 01 PRINCIPLE PIC S9(7)V99 VALUE 500000.00 . 01 PAYMENT PIC 9(7)V99 VALUE 499.00 . 01 RATE PIC 9V99 VALUE 0.04 . 01 MONTH PIC 999 VALUE 0 . 01 TOTALPAID PIC 9(7) VALUE 0.00 . PROCEDURE DIVISION . MAIN. PERFORM WITH TEST BEFORE UNTIL PRINCIPLE < 0.00 COMPUTE PRINCIPLE = PRINCIPLE*(1+RATE/12)-PAYMENT ADD PAYMENT TO TOTALPAID ADD 1 TO MONTH IF MONTH = 24 THEN SET PAYMENT TO 3999.00 SET RATE TO 0.09 END-IF END-PERFORM DISPLAY "TOTAL PAID", TOTALPAID DISPLAY "MONTHS", MONTH. STOP RUN. 49

Copyright (C) 2008, http://www.dabeaz.com 1- COBOL 50 "The use of
COBOL cripples the mind; its teaching should, therefore, be regarded as a criminal offense." - Edsger Dijkstra • So, let's move on...

Copyright (C) 2008, http://www.dabeaz.com 1- Lisp 51 • Conceived as
a language for writing computer programs based on the lambda calculus (Alonzo Church) • Invented by John McCarthy at MIT (1958) • Name derives from "List Processisng Language" • Many modern variations in use (Common Lisp, Scheme, etc.)

Copyright (C) 2008, http://www.dabeaz.com 1- Example (Scheme) (define (daves-mortgage principle
payment rate total month) (if (> principle 0.0) (mortgage (- (* principle (+ 1 (/ rate 12))) payment) (if (= month 24) 3999.0 payment) (if (= month 24) 0.09 rate) (+ total payment) (+ 1 month)) (cons total (- month 1) ) ) ) (display (daves-mortgage 500000.0 499.0 0.04 0.0 1)) (newline) 52 (and this solution is pretty clunky---Lisp programmers would be more clever about it)

Copyright (C) 2008, http://www.dabeaz.com 1- Lisp 53 • Lisp is
truly unlike any of the other early programming languages • The entire language is basically based on the "list." Lisp programs themselves are lists (thus programs can process their own code as data). • Programs written as functions that apply various operations to lists (functional programming). Especially strong reliance on recursive functions, mathematical thinking.

Copyright (C) 2008, http://www.dabeaz.com 1- Lisp 54 • Lisp is
the ﬁrst major dynamic language • Almost all major concepts of dynamic programming languages were ﬁrst invented with Lisp. • Modern languages like Python and Ruby borrow heavily from Lisp, but even today, have not replicated all of its features. • All "real programmers" eventually reinvent some part of Lisp without knowing it.

Copyright (C) 2008, http://www.dabeaz.com 1- Lisp Criticisms 55 • In
early computing, machines were quite limited and Lisp was much slower than the compiled languages (dynamic) "A LISP programmer knows the value of everything, but the cost of nothing" - Alan Perlis • Lisp was not widely adopted by those who were obsessed with high-performance (science/engineering/business).

Copyright (C) 2008, http://www.dabeaz.com 1- Lisp Criticisms 56 • Understanding
Lisp requires a certain degree of mathematical sophistication. Functions, composition of functions, recursion, etc. • Let's be honest---a huge majority of the world's programmers are not mathematicians. You don't need a math degree to write accounting software (or make a web page). • Programs in other languages are more like "recipes" of steps (imperative). Conceptually, this is easier for most people to grasp.

Copyright (C) 2008, http://www.dabeaz.com 1- Interlude 57 • The programming
languages of choice for "serious" applications have almost all been compiled programming languages that derive from Fortran/Algol. • Some major languages from the 70s/80s • Fortran (updates, F66, F77). • Pascal (1970) • C (1972)

Copyright (C) 2008, http://www.dabeaz.com 1- Pascal 58 • Developed in
1970 by Niklaus Wirth • Derives from ALGOL, but strongly focused on structured programming/data structures • Initially developed as a teaching language for structured programming. • Personal note : I remember learning Pascal in high school (1985). Most programmers in 70's/80's would have encountered it.

Copyright (C) 2008, http://www.dabeaz.com 1- Example (Pascal) program Mortgage(output); var
principle : Real = 500000; rate : Real = 0.04; payment : Real = 499; month : Integer = 0; totalpaid : Real = 0; begin while principle > 0 do begin principle := principle*(1+rate/12) - payment; totalpaid := totalpaid + payment; month := month + 1; if month = 24 then begin rate := 0.09; payment := 3999 end end; writeln('Total Paid ', totalpaid); writeln('Months ', month) end. 59

Copyright (C) 2008, http://www.dabeaz.com 1- Using Pascal 60 • Pascal
was very picky about "correctness" • Very strong type system, very strict in what it allowed and did not allow (pitched as a good teaching language) • Used as an alternative to BASIC on early PCs (e.g., Turbo Pascal, UCSD Pascal, etc.). • Early Macintosh systems made heavy use of Pascal (parts of the OS, major applications)

Copyright (C) 2008, http://www.dabeaz.com 1- C 61 • Developed in
1972 at AT&T Bell Labs by Dennis Ritchie in order to implement Unix • Created as a systems implementation language (a better assembly language) • Although the language borrows some ideas from ALGOL, the language was always meant to be minimal and low-level.

Copyright (C) 2008, http://www.dabeaz.com 1- Some Background 62 • Despite
developments in programming languages, there are situations where programs need to directly manipulate the computer hardware • Operating systems, device drivers, etc. • Before C, this code would typically be written in assembly language

Copyright (C) 2008, http://www.dabeaz.com 1- Example (Assembly) .principle: .double 5000000
.payment: .double 499 .rate: .double 0.04 main: leal 4(%esp), %ecx andl $-16, %esp pushl -4(%ecx) pushl %ebp movl %esp, %ebp push %ecx, subl $68, %esp fldl .principle fstpl -48(%ebp) fldl .payment fstp1 -40(%ebp) fldl .rate fstpl -32(%ebp) movl $0, -12(%ebp) fldz ... etc ... 63

Copyright (C) 2008, http://www.dabeaz.com 1- The C Language 64 •
C was not created to be a better ALGOL. It was a replacement for assembly coding. • Although it was compiled, "safety" was never really a primary concern • C allowed direct access to hardware/ memory (the complete opposite of Pascal) • Witness the consequences : Buffer overﬂow attacks (malware).

Copyright (C) 2008, http://www.dabeaz.com 1- C Adoption 65 • C/C++
is currently the de-facto standard for developing systems software • There are many reasons why this happened • Not just related to the technical merits (or lack of) of C as a programming language

Copyright (C) 2008, http://www.dabeaz.com 1- More Background 66 • Development
of minicomputers/ microcomputers in the 1970s • These systems were extremely minimal/ resource starved (compare a Commodore-64 with an IBM mainframe). • If you wanted anything to run fast and ﬁt in memory, you had to write it in assembly. • A lot of early PC software was assembly

Copyright (C) 2008, http://www.dabeaz.com 1- Growth of C 67 •
Use of C really exploded with minis/PCs • C was minimal, portable, and didn't enforce any morality rules on programming (you could do anything you wanted). • You could easily write programs that ran almost as fast as hand-written assembler (maybe even faster with optimization). • Growth completely driven by practical applications (and economics)

Copyright (C) 2008, http://www.dabeaz.com 1- C vs. Pascal 68 •
I don't know when C clobbered Pascal, but I'm guessing in the late 80s (I don't remember many people talking about Pascal after about 1990). • Humor : "How to shoot yourself in the foot" C : "You shoot yourself in the foot." Pascal : "The compiler won't let you shoot yourself in the foot."

Copyright (C) 2008, http://www.dabeaz.com 1- Problems with C 69 •
Despite C becoming dominant, it was extremely crippled in certain ways • Not just C, but almost all traditional programming languages

Copyright (C) 2008, http://www.dabeaz.com 1- Interactive Programs 70 • Computers
had become a lot more interactive • Early 1970's : Interactive video terminals replaced punch cards. Enabled programs to interact with the user in new ways. (shells) • Early 1980's : Graphical User Interfaces. A lot of previous research (e.g., Xerox), but the Apple Lisa/Mac was really ﬁrst GUI-centric system.

Copyright (C) 2008, http://www.dabeaz.com 1- Compute Power 71 • Increased
computing power • Rapid growth of CPU power, memory capacity, and disk storage • Enabled new kinds of programs with vastly more complexity than anything before. • GUIs took it to a whole new level

Copyright (C) 2008, http://www.dabeaz.com 1- Software Complexity 72 • Structured
programming was not enough from the standpoint of software engineering. • How to manage large-scale programming projects and complexity? • Rise of object-oriented programming, software components, etc. (1980s). • Example : Development of C++ as a "better C"

Copyright (C) 2008, http://www.dabeaz.com 1- Software Components 73 • Much
greater reliance on programming libraries, pre-built software components • Example: GUI "widgets" • Writing software was becoming less about creating everything from scratch and more about gluing together components that already existed.

Copyright (C) 2008, http://www.dabeaz.com 1- A Quote 74 "It seems
clear that languages somewhat different from those in existence today would enhance the preparation of structured programs. We will perhaps eventually be writing only small modules which are identiﬁed by name as they are used to build larger ones, so that devices like indentation [...] might become feasible for expressing local structure in the source language." - Donald Knuth (1974)

Copyright (C) 2008, http://www.dabeaz.com 1- BASIC 75 • Virtually all
early home computers came prepackaged with BASIC (usually Microsoft) • If you turned on the system, you were often dropped directly into a BASIC interpreter • Greatly expanded the number of programmers • Notion of "programmability" (maybe BASIC's only redeeming quality other than peek/poke)

Copyright (C) 2008, http://www.dabeaz.com 1- Internet 76 • Internet was
growing (1980's) • Major universities/corporations were already connected (ARPA Net) • Services for home users (Compuserve, BBS,etc). • Growth of free software/open source • Sharing of ideas and source code

Copyright (C) 2008, http://www.dabeaz.com 1- Setting the Stage 77 •
By the mid 1980's, there were a lot of things going on (PCs, GUIs, faster systems, early Internet, objects, etc.). • Programmers using C/Pascal, but there were many perceived limitations. • A cauldron of activity.

Copyright (C) 2008, http://www.dabeaz.com 1- Part 3 78 From C
to Scripting (Or how application programmers reinvented Lisp)

Copyright (C) 2008, http://www.dabeaz.com 1- Programming • Programmers usually write
programs to solve problems. • Problems that are of interest to people who are usually not programmers • Therefore, it is important to ﬁgure out some way to make programs generally usable (i.e., "user friendly"). 79

Copyright (C) 2008, http://www.dabeaz.com 1- A Simple C Program #include
<stdio.h> int main(int argc, char *argv[]) { double principle = 500000; double payment = 499; double rate = 0.04; int month = 0; double total_paid = 0; while (principle > 0) { principle = principle*(1+rate/12) - payment; total_paid += payment; month += 1; if (month == 24) { payment = 3999; rate = 0.09; } } printf("Total paid %0.2f\n", total_paid); printf("Months %d\n", month); } 80

Copyright (C) 2008, http://www.dabeaz.com 1- A Simple C Program #include
<stdio.h> int main(int argc, char *argv[]) { double principle = 500000; double payment = 499; double rate = 0.04; int month = 0; double total_paid = 0; while (principle > 0) { principle = principle*(1+rate/12) - payment; total_paid += payment; month += 1; if (month == 24) { payment = 3999; rate = 0.09; } } printf("Total paid %0.2f\n", total_paid); printf("Months %d\n", month); } 81 Variables are hard-coded with values. Program logic is hard-coded too.

Copyright (C) 2008, http://www.dabeaz.com 1- Problem • The previous program
just isn't that useful • Everything (including the underlying logic) is hard-coded into the program • Despite the possible merits of keeping a programmer employed full-time to make changes, the program isn't reusable • Thus, an important part of software engineering is how to make code more general purpose 82

Copyright (C) 2008, http://www.dabeaz.com 1- Reusability • As a general
rule, software engineers do not like to write programs where everything is just hard-coded. • I'd deduct points if you turned in a big programming assignment and you did this. • An extreme example : A former student (nameless) when asked to create a website that played "tic tac toe" tried to create a separate .html document for every possible game conﬁguration (of which there are many) 83

Copyright (C) 2008, http://www.dabeaz.com 1- User-Deﬁned Parameters #define PRINCIPLE 500000
#define PAYMENT 3999 #define RATE 0.09 #define TEASER_PAYMENT 499 #define TEASER_RATE 0.04 #define TEASER_PERIOD 24 int main(int argc, char *argv[]) { double payment = TEASER_PAYMENT; double rate = TEASER_RATE; ... if (month == TEASER_PERIOD) { payment = PAYMENT; rate = RATE; } ... } 84

Copyright (C) 2008, http://www.dabeaz.com 1- #define PRINCIPLE 500000 #define PAYMENT
3999 #define RATE 0.09 #define TEASER_PAYMENT 499 #define TEASER_RATE 0.04 #define TEASER_PERIOD 24 int main(int argc, char *argv[]) { double payment = TEASER_PAYMENT; double rate = TEASER_RATE; ... if (month == TEASER_PERIOD) { payment = PAYMENT; rate = RATE; } ... } 85 Problem parameters are speciﬁed in one location using symbolic names. Use the symbolic names in the later code User-Deﬁned Parameters

Copyright (C) 2008, http://www.dabeaz.com 1- Commentary • Using deﬁned constants
makes it easier to change the code. If you want to make changes to parameters, you just change in one location. • However, it's still not very user-friendly. To change a parameter, you have to recompile • "Pardon me, I'll tell you how much your mortgage will cost so soon as I ﬁnish recompiling my mortgage software." 86

Copyright (C) 2008, http://www.dabeaz.com 1- Reading User Input int main(int
argc, char *argv[]) { double principle, payment, rate; double teaser_payment, teaser_rate; int teaser_period; scanf("%lf",&principle); scanf("%lf",&payment); scanf("%lf",&rate); scanf("%lf",&teaser_payment); scanf("%lf",&teaser_rate); scanf("%d",&teaser_period); ... } 87

Copyright (C) 2008, http://www.dabeaz.com 1- Reading User Input int main(int
argc, char *argv[]) { double principle, payment, rate; double teaser_payment, teaser_rate; int teaser_period; scanf("%lf",&principle); scanf("%lf",&payment); scanf("%lf",&rate); scanf("%lf",&teaser_payment); scanf("%lf",&teaser_rate); scanf("%d",&teaser_period); ... } 88 Parameters are read from the user when program runs. shell % mortgage.exe 500000.0 3999.0 0.09 499.0 0.04 24 Total paid 2623323.00 Months 677

Copyright (C) 2008, http://www.dabeaz.com 1- Reading User Input • Reading
parameters from user works, but does not scale well to more complicated problems. • Example : A problem where you had to specify several hundred parameters • Also messy if the program is "branchy." • End-users probably ﬁnd this to be clunky 89

Copyright (C) 2008, http://www.dabeaz.com 1- A "Branchy" Interface shell %
loan.exe Loan type (1=Mortgage, 2=Auto, 3=Commercial) : 1 Mortgage type (1=Conventional, 2=Evil) : 2 Principle : 500000 Payment : 3999 Rate : 0.09 Teaser Payment : 499 Teaser Rate : 0.04 Teaser Period : 24 Be a sneaky bugger? (Y=Yes, N=No) : Y ... 90 • You might laugh, but a huge amount of "mission-critical" software is often not much more sophisticated than this. • Many GUIs not much different (dialogs)

Copyright (C) 2008, http://www.dabeaz.com 1- Conﬁguration Languages • As applications
grow, the process of reading input may evolve into a simple "conﬁguration" language. 91 # Dave's mortgage LOAN_TYPE = MORTGAGE MORTGAGE_TYPE = EVIL BE_SNEAKY = YES PRINCIPLE = 500000 PAYMENT = 3999 RATE = 0.09 TEASER_RATE = 0.04 TEASER_PAYMENT = 499 • Notion of "programmability"

Copyright (C) 2008, http://www.dabeaz.com 1- Conﬁguration Languages • Conﬁguration languages
have a tendency to grow new features. • Example : Variable expansion and expressions 92 # Dave's mortgage LOAN_TYPE = MORTGAGE MORTGAGE_TYPE = EVIL BE_SNEAKY = YES PRINCIPLE = 500000 PAYMENT = 3999 RATE = 0.09 TEASER_RATE = $RATE/2 TEASER_PAYMENT = $PAYMENT/10

Copyright (C) 2008, http://www.dabeaz.com 1- Conﬁguration Languages • Example :
Conditional Evaluation 93 # Dave's mortgage LOAN_TYPE = MORTGAGE MORTGAGE_TYPE = EVIL BE_SNEAKY = YES PRINCIPLE = 500000 PAYMENT = 3999 RATE = 0.09 if $PRINCIPLE > 350000 TEASER_RATE = $RATE/2 TEASER_PAYMENT = $PAYMENT/10 else TEASER_RATE = $RATE/1.5 TEASER_PAYMENT = $PAYMENT/5 endif

Copyright (C) 2008, http://www.dabeaz.com 1- Scripting • So, left to
their own devices, programmers have had a tendency to create their own weird application-specific command/config languages. • You see this in large apps (e.g., VBA in Microsoft Office) • Typically done without thinking much about programming languages, theory, or previous work however. 94

Copyright (C) 2008, http://www.dabeaz.com 1- Configuration Languages • If left
unchecked, configuration languages may grow into some sort of ad-hoc domain-specific (scripting) language 95 “Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified bug-ridden slow implementation of half of Common Lisp.” - Philip Greenspun

Copyright (C) 2008, http://www.dabeaz.com 1- Part 4 96 From Scripting
to Dynamic Languages (Or how script programmers started fooling around with programming languages)

Copyright (C) 2008, http://www.dabeaz.com 1- Rewind • The concept of
a "scripting language" has been around for a very long time • JCL - IBM System/360 (1964) • sh - Unix Shell (1971) • Rexx - IBM (1979) • Basically, these languages are oriented around controlling applications and the operating system. 97

Copyright (C) 2008, http://www.dabeaz.com 1- Command Shells • All application
programmerss know something about the operating system shell • They use it to run their applications, run the compiler, etc. • The command shell is often cited as an inﬂuence for the domain-speciﬁc languages that get created 98

Copyright (C) 2008, http://www.dabeaz.com 1- Application Control • Scripting/Shell languages
focus on running other applications • Most basic operation is running a program • Supplying arguments to a program 99 shell % someprog.exe foo bar 42 blah -x int main(int argc, char *argv[]) { ... } # args

Copyright (C) 2008, http://www.dabeaz.com 1- I/O Routing • Shells provide
mechanisms for I/O 100 shell % someprog.exe > out.txt shell % someprog.exe < in.txt shell % someprog.exe | otherprog.exe • So, you can hook programs up to ﬁles and hook programs up to other programs

Copyright (C) 2008, http://www.dabeaz.com 1- Environment Variables • Shells have
global environment variables 101 shell % printenv PWD=/Users/beazley HOME=/Users/beazley LOGNAME=beazley HOSTTYPE=intel-pc VENDOR=apple OSTYPE=darwin MACHTYPE=i386 GROUP=staff shell % echo $LOGNAME beazley shell % • These variables are passed to applications • Values are just simple strings

Copyright (C) 2008, http://www.dabeaz.com 1- Local Variables • Shells also
have "local" variables 102 shell % x="Hello" shell % y="World" shell % echo "$x $y" Hello World  shell % • These are not passed to applications, but you can export them to the global environment shell % export x

Copyright (C) 2008, http://www.dabeaz.com 1- Variable Interpolation • Shells tend
to work heavily with text • Shell interpreter performs variable substitutions prior to executing any command (known as interpolation). • Usually a special syntax is used ($var) 103 shell % cmd=ls shell % opts=-l shell % $cmd $opts /somedir -rw-r--r-- beazley staff 408 Apr 30 2007 foo -rw-r--r-- beazley staff 658 Apr 30 2007 bar -rw-r--r-- beazley staff 332 Apr 30 2007 spam ...

Copyright (C) 2008, http://www.dabeaz.com 1- Control-Flow • Shells have some
basic control-ﬂow features 104 if test $x -gt 0; then echo "$x is greater than 0" else echo "$x is not greater than 0" fi • Loops x="foo bar spam" for i in $x do echo $i done

Copyright (C) 2008, http://www.dabeaz.com 1- Procedures • You can also
deﬁne procedures in the shell 105 add() { echo `expr $1 + $2` } shell % add 3 4 7 shell % • However, all of this starts to get weird pretty fast (e.g., no local variables)

Copyright (C) 2008, http://www.dabeaz.com 1- Problems with Scripting • Small
shell scripts tend to grow into large shell scripts (usually unintelligible) • If you want to write an "application", get convoluted mix of tools hooked together in bizarre ways • Very limited support for data processing and data structures (strings and lists of strings) • Slow 106

Copyright (C) 2008, http://www.dabeaz.com 1- A Shell Program #!/bin/sh #
Dave's mortgage principle=500000 payment=499 rate=0.04 month=0 total_paid=0 while test ècho "$principle > 0" | bc -l` = 1; do principle=ècho "$principle*(1+$rate/12)-$payment" | bc -l` total_paid=ècho "$total_paid+$payment" | bc -l` month=èxpr $month + 1` if test $month -eq 24; then payment=3999 rate=0.09 fi done echo "Total paid $total_paid" echo "Months $month" 107

Copyright (C) 2008, http://www.dabeaz.com 1- Beyond Scripting • Programmers like
the interactivity of shells • But they want to more than just launch other programs and manipulate strings • So, there has always been an interest in expanding shells with features from "real" programming languages • More ﬂexible data structures, proper procedures, control ﬂow, variables, etc. 108

Copyright (C) 2008, http://www.dabeaz.com 1- Part 5 109 Modern "Popular"
Dynamic Languages

Copyright (C) 2008, http://www.dabeaz.com 1- New Languages 110 • Perl
(1987 - Larry Wall) • Tcl (1988 - John Ousterhout) • Python (1990 - Guido van Rossum) • Ruby (1993 - Yukihiro “Matz” Matsumoto) • PHP (1994 - Rasmus Lerdorf ) • Javascript (1995 - Brendan Eich/Netscape) • And many others...

Copyright (C) 2008, http://www.dabeaz.com 1- Commentary 111 • Over a
brief 5-10 year period, there was a sudden ﬂurry of development in which a variety of new programming languages were created • All of these languages were created by single individuals, often without any “ofﬁcial” funding. • Not associated with academic CS. • Question : Why?

Copyright (C) 2008, http://www.dabeaz.com 1- In their own words 112
"The Tcl scripting language grew out of my work on design tools for integrated circuits [...] Each tool needed to have a command language. However, our primary interest was in the tools, not their command languages. Thus, we didn’t invest much effort in command languages and the languages ended up being weak and quirky. Furthermore, the language for one tool couldn’t be carried over to the next so each tool ended up with a different bad command language. After a while, this became rather embarrassing." - John Ousterhout

"Like the typical human, Perl was conceived in secret, and existed for roughly nine months before anyone in the world ever saw it. Its womb was a secret project for the National Security Agency known as the ‘Blacker’ project, which has long since closed down. The goal of that sexy project was not to produce Perl. However, Perl may well have been the most useful thing to come from Blacker. Sex can fool you that way." - Larry Wall (Note: Perl was created to process logs and generate reports)

"My original motivation for creating Python was the perceived need for a higher level language in the Amoeba [Operating Systems] project. I realized that the development of system administration utilities in C was taking too long. Moreover, doing these things in the Bourne shell wouldn't work for a variety of reasons. ... So, there was a need for a language that would bridge the gap between C and the shell." - Guido van Rossum

Copyright (C) 2008, http://www.dabeaz.com 1- In the words of others
115 "PHP, known originally as Personal Home Pages, was first conceived in the autumn of 1994 by Rasmus Lerdorf. He wrote it as a way to track visitors to his online CV. The first version was released in early 1995, by which time Rasmus had found that by making the project open-source, people would fix his bugs.” - From “A History of PHP” (In Q1, '07, there were 78 PHP books on the market).

Copyright (C) 2008, http://www.dabeaz.com 1- Commentary • These programming languages
have largely been developed outside of “academia.” • All are based in practical applications • None was meant to be a “theoretical” experiment in programming languages. • Much to the dismay of academic researchers in programming languages 116 (“I don’t get no respect” - Rodney Rangerﬁeld)

Copyright (C) 2008, http://www.dabeaz.com 1- Food for Thought... 117 “Over
the Christmas holiday in December 1989, hacker Guido van Rossum of the Netherlands was bored, so he created a descendent of the ABC scripting language for the Unix platform, dubbing it Python, from the British comedy troop Monty Python's Flying Circus." - Timothy Morgan "Python has been an important part of Google since the beginning, and remains so as the system grows and evolves." - Peter Norvig, Google.

Copyright (C) 2008, http://www.dabeaz.com 1- Tcl • Tool Command Language
("Tickle") • Released as open-source in late 80s • One of the most inﬂuential early scripting languages • The big idea : A simple standardized programming language that could be easily added to other applications (like a library). • So you didn't have to write your own 118

Copyright (C) 2008, http://www.dabeaz.com 1- Traditional C Programs • A
traditional C program is launched from some kind of shell/command prompt • Command line arguments are passed as strings in an array (argv) 119 shell % someprog.exe foo bar 42 blah -x int main(int argc, char *argv[]) { ... } # args • There is just one entry point to the program (main) which ﬁgures out what to do

Copyright (C) 2008, http://www.dabeaz.com 1- Tcl Big Picture • A
simple interpreter that can call a collection of C functions using the same idea 120 Tcl Interpreter C Code User • User interacts with interpreter, issuing commands that call into C.

Copyright (C) 2008, http://www.dabeaz.com 1- Sample Tcl Command • For
each "command" you write C code 121 int square(void *clientData, Tcl_Interp *interp, int argc, char *argv[]) { double x; if (argc != 2) { return TCL_ERROR; } x = atof(argv[1]); /* Convert argument */ y = x*x; /* Compute something */ Tcl_SetDouble(interp,y);/* Return result */ return TCL_OK; } • Each command looks like a little C main() function

Copyright (C) 2008, http://www.dabeaz.com 1- Tclsh Shell • Tcl then
provided a "shell" where commands could be invoked (tclsh) 122 % square 4 16 % square 5 25 % Command Name Arguments int square(void *clientData, Tcl_Interp *interp, int argc, char *argv[]) { ... } Launches

Copyright (C) 2008, http://www.dabeaz.com 1- Using Tcl • Tcl basically
lifted some ideas from the Unix shell and put them inside C programs • If you had a big C program with several hundred functions, you would take each function and turn it into a Tcl command • High-level control ﬂow of the application driven by a Tcl script (instead of being hard- coded in C). 123

Copyright (C) 2008, http://www.dabeaz.com 1- The Tcl Language • Tcl
provides a whole language from which you can launch your commands • Variables • Conditionals • Loops • Procedures • Expressions 124

Copyright (C) 2008, http://www.dabeaz.com 1- Sample Tcl Program 125 #
Dave's mortgage set principle 500000 set payment 499 set rate 0.04 set month 0 set total_paid 0 while {$principle > 0} { set principle [expr {$principle*(1+$rate/12)-$payment}] incr total_paid $payment incr month if {$month == 24} { set payment 3999 set rate 0.09 } } puts "Total paid $total_paid" puts "Months $month"

Copyright (C) 2008, http://www.dabeaz.com 1- Sample Tcl Program 126 •
The language itself is kind of clunky • Everything is a string. • An expansion of shell programming • But, it is a full-featured programming language

Copyright (C) 2008, http://www.dabeaz.com 1- Tcl/Tk Release • Shortly after
Tcl, an optional add-on called "Tk" was released • Tk was a Tcl-based interface to a graphical user interface widget set and toolkit • At the time, it revolutionized GUI programming on UNIX systems. • Could build entire GUI using high level scripts 127

Copyright (C) 2008, http://www.dabeaz.com 1- Tk Example 128 • A
simple button proc pressed {} { puts "You pressed it!" } button .b -text "Press the button" -command pressed pack .b • This replaced several hundred lines of rather gnarly looking C code

Copyright (C) 2008, http://www.dabeaz.com 1- Usage of Tcl/Tk • As
originally envisioned, Tcl was meant to be a small language you added to huge C programs (most code written in C) • Didn't quite turn out that way. • Programmers wrote huge applications entirely in Tcl (> 100K lines) • Tcl/Tk used in a large number of mission critical applications (control systems, etc.) 129

Copyright (C) 2008, http://www.dabeaz.com 1- The Tcl/Tk Experience • Today,
Tcl/Tk is out of fashion, but it was very inﬂuential. • Showed that there was great utility in using a dynamic language to control code written in a static language (mixed languages) • Later became one of the ﬁrst cross- platform GUI development languages • A lot of ground-breaking software engineering related to scripting 130

Copyright (C) 2008, http://www.dabeaz.com 1- Tcl/Tk today • Most modern
dynamic languages have an optional interface to Tcl/Tk for GUI programming. • Python (Tkinter), Perl/Tk, Ruby/Tk, Scheme/ Tk, etc (Tcl is often hidden behind scenes) • Many other languages copied much of the SW-engineering practices of Tcl. 131

Copyright (C) 2008, http://www.dabeaz.com 1- Lessons Learned • Application programmers
learned that a dynamic language made them far more productive • For example, creating a simple GUI in Tcl was something you could do in an afternoon • Mixed-language development. C for systems/ high performance, Tcl for control. 132

Copyright (C) 2008, http://www.dabeaz.com 1- Perl • Released as open-source
in 1987 • The big idea : Take concepts from various facets of shell programming, but create a general purpose programming language • Fix annoying issues with shell programs • Incorporate features from text processing tools (especially sed and awk) 133

Copyright (C) 2008, http://www.dabeaz.com 1- Perl Inﬂuences • Syntax :
Roughly taken from C • Expands shell scripting with some new data structures (lists and associative arrays) • Adds support for regular-expression pattern matching (from sed) • Major goal : Scripting related to data processing, text processing, report generation. 134

Copyright (C) 2008, http://www.dabeaz.com 1- Sample Perl 135 # Dave's
mortgage $principle = 500000; $payment = 499; $rate = 0.04; $month = 0; $total_paid = 0; while ($principle > 0) { $principle = $principle*(1+$rate/12)-$payment; $total_paid += $payment; $month++; if ($month == 24) { $payment = 3999; $rate = 0.09; } } print "Total paid $total_paid\n"; print "Months $month\n";

Copyright (C) 2008, http://www.dabeaz.com 1- The Perl Experience • Perl
greatly simpliﬁed tasks that were previously done using rather complicated shell scripts (and faster) • Showed that a lot of network, system admin, and web-development code could be written entirely in a "script language" • Completely dominated early web- development (CGI scripting) 136

Copyright (C) 2008, http://www.dabeaz.com 1- The Perl Experience • Perl
user community was very effective at organizing third-party modules, add-ons, and extensions • Huge contributed library (CPAN) • Very inﬂuential on other open-source language projects (Ruby, Python, etc.) 137

Copyright (C) 2008, http://www.dabeaz.com 1- The Other Languages • Other
languages have evolved from earlier experiences with Tcl, Perl, C etc. • Because everything has been done in the open, there is a lot of cross-pollination • For example, Python has copied Perl's regular expression features. Perl copied Python's object system (in a manner) 138

Copyright (C) 2008, http://www.dabeaz.com 1- Big Picture • Extension languages.
When building a large C application, it is useful to have an extension/ control language (e.g., Tcl). • Scripting languages. For scripting, it is useful to have a real programming language with useful data structures and high-level features (e.g., Perl). 139

Copyright (C) 2008, http://www.dabeaz.com 1- Part 6 140 Introduction to
Python

Copyright (C) 2008, http://www.dabeaz.com 1- Python Introduction • In this
class, we will be using a wide variety of programming languages • This section serves as an introduction to one of the languages we'll be using more often. • More reference material is available online and in books 141

Copyright (C) 2008, http://www.dabeaz.com 1- What is Python? • An
interpreted, dynamically typed programming language. • In other words: A language that's similar to Perl, Ruby, Tcl, and other so-called "scripting languages." • Created by Guido van Rossum around 1990. • Named in honor of Monty Python 142

Copyright (C) 2008, http://www.dabeaz.com 1- Getting Started • In this
section, we will cover the absolute basics of Python programming • How to start Python • Interactive mode • Creating and running simple programs • Basic calculations and ﬁle I/O. 143

Copyright (C) 2008, http://www.dabeaz.com 1- Python Interpreter • Python is
an interpreter • If you give it a ﬁlename, it interprets the statements in that ﬁle • Otherwise, you get an "interactive" mode where you can experiment • No edit/compile/run/debug cycle 144

Copyright (C) 2008, http://www.dabeaz.com 1- Running Python (Unix) • Command
line shell % python Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin Type "help", "copyright", "credits" or "license" >>> • Integrated Development Environment (IDLE) shell % idle or 145

Copyright (C) 2008, http://www.dabeaz.com 1- Running Python (win) • Start
Menu (IDLE or PythonWin) 146

Copyright (C) 2008, http://www.dabeaz.com 1- Interactive Mode • Read-eval loop
>>> print "hello world" hello world >>> 37*42 1554 >>> for i in range(5): ... print i ... 0 1 2 3 4 >>> • Executes simple statements typed in directly • Useful for debugging, exploration 147

Copyright (C) 2008, http://www.dabeaz.com 1- Interactive Mode (IDLE) • Interactive
shell plus help features syntax highlights usage information 148

Copyright (C) 2008, http://www.dabeaz.com 1- Getting Help • Online help
is often available • help() command (interactive mode) • Documentation at http://www.python.org 149

Copyright (C) 2008, http://www.dabeaz.com 1- Creating Programs • Programs are
put in .py files # helloworld.py print "hello world" • Source files are simple text files • Create with your favorite editor (e.g., emacs) • Note: May be special editing modes • Can also edit programs with IDLE or other Python IDE (too many to list) 150

Copyright (C) 2008, http://www.dabeaz.com 1- Creating Programs • Creating a
new program in IDLE 151

Copyright (C) 2008, http://www.dabeaz.com 1- Creating Programs • Editing a
new program in IDLE 152

Copyright (C) 2008, http://www.dabeaz.com 1- Creating Programs • Saving a
new Program in IDLE 153

Copyright (C) 2008, http://www.dabeaz.com 1- Running Programs (IDLE) • Select
"Run Module" (F5) • Will see output in IDLE shell window 154

Copyright (C) 2008, http://www.dabeaz.com 1- Running Programs • In production
environments, Python may be run from command line or a script • Command line (Unix) shell % python helloworld.py hello world shell % • Command shell (Windows) C:\Somewhere>c:\python25\python helloworld.py hello world C:\Somewhere> 155

Copyright (C) 2008, http://www.dabeaz.com 1- A Sample Program • The
Sears Tower Problem You are given a standard sheet of paper which you fold in half. You then fold that in half and keep folding. How many folds do you have to make for the thickness of the folded paper to be taller than the Sears Tower? A sheet of paper is 0.1mm thick and the Sears Tower is 442 meters tall. 156

Copyright (C) 2008, http://www.dabeaz.com 1- A Sample Program # sears.py
# How many times do you have to fold a piece of paper # for it to be taller than the Sears Tower? height = 442 # Meters thickness = 0.1*(0.001) # Meters (0.1 millimeter) numfolds = 0 while thickness <= height: thickness = thickness * 2 numfolds = numfolds + 1 print numfolds, thickness print numfolds, "folds required" print "final thickness is", thickness, "meters" 157

Copyright (C) 2008, http://www.dabeaz.com 1- A Sample Program • Output
% python sears.py 1 0.0002 2 0.0004 3 0.0008 4 0.0016 5 0.0032 ... 20 104.8576 21 209.7152 22 419.4304 23 838.8608 23 folds required final thickness is 838.8608 meters 158

Copyright (C) 2008, http://www.dabeaz.com 1- Python 101 : Statements •
A Python program is a sequence of statements • Each statement is terminated by a newline • Statements are executed one after the other until you reach the end of the ﬁle. • When there are no more statements, the program stops 159

Copyright (C) 2008, http://www.dabeaz.com 1- Python 101 : Comments •
Comments are denoted by # # This is a comment height = 442 # Meters 160 • Extend to the end of the line • There are no block comments in Python (e.g., /* ... */).

Copyright (C) 2008, http://www.dabeaz.com 1- Python 101: Variables • A
variable is just a name for some value • Variable names follow same rules as C [A-Za-z_][A-Za-z0-9_]* • You do not declare types (int, ﬂoat, etc.) height = 442 # An integer height = 442.0 # Floating point height = "Really tall" # A string • Differs from C++/Java where variables have a ﬁxed type that must be declared. 161

Copyright (C) 2008, http://www.dabeaz.com 1- Python 101: Keywords • Python
has a basic set of language keywords • These are mostly C-like and have the same meaning in most cases • Variables can not have one of these names 162 and assert break class continue def del elif else except exec finally for from global if import in is lambda not or pass print raise return try while yield

Copyright (C) 2008, http://www.dabeaz.com 1- Python 101: Looping • The
while statement executes a loop • Executes the indented statements underneath while the condition is true 163 while thickness <= height: thickness = thickness * 2 numfolds = numfolds + 1 print numfolds, thickness

Copyright (C) 2008, http://www.dabeaz.com 1- Python 101 : Indentation •
Indentation used to denote blocks of code • Indentation musy be consistent while thickness <= height: thickness = thickness * 2 numfolds = numfolds + 1 print numfolds, thickness while thickness <= height: thickness = thickness * 2 numfolds = numfolds + 1 print numfolds, thickness (ok) (error) • Colon (:) always indicates start of new block while thickness <= height: 164

Copyright (C) 2008, http://www.dabeaz.com 1- Python 101 : Conditionals •
If-else if a < b: print "Computer says no" else: print "Computer says yes" • If-elif-else if a == '+': op = PLUS elif a == '-': op = MINUS elif a == '*': op = TIMES else: op = UNKNOWN 165

Copyright (C) 2008, http://www.dabeaz.com 1- Python 101 : Relations •
Relations return boolean values (True, False) >>> 3 < 4 True >>> 3 > 4 False >>> 166 • Relational operators < > <= >= == !=

Copyright (C) 2008, http://www.dabeaz.com 1- Python 101 : Booleans •
Boolean expressions (and, or, not) if b >= a and b <= c: print "b is between a and c" if not (b < a or b > c): print "b is still between a and c" • Non-zero numbers, non-empty objects also evaluate as True. 167 x = 42 if x: # x is nonzero else: # x is zero

Copyright (C) 2008, http://www.dabeaz.com 1- Python 101 : Printing •
The print statement print x print x,y,z print "Your name is", name print x, # Omits newline • Produces a single line of text • Items are separated by spaces • Works with any kind of Python object • Very useful for debugging 168

Copyright (C) 2008, http://www.dabeaz.com 1- Python 101 : pass statement
• Sometimes you will need to specify an empty block of code if name in namelist: # Do something else: pass # Not implemented yet 169 • pass is a "no-op" statement • It does nothing, but serves as a placeholder for statements (possibly to be added later)

Copyright (C) 2008, http://www.dabeaz.com 1- Other Formatting Notes • Can
put single-line bodies on same line for i in range(10): print i • Multiple statements on the same line (;) x = 4; y = 10; z = "hello" • Line continuation (\) if product=="game" and type=="pirate memory" \ and age >= 4 and age <= 8: print "I'll take it!" 170 • Line continuation not needed for (),[],{} if (product=="game" and type=="pirate memory" and age >= 4 and age <= 8): print "I'll take it!"

Copyright (C) 2007, http://www.dabeaz.com 1- Basic Datatypes • Python only
has a few primitive types of data • Numbers • Strings (character text) 171

Copyright (C) 2007, http://www.dabeaz.com 1- Numbers • Python has 5
basic kinds of numbers • Booleans • Integers • Long integers • Floating point • Complex (imaginary numbers) 172

Copyright (C) 2007, http://www.dabeaz.com 1- Booleans (bool) • Two values:
True, False a = True b = False • Evaluated as integers with value 1,0 c = 4 + True # c = 5 d = False if d == 0: print "d is False" • A relatively late addition to Python (v2.3) 173

Copyright (C) 2007, http://www.dabeaz.com 1- Integers (int) • Signed integers
up to machine precision a = 37 b = -299392993 c = 0x7fa8 # Hexadecimal d = 0253 # Octal • Typically 32 bits • Comparable to the C long type 174

Copyright (C) 2007, http://www.dabeaz.com 1- Long Integers (long) • Arbitrary
precision integers a = 37L b = -126477288399477266376467L • Integers that overﬂow promote to longs >>> 3 ** 73 67585198634817523235520443624317923L >>> a = 72883988882883812 >>> a 72883988882883812L >>> • Can almost always be used interchangeably with integers 175

Copyright (C) 2007, http://www.dabeaz.com 1- Integer Operations + Add -
Subtract * Multiply / Divide // Floor divide % Modulo ** Power << Bit shift left >> Bit shift right & Bit-wise AND | Bit-wise OR ^ Bit-wise XOR ~ Bit-wise NOT abs(x) Absolute value pow(x,y[,z]) Power with optional modulo (x**y)%z divmod(x,y) Division with remainder 176

Copyright (C) 2007, http://www.dabeaz.com 1- Integer Division • Classic division
(/) - truncates >>> 5/4 1 >>> • Floor division (//) - truncates (same) >>> 5//4 1 >>> • Future division (/) - Converts to ﬂoat >>> from __future__ import division >>> 5/4 1.25 • Will change in some future Python version • If truncation is intended, use // 177

Copyright (C) 2007, http://www.dabeaz.com 1- Floating point (ﬂoat) • Use
a decimal or exponential notation a = 37.45 b = 4e5 c = -1.345e-10 • Represented as double precision using the native CPU representation (IEEE 754) 17 digits of precision Exponent from -308 to 308 • Same as the C double type 178

Copyright (C) 2007, http://www.dabeaz.com 1- Floating point • Be aware
that ﬂoating point numbers are inexact when representing decimal values. >>> a = 3.4 >>> a 3.3999999999999999 >>> 179 • This is not Python, but the underlying ﬂoating point hardware on the CPU.

Copyright (C) 2007, http://www.dabeaz.com 1- Floating Point Operators + Add
- Subtract * Multiply / Divide % Modulo (remainder) ** Power pow(x,y [,z]) Power modulo (x**y)%z abs(x) Absolute value divmod(x,y) Division with remainder • Additional functions are in the math module import math a = math.sqrt(x) b = math.sin(x) c = math.cos(x) d = math.tan(x) e = math.log(x) 180

Copyright (C) 2007, http://www.dabeaz.com 1- Converting Numbers • Type name
can be used to convert a = int(x) # Convert x to integer b = long(x) # Convert x to long c = float(x) # Convert x to float • Only work if type conversion makes sense >>> a = "Hello World" >>> int(a) ValueError: invalid literal for int() >>> • Also work with strings containing numbers >>> a = "3.14159" >>> float(a) 3.14159 >>> int("0xff",16) # Optional integer base 255 181

Copyright (C) 2008, http://www.dabeaz.com 1- Strings • Speciﬁed using quotes
a = "Yeah but no but yeah but..." b = 'computer says no' c = ''' Look into my eyes, look into my eyes, the eyes, the eyes, the eyes, not around the eyes, don't look around the eyes, look into my eyes, you're under. ''' • Standard escape sequences work (e.g., '\n') • Triple quotes capture all literal text enclosed 182

Copyright (C) 2007, http://www.dabeaz.com 1- String Escape Codes '\n' Line
feed '\r' Carriage return '\t' Tab '\xhh' Hexadecimal value '\”' Literal quote '\\' Backslash • In literals, standard escape codes work • Raw strings (don’t interpret escape codes) a = r"\w+\.\w+" # String exactly as specified 183 Leading r

Copyright (C) 2007, http://www.dabeaz.com 1- String Representation • An ordered
sequence of bytes (characters) 184 • Store 8-bit data (ASCII) • May contain binary data, embedded nulls • Strings are frequently used for both text and for raw-data of any kind

Copyright (C) 2007, http://www.dabeaz.com 1- String Representation • Indexed array
of characters : s[n] a = "Hello world" b = a[4] # b = 'o' c = a[-1] # c = 'd' (Taken from end of string) • Slicing/substrings : s[start:end] d = a[:5] # d = "Hello" e = a[6:] # e = "world" f = a[3:8] # f = "lo wo" g = a[-5:] # g = "world" • Concatenation (+) a = "Hello" + "World" b = "Say " + a 185

Copyright (C) 2007, http://www.dabeaz.com 1- More String Operations • Length
(len) >>> s = "Hello" >>> len(s) 5 >>> • Membership test (in) >>> 'e' in s True >>> 'x' in s False >>> "ello" in s True 186 • Replication (s*n) >>> s = "Hello" >>> s*5 'HelloHelloHelloHelloHello' >>>

Copyright (C) 2007, http://www.dabeaz.com 1- String Methods • Stripping any
leading/trailing whitespace t = s.strip() • Case conversion t = s.lower() t = s.upper() • Replacing text t = s.replace("Hello","Hallo") 187 • Strings have "methods" that perform various operations with the string data.

Copyright (C) 2007, http://www.dabeaz.com 1- More String Methods s.endswith(suffix) #
Check if string ends with suffix s.find(t) # First occurrence of t in s s.index(t) # First occurrence of t in s s.isalpha() # Check if characters are alphabetic s.isdigit() # Check if characters are numeric s.islower() # Check if characters are lower-case s.isupper() # Check if characters are upper-case s.join(slist) # Joins lists using s as delimeter s.lower() # Convert to lower case s.replace(old,new) # Replace text s.rfind(t) # Search for t from end of string s.rindex(t) # Search for t from end of string s.split([delim]) # Split string into list of substrings s.startswith(prefix) # Check if string starts with prefix s.strip() # Strip leading/trailing space s.upper() # Convert to upper case 188

Copyright (C) 2007, http://www.dabeaz.com 1- String Mutability • Strings are
"immutable" • Once created, the value can't be changed >>> s = "Hello World" >>> s[1] = 'a' Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'str' object does not support item assignment >>> 189 • All operations and methods that manipulate string data always create new strings

Copyright (C) 2007, http://www.dabeaz.com 1- String Conversions • To convert
any object to string • Produces the same text as print s = str(obj) • Actually, print uses str() for output >>> x = [1,2,3,4] >>> str(x) '[1, 2, 3, 4]' >>> 190

Copyright (C) 2007, http://www.dabeaz.com 1- String Splitting • Strings are
often split into a list of strings >>> line = 'GOOG 100 490.10' >>> fields = line.split() >>> fields ['GOOG', '100', '490.10'] >>> 191 • Example: When reading data from a ﬁle, you might read each line and then split the line into columns or ﬁelds.

Copyright (C) 2007, http://www.dabeaz.com 1- Lists • A sequence of
arbitrary objects (an array) names = [ "Elwood", "Jake", "Curtis" ] nums = [ 39, 38, 42, 65, 111] • Can contain mixed types items = [ "Elwood", 39, 1.5 ] • Adding new items items.append("that") # Adds at end items.insert(2,"this") # Inserts in middle 192 • Concatenation : s + t s = [1,2,3] t = ['a','b'] s + t [1,2,3,'a','b']

Copyright (C) 2007, http://www.dabeaz.com 1- Lists (cont) • Negative indices
are from the end names[-1] "Curtis" 193 • Lists are indexed by integers (starting at 0) names = [ "Elwood", "Jake", "Curtis" ] names[0] "Elwood" names[1] "Jake" names[2] "Curtis" • Changing one of the items names[1] = "Joliet Jake"

Copyright (C) 2007, http://www.dabeaz.com 1- More List Operations • Length
(len) >>> s = ['Elwood','Jake','Curtis'] >>> len(s) 3 >>> • Membership test (in) >>> 'Elwood' in s True >>> 'Britney' in s False >>> 194 • Replication (s*n) >>> s = [1,2,3] >>> s*3 [1,2,3,1,2,3,1,2,3] >>>

Copyright (C) 2007, http://www.dabeaz.com 1- Lists (Removal) • Removing an
item names.remove("Curtis") del names[2] • Deleting an item by index 195 • Removal results in items moving down to ﬁll the space vacated (i.e., no "holes").

Copyright (C) 2007, http://www.dabeaz.com 1- File Input and Output •
Opening a file f = open("foo.txt","r") # Open for reading g = open("bar.txt","w") # Open for writing • To read a line of text line = f.readline() • To write text to a file g.write(text) • To print to a file print >>g, "Your name is", name 196

Copyright (C) 2007, http://www.dabeaz.com 1- Looping over a file •
Reading a file line by line f = open("foo.txt","r") for line in f: # Process the line ... f.close() • Alternatively for line in open("foo.txt","r"): # Process the line ... • This reads all lines until you reach the end of the file 197

Copyright (C) 2007, http://www.dabeaz.com 1- Simple Functions • Use functions
for code you want to reuse def square(x): return x*x • Calling a function a = square(3) 198 • A function is just a series of statements that return a result or carry out some task

Copyright (C) 2007, http://www.dabeaz.com 1- Library Functions • Python comes
with a large standard library • Library modules accessed using import import math x = math.sqrt(10) import urllib u = urllib.urlopen("http://www.python.org/index.html") data = u.read() 199 • Will cover in more detail later

Copyright (C) 2008, http://www.dabeaz.com 1- dir() function • dir() returns
list of symbols >>> import sys >>> dir(sys) ['__displayhook__', '__doc__', '__excepthook__', '__name__', '__stderr__', '__stdin__', '__stdout__', '_current_frames', '_getframe', 'api_version', 'argv', 'builtin_module_names', 'byteorder', 'call_tracing', 'callstats', 'copyright', 'displayhook', 'exc_clear', 'exc_info', 'exc_type', 'excepthook', 'exec_prefix', 'executable', 'exit', 'getcheckinterval', ... 'version_info', 'warnoptions'] • Useful for exploring, inspecting objects, etc. 200

Copyright (C) 2007, http://www.dabeaz.com 1- Exception Handling • Errors are
reported as exceptions • Cause the program to stop >>> f = open("file.dat","r") Traceback (most recent call last): File "<stdin>", line 1, in <module> IOError: [Errno 2] No such file or directory: 'file.dat' >>> 201 • For debugging, message describes what happened, where the error occurred, along with a traceback.

Copyright (C) 2008, http://www.dabeaz.com 1- Exceptions • To catch, use
try-except try: f = open(filename,"r") except IOError: print "Could not open", filename • To raise an exception, use raise raise RuntimeError("What a kerfuffle") 202 • Exceptions can be caught

Copyright (C) 2008, http://www.dabeaz.com 1- Summary • This has been
an overview of simple Python • Enough to write basic programs • Python code tends to be fairly readable • Just have to know the core datatypes and a few basics (loops, conditions, etc.) 203

Copyright (C) 2008, http://www.dabeaz.com 2- Working with Data and Data
Structures Section 2 2 "The horror! The horror!" - Col. Kurtz

Copyright (C) 2008, http://www.dabeaz.com 2- Introduction 3 • Most programs
need to perform various kinds of data manipulation • Mathematical calculations • Text processing • Row/column oriented data (e.g., databases)

Copyright (C) 2008, http://www.dabeaz.com 2- Overview 4 • In this
section, we take a closer look at how dynamic languages handle data • Topics include: • Variables and values • Primitive data types • Operations on data • Compound data • Memory management

Copyright (C) 2008, http://www.dabeaz.com 2- Disclaimer 5 • We're going
to cover some topics that you normally do not ﬁnd in the "user manual" for various languages. • My goal is to explore the design challenges and decisions that have been made in various languages. • The big picture

Copyright (C) 2008, http://www.dabeaz.com 2- Part 1 6 Variables, values,
and types

Copyright (C) 2008, http://www.dabeaz.com 2- Variables 7 • To work
with data, programs typically assign values to "variables" • A variable has a name which is known as an "identiﬁer" • The identiﬁer is used to identify values in subsequent calculations • The value is some sort of data

Copyright (C) 2008, http://www.dabeaz.com 2- Static Typing double total; int
x; 8 • In static languages such as C, C++, and Java, all variables must be declared and given a specific type in advance (declarations) • Underneath the covers, this binds the variable name to a fixed memory location that holds the value of the variable. • Type and location remain fixed

Copyright (C) 2008, http://www.dabeaz.com 2- Dynamic Typing total = 0.0
x = 42 9 • In dynamic languages, variables are just names for values • As the program runs, the value may change. • And it may change to a completely different type of data x = "foo" • Underneath the covers, it's just a table

Copyright (C) 2008, http://www.dabeaz.com 2- Variable Tables a = 0.0
b = 42 c = "Hello World" 10 'a' 'b' 'c' 0.0 42 "Hello World" Variable table • As your program runs, this table gets dynamically updated as variables are created, values get changed, and variables are destroyed.

Copyright (C) 2008, http://www.dabeaz.com 2- Values 11 • A "value"
represents some kind of data • Usually falls into a couple of categories • Primitive data (numbers and strings) • Compound data (arrays) • Objects • The treatment of values is actually a fairly complex problem (more soon)

Copyright (C) 2008, http://www.dabeaz.com 2- Typeless Languages 12 • In
some languages, all values are the same • For example, in shell scripts and Tcl, all values are just text strings set a 0.0 set b 42 set c "Hello World" 'a' "0.0" 'b' "42" 'c' "Hello World" • Because there are no types, programs simply interpret the value strings in different ways set c "$a + $b" # c -> "0.0 + 42" set c [expr "$a + $b"] # c -> "42.0"

Copyright (C) 2008, http://www.dabeaz.com 2- Typed Languages 13 • Most
dynamic languages use typed values a = 0.0 b = 42 c = "Hello World" 'a' (float, 0.0) 'b' (int, 42) 'c' (str,"Hello World") • Various operations in the language then look at the types to ﬁgure out what to do x = a + b # Ok. x = 42.0 y = b + c # Error. Can't add int and str • However, there is great variation in how "strict" a language is when types are mixed.

Copyright (C) 2008, http://www.dabeaz.com 2- Strong Typing 14 • If
a language is strongly typed, it tends to enforce strict rules about how values are used # Python a = 42 # An integer b = "Hello World" # A string x = a + b # Error • Any operation involving incompatible types may result in some kind of "Type Error"

Copyright (C) 2008, http://www.dabeaz.com 2- Weak Typing 15 • A
language may also be "weakly" typed. • In this case, the language performs implicit conversions to make certain operation go ahead. • For example, implicitly treating numbers as strings (shown above). // Javascript var a = 42 // An integer var b = "Hello World" // A string var x = a + b // x = "42Hello World"

Copyright (C) 2008, http://www.dabeaz.com 2- Terminology 16 • "Strong" versus
"weak" typing • Generally this just refers to whether or not a programming language makes a lot of implicit type conversions. a = 42 b = "Hello World" a + b Error # Strong typing a + b "42Hello World" # Weak typing • For example, even though C is statically typed, it is considered to be weakly typed

Copyright (C) 2008, http://www.dabeaz.com 2- Type Safety 17 • A
related issue that pertains to whether or not a language lets you "cast" values between incompatible data types. • Example : Pointer casting in C Foo *f; int x; x = (int) f; // OK. • This was one big difference between C/ Pascal (C let you do anything you wanted)

Copyright (C) 2008, http://www.dabeaz.com 2- Commentary 18 • In practice,
programming languages don't always fall neatly into any one category • Certain parts of the language may appear to be strongly typed whereas other parts seem to be weakly typed. x = 42 # int y = 2.5 # float z = x + y # float (x implicitly converted to int) • If it's too strict, it's "safe" but quite fussy

Copyright (C) 2008, http://www.dabeaz.com 2- Part 2 19 The Secret
Life of Numbers

Copyright (C) 2008, http://www.dabeaz.com 2- Numeric Data 20 • Numbers
are obviously one of the most common primitive data types • There are two basic kinds of numbers: • Integers : 123, -45, 1234 • Reals : 1.23, 4.5, 12e+34 • However, working with numbers is often a surprisingly difﬁcult problem • Let's talk more about this....

Copyright (C) 2008, http://www.dabeaz.com 2- Math on the CPU 21
• The CPU of your computer supports math with a few primitive types • Integer word (32 or 64 bits) • Floating point (32 or 64 bits) • In static languages (C, Java), these map to very speciﬁc datatypes int # 32 bit integer long # 32 or 64 bit integer (depends) float # 32-bit floating point number double # 64-bit floating point

Copyright (C) 2008, http://www.dabeaz.com 2- Native Integer Math 22 •
On the CPU, integers are a bunch of bits 5 00000000000000000000000000000101 -5 11111111111111111111111111111011 00000000110101111110101000010001 sign bit "value" • Data representation is in 2's complement • Invert all bits and add 1 to go between +/- 32 bits

Range of integers (32 bits) 10000000000000000000000000000000 01111111111111111111111111111111 -2147483648 2147483647 00000000000000000000000000000000 0 • Commentary : The representation of numbers is a surprisingly complex problem (there are many ways to do it). Would cover in more detail in an computer architecture coure.

Native integers support the usual math operations (+, -, *, /) • Also, a number of "bitwise" operators 10110101 bitwise-or 00101100 10111101 | 10110101 bitwise-and 00101100 00100100 & 10110101 bitwise-xor 00101100 10011001 ^ 10110101 left shift 1 01101010 << 10110101 right shift 11011010 >> 1 00101100 invert 11010011 ~

Copyright (C) 2008, http://www.dabeaz.com 2- Integer Overflow 25 • On
CPU, math operations that exceed the hardware range will overflow 01001001100101100000001011010010 1234567890 10010011001011000000010110100100 -1825831516 * 2 Result overflows into the sign bit • C/C++ is completely silent when this happens (i.e., you don't get an error).

Copyright (C) 2008, http://www.dabeaz.com 2- Commentary 26 • The behavior
of integer math on the CPU is fairly well understood by C/C++ programmers (maybe) • Math operations in those languages are directly mapped to low-level machine instructions (C as a better assembly) • Truncation, overﬂow, and other aspects are just "features" of those languages.

Copyright (C) 2008, http://www.dabeaz.com 2- Floating Point Numbers 27 •
Floating point numbers are a representation of the real numbers (decimals) • A number consists of three parts -1.23647223 x 1034 sign (+,-) mantissa exponent • "Floating Point" refers to the fact that the position of the decimal point varies

Fixed point numbers 1.23456 0.12345 0.01234 0.00123 0.00012 • Floating point numbers 1.23456 x 100 1.23456 x 10-1 1.23456 x 10-2 1.23456 x 10-3 1.23456 x 10-4 The exponent adjusts the position of the decimal point

On hardware, ﬂoating point numbers are merely a different interpretation of the bits 00000000110101111110101000010001 sign bit 32 bits exponent mantissa 8 bits 23 bits 32 bit ﬂoat • Value is computed as (+/-) mantissa * 2exponent

There are two main types of ﬂoats • Described by a standard : IEEE 754 • Single precision (32-bit) sign bit exponent mantissa 8 bits 23 bits • Double precision (64-bit) sign bit exponent mantissa 11 bits 52 bits

Numerical range of ﬂoating point • Single precision • Double precision 8 digits of accuracy Max value : 3.4 x 1038 17 digits of accuracy Max value : 1.8 x 10308 • Given a choice, most people use double

Copyright (C) 2008, http://www.dabeaz.com 2- Floating Point Math 32 •
The CPU has a ﬂoating point unit to perform math operations (+,-,*,/, sqrt, sin, cos, tan, etc). • One caution : Floating point can not accurately represent decimals (all values are approximate). >>> x = 3.4 >>> x 3.39999999999999 >>> >>> x = 0.1 * 0.1 >>> print x 0.01 >>> x == 0.01 False >>>

Copyright (C) 2008, http://www.dabeaz.com 2- Floating Point Math 33 •
Because ﬂoating point is approximate, repeated calculations result in mathematical errors that accumulate in a program. • Normally, this is covered in a numerical analysis/numerical methods class D. Goldberg, "What Every Computer Scientist Should Know About Floating Point Arithmetic" • One reason why ﬂoating point is sometimes avoided in business software

Copyright (C) 2008, http://www.dabeaz.com 2- Floating Point Errors 34 •
Certain operations result in exceptions (divide by 0, overflow, sqrt(-1)) • There are three special values +Inf Positive infinity -Inf Minus infinity NaN Not a number • These get encoded in a special way in the number (exponent field set to all 1s).

Design issue : If a math calculation produces an exceptional value (+Inf, -Inf, NaN), should it cause a program to abort or should the program keep running? • Note : These special values are "sticky". Any operation involving +Inf,-Inf, NaN will only produce one of those values as a result (it will not ever turn back into a normal number)

If you ignore errors, a program may run for a very long time silently producing garbage data (NaNs, Inf, etc.) • If you cause an abort, an unexpected math error (e.g., due to some kind of transient event) might cause the whole program to mysteriously crash.

Copyright (C) 2008, http://www.dabeaz.com 2- Interlude 37 • Math is
tricky • Classic example : Arianne 5 Rocket Launch (1996) • Exploded 37 seconds after launch • Cause : Overﬂow in a ﬂoat to integer conversion produced an uncaught math exception (which then caused the guidance software to dump core).

Copyright (C) 2008, http://www.dabeaz.com 2- Interlude 39 • Yeah, math
is something you worry about...

Copyright (C) 2008, http://www.dabeaz.com 2- Math in Dynamic Languages 40
• Dynamic languages tend to be very high level x = 12345 # An integer y = 123.45 # A float • Design question : Should a high-level language force programmers to think about low-level implementation details regarding math? (e.g., bits, overﬂow, etc.)

Copyright (C) 2008, http://www.dabeaz.com 2- Integers 41 • It is
common to represent integers by mapping them to the native integer type • This is beneﬁcial for performance • If so, integers will have a ﬁxed range: • -2147483648 -> 2147483647 (32 bits) • Question : what happens if you go outside that range?

Copyright (C) 2008, http://www.dabeaz.com 2- Integer Overﬂow 42 • Easy
solution : Do nothing. Just let math operations overﬂow like they do in C. • Example: Tcl set x 1234567890 set y 1234567890 set z [expr $x + $y] puts $z # Outputs -1825831516 • It's all perfectly intuitive if you're a C programmer (in fact, you'd ideally write your program to depend on this "feature" in some sort of very crucial, but diabolical way)

Copyright (C) 2008, http://www.dabeaz.com 2- Digression : Story of Mel
43 "Perhaps my greatest shock came when I found an innocent loop that had no test in it. No Test. None. Common sense said it had to be a closed loop, where the program would circle, forever, endlessly. Program control passed right through it, however, and safely out the other side. It took me two weeks to ﬁgure it out." "The vital clue came when I noticed ... incrementing the instruction address would make it overﬂow..." http://www.cs.utah.edu/~elb/folklore/mel.html

Copyright (C) 2008, http://www.dabeaz.com 2- Promotion to Floats 44 •
Another strategy : Silently promote integers that overﬂow to ﬂoating point numbers • Example: PHP <?php $x = 1234567890; $y = 1234667890; $z = $x + $y; print(gettype($x) . "\n"); // prints "integer" print(gettype($z) . "\n"); // prints "double" ?>

Copyright (C) 2008, http://www.dabeaz.com 2- Promotion to Bignums 45 •
Another strategy : Promote integers to arbitrary precision longs/bignums • Example: Ruby/Python x = 1234567890 # int y = 1234567890 # int z = x + y # long print type(x) # produces <type 'int'> print type(z) # produces <type 'long'> • In this case, integers are allowed to grow to arbitrary size.

Copyright (C) 2008, http://www.dabeaz.com 2- Bignum Implementation 46 • Typically,
a bignum is a sequence of native integers chained together. 32 bits 32 bits 32 bits ... • The number of parts is allowed to grow or shrink dynamically to accommodate the number as necessary bignum

Copyright (C) 2008, http://www.dabeaz.com 2- Bignums for Everything? 47 •
Why not just store all integers using the big number format? • Calculations will be slower due to extra processing overhead • Big numbers take more memory • Since small integer values are the most common, it makes little sense to penalize them.

Copyright (C) 2008, http://www.dabeaz.com 2- Integers as Floats 48 •
Some languages just always represent integers using double precision ﬂoating point numbers • Example : Perl, Javascript $x = 1234567890; # float $y = 1234567890; # float $z = $x + $y; # float • In this case, you just dispense with the problem of having to promote values (all numbers are the same type)

If you use ﬂoats, you'll get an extended range of exact integer values. (53 bits). 0 9007199254740992.0 (9.0e+15) • If you go beyond this, things get "weird" 9007199254740992.0 + 1 9007199254740992.0 (same) 9007199254740992.0 + 2 9007199254740994.0 9007199254740992.0 + 3 9007199254740996.0 • Will start to get "gaps" between numbers

There are some downsides • Floating point math is slower than integer math on the hardware. However, maybe you don't care in an interpreted language. • Increased memory footprint. 64-bit ﬂoats take twice as much memory as 32-bit ints. • May ﬁnd special cases at/around 32 bit limit. For example, systems interfaces may only work with 32 bit integer values

Copyright (C) 2008, http://www.dabeaz.com 2- Example : Bitwise Ops 51
• Sometimes silently truncated at 32-bits x = 9876543210 a = x * 2; # Multiplies x by 2 b = x << 1; # Multiplies x by 2 a 19753086420 (Perl) b 4294967294 a 19753086420 (Python/Ruby) b 19753086420 a 19753086420 (PHP) b -2 a 19753086420 (Javascript) b -1721750060

Copyright (C) 2008, http://www.dabeaz.com 2- Commentary 52 • One of
the reasons why Python/Ruby use integer bignums is to provide mathematical consistency across the entire range of integer values • There is no mysterious 32-bit cut-off for some operations and not others

Copyright (C) 2008, http://www.dabeaz.com 2- Integer Use Cases 53 •
There are actually very few practical applications that need the accuracy of really big integer numbers (e.g., cryptography) • Most uses of integers are for counting and for indexing into data (e.g., array lookup). • Example : Indexing bytes in a file 1 Gigabyte 1073741824.0 1 Terabyte 1099511627776.0 1 Petabyte 1125899906842624.0 Largest int in a 64-bit float 9007199254740992.0 • For now, using floats is fine.

Copyright (C) 2008, http://www.dabeaz.com 2- Integer Division -7/3 -2.333333333333333 (Perl,Javascript,PHP)
-7/3 -3 (Python, Ruby, Tcl) -7/3 -2 (C, C++, Java) 54 • Integer division behaves differently in different languages (a surprise!) • Choices: • Convert to ﬂoating value (exact value) • Floor division (closest integer less than the value) • Truncate towards zero.

Copyright (C) 2008, http://www.dabeaz.com 2- Integer Division 55 • Currently,
the trend in dynamic languages may be to make integer division convert the result to a ﬂoating point number if result not exact • Python is changing integer division in v. 3.0 • This change is highly controversial (I was even skeptical when I ﬁrst heard it).

Copyright (C) 2008, http://www.dabeaz.com 2- Why Worry? 56 • In
compiled languages, you can write functions that expect to work with floats, but you can use them fine with integers float midpoint(float x, float y) { return (x+y)/2; } ... float m = midpoint(12,17); // m = 14.5 • Inside the compiler, it knows the arguments are supposed to be floats. So, it automatically converts the integer arguments to floats. • It all just works.

Copyright (C) 2008, http://www.dabeaz.com 2- Why Worry? 57 • In
dynamic languages, functions are written with no type information def midpoint(x, y): return (x+y)/2 ... m = midpoint(12,17) // m = 14 m = midpoint(12.0,17.0) // m = 14.5 • It is very easy to silently introduce numerical errors into a program. • You can code around it, but it is error prone, makes code hard to read, and runs slower.

Copyright (C) 2008, http://www.dabeaz.com 2- Why it's controversial 58 •
There are important uses of truncating integer division • Most common : Date/time calculations seconds = x; minutes = seconds / 60 hours = minutes / 60 days = hours / 24 • A good subject for ﬂame wars involving people with far too much spare time

Copyright (C) 2008, http://www.dabeaz.com 2- Why not Rationals? 59 •
Some programming languages convert integer division into exact rational numbers (fractions) • Example: Common lisp > (setf seconds 123456) 123456 > (setf minutes (/ seconds 60)) 10288/5 > (setf hours (/ minutes 60)) 2572/75 > (setf days (/ hours 24)) 643/450 > • Mathematically interesting, but not practical for most people

Copyright (C) 2008, http://www.dabeaz.com 2- Numbers and Strings 60 •
Some languages blur the distinction between numbers and text strings x = "42 bottles" y = "37 bottles" $x + $y 79 (Perl, PHP) x + y "42 bottles37 bottles" (Python, Ruby) • If a string is used in a context that expects a number, it may be converted to a number

Copyright (C) 2008, http://www.dabeaz.com 2- Numbers and Strings 61 •
In some cases, it's a little diabolical // Javascript var x = "42" var y = "37" var a = x + y; // a = "4237" var b = x * y; // b = 1554 • If numbers and strings are mixed, it's more common to have separate string/math ops # Perl/PHP $x = "42" $y = "37" $a = $x + $y; # Numeric add : a = 79 $b = $x . $y; # String concat : a = "4237"

Copyright (C) 2008, http://www.dabeaz.com 2- Mixed Numbers/Strings 62 • Underneath
the covers, an interpreter may keep multiple representations of data $x = "1 bottle"; str : "1 bottle" num : 1 • If used as a number, the numeric value will be saved and reused in later calculations • Perl does this.

Copyright (C) 2008, http://www.dabeaz.com 2- Numeric Type Promotion 63 •
Many languages have multiple numeric types int : 42 long : 1273894812883991923 float : 1.2374623 complex : 1.23 + 4.5j • In calculations involved mixed types, numbers are converted to the same type. • Usually done in a way so accuracy is not lost 42 + 4.5 42.0 + 4.5 • You still need to be careful

Copyright (C) 2008, http://www.dabeaz.com 2- Alternatives to Floats 64 •
Generalized Decimal Arithmetic • Example : Python Decimal() types >>> import decimal >>> a = decimal.Decimal("3.45") >>> b = decimal.Decimal("7.22") >>> a + b Decimal("10.67") >>> a / b Decimal("0.4778393351800554016620498615") • This module performs exact decimal ops • IBM General Decimal Arithmetic Spec.

Copyright (C) 2008, http://www.dabeaz.com 2- Interlude 65 • As you
can tell, numbers are a bit of a mess • Not entirely standardized across languages • Many possibilities for program errors (especially with weak typing) • Many tradeoffs and design considerations • Let's move on to something more "simple"...

Copyright (C) 2008, http://www.dabeaz.com 2- Part 3 66 The equally
secret, but quite different, life of text

Copyright (C) 2008, http://www.dabeaz.com 2- Strings 67 • A string
typically refers to a sequence of characters x = "Hello World" • Most programmers are generally familiar with working with strings. • Operations on strings are fairly well "standardized" across languages.

Copyright (C) 2008, http://www.dabeaz.com 2- String Literals 68 • There
are a number of quoting conventions a = "Hello World" b = 'Hello World' c = """This is a multiline string. It captures all text.""" • "Heredoc" assignment (Perl, PHP, Ruby) a = <<END All of the text from here on is captured just as is is typed. END

Copyright (C) 2008, http://www.dabeaz.com 2- String Interpolation 69 • You
can sometimes substitute variables name = "Dave" text = "Your name is $name" # Perl/PHP/Tcl text = "Your name is ${name}" # Alternative text = "Your name is #{name}" # Ruby • This is notably absent in Python/Javascript (although you can sometimes hack it) name = "Dave" text = "Your name is %(name)s" % vars() # Python

Copyright (C) 2008, http://www.dabeaz.com 2- String Operations 70 • There
are common string operations • stripping/chopping " text \n" "text" • splitting "text1 text2 text3" [ "text1", "text2", "text3" ] • replacing "Hello World" "Hello There" • Just read a manual to ﬁnd out how

Copyright (C) 2008, http://www.dabeaz.com 2- Strings are Hard 71 •
There are a number of "hard" real-world issues concerning the use, design, and implementation of strings. • These issues are often overlooked/ignored by programmers (at their own peril) • We're not one of those programmers

Copyright (C) 2008, http://www.dabeaz.com 2- Problem 72 • A string
is a sequence of characters x = "Hello World" • Yes, yes, please continue... • Question : What is a character? • Think about it for a moment...

Copyright (C) 2008, http://www.dabeaz.com 2- Characters 73 • A character
is a number • 65 'A' (ASCII) • The number is a symbolic representation of some sort of a writing element (e.g., a letter) • The number 65 represents the letter 'A' • A character is not the visual presentation (that's called a "glyph"). • Oh, and a character is not a byte

Copyright (C) 2008, http://www.dabeaz.com 2- The Character Song 74 •
Characters are not bytes • Characters are not bytes • Characters are not bytes • Characters are not bytes • Characters are not bytes

Copyright (C) 2008, http://www.dabeaz.com 2- Wait a Minute! 75 •
Characters are bytes and always have been! • Characters 0-127 : ASCII • Characters 128-255 : Everything else • Just look it up in the manual... you'll see. "A string is an array of bytes." - From Ruby in a Nutshell • Yes, it's common for characters to be bytes

Copyright (C) 2008, http://www.dabeaz.com 2- History Lesson 76 • Much
of early computing was invented in the west (US/Europe) • Strong bias towards European languages and European characters • Which, conveniently, happened to mostly ﬁt into a single byte of data. • Example : ASCII character set • But there was other horrible weirdness

Copyright (C) 2008, http://www.dabeaz.com 2- Early Home Computers 77 •
In the late 70s, every manufacturer implemented ASCII, but did whatever they wanted with the rest of the characters Starship Enterprise! Incriminating drink stain "\x09\x0a"

Copyright (C) 2008, http://www.dabeaz.com 2- Example: Microsoft CP437 78 •
Charset for IBM- PC (1981) • Taken from Wang word-processing machines • A hodge-podge of letters, nums, accented letters, and symbols

Copyright (C) 2008, http://www.dabeaz.com 2- Example Use 79 • Pre-GUI
text-based applications on DOS

Copyright (C) 2008, http://www.dabeaz.com 2- Characters as Bytes 80 •
The practice of storing characters in a single byte is not workable in general • Thousands of different world languages • Some have thousands of characters (e.g., Chinese, Japanese, Korean, etc.) • Is everyone going to go create their own mutually incompatible character set? (well, yes, actually).

Copyright (C) 2008, http://www.dabeaz.com 2- Example: Big5 81 • An
encoding of Chinese that emerged out of using MS-DOS in Taiwan (early 80s) CP437 (Multibyte characters sort of "hacked" to overlay with CP437)

Copyright (C) 2008, http://www.dabeaz.com 2- Unicode 82 • A standardized
mapping of characters to numerical codes • First appeared ~1991 and is periodically updated by a consortium • Simple explanation : Assign a unique number to all characters used by humans in all written languages. • (Sounds like a good project for a hapless Ph.D. student).

Copyright (C) 2008, http://www.dabeaz.com 2- Unicode 83 • Characters are
organized into code charts (http://www.unicode.org/charts)

Copyright (C) 2008, http://www.dabeaz.com 2- Unicode Big Picture 86 •
There are currently about 100000 characters • Characters 0-127 correspond to ASCII • Other character sets are mapped to different ranges of numbers (usually given in hex) • Example: Armenian (0530-058F) • Example: Mongolian (1800-18AF)

Copyright (C) 2008, http://www.dabeaz.com 2- Unicode Big Picture 87 •
Unicode just assigns numbers • The unicode standard does NOT specify how characters are supposed to be represented in memory • Does NOT specify how characters are supposed to be stored in ﬁles • And it's not even entirely consistent on how you represent certain characters

Copyright (C) 2008, http://www.dabeaz.com 2- What is a Character? 88
• In theory, there is a unique integer code for each character. • However, in some languages, there are characters and then there are characters with modiﬁers (e.g.,ä, ã, â, á, à, å). • Unicode gives all of these variants a separate numerical code. ä U+00e4 ã U+00e3 â U+00e2 á U+00e1 à U+00e0

Copyright (C) 2008, http://www.dabeaz.com 2- What is a Character? 89
• But certain characters can also be constructed by adding modiﬁers • ä = a + ̈ (0061 + 0308) • So, you might have multiple representations "Jalapeño" 004a 0061 007c 0061 0070 0065 00f1 006f 004a 0061 007c 0061 0070 0065 006e 0303 006f ñ n ̃

Copyright (C) 2008, http://www.dabeaz.com 2- String Comparison 90 • If
the same text has multiple representations, how do you do string comparison? • Well, in general you don't • To do this, you would have to "normalize" strings to one standard representation • A related, but equally nasty problem : Alphabetization (collocation)

Copyright (C) 2008, http://www.dabeaz.com 2- Character Collocation 91 • From
a French-English Dictionary • Collocated characters e U+0065 è U+00E8 é U+00E9

Copyright (C) 2008, http://www.dabeaz.com 2- Character Collocation 92 • The
collocation of characters varies by language/region, not by character set • So, to make sorting work, you would have to have some kind of collocation sequence that speciﬁes the desired order [..., c, d, e, è, é, ê, ë, f, g, h, ... • Bloody hell • Let's move on...

Copyright (C) 2008, http://www.dabeaz.com 2- Strings (Revisited) 93 • So,
a string is just a sequence of "characters" x = "Hello World" • Question : You're the language designer. Are you going to support Unicode? • Well, it's 2008, so let's assume yes...

Copyright (C) 2008, http://www.dabeaz.com 2- Unicode Implementation 94 • How
do you represent Unicode strings? • Unicode characters are just numbers • Well, just store the numbers then...

Copyright (C) 2008, http://www.dabeaz.com 2- Unicode Implementation 95 • Option
1: Make each character a 32-bit int • This is known as UCS-4 • More than enough bits to represent all unicode characters, but it hogs memory • ASCII text takes 4 times as much memory • Memory is cheap--buy more RAM. • Worse performance (e.g., CPU cache)

2 : Make each character a slightly smaller, but still large enough integer. • For example : 20 bits • Fine except that 20 bits is pretty odd • No C,C++,Java datatype for that. • Not natively supported on the CPU. • Will run slow as hell. • Nobody does this.

3 : Make each character a 16-bit int • Known as UCS-2 (very common) • Much less memory overhead. • But 16-bits is not enough to represent all of the unicode characters • However, the Unicode people thought of that...

Copyright (C) 2008, http://www.dabeaz.com 2- Surrogate Pairs 98 • Large
Unicode characters can be encoded into a pair of smaller character codes • U+D800 - U+DFFF (Surrogate pairs) • How it works : U+1D122 ( ) 1D122 00011011000100100010 0001101100 0100100010 1011100001101100 1011100100100010 (U+D86C, U+D922) (20 bits) (2x10 bits) (Add to D800) (A pair of 16 bit values)

Copyright (C) 2008, http://www.dabeaz.com 2- Surrogate Pairs 99 • Some
unicode characters now get encoded as a pair of "sort of" characters • U+1D122 becomes (U+D86C, U+D922) • How is that supposed to work in practice? • Does an application programmer check? • If surrogate pairs get handled automatically, you are probably working in a string encoding known as UTF-16

4. Variable length encoding • Example : UTF-8 • Now, this is something you see a lot <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/> • But, what is UTF-8 exactly?

Copyright (C) 2008, http://www.dabeaz.com 2- UTF-8 101 • Characters 0-127
are ASCII (backwards compatibility) • The rest of the entire Unicode character set is encoded into numerical range 128-255. • However, single characters may require a variable number of bytes to be represented

Copyright (C) 2008, http://www.dabeaz.com 2- UTF-8 102 0 - 127
0nnnnnnn (0x0) (0x7f) 128 - 2047 110nnnnn 10nnnnnn (0x7f) (0x7ff) 2048 - 65535 1110nnnn 10nnnnnn 10nnnnnn (0x800) - (0xffff) 65536 - 2097152 11110nnn 10nnnnnn 10nnnnnn 10nnnnnn (0x10000)-(0x200000) Bits set here determine how many additional data bytes follow Data bytes (6-bit chunks) ASCII

Copyright (C) 2008, http://www.dabeaz.com 2- UTF-8 103 • UTF-8 has
some nice properties • Can often be plugged into legacy programs that just process characters as bytes • Problem : Not a good internal format for randomly accessing unicode characters. • Example : Array lookup s = "some unicode string" c = s[n] # Does this return the nth character # or does it return the nth byte?

Copyright (C) 2008, http://www.dabeaz.com 2- Other Stuff that Sucks 104
• Putting unicode characters in string literals • Source code encoded in Unicode • Read/writing Unicode data from ﬁles (later) • Unicode character properties database a = "¼" b = "x" numeric(a) -> 0.25 numeric(b) -> false

Copyright (C) 2008, http://www.dabeaz.com 2- What about Bytes? 105 •
There is still a need for processing data as raw sequences of bytes • Example : Processing binary formats (images, sound ﬁles, video, etc.) • Example : Fast ASCII Text processing • Do you have to wedge this into all of the Unicode processing?

Copyright (C) 2008, http://www.dabeaz.com 2- Byte Strings 106 • One
solution is to provide an entirely different primitive datatype for "byte strings" • Example : Python-3000 s = b"just some bytes" • This raises new issues : Do you allow text strings and byte strings to intermix? • If so, what rules deﬁne that relationship?

Copyright (C) 2008, http://www.dabeaz.com 2- Interlude 107 • All dynamic
programming languages have been wrestling with the unicode problem right now • It's a complicated issue because these languages have grown entirely out of real- world application development. • Many of the issues are quite subtle. • We haven't even discussed I/O yet!

Copyright (C) 2008, http://www.dabeaz.com 2- True Story 108 • I
once met some American programmers working on a news web site that published some articles in Spanish. They couldn't ﬁgure out how to deal with the special "Spanish" characters so they just dropped them entirely. "That's a spicy Jalapeo" • Don't be like those guys...

Copyright (C) 2008, http://www.dabeaz.com 2- Strings (Reprise) 109 • A
string is a sequence of characters x = "Hello World" • Yes, conceptually simple, but some horrible details concerning "characters" • But let's assume you've sorted that out. • Question : How do strings actually behave in our favorite dynamic language?

Copyright (C) 2008, http://www.dabeaz.com 2- String Mutability 110 • Question
: Can you modify the contents of a string after you create it? • Sometimes yes : Perl, PHP, Ruby irb(main):001:0> a = "Hello World" => "Hello World" irb(main):002:0> a[1] = 'a' => "a" irb(main):003:0> a => "Hallo World" • Sometimes no : Python, Javascript >>> a = "Hello World" >>> a[1] = 'a' Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'str' object does not support item assignment >>>

Copyright (C) 2008, http://www.dabeaz.com 2- Mutable Strings 111 • Allows
in-place modiﬁcation of string data • High performance for manipulating huge strings $text =~s/Foo/Bar/g; # Substition in perl • Question : But what happens here? $other = $text; • It's fast because you can modify the contents without making a new copy in memory.

Copyright (C) 2008, http://www.dabeaz.com 2- Mutable Strings 112 • Copy
on assignment. When saving the value of a string in a new variable, make a fresh copy. • Copy by reference. Just copy a pointer. Of course, this can lead to bizarre sharing. $other = $text; # ruby a = "Hello World" b = a b.sub!("Hello","Hello Cruel") print a # "Hello Cruel World"

Copyright (C) 2008, http://www.dabeaz.com 2- Mutable Strings 113 • Copy
on write. Initially copy by reference, but if anyone makes a modiﬁcation, make a local copy a = "Hello World" b = a b[1] = 'a' "Hello World" a b "Hello World" a b "Hallo World" copy • Sounds tricky...

Copyright (C) 2008, http://www.dabeaz.com 2- Mutable Strings 114 • Because
of sharing, mutable strings may have one set of methods that always return a new string and another set that modify a string "in-place" • Example: Ruby Create new strings In-place ------------------ --------- s.capitalize s.capitalize! s.chomp s.chomp! s.gsub s.gsub! s.strip s.strip! ...

Copyright (C) 2008, http://www.dabeaz.com 2- Dangers 115 • Working with
mutable strings requires a certain degree of programming discipline • Since values might be shared, changes can unexpectedly affect other parts of code (like working with pointers in C++) • Could get real messy if working with Unicode and multibyte character sets because of internal representation and encoding issues.

Copyright (C) 2008, http://www.dabeaz.com 2- Immutable Strings 116 • Immutable
strings are much simpler • Since they are read-only, all operations that manipulate strings always return new strings s = "Hello World" a = s.upper() # a = "HELLO WORLD" b = s.replace("Hello","Hallo") # b = 'Hallo World' • Copies are always made by reference a = "Hello World" b = a c = b "Hello World" a b c

Copyright (C) 2008, http://www.dabeaz.com 2- Immutable Strings 117 • The
fact that strings are immutable allows operations to optimized inside the interpreter • For example : Use of small strings to refer to named ﬁelds, etc. • Gives more freedom in how strings are represented/manipulated internally (since programs aren't allowed to touch the bits)

Copyright (C) 2008, http://www.dabeaz.com 2- Commentary 118 • That might
have been more about numbers and strings than you ever wanted to know • In my experience, working programmers only have a ﬂimsy grasp of the details (especially when it concerns Unicode). • One goal of going into detail has been to inform you of the important issues and pitfalls. • Note : There is not going to be a unicode quiz

Copyright (C) 2008, http://www.dabeaz.com 2- Data Structures 120 • In
programs, it is often necessary to represent data that consists of multiple parts • Example: A holding of stock Name : "GOOG" (string) Shares : 100 (integer) Price : 490.10 (float) 100 shares of GOOG at 490.10 • There are three basic components

Copyright (C) 2008, http://www.dabeaz.com 2- Structs in Static Languages 121
• In static programming languages (C, Java, etc.), data structures are managed by deﬁning a "structure" or "class" struct StockHolding { char name[8]; int shares; double price; }; • This precisely deﬁnes the members, the memory layout, and other low-level details

Copyright (C) 2008, http://www.dabeaz.com 2- Grouping Values 122 • Dynamic
languages don't really have "structs" • Instead, you can group values together g = "GOOG", 100, 490.10 a = "AAPL", 50, 123.45 • This becomes a single object composed of multiple parts (sometimes known as a tuple) • You can pass it around in your program as a single "value"

Copyright (C) 2008, http://www.dabeaz.com 2- Using Compound Data 123 •
When values are grouped, the components are typically ordered (like an array) g = "GOOG", 100, 490.10 name = g[0] shares = g[1] price = g[2] • However, you also just unpack values like this: g = "GOOG", 100, 490.10 ... name, shares, price = g

Copyright (C) 2008, http://www.dabeaz.com 2- Packing/Unpacking 124 • This concept
of packing/unpacking values is surprising rich in most dynamic languages • Most programmers aren't even aware of the full extent to which this actually works • Some examples follows

Copyright (C) 2008, http://www.dabeaz.com 2- Example (Python/Ruby) 125 Date =
16, "Jan", 2008 Time = 17, 30 When = Date, Time Who = "David Beazley", "[email protected]" Lecture = "Working with Data", Who, When Packing: Lecture = ( "Working with Data", ("David Beazley","[email protected]"), ( (16,"Jan", 2008), (17, 30) ) )

Copyright (C) 2008, http://www.dabeaz.com 2- Example (Python/Ruby) 126 Unpacking: Lecture
= ( "Working with Data", ("David Beazley","[email protected]"), ( (16,"Jan", 2008), (17, 30) ) ) Just ﬂip the sides and put in some variable names: ( title, (name,email) ( (day,month,year), (hour,minute) ) ) = Lecture

Copyright (C) 2008, http://www.dabeaz.com 2- Example (Perl) 127 @Date =
(16, "Jan", 2008); @Time = (17, 30); @When = (@Date, @Time); @Who = ("David Beazley", "[email protected]"); @Lecture = ("Working with Data", @Who, @When); Unpacking: ($title, ($name,$email), ( ($day,$month,$year), ($hour, $min) ) ) = @Lecture; Packing:

Copyright (C) 2008, http://www.dabeaz.com 2- Example (PHP) 128 $Date =
array(16, "Jan", 2008); $Time = array(17, 30); $When = array($Date, $Time); $Who = array("David Beazley", "[email protected]"); $Lecture = array("Working with Data", $Who, $When); Unpacking: list($title, list($name,$email), list( list($day,$month,$year), list($hour,$minute) ) ) = $Lecture; Packing:

Copyright (C) 2008, http://www.dabeaz.com 2- Example (PHP) 129 $Date =
array(16, "Jan", 2008); $Time = array(17, 30); $When = array($Date, $Time); $Who = array("David Beazley", "[email protected]"); $Lecture = array("Working with Data", $Who, $When); Unpacking (with ignored values) list(, list(,), list( list($day,$month,$year), list(,) ) ) = $Lecture; Packing:

Copyright (C) 2008, http://www.dabeaz.com 2- Commentary 130 • Working with
data using packed values tends to be quite efﬁcient • Fairly small memory footprint • Implementation is highly optimized in the interpreter (dynamic languages often rely on these same data structures for their own operation)

Copyright (C) 2008, http://www.dabeaz.com 2- Commentary 131 • Packing and
unpacking values only really works well if data consists of a small number of parts • It would be extremely annoying to do this with a 50-ﬁeld database row • There may be constraints on packed values. For example, in Python, such objects are immutable.

Copyright (C) 2008, http://www.dabeaz.com 2- Using Named Fields 132 •
An alternative approach is to store data using objects involving named ﬁelds • Dictionaries, hashes, associative arrays, etc. # Python g = { 'name' : GOOG, 'shares' : 100, 'price' : 490.10 } # Ruby g = { 'name' => 'GOOG', 'shares' => 100, 'price' => 490.10 } # PHP $g = array('name' => 'GOOG', 'shares' => 100, 'price' => 490.10) # Perl %g = { name => 'GOOG', shares => 100, price => 490.10 }; # Javascript g = { name : 'GOOG', shares : 100, price : 490.10 };

Copyright (C) 2008, http://www.dabeaz.com 2- Dictionaries/Hashes 133 • The behavior
is almost identical across languages • Work like arrays but you use the ﬁeld names shares = g['shares']; # Retrieval g['shares'] = 75; # Assignment • Unlike a normal array, there is no ordering. • Keys aren't stored in alphabetical order, etc.

Copyright (C) 2008, http://www.dabeaz.com 2- Collections of Objects 135 •
Programs often have to work with collections of "objects" • Example : A collection of stocks in a portfolio YHOO 50 19.25 AAPL 100 143.41 SCOX 500 4.21 GOOG 20 490.10 MSFT 50 67.12 JAVA 75 6.23 IBM 50 91.10

Copyright (C) 2008, http://www.dabeaz.com 2- Choices 136 • In most
dynamic languages, there are two very common choices for collections • List or array (ordered sequence of items) • Associative array/hash table (unordered data) • We've already used these in the last section.

Copyright (C) 2008, http://www.dabeaz.com 2- Lists/Arrays 137 • An ordered
sequence of values items = [1, 3.5, "Hello"] • Items are accessed by numerical indices n = len(items) # Number of items a = items[i] # Retrieve the ith item items[i] = b # Change the ith item • There are often append/insert/delete operations items.append(x) items.remove(y) items.insert(i,z) • Read the manual to know exact syntax

Copyright (C) 2008, http://www.dabeaz.com 2- Hashes/Dictionaries 138 • An unordered
collection of values prices = { 'GOOG' : 523.10, 'AAPL' : 172.23, 'IBM' : 105.44 } • Values are accessed by keys n = len(prices) # Number of items a = prices['GOOG'] # Retrieve 'GOOG' value prices['SCOX'] = 0 # Change the 'SCOX' value • Likewise, there are various operations for manipulating the contents

Copyright (C) 2008, http://www.dabeaz.com 2- Array/Dictionary Data 139 • A
critical part of using containers is knowing that you can store any kind of data that you want inside • This includes other lists and hashes YHOO 50 19.25 AAPL 100 143.41 SCOX 500 4.21 GOOG 20 490.10 MSFT 50 67.12 JAVA 75 6.23 IBM 50 91.10 [ ['YHOO', 50, 19.25], ['AAPL', 100, 143.41], ['SCOX', 500, 4.21], ['GOOG', 20, 490.10], ['MSFT', 50, 67.12], ['JAVA', 75, 6.23], ['IBM', 50, 91.10] ] list of lists

Copyright (C) 2008, http://www.dabeaz.com 2- Another Example 140 YHOO 50
19.25 AAPL 100 143.41 SCOX 500 4.21 GOOG 20 490.10 MSFT 50 67.12 JAVA 75 6.23 IBM 50 91.10 [ { 'name' :'YHOO', 'shares' : 50, 'price' : 19.25 }, { 'name' : 'AAPL', 'shares' : 100, 'price' : 143.41 } ... ] list of dicts

Copyright (C) 2008, http://www.dabeaz.com 2- "First Class" Data 141 •
Sometimes you will computer scientists talking about so-called "First Class" objects. • This means that whatever they're talking about can be used as data value in a program. • You can assign it to a variable • You can store it in an array. • It has equal status with primitive types • In most dynamic languages, everything is FC

Copyright (C) 2008, http://www.dabeaz.com 2- Commentary 142 • An adept
programmer can write signiﬁcant programs that do nothing but perform operations on lists and hashes • These data structures are powerful enough to do almost any kind of data processing you would ever need to do. • Note : You almost never hear of people implementing things like linked lists and search trees in these languages (why bother?)

Copyright (C) 2008, http://www.dabeaz.com 2- Array Implementation 143 • For
ordered data, an array is usually just a resizable array of references to values items = [1, 3.5, "Hello"] 1 3.5 "Hello" • It's an array of pointers to the values items

Copyright (C) 2008, http://www.dabeaz.com 2- Dictionary Implementation 144 • For
dictionaries/hashes, you get a mapping of keys to values 523.10 172.23 105.44 prices 'GOOG' 'IBM' 'AAPL' • The tricky part : Searching for keys prices = { 'GOOG' : 523.10, 'AAPL' : 172.23, 'IBM' : 105.44 }

Copyright (C) 2008, http://www.dabeaz.com 2- Implementing Dictionaries 145 • The
critical part of creating a dictionary is knowing what to do with the keys • Can a key be any object or is it restricted to strings? • How do you perform a fast key-lookup?

Copyright (C) 2008, http://www.dabeaz.com 2- Hashing 146 • Any object
that can be used as a key, is given a hash value operation. • This usually computes an integer value irb(main):023:0> a = "GOOG" => "Hello" irb(main):024:0> a.hash => 252612492 irb(main):025:0> Hash value • You use the hash value to perform a lookup

Copyright (C) 2008, http://www.dabeaz.com 2- Hashing 147 • A hash
table has some number of "slots" (N) "GOOG" 252612492 ... N-1 0 j % N hashval "GOOG" 523.10 • Ideally, each key has a different hashing slot

Copyright (C) 2008, http://www.dabeaz.com 2- Hashing 148 • For collisions,
you typically chain ... N-1 0 "name1" Value1 • However, the design of this can be complex (consult an algorithms book) "name2" Value2 "name3" Value3

Copyright (C) 2008, http://www.dabeaz.com 2- Comments on Hashing 149 •
Hash tables are one of the most essential data structures in virtually every dynamic language • Used not only by end-users, but for the implementation of the interpreter itself • Reading : A. Kuchling, "Python's Dictionary Implementation : Being All Things To All People", in Beautiful Code (O'Reilly)

Copyright (C) 2008, http://www.dabeaz.com 2- Variable Assignment total = 0.0
x = 42 151 • In dynamic languages, variables are just names for values • As the program runs, the value may change. • And it may change to a completely different type of data x = "foo" • Er..... didn't we already have this slide????

Copyright (C) 2008, http://www.dabeaz.com 2- What is Assignment? x =
42 152 • What does this do? • It assigns a value to a variable, yes. • But what does this do? y = x • And this...? z = [x,y] # A list/array with two items

Copyright (C) 2008, http://www.dabeaz.com 2- Value Binding x = 42
# Binding a value to a name y = x # Binding a value to a name items[2] = x # Binding to container location 153 • When programs run, values (i.e., data) get bound to different locations • But, what really happens? • In the above code, the value 42 has been assigned to three different places. • Does that mean there are three copies of 42 in memory? (Answer : It depends)

Copyright (C) 2008, http://www.dabeaz.com 2- Assignment by Value x =
42 y = x items[2] = x 154 • Assignments always make a local copy of whatever value is being stored. x 42 y items 42 42 2 These are all distinct objects even though they have the same value

Copyright (C) 2008, http://www.dabeaz.com 2- Assignment by Value x =
"... A string with 10 million characters ..." y = x items[2] = x 155 • But, consider this case: • Discuss amongst yourselves... • Maybe this one isn't so clear-cut. • Might depend on how strings were implemented.

Copyright (C) 2008, http://www.dabeaz.com 2- Reference Variables # Perl $x
= 42; $y = \$x; # Reference to $x print $$y,"\n"; # Dereference the value $$y = 37; # Reassign value being reference 156 • You might introduce special reference/ pointer variables (Perl/PHP) • This lets you refer to data instead of making copies, but it also introduces pointers • That may or may not be a good thing

Copyright (C) 2008, http://www.dabeaz.com 2- Assignment by Reference x =
42 y = x items[2] = x 157 • All assignments merely makes a reference to the value (like a pointer) x 42 y items 2 There is one object with value 42, many locations point to it.

Copyright (C) 2008, http://www.dabeaz.com 2- Assignment by Reference 158 •
If everything is a reference, there are other issues. x 42 y items 2 • Are primitive types mutable? x = 37 • In general, you want immutable data to avoid making your head explode.

Copyright (C) 2008, http://www.dabeaz.com 2- Reference Counting 159 • How
do you track memory? x 42 y items 2 • Must keep reference counts on values or perform some kind of garbage collection ref=3 • Values (memory) will be reclaimed when no more references

Copyright (C) 2008, http://www.dabeaz.com 2- Commentary 160 • Most modern
dynamic languages assign by reference (Python, Ruby, Javascript, etc.) • This is one of the reasons why strings are immutable in Python/Javascript • You can check it out: >>> a = 42 >>> b = a >>> a is b True >>>

Copyright (C) 2008, http://www.dabeaz.com 2- Some Perils 161 • Working
with containers (lists/hashes) is very tricky in this model irb(main):001:0> a = [1,2,3,4] => [1, 2, 3, 4] irb(main):002:0> b = a => [1, 2, 3, 4] irb(main):003:0> b[2] = 99 => 99 irb(main):004:0> a => [1, 2, 99, 4] irb(main):005:0> • Since assignments only make references, you get shared references to the same object a b 1 2 99 4

Copyright (C) 2008, http://www.dabeaz.com 2- Copying 162 • To make
copies of containers, you have to take special steps irb(main):001:0> a = [1,2,3,4] => [1, 2, 3, 4] irb(main):002:0> b = a.clone => [1, 2, 3, 4] irb(main):003:0> b[2] = 99 => 99 irb(main):004:0> a => [1, 2, 3, 4] irb(main):005:0> b => [1, 2, 99, 4] Make a "copy" of the object

Copyright (C) 2008, http://www.dabeaz.com 2- Shallow Copies 163 • Copies
are often only shallow • You get a new object, but the values inside are copied by reference a b a.clone Values 1 2 3 4 99 b[2] = 99

Copyright (C) 2008, http://www.dabeaz.com 2- More Peril 164 • Containers
of containers (python) >>> a = [2,3,[100,101],4] >>> b = list(a) >>> a is b False • However, items in list copied by reference >>> a[2].append(102) >>> b[2] [100,101,102] >>> 100 101 102 2 3 4 a b This list is being shared

Copyright (C) 2008, http://www.dabeaz.com 2- Deep Copies 165 • To
actually copy data, you might have to execute a "deep copy" operation >>> a = [2,3,[100,101],4] >>> import copy >>> b = copy.deepcopy(a) >>> a[2].append(102) >>> b[2] [100,101] >>> • Recursively traverses through the object and copies everything that can be found. • (This is also an interesting CS problem)

Copyright (C) 2008, http://www.dabeaz.com 2- Hidden Secrets 166 • Dynamic
languages use various facets of references, immutable data, and this memory model to perform various kinds of optimization. • Example : Small integer caching. Small integers are frequently cached and reused. >>> a = 42 >>> b = 37 >>> c = b + 5 >>> c is a True >>> a 42 b 37 c

Copyright (C) 2008, http://www.dabeaz.com 2- Hidden Secrets 167 • Example:
Sharing dictionary keys and variable names stock = { 'name' : 'GOOG', 'shares' : 100, 'price' : 490.10 } person = { 'name' : 'Dave', 'email' : '[email protected]' } 'name' name = "Mondo" • Programs may use much less memory than you think

Copyright (C) 2008, http://www.dabeaz.com 2- Wrap-up 169 • A lot
of material has been presented in this section, but there are three big take-aways • Data. A close look at primitive datatypes (numbers, reals, and strings) • Data structures. How to group data together (lists, arrays, etc.) • Assignment. What happens when you assign variables and manipulate values in a program.

Copyright (C) 2008, http://www.dabeaz.com 2- More Information 170 • Everything
else related to manipulating basic data is user-manual sorts of stuff • E.g., can look up how to append to a list in your favorite language. • Will get a chance to explore in the exercise

Copyright (C) 2008, http://www.dabeaz.com 3- Program Organization,Control Flow, and Functions
Section 3 2

Copyright (C) 2008, http://www.dabeaz.com 3- Introduction 3 • This section
explores the problem of structuring more complicated programs • Program structure and statements • Control ﬂow structures • Functions • Exception handling

Copyright (C) 2008, http://www.dabeaz.com 3- Part 1 4 What is
a Program?

Copyright (C) 2008, http://www.dabeaz.com 3- What is a Program? 5
• A program is a series of statements • The statements perform various operations and generate some kind of result • When a program runs, it executes statements until there is nothing more to do • It seems pretty straightforward---although there are some thorny theoretical questions (e.g., "The Halting Problem").

Copyright (C) 2008, http://www.dabeaz.com 3- An Example Program n =
10 while n > 0: print "T-minus", n n = n - 1 print "Fizzle..." 6 • Count down from 10 • Yippee! T-minus 10 T-minus 9 T-minus 8 T-minus 7 ... T-minus 4 T-minus 3 T-minus 2 T-minus 1 Fizzle...

Copyright (C) 2008, http://www.dabeaz.com 3- Statements 7 • Common types
of statements • Assignment (=) • Conditions (if-else) • Looping (while, for, break, continue) • Functions (deﬁnitions and calls) • Error handling

Copyright (C) 2008, http://www.dabeaz.com 3- Execution Environment n = 10
while n > 0: print "T-minus", n n = n - 1 print "Fizzle..." 8 • Statements never run in isolation! • They always run inside an "environment" • This is where the variables live environment statements 'n' : 10 variables

while n > 0: print "T-minus", n n = n - 1 print "Fizzle..." 9 • As a program runs, the statements tend to do one of several things • They either modify the environment environment statements 'n' : 10 9 variables

while n > 0: print "T-minus", n n = n - 1 print "Fizzle..." 10 • Or they control the next statement that executes (control ﬂow) environment statements 'n' : 9 variables

while n > 0: print "T-minus", n n = n - 1 print "Fizzle..." 11 • Or they perform some kind of input/output with the outside world environment statements 'n' : 9 variables

Copyright (C) 2008, http://www.dabeaz.com 3- Commentary 12 • This is
a fairly simple view, but most programs really don't do much more than what I've described • Of course, the devil is in the details • We're going to look at various facets of program structure and execution

Copyright (C) 2008, http://www.dabeaz.com 3- Assignment of Values 14 •
Assignment stores a value x = 42 avg = (x+y)/2 items[2] = 37 a["name"] = "Elvis" a.response = "Yeah" • General form of assignment location = expression • An expression represents a value • Location speciﬁes the place where stored

Copyright (C) 2008, http://www.dabeaz.com 3- Expressions 15 • An expression
always represents a value • May involve various operations on data 42 # Literal x # Variable x + y # Math operator x[i+n] # Array lookup foo(x) # Function call (x+y) / (a+b) # Grouping • The syntax and set of operators is fairly standard across languages (minor variations)

Copyright (C) 2008, http://www.dabeaz.com 3- Expressions 16 • Critical point
: Expressions always get evaluated in the environment x = a * b environment statements 'a' : 3 variables • Unknown names result in an error Error! ??

Copyright (C) 2008, http://www.dabeaz.com 3- Locations 17 • A location
represents a place where a value is going to be "stored" (known as an "lvalue") • It might be a name x = 42 name = "Elvis" • But it also might also involve an expression names[i+n] = Elvis • Key point : The left hand side must always represent a place where you can put a value

Copyright (C) 2008, http://www.dabeaz.com 3- Locations 18 • Locations always
refer to a place in the surrounding environment • Storing to an unknown name creates an entry • Storing to an existing place replaces the value a = 3 b = 4 x = [a, b] x[1] = 37 environment statements 'a' : 3 'b' : 4 'x' : [3, 4] variables 37

Copyright (C) 2008, http://www.dabeaz.com 3- Storing Values 19 • What
does it mean to "store" a value? • From last lecture, we saw that this can be more complicated that you might imagine • Might be by value or by reference

Copyright (C) 2008, http://www.dabeaz.com 3- Assignment by Value 20 •
Assign by value (make a copy) a = 42 b = a environment statements variables 'a' 'b' 42 42 • Here, there are two separate objects with the value 42

Copyright (C) 2008, http://www.dabeaz.com 3- Assignment by Reference 21 •
Assign by reference (copy pointers) a = 42 b = a environment statements variables • Here, there is one object with the value 42, but two names refer to it 42 'a' 'b'

Copyright (C) 2008, http://www.dabeaz.com 3- Mutability 22 • Does assignment
overwrite previous values by overwriting the memory? • Or does assignment make a new object? a = 42 ... a = 37 'a' 42 42 'a' 37 Overwrites 37 ref-- Rebind to new object • Answer : It depends on the language

Copyright (C) 2008, http://www.dabeaz.com 3- Example : Python 23 •
Overwriting a variable a = 42 b = a a = 37 environment statements variables 42 'a' 'b' 37 old new • You get a new value and the name is rebound • The old value may persist if used elsewhere

Copyright (C) 2008, http://www.dabeaz.com 3- Commentary 24 • The key
point : Assignment is an operation that modiﬁes the environment in which statements execute • Deep thought : The environment sure looks a lot like a hash table/dictionary/associative array

Copyright (C) 2008, http://www.dabeaz.com 3- Conditional Execution 26 • Execute
statements on condition # Python if x > 0: statements else: statements • Almost all languages do exactly what you would expect here • Condition is checked and only one branch runs # Ruby if x > 0 statements else statements end # Perl/PHP if ($x > 0) { statements } else { statements }

Copyright (C) 2008, http://www.dabeaz.com 3- Conditional Expressions 27 • Conditions
generally rely on the result of a conditional expression if condition statements • Usually, this is built from special operators expr == expr expr != expr expr < expr expr > expr expr <= expr expr >= expr condition and condition condition or condition not condition • Produces a true/false value

Copyright (C) 2008, http://www.dabeaz.com 3- Examples 28 • Ruby irb(main):020:0>
x < 3 => true irb(main):021:0> x < 3 and x > 42 => false irb(main):022:0> • Python >>> x < 3 True >>> x < 3 and x > 42 False >>>

Copyright (C) 2008, http://www.dabeaz.com 3- What is Truth? 29 •
A tricky issue : What happens here? if x statements • x is just some value (we don't know what) • For this to make sense, you have to know what it means for a value to be "True" • Believe it not, there are some differing ideas about that.

Option 1: A value is true if it is non-zero, non- empty, or generally looks like it has an interesting value • This is probably the most common treatment # True Values x = 1 x = "Hello" x = [1,2,3] # False Values x = 0 x = "" x = [] x = None if x statements

Option 2: A value is true if x is not false or is assigned to an actual value. • This is the approach Ruby takes if x statements # True Values x = 0 x = "Hello" x = [1,2,3] x = "" # False values x = nil x = false • Danger : 0 evaluates as True! • This is really a pointer check--does x point to a value?

Sometimes it gets messy. • In Perl, non-empty strings evaluate as True # Perl $x = "0"; if ($x) { print "Yes" } else { print "No" } "No" $x = "Hello"; # True $x = ""; # False • Well, it's true most of the time

Copyright (C) 2007, http://www.dabeaz.com 3- Evaluation of Relations • Relations
only evaluate parts until the result can be determined if condition1 and condition2 statements if condition1 or condition2: statements 33 • Example: not evaluated if condition1 is False not evaluated if condition1 is True if x != 0 and y/x < 0.01 statements • Also known as "short-circuit" evaluation

Copyright (C) 2007, http://www.dabeaz.com 3- Statement Conditions • Statement with
a condition 34 statement if condition; statement unless condition; • Examples: print "Hello Dave\n" unless ($name ne "Dave"); # Perl print "Hello Dave\n" unless $name != "Dave" # Ruby • This form is somewhat less common. • Personally, I'm not a huge fan... Here is a really long statement that looks like it will erase the entire filesystem ... NOT!

Copyright (C) 2007, http://www.dabeaz.com 3- Switch Statements • Evaluation of
code based on the value of a variable: 35 switch(variable) { case value1: statements case value2: statements case value3: statements default: statements } • This is not always supported and even if it is, there are subtle issues

Copyright (C) 2007, http://www.dabeaz.com 3- Switch Statements • Do statement
blocks "fall-through?" 36 switch(variable) { case value1: statements break case value2: statements case value3: statements break default: statements } statements If no break, execution falls through to the next case • This is the behavior of C/Java/Javascript, etc. • Fall-through may be disallowed (Ruby)

Copyright (C) 2007, http://www.dabeaz.com 3- Chained if-elif-else • Python doesn't
provide any kind of switch. You just chain if-elif-else statements 37 if condition: statements elif condition: statements elif condition: statements else: statements • Thinking : Having a separate switch statement just seems redundant if content == 'gif': ... elif content == 'png': ... elif content == 'jpg': ... else print "Unknown content!" Example

Copyright (C) 2007, http://www.dabeaz.com 3- Switch Implementation • The switch
statement might be far-more efﬁcient than chained if-else depending on how it is implemented • Historically, compilers would turn switch into a jump table (a goto lookup table) 38 switch(variable) { case value1: statements case value2: statements case value3: statements default: statements } value1 value3 value2 loc1 loc3 loc2 loc1: statements ... loc2: statements ... loc3: statements

Copyright (C) 2007, http://www.dabeaz.com 3- Switch Experiment • The implementation
of switch in many dynamic languages seems to be hit or miss • I actually did a little experiment ("The Big Switch") 39 switch(variable) { case 1: statement case 2: statement ... case 999: statement } if (variable == 1) { statement } else if (variable == 2) { statement } else if { ... } else if (variable == 999) { statement } vs. variable == 999

Copyright (C) 2007, http://www.dabeaz.com 3- Switch Experiment • PHP 40
switch : 58.9 seconds else if : 108.1 seconds • Ruby switch :151.5 seconds else if : 106.5 seconds • Javascript (in Firefox) switch :~7 seconds else if : ???? minutes (didn't have patience to wait) • A million repetitions

Copyright (C) 2007, http://www.dabeaz.com 3- Looping on a Condition •
while loops (universally supported) 42 while condition statement statement statement end • Only the syntax differs slightly • Sometimes you will ﬁnd this variation do statement statement statement while condition

Copyright (C) 2007, http://www.dabeaz.com 3- Loop Exit • To prematurely
break out of a loop 43 while condition statement break # Terminates a loop statement end • Example: Python while True: line = f.readline() if line == 'END' : break # Various processing ...

Copyright (C) 2007, http://www.dabeaz.com 3- Loop Continuation • To skip
the rest of the statements and go back to the start of the loop 44 while condition statement continue # Go back to the top statement # Not executed end • Example: Python while True: line = f.readline() if line.startswith("#"): continue # Do more processing ...

Copyright (C) 2007, http://www.dabeaz.com 3- For-loop (classic) • Looping with
some kind of looping variable 45 for (init; condition; increment) { statements } • Example: for (i = 0; i < 10; i++) { print i } • This is really just a short-hand for this i = 0; while (i < 10) { print i; i++; }

Copyright (C) 2007, http://www.dabeaz.com 3- For-loop (Iteration) • The more
modern use of for is to loop over items of a collection (array, hash, etc.) 46 for item in collection statements end • Example: items = [1, 4, "Foo", "Bar"] # Ruby for x in items # x = 1, 4, "Foo", "Bar" ... end • Might be known as a "foreach" statement

Copyright (C) 2007, http://www.dabeaz.com 3- Iteration • Looping over a
collection is a very powerful concept • A collection could be many different things • An array, hash, set, string, ﬁle, etc. 47 f = open("foo.txt") for line in f: statements ...

Copyright (C) 2007, http://www.dabeaz.com 3- Iteration Example • Looping over
a collection of stocks 48 portfolio = [ ('GOOG',100, 490.10), ('IBM', 50, 91.10), ('AAPL', 75, 122.45), ('YHOO', 45, 28.42) ] for name, shares, cost in portfolio: # statements # ... • Notice how values get expanded into variables for you (very nice)

Copyright (C) 2007, http://www.dabeaz.com 3- Iteration Commentary • The whole
concept of "iterating" over data is something that has been expanded greatly in dynamic languages • For instance, a large number of recent features in Python are just related to this • We'll see a lot more of this later 49

Copyright (C) 2007, http://www.dabeaz.com 3- Looping-Else • The looping-else (Python)
50 for x in s: statements else: statements • The else clause only runs if the loop runs all the way to completion without breaking for line in open("stocks.dat"): if 'IBM' in line: break else: print "Didn't find it"

Copyright (C) 2007, http://www.dabeaz.com 3- Looping-Redo/Retry • The redo statement
(Ruby) 51 for x in s statements redo statements end • Restarts the body of the loop without updating the iteration variable • Retry : Restarts from the beginning for x in s statements retry statements end

Copyright (C) 2007, http://www.dabeaz.com 3- Interlude • A signiﬁcant number
of things can be done using nothing but basic statements, conditions, and loops • For example: Writing scripts, data processing, etc. • I would suspect that a large number of programs actually use nothing more than these features to do various odd-jobs 52

Copyright (C) 2007, http://www.dabeaz.com 3- Interlude • To write larger
programs, you want to do more than this • Packaging code so that you can reuse it • Bundling code into modules and libraries 53

Copyright (C) 2008, http://www.dabeaz.com 3- Part 5 54 The Wild
World of Functions

Copyright (C) 2007, http://www.dabeaz.com 3- What is a Function? •
Mathematically, it's an operation that accepts a bunch of inputs (arguments) and produces an output (the result) • Examples • sin(x) • f(x,y) -> 3x2 + 2xy - 7 • However, this isn't a math class nor is it a theoretical programming languages course 55

Copyright (C) 2007, http://www.dabeaz.com 3- What is a function? •
A function is a named sequence of statements def funcname statement statement ... statement end 56 • If you want those statements to run, you just invoke the function name funcname

Copyright (C) 2008, http://www.dabeaz.com 3- Example (Ruby) 57 def countdown
n = 10 while n > 0 printf("T-minus %d\n", n) n -= 1 end print "Fizzle...\n" end countdown # Run the above function

Copyright (C) 2007, http://www.dabeaz.com 3- Function Deﬁnition • Deﬁning a
function is actually an assignment in the environment 58 statements variables 'countdown' def countdown n = 10 while n > 0 printf("T-minus %d\n", n) n -= 1 end print "Fizzle...\n" end statement statement statement ... • The "value" of a function is the list of statements inside the body of the function

Copyright (C) 2007, http://www.dabeaz.com 3- Function Definition • Functions are
like data in dynamic languages • In fact, they can be redefined on-the-fly just like variables 59 • You can even redefine a function in the middle of running your program (try that in C++)

Copyright (C) 2008, http://www.dabeaz.com 3- Example (Ruby) 60 def countdown
n = 10 while n > 0 printf("T-minus %d\n", n) n -= 1 end print "Fizzle...\n" end countdown # Run the above function def countdown print "Boom!\n" end countdown # Run the new function

Copyright (C) 2007, http://www.dabeaz.com 3- def countdown n = 10
while n > 0 printf("T-minus %d\n", n) n -= 1 end print "Fizzle...\n" end Function Execution • What happens when you call a function? statement statement countdown statement statement 61 • Control passes to the ﬁrst function statement • After the function is done, you go back to the statement after the function call

Copyright (C) 2007, http://www.dabeaz.com 3- Function Execution • Each function
call creates a new environment 62 statements variables 'countdown' statement statement statement ... statement statement countdown statement statement statements variables 'n' : 10 n = 10 while n > 0 printf("T-minus %d\n", n) n -= 1 end print "Fizzle...\n" countdown

Copyright (C) 2007, http://www.dabeaz.com 3- Function Execution • Because every
function call creates a new environment, everything that happens inside a function stays localized • A function can freely create new variables and modify its own environment • These changes don't affect anything else • The environment is destroyed when the function returns 63

Copyright (C) 2007, http://www.dabeaz.com 3- Problem • If a function
executes in its own private environment, how do you get data in and out of the environment? • Passing parameters to a function • Returning results from a function 64

Copyright (C) 2007, http://www.dabeaz.com 3- Function Arguments • To pass
data into a function, use arguments def square(x) return x*x end 65 • However, an argument doesn't receive a value until the function is actually called a = square(3) argument • Arguments represent incoming values that will be bound to names when the function runs

Copyright (C) 2007, http://www.dabeaz.com 3- Function Arguments 66 statements variables
'x' : 3 return x*x square • Arguments get placed into the environment created for the function call def square(x) return x*x end square(3)

Copyright (C) 2007, http://www.dabeaz.com 3- Function Returns • To return
data from a function, use return def square(x) return x*x end 67 Return value • It is up to the caller to save the result (using assignment) statements variables 'r' : 9 r = square(3)

Copyright (C) 2007, http://www.dabeaz.com 3- Commentary • On the surface,
passing and returning values seems like it should be straightforward • However, there are a number of subtle issues that come up • Where do arguments get evaluated? • How do arguments get passed? 68

Copyright (C) 2007, http://www.dabeaz.com 3- Argument Evaluation 69 statements variables
'a' : 3 'b' : 4 'square' : <func> a = 3 b = 4 square(a+b) • Consider the following Must evaluate this expression • When calling a function, the arguments are usually fully evaluated ﬁrst. • Known as "Applicative Evaluation Order"

Copyright (C) 2007, http://www.dabeaz.com 3- Argument Passing • Question :
How do the values get passed into a function? 70 statements variables 'x' 'y' func(x,y) value1 value2 def func(a,b) statements variables 'a' 'b' statement statement statement ? ? ? ?

Copyright (C) 2007, http://www.dabeaz.com 3- Pass by Value • Function
arguments are copies 71 statements variables 'x' 'y' func(x,y) value1 value2 def func(a,b) statements variables 'a' 'b' statement statement statement value1 value2 copy

Copyright (C) 2007, http://www.dabeaz.com 3- Pass by Reference • Function
arguments are references 72 statements variables 'x' 'y' func(x,y) value1 value2 def func(a,b) statements variables 'a' 'b' statement statement statement

Copyright (C) 2007, http://www.dabeaz.com 3- Discussion • Pass by reference
is often preferred because it is the most efﬁcient way to pass containers (lists and hashes) to functions • For example, if you have a list with a million entries in it, you don't want to make a copy • However, be aware that modiﬁcations to argument will affect the caller. 73

Copyright (C) 2007, http://www.dabeaz.com 3- Reference Example • Modifications to
mutable data types (e.g., lists, dicts) will be reflected in the original object--arguments are not copies. 74 def insert_sorted(s,val): for i,x in enumerate(s): if x > val: s.insert(i,val) break else: s.append(val) a = [10, 15, 50] insert_sorted(a,27) # a = [10, 15, 27, 50] Modifies the passed object

Copyright (C) 2008, http://www.dabeaz.com 3- More on the Environment 75
• Recall : All statements execute in an environment that holds variables • A thorny question : Are statements able to access variables that have been deﬁned in other environments? • For example, can a function access variables that were deﬁned outside of the function? (e.g., globals)

Copyright (C) 2008, http://www.dabeaz.com 3- An Example 76 x =
42 def foo y = 2*x x = 37 bar end def bar print x print y end foo

42 def foo y = 2*x x = 37 bar end def bar print x print y end foo Question 1: Does this "x" refer to the global value?

42 def foo y = 2*x x = 37 bar end def bar print x print y end foo Question 2: Does this assignment modify the global value? 37

42 def foo y = 2*x x = 37 bar end def bar print x print y end foo Question 3: Does this "y" refer to the "y" in foo (the caller)?

Copyright (C) 2008, http://www.dabeaz.com 3- Lexical Scope 80 • Most
programming languages deal with these questions using two-level "lexical scoping" • General idea : All variables either live in a "local" space or they live in a "global" space as determined by the structure of the source code. • Globals are the variables deﬁned outside of function bodies • Locals are the variables deﬁned inside functions

Copyright (C) 2008, http://www.dabeaz.com 3- Two-level Scoping 81 • When
a program starts, there is an empty global environment x = 42 def foo y = 2*x x = 37 bar end def bar print x print y end foo start statements variables

Copyright (C) 2008, http://www.dabeaz.com 3- Two-level Scoping 82 • Statements
start populating the environment x = 42 def foo y = 2*x x = 37 bar end def bar print x print y end foo statements variables 'x' : 42

start populating the environment x = 42 def foo y = 2*x x = 37 bar end def bar print x print y end foo statements variables 'x' : 42 'foo' : <func>

start populating the environment x = 42 def foo y = 2*x x = 37 bar end def bar print x print y end foo statements variables 'x' : 42 'foo' : <func> 'bar' : <func>

Copyright (C) 2008, http://www.dabeaz.com 3- Two-level Scoping 85 • Now
consider a function call: x = 42 def foo y = 2*x x = 37 bar end def bar print x print y end foo statements variables 'x' : 42 'foo' : <func> 'bar' : <func> Call a function

Copyright (C) 2008, http://www.dabeaz.com 3- Two-level Scoping 86 • Function
calls create a new environment globals 'x' : 42 'foo' : <func> 'bar' : <func> y = 2*x x = 37 bar statements variables foo : environment • Globals is the variable table from the global environment (previous slide)

Copyright (C) 2008, http://www.dabeaz.com 3- Two-level Scoping 87 • Two-level
variable lookup globals 'x' : 42 'foo' : <func> 'bar' : <func> y = 2*x x = 37 bar statements variables foo : environment x? x? • When looking up a value, look in the variable table of the local environment ﬁrst • If not found, look in globals (as a fallback)

Copyright (C) 2008, http://www.dabeaz.com 3- Two-level Scoping 88 • Variable
assignment globals 'x' : 42 'foo' : <func> 'bar' : <func> y = 2*x x = 37 bar statements variables 'y' : 84 foo : environment • Assignment puts a new value in the locals

assignment globals 'x' : 42 'foo' : <func> 'bar' : <func> y = 2*x x = 37 bar statements variables 'y' : 84 foo : environment • But, what happens here? • Notice : There is nothing in the assignment statement that indicates where it goes x? x?

Copyright (C) 2008, http://www.dabeaz.com 3- Discussion 90 • Should assignment
overwrite previous values? globals 'x' : 42 'foo' : <func> 'bar' : <func> y = 2*x x = 37 bar statements variables 'y' : 84 foo : environment • Is that a good idea or not? 37 ?

Copyright (C) 2008, http://www.dabeaz.com 3- Assignment (Revisited) 91 • Consider
this code fragment def foo() x = 42 y = 37 ... end • If reading this code, most programmers will interpret those variables as locals • It would be pretty damn weird if the behavior changed depending on whether or not someone deﬁned a global with those names

assignment should be local globals 'x' : 42 'foo' : <func> 'bar' : <func> y = 2*x x = 37 bar statements variables 'y' : 84 'x' : 37 foo : environment • Python and Ruby operate like • However, it's not so clear cut • Let's continue for now...

Copyright (C) 2008, http://www.dabeaz.com 3- Two-level Scoping 93 • New
function calls create more environments globals 'x' : 42 'foo' : <func> 'bar' : <func> y = 2*x x = 37 bar statements variables 'y' : 84 'x' : 37 foo : environment print x print y statements variables bar : environment

lookup (reprise) globals 'x' : 42 'foo' : <func> 'bar' : <func> y = 2*x x = 37 bar statements variables 'y' : 84 'x' : 37 foo : environment print x print y statements variables bar : environment x? x? Variable lookup always looks in just two places

lookup (reprise) globals 'x' : 42 'foo' : <func> 'bar' : <func> y = 2*x x = 37 bar statements variables 'y' : 84 'x' : 37 foo : environment print x print y statements variables bar : environment y? y? A name error. "y" is not deﬁned.

Copyright (C) 2008, http://www.dabeaz.com 3- Assignment Problems 96 • Two-level
scoping is an effective way of managing variables, but it has a problem • There is always this distinction between the local space and global space • Sometimes you want to assign values in either one of those spaces • Sometimes you actually do want to change a global variable.

Copyright (C) 2008, http://www.dabeaz.com 3- Assignment Problems 97 • The
management of the local/global variable space is one area where dynamic languages can get messy • Extreme peril in certain cases

Copyright (C) 2008, http://www.dabeaz.com 3- Global Peril 98 • A
Little Perl Experiment $x = 19; sub foo() { $x = 42; $y = 37; } foo(); print "$x\n"; # Produces 42 print "$y\n"; # Produces 37 • All variable assignments are global--including those inside the subroutine! • Yikes! This violates everything I covered!

Copyright (C) 2008, http://www.dabeaz.com 3- Global Peril (Part 2) 99
• The same experiment in Javascript x = 19; function foo() { x = 42; y = 37; } foo(); document.writeln(x); // Produces 42 document.writeln(y); // Produces 37 • Okay, I'm just a little disturbed (and you should be too)

Copyright (C) 2008, http://www.dabeaz.com 3- Digression 100 • The problem
here is that the syntax for assignment doesn't say anything about where a value is supposed to be stored x = 37 • No clear way to indicate that it's a local or a global variable • So, a language will do whatever its designers felt like it should do.

Copyright (C) 2008, http://www.dabeaz.com 3- Declarations 101 • In static
languages, this problem is solved through the use of "declarations" which precisely pin down the location int x; // Global void foo() { int y; // Local y = 2*x; // No issues here x = 37; } • Fine, but in dynamic languages you don't declare datatypes

Copyright (C) 2008, http://www.dabeaz.com 3- Scope "Tagging" 102 • A
common approach : Require variables with non-local scope to be tagged x = 19 def foo(): global x # The x below is global x = 42 y = 37 foo() print x # Produces 42 print y # NameError : y not defined • This approach is used by Python, PHP, Tcl

Copyright (C) 2008, http://www.dabeaz.com 3- Variable Declarators 103 • Another
approach : Allow variables to have optional scope "declarators" • This an approach used by Perl • Except there is more to this (in a minute) $x = 19; sub foo() { local $x = 42; local $y = 37; } foo(); print "$x\n"; # Produces 19 print "$y\n"; # Produces nothing

Copyright (C) 2008, http://www.dabeaz.com 3- Variable Declarators 104 • In
Javascript, variables live where they are formally declared using "var" • If you leave off the var when deﬁning, the variable is just global var x = 19; // A global function foo() { x = 42; var y = 37; // A local } foo(); document.writeln(x); // 42 document.writeln(y); // Nothing

Copyright (C) 2008, http://www.dabeaz.com 3- Scope Symbols 105 • Ruby
prepends variables with a special symbol to tell you where it's located • Here, you just look at the variable and you know where it lives Name # A constant name # A local variable $name # A global variable @name # An instance variable (objects) @@name # A class variable (objects) $x = 19; def foo $x = 42 // A global y = 37 // A local end

Copyright (C) 2008, http://www.dabeaz.com 3- Perl Quiz Show 106 •
Just when you thought were safe, consider this bit of Perl code $x = 42; sub foo() { local $x = 37; bar(); } sub bar() { print "$x\n"; } bar(); # Prints 42 foo(); # Prints 37 (?!?!?!?!?!?!) • This is an example of "Dynamic Scope"

Copyright (C) 2008, http://www.dabeaz.com 3- Perl Dynamic Scope 107 •
Variables bind to the nearest deﬁnition on the function call stack $x = 42; sub foo() { local $x = 37; bar(); } sub bar() { print "$x\n"; } foo(); # Prints 37 globals $x = 42; foo() local $x = 37; bar() print "$x\n"; • This can get absolutely diabolical!

Copyright (C) 2008, http://www.dabeaz.com 3- Perl Lexical Scope 108 •
Allows a variables to only exist in the block where it was deﬁned $x = 42; sub foo() { my $x = 37; # Local variable bar(); } sub bar() { print "$x\n"; } bar(); # Prints 42 foo(); # Prints 42 • This takes us back to two-level scoping

Copyright (C) 2008, http://www.dabeaz.com 3- Interlude 109 • Clearly there
is quite a bit more to variable assignment than meets the eye • Especially related to where a value lives • The lack of formal declarations creates various sorts of chaos • Read the manual! • Ignore at your own peril

Copyright (C) 2008, http://www.dabeaz.com 3- The Dangers of Functions 110
• In dynamic languages, the usage of functions can be very free and easy • Especially compared to C++/Java

Copyright (C) 2007, http://www.dabeaz.com 3- Conformance Checking • There is
usually no checking or validation of function arguments. • A function will work on any data that is compatible with the statements in the function def add(x,y): return x + y add(3,4) # 7 add("Hello","World") # "HelloWorld" add([1,2],[3,4]) # [1,2,3,4] • Example (Python): 111

Copyright (C) 2007, http://www.dabeaz.com 3- Conformance Checking • There is
also rarely any checking of return values. • Inconsistent use does not result in an error def foo(x,y): if x: return x + y else: return • Example: 112 Inconsistent use of return (not checked)

Copyright (C) 2007, http://www.dabeaz.com 3- Conformance Checking • If there
are errors in a function, they will show up at run time (as an exception) def add(x,y): return x+y >>> add(3,"hello") Traceback (most recent call last): ... TypeError: unsupported operand type(s) for +: 'int' and 'str' >>> • Example: 113

Copyright (C) 2008, http://www.dabeaz.com 3- Part 6 114 Iteration, Generation,
and Code Blocks

Copyright (C) 2007, http://www.dabeaz.com 3- Iteration (Revisited) • The process
of looping over data is a very common programming operation names = [ "Dave", "Leo", "Nita" ] for name in names: print "Hello", name • Example: 115 • This is something programmers use all of the time without even thinking about it

Copyright (C) 2007, http://www.dabeaz.com 3- Controlling Iteration • Some dynamic
languages have the ability to turn iteration itself into some kind of "object" that you can manipulate 116 • Example : Generator Functions in Python • Example : Code blocks in Ruby • Disclaimer : This is an advanced topic---I'm just going to cover the basics now

Copyright (C) 2007, http://www.dabeaz.com 3- Generator Functions • A function
that, instead of returning a single value, stays alive and generates a sequence of results def countdown(n): print "Counting down!" while n > 0: yield n # Yield a value n -= 1 >>> for i in countdown(5): ... print i, ... Counting down! 5 4 3 2 1 >>> 117 • Example:

Copyright (C) 2007, http://www.dabeaz.com 3- Generator Functions • Generator functions
are pretty odd • If you call one, it doesn't seem to do anything >>> c = countdown(5) >>> 118 • However, it you call .next() on the result, you'll see it start to run >>> c.next() Counting down! 5 >>> c.next() 4 >>>

Copyright (C) 2007, http://www.dabeaz.com 3- Generator Implementation • It's like
a normal function except that the "environment" is an object with a method that triggers statement execution. 119 print "Counting down!" while n > 0: yield n n -= 1 statements variables 'n' : 5 countdown : environment suspended

Copyright (C) 2007, http://www.dabeaz.com 3- Generator Implementation • Calling .next()
runs statements until you reach a yield statement. That pops a value out of the function. 120 print "Counting down!" while n > 0: yield n n -= 1 statements variables 'n' : 5 countdown : environment .next() 5

Copyright (C) 2007, http://www.dabeaz.com 3- Generator Implementation • After yielding,
the function suspends again 121 print "Counting down!" while n > 0: yield n n -= 1 statements variables 'n' : 5 countdown : environment suspended

Copyright (C) 2007, http://www.dabeaz.com 3- Generator Implementation • On the
following .next(), it wakes up and continues where it left off until the next yield statement is encountered 122 print "Counting down!" while n > 0: yield n n -= 1 statements variables 'n' : 4 countdown : environment .next() 4 • This continues until there are no more statements

Copyright (C) 2007, http://www.dabeaz.com 3- Using Generators • Generators separate
the concept of iteration from code that uses the iteration 123 for i in countdown(5): print i, for i in countdown(5): print "T-minus", i for i in countdown(5): os.system("rm img%d.png" % i) Iteration Code block that uses the iteration • We will talk more about this in a later class

Copyright (C) 2007, http://www.dabeaz.com 3- Anonymous Code Blocks • Instead
of turning iteration into an object, Ruby ﬂips the whole thing around and allows code blocks to be turned into objects. 124 def countdown(n) print "Counting down!\n" while n > 0 yield n n -= 1 end end countdown(5) { |i| puts i } Counting down! 5 4 3 2 1 Code block

Copyright (C) 2007, http://www.dabeaz.com 3- Anonymous Code Blocks • Here,
a block of code gets packaged into an object 125 countdown(5) { |i| puts i } puts i • That object is then passed into a function as part of the environment

Copyright (C) 2007, http://www.dabeaz.com 3- Anonymous Code Blocks • The
function runs as normally until the yield statement is reached 126 print "Counting down!" while n > 0 yield n n -= 1 end statements variables 'n' : 5 <block> countdown : environment |i| puts i

Copyright (C) 2007, http://www.dabeaz.com 3- Anonymous Code Blocks • Yield
then produces a value that's fed into the code block which runs 127 print "Counting down!" while n > 0: yield n n -= 1 end statements variables 'n' : 5 <block> countdown : environment |i| puts i • The code block then executes in the environment where it was deﬁned!

Copyright (C) 2008, http://www.dabeaz.com 3- Anonymous Code Blocks • Once
the code block runs, you go back to statements in the current function 128 print "Counting down!" while n > 0: yield n n -= 1 end statements variables 'n' : 5 <block> countdown : environment |i| puts i • This continues until no more statements

Copyright (C) 2008, http://www.dabeaz.com 3- Implementation Details • It's important
to realize that there is an environment switch going on here! sum = 0 countdown(5) { |i| sum += i } • When the code block gets executed, it runs in the outer environment---not in the environment of the countdown function • Note: This is related to "closures" (which we will cover in a few weeks)

Copyright (C) 2008, http://www.dabeaz.com 3- Commentary • These exotic features
related to iteration are really just fancy tricks involving the execution environment • Generator : A function that can suspend itself and emit a value from its environment • Code block : A chunk of code that you can run, but which executes in the environment where it was deﬁned. 130

Copyright (C) 2008, http://www.dabeaz.com 3- Part 7 131 Exceptions and
Errors

Copyright (C) 2008, http://www.dabeaz.com 3- Error Handling • If you
write a program and it encounters an error, it normally aborts with some kind of traceback 132 >>> prices['SCOX'] Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 'SCOX' >>> • Errors usually have a "type" and some informative diagnostics. • Programs can raise and catch errors

Copyright (C) 2007, http://www.dabeaz.com 3- Exceptions (Python) • Catching an
exception (try-except) • Raising an exception (raise) raise RuntimeError("Name not found") try: statements except RuntimeError,e: print e 133

Copyright (C) 2007, http://www.dabeaz.com 3- Exceptions (Ruby) • Catching an
exception (begin - rescue) • Raising an exception (raise) raise RuntimeError,"Name not found" begin statements rescue RuntimeError => e puts e end 134 • Exceptions are very similar in most languages • So, I will focus on Python.

Copyright (C) 2007, http://www.dabeaz.com 3- Exceptions • Exceptions propagate to
ﬁrst matching except def foo(): try: bar() except RuntimeError,e: ... def bar(): try: spam() except RuntimeError,e: ... def spam(): grok() def grok(): ... raise RuntimeError("Whoa!") 135

Copyright (C) 2007, http://www.dabeaz.com 3- Exception Types • Exceptions usually
have some kind of name ArithmeticError AssertionError EnvironmentError EOFError ImportError IndexError KeyboardInterrupt KeyError MemoryError NameError ReferenceError RuntimeError SyntaxError SystemError TypeError ValueError 136 • Consult reference

Copyright (C) 2007, http://www.dabeaz.com 3- Exception Values • Most exceptions
have an associated value • More information about what's wrong raise RuntimeError("Invalid user name") • Passed to variable supplied in except try: ... except RuntimeError,e: ... • Commonly a string, but may be any object 137

Copyright (C) 2007, http://www.dabeaz.com 3- Catching Multiple Errors • Can
catch different kinds of exceptions try: ... except LookupError,e: ... except RuntimeError,e: ... except IOError,e: ... except KeyboardInterrupt,e: ... 138 • Just chain together except clauses

Copyright (C) 2007, http://www.dabeaz.com 3- Catching All Errors • Just
leave off the exception type try: statements except: print "An error occurred" 139 • This isn't the best programming style (can be hellish to debug)

Copyright (C) 2007, http://www.dabeaz.com 3- Reraising an Exception • It
is usually possible to catch and re-raise the last exception try: ... except: print "An error occurred" raise # Re-raise the last error 140

Copyright (C) 2007, http://www.dabeaz.com 3- finally statement • Specifies code
that must run regardless of whether or not an exception occurs f = open("foo","r") try: ... finally: f.close() # Close file • Commonly use to properly manage resources (especially locks, files, etc.) • In Ruby, this is the "ensure" clause 141

Copyright (C) 2007, http://www.dabeaz.com 3- Restartable Blocks • It is
not always possible to retry code that generated an error • For example, in Python, execution always resumes after the try-except block • Ruby has a retry statement begin statements rescue # Determine if we can retry retry end 142

Copyright (C) 2008, http://www.dabeaz.com 3- Part 8 143 Eval and
Exec

Copyright (C) 2007, http://www.dabeaz.com 3- Executing Code Strings • Since
most dynamic languages are interpreted, they typically have the ability to run their own code given as a string 144 s = """ for i in range(10): print "i =", i """ exec(s) x = eval("3 + 20/5") • The exact syntax varies

Copyright (C) 2007, http://www.dabeaz.com 3- Executing Code Strings • Executing
code strings is inherently a frightening concept 145 • The usual semantics is for the code string to execute as if it were typed directly into the program at the point of eval/exec statement • However, it is sometimes possible to execute strings in their own environment

Copyright (C) 2007, http://www.dabeaz.com 3- Executing Code Strings • Executing
code in a custom environment 146 v = { 'x' : 3, 'y' : 4 } a = eval("x+y",v) --> 7 • There are many opportunities for serious magic here.

Copyright (C) 2007, http://www.dabeaz.com 3- Wrap-up • This section has
really focused on the structure and control-ﬂow of programs • The really big issues: • Statement execution environment • Global/local environment distinction • What happens during function calls • Will return to more of this later. 148

Copyright (C) 2008, http://www.dabeaz.com 4- Files, Modules, and Object Oriented
Programming Section 4 2

Copyright (C) 2008, http://www.dabeaz.com 4- Introduction 3 • In this
section, we look at the problem of how to put larger programs together • Organizing programs into modules • Introduction to "objects" • Object oriented programming

Copyright (C) 2008, http://www.dabeaz.com 4- Background 4 • The primary
focus of this section is not on how to hack code (e.g. loops, variables, functions, algorithms, etc.) • It's more related to software engineering • How do you organize a million line program? • How do you make programs extensible? • How do you make programs maintainable?

Copyright (C) 2008, http://www.dabeaz.com 4- Signs and Portents 5 •
You have already been working with "objects" import csv reader = csv.reader(open("portfolio.dat")) a = [1,2,3] # A list (object) a.append(42) # Append to a list (a method) • Have probably encountered modules as well • However, you may not have thought much about why objects and modules behave the way that they do.

Copyright (C) 2008, http://www.dabeaz.com 4- Historical Perspective 6 • The
engineering aspects of developing software have been known for a long time • Currently, most of the work in this area is found under the banner of "Object Oriented Programming" • However, it is a lot like religion. There might be something redeeming about it, but it's not always clear what people are talking about.

Copyright (C) 2008, http://www.dabeaz.com 4- Required Reading 7 • If
you can own just one book on software engineering... • F. Brooks, "The Mythical Man- Month" • And when you're done, read T. Kidder, "The Soul of a New Machine."

Copyright (C) 2008, http://www.dabeaz.com 4- Our Focus 8 • This
class isn't about software project management (at least not directly) • Instead, we're going to focus a little bit on how object oriented programming came into existence • A lot of background on why people work with objects to begin with

Copyright (C) 2008, http://www.dabeaz.com 4- An Experiment 9 • We're
later going to go explore Smalltalk • One of the earliest OO languages • By far, the most inﬂuential language on the development of later dynamic languages such as Ruby and Python. • Disclaimer : I might crash and burn with this. (Don't say I didn't warn you).

Copyright (C) 2008, http://www.dabeaz.com 4- Part 1 10 Programming in
the Large

Copyright (C) 2008, http://www.dabeaz.com 4- What is a Program? 11
• A program is a series of statements • There are many different types of statements • Assignment, conditions, loops, exception handling, function calls, function deﬁnition, etc. • But what happens as a program grows?

Copyright (C) 2008, http://www.dabeaz.com 4- Program Growth 12 • Problem
: Editing the source code • As a program grows in size, the program will start to become quite large in the editor. • You don't want to edit a 100000 line program that's been typed into one big ﬁle • As a practical matter, programmers don't like to edit ﬁles that are much longer than a few thousand lines.

Copyright (C) 2008, http://www.dabeaz.com 4- Multiple Files 13 • Large
programs involve multiple source files • Generally, you break it up by putting related functionality into the same file • However, this introduces a variety of new problems related to file management • Example : Separate compilation in C • Example : Management of the global namespace

Copyright (C) 2008, http://www.dabeaz.com 4- Separate Compilation /* foo.c */
extern int bar(int); void foo(int n) { ... x = bar(n); ... } 14 • If you split across ﬁles, you have to have some way to reference deﬁnitions in other locations (e.g., "extern") /* bar.c */ int bar(int n) { ... statements ... } foo.c bar.c

Copyright (C) 2008, http://www.dabeaz.com 4- Global Namespace 15 • In
addition, there is the question of how global symbols are managed. • For example, do all of the variable and function names have to be distinct? • In C/C++, the answer is generally "yes."

Copyright (C) 2008, http://www.dabeaz.com 4- The Naming Problem 16 •
If everything exists in the same global space, the problem of picking names becomes increasingly difﬁcult as the program grows • An added problem arises if you start using programming libraries • Those libraries also need to pick unique names

Copyright (C) 2008, http://www.dabeaz.com 4- Name Prefixing 17 • One
solution : Name prefixing def Foo_bar() ... end def Foo_spam() ... end def Foo_grok() ... end • Group related functionality under a common name prefix (very common in C)

Copyright (C) 2008, http://www.dabeaz.com 4- Other Problems 18 • Extensibility
: How do you make it easy to extend a program with new features? • Code reuse : How do you make it easy to reuse parts of a program that you have already written? • Modularization : How do you make it easier to divide a program into pieces that many people can work on?

Copyright (C) 2008, http://www.dabeaz.com 4- Extensibility 19 • Suppose you
have written a big application that has to read data def read_data(source) ... end Mondo Application • And suppose there is a single function that reads input data (e.g., read_data)

Copyright (C) 2008, http://www.dabeaz.com 4- Extensibility Example 20 • Now,
modify the application to support reading data from the following file formats • CSV files • From a relational database • XML files • Scraped off HTML pages • Excel spreadsheets

Copyright (C) 2008, http://www.dabeaz.com 4- Extensibility Example 21 • To
implement this, you might implement many different versions of the functions: def read_data_csv(name) ... def read_data_xml(name) ... def read_data_db(name) ... def read_data_html(name) ... • But now, you have the problem of plugging these functions into the larger program.

Copyright (C) 2008, http://www.dabeaz.com 4- Extensibility Example 22 • One
solution : A dispatch function data_format = "xml" ... def read_data(name) if data_format == 'csv' d = read_data_csv(name) elif data_format == 'xml' d = read_data_xml(name) elif data_format == 'db' d = read_data_db(name) elif data_format == 'html' d = read_data_html(name) • Yow! How is this going to scale up in a huge programming project? (hint : It's not!)

Copyright (C) 2008, http://www.dabeaz.com 4- Code Reuse 23 • Perhaps
you have implemented some general purpose functionality def read_data_xml(source) ... # General purpose code to parse XML statements ... # Specific statements to process data statements ... • Maybe you want to re-use the more general parts of this code in other places • Question : How?

Copyright (C) 2008, http://www.dabeaz.com 4- Modularization 24 • In large
projects, there is a beneﬁt to breaking a program up into to small self- contained modules • Each module can be maintained separately • Often by different groups of programmers • To make it work, you really have to think about the boundaries between modules (interfaces, versions, etc.)

Copyright (C) 2008, http://www.dabeaz.com 4- Components 25 • Efforts to
modularize code typically lead to the use of software "components" • Components are self-contained and have a well-deﬁned programming interface (API) foo API • Applications constructed mostly by assembling and gluing components together

Copyright (C) 2008, http://www.dabeaz.com 4- Components 26 • In the
real world, software components may be written in entirely different languages • There is a whole industry surrounding the use of components • Example : COM, Active-X, etc. on Windows • This is also a major reason why people are using dynamic languages

Copyright (C) 2008, http://www.dabeaz.com 4- Part 2 27 Fun with
Source Files

Copyright (C) 2008, http://www.dabeaz.com 4- Files 28 • When you
start working on a program, it often starts out as one source file • However, at some point, it reaches a point where you want to split it into two files • Splitting across files is probably the most fundamental division of source code. • It seems simple enough...

Copyright (C) 2008, http://www.dabeaz.com 4- File Includes 29 • Most
dynamic languages provide some kind of statement to load statements from another source code ﬁle execfile("foo.py") # Python require 'foo.rb' # Ruby require "foo.pl"; # Perl require("foo.php"); # PHP • Examples:

Copyright (C) 2008, http://www.dabeaz.com 4- File Includes 30 • A
ﬁle include typically executes the statements in the ﬁle as if they had been typed at the point where the include statement was placed • However, there are still some tricky issues lurking underneath the covers # Here's my big application require 'funcs.src' require 'utils.src' ... statements ...

Copyright (C) 2008, http://www.dabeaz.com 4- Multiple Includes 31 • Can
a ﬁle be included more than once? • require() is often a one-time operation. require 'foo.src' ... statements ... require 'foo.src' # Ignored • The one-time behavior is used to make programming libraries work correctly

Copyright (C) 2008, http://www.dabeaz.com 4- Multiple Includes 32 • Example
of library dependencies require 'spam.src' ... statements ... require 'spam.src' ... statements ... foo.src bar.src require 'foo.src' require 'bar.src' ... statements ... Don't want this operation to load 'spam.src' a second time (already loaded before)

Copyright (C) 2008, http://www.dabeaz.com 4- Specifying Files 33 • Specifying
a ﬁlename is trickier than it looks require './foo.src' require '/users/beazley/Projects/foo.src' require 'C:\Documents and Settings\Projects\foo.src' • As a general rule, it's bad practice to hard- code path names into a program (especially if it has to be moved around) • Solution : PATH variables • May be platform dependent

Copyright (C) 2008, http://www.dabeaz.com 4- Path Settings 34 • Most
languages will have some kind of internal variable that contains the list of search directories for ﬁle includes • Example : Ruby ($: variable) irb(main):001:0> puts $: /usr/lib/ruby/site_ruby/1.8 /usr/lib/ruby/site_ruby/1.8/powerpc-darwin8.0 /usr/lib/ruby/site_ruby/1.8/universal-darwin8.0 /usr/lib/ruby/site_ruby /usr/lib/ruby/1.8 /usr/lib/ruby/1.8/powerpc-darwin8.0 /usr/lib/ruby/1.8/universal-darwin8.0 . => nil

Copyright (C) 2008, http://www.dabeaz.com 4- Files to Directories 35 •
As a program continues to grow, you will reach a point where you want to split ﬁles across multiple directories Formats/ png.src gif.src jpg.src tiff.src Parsing/ html.src xml.src csv.src

Copyright (C) 2008, http://www.dabeaz.com 4- Packages 36 • A directory
of related ﬁles is sometimes known as a "package" Formats/ png.src gif.src jpg.src tiff.src • To install, you need to put the package directory on the ﬁle search path. • But packages don't always play nice...

Copyright (C) 2008, http://www.dabeaz.com 4- Conflicting File Names 37 •
Consider two packages of source code /Blah/ foo.src bar.src spam.src grok.src /Yow/ bar.src spam.src • What happens if both packages are on the file search path and they include the same filename? # foo.src require 'spam.src' # bar.src require 'spam.src' ??

Copyright (C) 2008, http://www.dabeaz.com 4- Commentary 38 • Virtually every
dynamic language has implemented ﬁle inclusion in some sort of slightly broken way • It's a problem that seems like it should be easy, but which is hard and sneaky • Solutions usually focus on making the loading of ﬁles more abstract and high level • Example : Packages in Java, Python, Perl, etc.

Copyright (C) 2008, http://www.dabeaz.com 4- Example 39 • High-level module/package
import import java.io.*; // Java import os.path // Python use blah; // Perl • Here, the request to "import" code is not directly tied to low-level details concerning the ﬁle system • You still worry about conﬁguration, but it's a little more controlled

Copyright (C) 2008, http://www.dabeaz.com 4- Part 3 40 From Files
to Namespaces

Copyright (C) 2008, http://www.dabeaz.com 4- The Naming Problem 41 •
Even if you have multiple ﬁles, you still have issues with naming things • Recall that all statements execute inside an environment (that holds variables) • There is usually a global/local environment • Question : Do all ﬁles execute in the same global environment?

Copyright (C) 2008, http://www.dabeaz.com 4- Namespaces 42 • A namespace
is a named environment where program statements can execute • To break up a large program, different parts of the program can execute in different namespaces • This provides isolation between components • Namespace serves as a kind of "module"

Copyright (C) 2008, http://www.dabeaz.com 4- Namespace Example 43 • Consider
two different sets of statements x = 42 def square(y) return y*y end x = 10 def countdown(n) while n > 0 print "T-minus", n n -= 1 end print "Fizzle" end • If you do nothing, these statements live in the same space

Copyright (C) 2008, http://www.dabeaz.com 4- Namespace Example 44 • Namespaces
put statements into a named env namespace foo { x = 42 def square(y) return y*y end } namespace bar { x = 10 def countdown(n) while n > 0 print "T-minus", n n -= 1 end print "Fizzle" end } foo bar • Note : Exact syntax varies widely for this

Copyright (C) 2008, http://www.dabeaz.com 4- Namespace Example 45 • Namespaces
deﬁne separate environments foo bar 'x' : 42 'square' : <func> 'x' : 10 'countdown' : <func> namespace foo { x = 42 def square(y) return y*y end } namespace bar { x = 10 def countdown(n) while n > 0 print "T-minus", n n -= 1 end print "Fizzle" end

Copyright (C) 2008, http://www.dabeaz.com 4- Accessing Namespaces 46 • Although
namespaces are isolated, you still need to access to data/functionality contained in other namespaces • You need an access mechanism to cross module boundaries

Copyright (C) 2008, http://www.dabeaz.com 4- Namespace Access 47 • Namespace
selection syntax (::) 'x' : 42 'square' : <func> 'x' : 10 'countdown' : <func> foo bar foo::x foo::square bar::x bar::countdown

Copyright (C) 2008, http://www.dabeaz.com 4- Namespace Access 48 • Sometimes
a namespace is implemented as some kind of data or "object" in the language print foo.x print bar.x bar.countdown(10) a = foo.square(3) • Here, the "namespace" is something you can pass around and treat like data b = bar b.countdown(5)

Copyright (C) 2008, http://www.dabeaz.com 4- Commentary 49 • With namespaces,
you now have a mechanism for breaking code across ﬁles and isolating the execution of code to different environments • This is critical to the development of programming libraries and components • Library builders can isolate their code and have a reasonable assurance that it won't conﬂict with your code

Copyright (C) 2008, http://www.dabeaz.com 4- Part 4 50 From Namespaces
to "Objects"

Copyright (C) 2008, http://www.dabeaz.com 4- Data Structures 51 • Programs
need to work with data structures • For example, a graphics program might have to work with shapes like Circles and Rectangles. • Each of shape will have some basic attributes struct Circle { double radius; }; struct Rectangle { double width; double height; }; { 'radius' : 4 } { 'width' : 4, 'height' : 5 } C Python

Copyright (C) 2008, http://www.dabeaz.com 4- Methods 52 • In addition,
there are functions that perform various operations on data • These are typically called "methods" • Some examples for shapes: • Compute the area • Compute the perimeter • Draw on the screen

Copyright (C) 2008, http://www.dabeaz.com 4- Problem 53 • How do
you bundle data structures and methods together in an effective way? • One approach : Use a namespace • Rationale : Namespaces keep code isolated. So, just put the functionality for each kind of data in a separate namespace.

Copyright (C) 2008, http://www.dabeaz.com 4- Example : A Circle 54
• Here is some code for a Circle (Python) # Circle.py import math def new(): c = { } return c def init(c,radius): c['radius'] = radius def area(c): return math.pi*c['radius']**2 def perimeter(c): return 2*math.pi*c['radius']

Copyright (C) 2008, http://www.dabeaz.com 4- # Circle.py import math def
new(): c = { } return c def init(c,radius): c['radius'] = radius def area(c): return math.pi*c['radius']**2 def perimeter(c): return 2*math.pi*c['radius'] Example : A Circle 55 • Here is some code for a Circle (Python) The namespace "Circle"

new(): c = { } return c def init(c,radius): c['radius'] = radius def area(c): return math.pi*c['radius']**2 def perimeter(c): return 2*math.pi*c['radius'] Example : A Circle 56 • Here is some code for a Circle (Python) Create a container where we will store data related to the circle

new(): c = { } return c def init(c,radius): c['radius'] = radius def area(c): return math.pi*c['radius']**2 def perimeter(c): return 2*math.pi*c['radius'] Example : A Circle 57 • Here is some code for a Circle (Python) Initialize a circle by storing some data (the radius) inside the container

new(): c = { } return c def init(c,radius): c['radius'] = radius def area(c): return math.pi*c['radius']**2 def perimeter(c): return 2*math.pi*c['radius'] Example : A Circle 58 • Here is some code for a Circle (Python) Perform some kind of operation on a Circle

Copyright (C) 2008, http://www.dabeaz.com 4- Example : Circle 59 •
Here's how you would use the Circle >>> import Circle >>> c = Circle.new() >>> Circle.init(c,4) >>> Circle.area(c) 50.26548245743669 >>> Circle.perimeter(c) 25.132741228718345 >>> • Notice how the namespace (Circle) is encapsulating all of the functionality related to circles

Copyright (C) 2008, http://www.dabeaz.com 4- Example : A Rectangle 60
• Here is similar code for a Rectangle # Rectangle.py def new(): r = { } return r def init(r,width,height): r['width'] = width r['height'] = height def area(r): return r['width']*r['height'] def perimeter(r): return 2*(r['width']+r['height'])

Copyright (C) 2008, http://www.dabeaz.com 4- Example : Shapes 61 •
Example use >>> import Circle >>> import Rectangle >>> c = Circle.new() >>> Circle.init(c,4) >>> r = Rectangle.new() >>> Rectangle.init(r,4,5) >>> Circle.area(c) 50.26548245743669 >>> Rectangle.area(r) 20 >>> Circle.perimeter(c) 25.132741228718345 >>> Rectangle.perimeter(r) 18 >>>

Copyright (C) 2008, http://www.dabeaz.com 4- Example : Shapes 62 •
Example use The code for each shape is isolated in a separate module (namespace) >>> import Circle >>> import Rectangle >>> c = Circle.new() >>> Circle.init(c,4) >>> r = Rectangle.new() >>> Rectangle.init(r,4,5) >>> Circle.area(c) 50.26548245743669 >>> Rectangle.area(r) 20 >>> Circle.perimeter(c) 25.132741228718345 >>> Rectangle.perimeter(r) 18 >>>

Copyright (C) 2008, http://www.dabeaz.com 4- >>> import Circle >>> import
Rectangle >>> c = Circle.new() >>> Circle.init(c,4) >>> r = Rectangle.new() >>> Rectangle.init(r,4,5) >>> Circle.area(c) 50.26548245743669 >>> Rectangle.area(r) 20 >>> Circle.perimeter(c) 25.132741228718345 >>> Rectangle.perimeter(r) 18 >>> Example : Shapes 63 • Example use Here, we are creating and initializing some shapes (which are just dictionaries)

Copyright (C) 2008, http://www.dabeaz.com 4- >>> import Circle >>> import
Rectangle >>> c = Circle.new() >>> Circle.init(c,4) >>> r = Rectangle.new() >>> Rectangle.init(r,4,5) >>> Circle.area(c) 50.26548245743669 >>> Rectangle.area(r) 20 >>> Circle.perimeter(c) 25.132741228718345 >>> Rectangle.perimeter(r) 18 >>> Example : Shapes 64 • Example use Performing various operations on the shapes

Copyright (C) 2008, http://www.dabeaz.com 4- Problem 65 • The code
"works," but you have to be very speciﬁc about the methods you call. >>> Circle.area(c) 50.26548245743669 >>> Rectangle.area(r) 20 >>> • There is another issue: How do you know what kind of shape you have? s = ... # A shape of some kind # Calculate the area area = ????? # What do you do here?

Copyright (C) 2008, http://www.dabeaz.com 4- Adding a "class" 66 •
In order to distinguish different kinds of data, you can tag it with some kind of "class" # Circle.py import math def new(): c = { 'class' : 'Circle' } return c def init(c,radius): c['radius'] = radius def area(c): return math.pi*c['radius']**2 def perimeter(c): return 2*math.pi*c['radius'] An attribute that says what the data actually is

Copyright (C) 2008, http://www.dabeaz.com 4- Adding a "class" 67 •
Example : A Rectangle # Rectangle.py def new(): r = { 'class' : 'Rectangle'} return r def init(r,width,height): r['width'] = width r['height'] = height def area(r): return r['width']*r['height'] def perimeter(r): return 2*(r['width']+r['height'])

Copyright (C) 2008, http://www.dabeaz.com 4- Method Dispatch 68 • Once
data is tagged with a "class", you can create a high-level "dispatch function" def area(shape): if shape['class'] == 'Circle': return Circle.area(shape) elif shape['class'] == 'Rectange': return Rectangle.area(shape) ... # Usage c = Circle.new(4) s = Rectangle.new(4,5) print area(c) # Calls Circle.area print area(s) # Calls Rectangle.area

Copyright (C) 2008, http://www.dabeaz.com 4- A New Problem 69 •
What if you wanted all shapes to have position information and some functions for movement? • Example : x,y coordinates and a function for moving the shape. • One approach : Modify every source ﬁle involving shapes... ugh.

Copyright (C) 2008, http://www.dabeaz.com 4- Adding Positions 70 # Circle.py
import math def new(): c = { 'class' : 'Circle' } return c def init(c,radius): c['radius'] = radius c['x'] = 0 c['y'] = 0 def move(c,dx,dy): c['x'] += dx c['y'] += dy def area(c): return math.pi*c['radius']**2 ... Adding new features to the Circle.

Copyright (C) 2008, http://www.dabeaz.com 4- Adding Positions 71 # Rectangle.py
def new(): r = { 'class' : 'Rectangle' } return r def init(r,width,height): r['width'] = width r['height'] = height r['x'] = 0 r['y'] = 0 def move(r,dx,dy): r['x'] += dx r['y'] += dy def area(r): return r['width']*r['height'] ... Adding new features to the Rectangle

Copyright (C) 2008, http://www.dabeaz.com 4- Commentary 72 • Modifying code
in every shape sucks! • Can't we put that code in one place and use it for all of the shapes? • Sure, just put it in a different namespace

Copyright (C) 2008, http://www.dabeaz.com 4- Shape Module 73 • General
purpose shape functions # Shape.py def init(shape): shape['x'] = 0 shape['y'] = 0 def move(shape,dx,dy): shape['x'] += dx shape['y'] += dy • All of this code is general (not shape speciﬁc)

Copyright (C) 2008, http://www.dabeaz.com 4- A Circle 74 # Circle.py
import Shape import math def new(): c = { 'class' : 'Circle' } return c def init(c,radius): Shape.init(c) # Set the positions c['radius'] = radius move = Shape.move # Copy move here def area(c): return math.pi*c['radius']**2 ...

Copyright (C) 2008, http://www.dabeaz.com 4- A Rectangle 75 # Rectangle.py
import Shape def new(): r = { 'class' : 'Rectangle' } return r def init(r,width,height): Shape.init(r) # Set the positions r['width'] = width r['height'] = height move = Shape.move # Copy move here def area(r): return r['width']*r['height'] ...

Copyright (C) 2008, http://www.dabeaz.com 4- Example Usage 76 >>> import
Circle >>> c = Circle.new() >>> Circle.init(c,4) >>> c['x'] 0 >>> Circle.move(c,3,7) >>> c['x'] 3 >>> Circle.area(c) 50.26548245743669 >>> Notice how Circles picked up the functionality we deﬁned in Shape

Copyright (C) 2008, http://www.dabeaz.com 4- A Subtle Detail 77 •
Since Circle and Rectangles are using common functionality from Shape, we should probably make sure that both objects get created in a consistent way def new(): c = { 'class' : 'Circle' } return c def new(): r = { 'class' : 'Rectangle' } return r

Copyright (C) 2008, http://www.dabeaz.com 4- Creating Shapes 78 # Shape.py
def new(classname="Shape"): s = { 'class' : classname } return s # Circle.py def new(classname="Circle"): return Shape.new(classname) # Rectangle.py def new(classname="Rectangle"): return Shape.new(classname)

Copyright (C) 2008, http://www.dabeaz.com 4- A New Problem 79 •
What if you wanted make a new kind of circle with just slight modiﬁcations? • Example : A circle with a color added to it

Copyright (C) 2008, http://www.dabeaz.com 4- A Colored Circle 80 #
ColoredCircle.py import Circle def new(classname="ColoredCircle"): c = Circle.new(classname) return c def init(c,color,radius): Circle.init(c,radius) c['color'] = color # Add a color value # Just use the same functions for area/perimeter area = Circle.area perimeter = Circle.perimeter

Copyright (C) 2008, http://www.dabeaz.com 4- Terminology 81 • Specializing an
existing object with new attributes or methods is called "inheritance" • You're "inheriting" all of the features of the original object, but making modiﬁcations

Copyright (C) 2008, http://www.dabeaz.com 4- Function Dispatch 82 • We've
set up a lot of machinery, but the problem of method dispatch is still horrible • Here's an example of what's wrong: import Rectangle, Circle, ColoredCircle # Create some shapes a = Rectangle.new(); Rectangle.init(r,4,5) b = Circle.new(); Circle.init(c,4) c = ColoredCircle.new(); ColoredCircle.init(c,"red",5) shapes = [a,b,c] for s in shapes: print area(s) # Compute area of whatever Not quiet sure what to do here (depends on the shape)

Copyright (C) 2008, http://www.dabeaz.com 4- Function Dispatch 83 • You
can implement a sort of "hack" import sys def dispatch(s,name): classname = s['class'] module = sys.modules[classname] return getattr(module,name) • Example: shapes = [a,b,c] for s in shapes: print dispatch(s,"area")() • This looks up a method based on the classname

Copyright (C) 2008, http://www.dabeaz.com 4- Where is This Going? 84
• By now, it should be pretty clear • The code we have been writing has been building towards the concept of an "object" • Roughly speaking, an "object" is a way of packaging data and functions together • It ties most of what we just did together

Copyright (C) 2008, http://www.dabeaz.com 4- The important bits 85 •
The container used to hold object data is called an "instance." The data stored inside is called "instance data." • The namespace where all of the methods are deﬁned is called a "class" • Borrowing methods from other classes is called "inheritance." • Dispatching is called "polymorphism"

Copyright (C) 2008, http://www.dabeaz.com 4- Historical Perspective 86 • Programmers
have been writing programs that do these sorts of things for a long time • For example, you can implement all of this in C or other simple languages • However, it's usually really clunky, verbose, and really hard to maintain

Copyright (C) 2008, http://www.dabeaz.com 4- The Missing Link 87 •
An "object oriented language" makes it a lot easier by taking care of low-level details • There is special syntax and other features >>> c = Circle(4.0) >>> r = Rectangle(4,5) >>> c.area() 50.26548245743669 >>> r.area() 20 >>> • So, let's talk about that...

Copyright (C) 2008, http://www.dabeaz.com 4- Part 5 88 From Talking
About "Objects" to Smalltalk

Copyright (C) 2008, http://www.dabeaz.com 4- A Bit of History :
Simula 89 • The ﬁrst "object oriented" programming language was Simula. • Simula was largely based on adding support for "objects" to Algol-60. • Strongly based on static compilers • Most of the core ideas in Simula later re- surfaced in C++ and by extension in Java.

Copyright (C) 2008, http://www.dabeaz.com 4- History : Smalltalk 90 •
Smalltalk was also one of the ﬁrst object- oriented programming languages • Initially developed at Xerox PARC (~1971) • Smalltalk-80 was ﬁrst public release • Unlike Simula, it was a dynamic language (!)

Copyright (C) 2008, http://www.dabeaz.com 4- Historical Quote 91 One day,
in a typical PARC hallway bullsession, Ted Kaeher, Dan Ingalls, and I were standing around talking about programming languages. The subject of power came up and the two of them wondered how large a language one would have to make to get great power. With as much panache as I could muster, I asserted that you could deﬁne the "most powerful language in the world" in "a page of code." They said, "Put up or shut up." - Alan Kay, "The Early History of Smalltalk"

Copyright (C) 2008, http://www.dabeaz.com 4- The Big Picture 92 •
Almost all modern dynamic languages cite Smalltalk as an "influence" "The idea of [....] comes from Smalltalk" • I'll be honest, I've never written a single program in Smalltalk before this lecture. • But, what in the heck does it mean to be "influenced" by Smalltalk? • Let's go find out...

Copyright (C) 2008, http://www.dabeaz.com 4- The Smalltalk Environment 93 •
It was largely a graphical tool (ﬁrst IDE???)

Copyright (C) 2008, http://www.dabeaz.com 4- Smalltalk in Three Bullets 94
• Everything in Smalltalk is an "object" • Objects hold state (data) • An object sends messages to and receives messages from other objects (or itself) ("That's all folks.")

Copyright (C) 2008, http://www.dabeaz.com 4- Smalltalk as Text 95 •
It is possible (but unusual) to program Smalltalk as a text-based language • For the examples that follow, I am using GNU smalltalk 3.0 • Fine for illustrating the general idea

Copyright (C) 2008, http://www.dabeaz.com 4- Example : Integers 96 •
Creating an object (an integer) x := 5 • The data stored by that object is the value (5) • The object has an associated class that indicates what kind of object it is st> x class SmallInteger st> • The object is called an "instance"

Copyright (C) 2008, http://www.dabeaz.com 4- Object Hierarchy 97 • Classes
are organized into a hierarchy Object Magnitude Number Integer Float SmallInteger LargeInteger Collection

Copyright (C) 2008, http://www.dabeaz.com 4- Messages 98 • Once you
have created an object, there is only one thing you do with it • You can send it a message. • That's it. • Nothing else. • Thus ends our tutorial of Smalltalk....

Copyright (C) 2008, http://www.dabeaz.com 4- Messages 99 • Messages have
two components selector parameter (opt) • It is ﬁrst delivered to the object's class Object Magnitude Number Integer SmallInteger x := 5 Instance of SmallInteger selector | parm Message

Copyright (C) 2008, http://www.dabeaz.com 4- Messages 100 • If not
handled, it propagates to the superclass • This is an example of "inheritance" Object Magnitude Number Integer SmallInteger x := 5 Instance of SmallInteger selector | parm Message

Copyright (C) 2008, http://www.dabeaz.com 4- Messages 101 • The message
will propagate up the class hierarchy until a matching selector is found • At this point, the message is handled. Object Magnitude Number Integer SmallInteger x := 5 Instance of SmallInteger selector | parm Message selector Message Handler code

Copyright (C) 2008, http://www.dabeaz.com 4- Message Example 102 • How
do you send a message? • Here's is an example: st> x := 5. 5 st> x factorial. 120 st> x abs. 5 st> • Is this case, we are sending a simple "unary" message (just a selector, no parameters) The object The message

Copyright (C) 2008, http://www.dabeaz.com 4- More Messages 103 • Binary
messages (+, -, /, *, etc.) • These take another object as a parameter st> x := 5. 5 st> x + 3. 8 st> x * 4. 20 st> • Here, the operator (+,*) is the selector and the value on the right is the parameter The message

Copyright (C) 2008, http://www.dabeaz.com 4- Keyword Messages 104 • Named
messages with parameters st> x := 5. 5 st> x raisedTo: 2 25 st> • Here, 'raisedTo:' is the selector and 2 is a parameter

Copyright (C) 2008, http://www.dabeaz.com 4- Smalltalk is Different 105 •
You'll now notice some pretty odd things st> x := 5 5 st> x + 3 * 4. 32 st> • There are no "operators" in Smalltalk, just messages which usually bind left to right ???? (x + 3) * 4. -> Send "+ 3" to x. Produces 8 8 * 4. -> Send "* 4" to 8. Produces 32

Copyright (C) 2008, http://www.dabeaz.com 4- Message Precedence 106 • The
three types of messages bind in this order • Unary messages • Binary messages • Keyword messages • Example: st> x := 5. 5 st> 3 * x raisedTo: 2 + 4. 11390625 st> (3 * x) raisedTo: (2 + 4) 15 raisedTo: 6 11390625

Copyright (C) 2008, http://www.dabeaz.com 4- Datatypes 107 • Smalltalk has
a useful set of datatypes • In fact, they mirror what you have seen so far. • Numbers, strings, arrays, dictionaries, etc.

Copyright (C) 2008, http://www.dabeaz.com 4- Primitive Types 108 • Numbers
x := 3. Integer x := 3.14159. Floating point x := 3/4. Fraction • Strings x := 'Hello World'. • Characters x := $a. The letter 'a'

Copyright (C) 2008, http://www.dabeaz.com 4- Example : A "List" 109
• An ordered collection of items items := OrderedCollection new. • Adding some items items add: 3 items add: 'Hello' items add: 7 items addFirst: 'Hey' items addLast: 'Foo' • Pulling out an item x := items at: 2 • Reassignment items at: 2 put: 'Yow!'

Copyright (C) 2008, http://www.dabeaz.com 4- OrderedCollection 110 • Getting the
size items size • Removing at an index items removeAtIndex: 2 • Remove by searching • Concatenation (,) x := items , otheritems items remove: 'Hello'

Copyright (C) 2008, http://www.dabeaz.com 4- Dictionaries 111 • Creating a
dictionary items := Dictionary new. • Inserting items at: 'name' put: 'Dave' items at: 3 put: 4 • Getting items • Removing items removeKey: 'name'' items at: 'name'

Copyright (C) 2008, http://www.dabeaz.com 4- Some Useful Things 112 •
Displaying an object object display • Printing with a newline object printNl (note ends with lower-case 'L') • Inspecting an object (debugging) object inspect

Copyright (C) 2008, http://www.dabeaz.com 4- Control Flow 113 • Remember,
I said that Smalltalk only has objects and messages • THAT'S IT! • There are no "control ﬂow" statements • No conditional statements • No looping statements • No function statements

Copyright (C) 2008, http://www.dabeaz.com 4- Code Blocks 114 • Blocks
of code are objects st> a := [ x := 3. y := 4. x + y ]. st> a a BlockClosure st> a value 7 st> An object holding the code A message telling the code block to run and produce a value

Copyright (C) 2008, http://www.dabeaz.com 4- Code Blocks 115 • A
Code block can optionally take parameters st> b := [ :x :y | x + y ]. st> b value: 3 value : 4 7 st> • This gives you something that roughly looks like a function • But it's still just an object. You send it messages to get it to run.

Copyright (C) 2008, http://www.dabeaz.com 4- Conditionals 116 • There are
no conditional statements. Instead, you send a message to a boolean object st> x := 3. 3 st> y := 4. 4 st> (x < y) ifTrue: [ ... code block ... ] ifFalse: [ ... code block ... ]. st> • The message parameters are code blocks

Copyright (C) 2008, http://www.dabeaz.com 4- Loops 117 • Loops are
also messages involving code blocks st> x := 0. st> 3 timesRepeat: [x := x + 1]. 3 st> [ x < 100 ] whileTrue: [x := x + 1]. nil st> x 100 st> • It's a message with code blocks as parameters

Copyright (C) 2008, http://www.dabeaz.com 4- Example : Dave's Mortgage 118
principle := 500000. rate := 0.04. payment := 499.0. month := 0. total_paid := 0. [principle > 0 ] whileTrue: [ principle := principle*(1+(rate/12)) - payment. total_paid := total_paid + payment. month := month + 1. (month == 24) ifTrue: [ rate := 0.09. payment := 3999. ] ] 'Total paid ' display. total_paid printNl. 'Months' display. month printNl.

Copyright (C) 2008, http://www.dabeaz.com 4- Iterating over Data 119 •
Here's how you loop over a collection st> x := #(1 4 5 10 20). (1 4 5 10 20) st> x do: [:item | item printNl ]. 1 4 5 10 20 (1 4 5 10 20 st> • More code blocks

Copyright (C) 2008, http://www.dabeaz.com 4- Classes 120 • Everything in
Smalltalk is an object • You create your own objects by deﬁning a class. • However, there is no special "class" statement. • Instead, you send a message

Copyright (C) 2008, http://www.dabeaz.com 4- Deﬁning a Class 121 •
To create a new class, you send a message to the parent class (the superclass) • If you don't know what the parent is, you send a message to Object. The root of all objects. st> Object subclass: #Shape. Shape st> • Here, we are asking Object to create a subclass called "Shape."

Copyright (C) 2008, http://www.dabeaz.com 4- Instance Data 122 • Objects
have internal data • The members are set up in the class. st> Shape instanceVariableNames: 'x y'. Shape st> • This operation sets the names of instance variables of Shape objects • Enforces that Shapes will have x and y.

Copyright (C) 2008, http://www.dabeaz.com 4- Creating Shapes 123 • To
create a shape, you have to deﬁne new Shape class extend [ new [ | s | s := super new. s init. ^s ] ] • This is an example of a "class method" • It is a message that is sent to the class itself

create a shape, you have to deﬁne new Shape class extend [ new [ | s | s := super new. s init. ^s ] ] • This is an example of a "class method" • It is a message that is sent to the class itself A local variable

create a shape, you have to deﬁne new Shape class extend [ new [ | s | s := super new. s init. ^s ] ] • This is an example of a "class method" • It is a message that is sent to the class itself Sends 'new' to the parent class. The "parent" (superclass)

create a shape, you have to deﬁne new Shape class extend [ new [ | s | s := super new. s init. ^s ] ] • This is an example of a "class method" • It is a message that is sent to the class itself Send the 'init' message to the newly created instance

create a shape, you have to deﬁne new Shape class extend [ new [ | s | s := super new. s init. ^s ] ] • This is an example of a "class method" • It is a message that is sent to the class itself Return the instance

Copyright (C) 2008, http://www.dabeaz.com 4- Example of Creating 128 •
Here's some sample output st> s := Shape new. Object: Shape new "<0x40292bb0>" error: did not understand #init ... st> • It didn't work because we didn't deﬁne init yet

Copyright (C) 2008, http://www.dabeaz.com 4- Initializing Shapes 129 • Here's
a deﬁnition of init Shape extend [ init [ x := 0. y := 0. ] ] • This just sets up the instance variables st> s := Shape new. a Shape st>

Copyright (C) 2008, http://www.dabeaz.com 4- Instances 130 • Every object
you create is called an "instance" • All of the internal data is completely private • There is no way to inspect it from outside • To do anything, you make the object respond to messages (by implementing methods)

Copyright (C) 2008, http://www.dabeaz.com 4- Viewing Attributes 131 • Create
methods that return internal state Shape extend [ x [ ^x. ] y [ ^y. ] ] • These respond to messages st> s x. 0 st> s y. 0 st>

Copyright (C) 2008, http://www.dabeaz.com 4- Methods with Parameters 132 •
Let's make a shape move Shape extend [ movex: dx [ x := x + dx. ] movey: dy [ y := y + dy. ] ] • These also correspond to messages st> s movex: 3. a Shape st> s movey: 2. a Shape st> s x. 3 st> s y. 2 st>

Copyright (C) 2008, http://www.dabeaz.com 4- Messages to Self 133 •
Here's a method that sends some messages Shape extend [ movene: distance [ self movex: distance. self movey: distance. ] ] • Example use: st> s movene: 4. a Shape st> s x. 4 st> s y. 4 st> self is the instance

Copyright (C) 2008, http://www.dabeaz.com 4- Making a Circle 134 Shape
subclass: #Circle. Circle instanceVariableNames: 'radius'. Circle class extend [ new: radius [ |c| c := super new. c init: radius. ^c ] ] Circle extend [ init: rad [ radius := rad. ] area [ |a| a := 3.1415926*(radius raisedTo: 2). ^a ] ]

Copyright (C) 2008, http://www.dabeaz.com 4- Using a Circle 135 st>
c := Circle new: 4. a Circle st> c area 50.26548 st> c x 0 st> c movex: 3. st> c x 3 st> • Notice how it responds to Shape messages

Copyright (C) 2008, http://www.dabeaz.com 4- Code Block Example 136 Shape
extend [ movex: dx count: n action: code [ n timesRepeat: [ self movex: dx. code value: self. ] ] ] • Example use: st> c movex: 2 count: 5 action: [:shp |shp x printNl. ] 0 2 4 6 8 st>

Copyright (C) 2008, http://www.dabeaz.com 4- Class Variables/Methods 137 • Almost
everything we have been doing has focused on instances. • However, the class itself is an object • The class can have its own variables (called class variables) • A class can have its own methods (called class methods)

Copyright (C) 2008, http://www.dabeaz.com 4- Class Method Example 138 •
Here's a sample deﬁnition Shape class extend [ foo [ 'Hello World' printNl. ] ] • Here's a use st> s := Shape new. a Shape. st> s foo. Object: Shape new "<0x402960b8>" did not understand #foo st> Shape foo. Hello World st> A method on the class

Copyright (C) 2008, http://www.dabeaz.com 4- Class Variables 139 • Set
up when the class is created Object subclass: #Shape instanceVariableNames: 'x y' classVariableNames: 'ncreate' poolDictionaries: '' category: nil ! • Can be accessed in class methods Shape class extend [ ncreate [^ncreate] new [ |s| s := super new. s init. (ncreate = nil) ifTrue: [ncreate := 1] ifFalse: [ncreate := ncreate + 1]. ^s.] ]

Copyright (C) 2008, http://www.dabeaz.com 4- Class Variables 140 • Example
use: st> s := Shape new. st> s ncreate. Object: Shape new "<0x402988d8>" error: did not understand #ncreate st> Shape ncreate. 1 st> • Again: Notice that it's part of the class

Copyright (C) 2008, http://www.dabeaz.com 4- Interesting Stuff 141 • All
objects in Smalltalk are "open" • You can add and modify methods of both instances and classes at any time. • Essentially, you can make an instances respond to new kinds of messages at will (even after creation) • On the other hand, you can't really add new instance variables after creation.

Copyright (C) 2008, http://www.dabeaz.com 4- Interesting Stuff 142 • The
Smalltalk environment itself is an object • It turns out that the assignment operator (:=) is actually a message as well. st> Smalltalk at: #x put: 42. 42 st> x 42 st> • The whole language is objects and messages.

Copyright (C) 2008, http://www.dabeaz.com 4- Smalltalk Wrap-up 143 • Smalltalk
has been hugely inﬂuential • GUIs • Graphical IDEs • Object Oriented Concepts • Object implementation in Dynamic Languages

Copyright (C) 2008, http://www.dabeaz.com 4- Wrap-up 145 • Have built-up
some of the basic parts of how big programs get put together. • Next time, we'll look at speciﬁcs of current Dynamic languages

Copyright (C) 2008, http://www.dabeaz.com 5- Object Models and Implementation Section
5 2

Copyright (C) 2008, http://www.dabeaz.com 5- Introduction 3 • Last time,
we looked at problems related to creating large programs • Took a detour to go look at Smalltalk, one of the ﬁrst, and most inﬂuential object oriented languages • Today, we're going to look at how all of this gets put together in modern languages

Copyright (C) 2008, http://www.dabeaz.com 5- Overview 4 • A brief
review of concepts • The Ruby object model (in depth) • The Python object model (in depth) • The Perl object model (brief survey) • The Javascript object model (brief survey)

Objects (Reprise)

Copyright (C) 2008, http://www.dabeaz.com 5- What is an Object? 6
• An object is a programming abstraction that bundles two things together • Data • Methods that operate on the data • For example : A Circle • Data : The radius • Methods : area(), perimeter(), etc.

Copyright (C) 2008, http://www.dabeaz.com 5- Instances 7 • When you
create objects, you are creating "instances" • Each instance of an object has its own internal data (instance variables) • Examples : Instances of circles .radius=6 .radius=3 .radius=4 .radius=9

Copyright (C) 2008, http://www.dabeaz.com 5- Instance Methods 8 • Functions
that operate on instances of objects are known as "instance methods" • For example: Compute the area of a circle • The result depends on the circle instance that you supply to the method

Copyright (C) 2008, http://www.dabeaz.com 5- Classes 9 • Instance methods
are not stored as part of the instances themselves. • They are found in an associated class • Instances are always linked back to a class class Circle area() perimeter() ... .radius=6 .radius=3 .radius=4 Circle instances

Copyright (C) 2008, http://www.dabeaz.com 5- Class Variables 10 • A
class may deﬁne its own variables known as "class variables" • These variables act as a kind of "global variable" for all everything in the class class Circle area() perimeter() ... ncircles = 3 .radius=6 .radius=3 .radius=4 Circle instances class variable

Copyright (C) 2008, http://www.dabeaz.com 5- Inheritance 11 • A class
can inherit from other classes • Each class has a link to its superclass (parent) class Circle area() perimeter() ... class Shape move() ...

Copyright (C) 2008, http://www.dabeaz.com 5- Inheritance 12 • The whole
point of inheritance is to borrow or modify existing functionality • For example, a Circle picks up all of the functionality that was deﬁned for shapes • And it can modify that functionality if it wants

Copyright (C) 2008, http://www.dabeaz.com 5- Classes as Objects 13 •
In some languages, classes themselves are an object (an instance of a "class") • The "data" stored in a class object consists of the instance methods and class variables class Circle area() perimeter() class Shape move() class Stock sell() cost() Instances of "classes"

Copyright (C) 2008, http://www.dabeaz.com 5- Class Methods 14 • A
class may deﬁne methods that operate on the class itself (as an object) • These are known as "class methods" • An example : the new method • New is a class method that asks the class to create a new instance

Copyright (C) 2008, http://www.dabeaz.com 5- Static Methods 15 • A
normal function that just happens to be placed in a class for the purposes of packaging • It has no relation to instances or classes • It's just placed into the class namespace • This is more of a C++/Java oddity

Copyright (C) 2008, http://www.dabeaz.com 5- Part 2 16 Objects in
Ruby

Copyright (C) 2008, http://www.dabeaz.com 5- Objects in Ruby 17 •
Ruby is object oriented "I wanted a scripting language that was more powerful than Perl, and more object-oriented than Python. That’s why I decided to design my own language (Ruby).” - Matz (creator of Ruby) • What it really means : Matz likes Smalltalk.

Copyright (C) 2008, http://www.dabeaz.com 5- Everything is an Object 18
• Objects in Ruby are organized in a hierarchy Object Numeric Integer FixNum Float BigNum String ... • At the top, you have "Object"

• How to navigate the hierarchy x.class # The class to which x belongs cls.superclass # The superclass of a class cls • Example: irb(main):025:0> x = 37 => 37 irb(main):026:0> x.class => Fixnum irb(main):027:0> Fixnum.superclass => Integer irb(main):028:0> Integer.superclass => Numeric irb(main):029:0> Numeric.superclass => Object irb(main):030:0>

Copyright (C) 2008, http://www.dabeaz.com 5- Objects Have Methods 20 •
Example : Integers irb(main):040:0> x = 37 => 37 irb(main):041:0> x.abs => 37 irb(main):042:0> x.between?(2,100) => true irb(main):043:0> x.to_s => "37" irb(main):044:0> x.remainder 20 => 17 irb(main):045:0>

Copyright (C) 2008, http://www.dabeaz.com 5- Inspecting Methods 21 • Get
a list of method names x.methods • Example: irb(main):025:0> x = 37 => 37 irb(main):026:0> x.methods => ["method", "%", "between?", "send", "<<", "prec", "modulo", "&", "object_id", ">>", "zero?", "size", "singleton_methods", "__send__", "equal?", "taint", "id2name", "*", "next", "frozen?", "instance_variable_get", "+", "kind_of?", "step", "to_a", "instance_eval", "-", "remainder", ...]

Copyright (C) 2008, http://www.dabeaz.com 5- Defining New Objects 22 •
To create an object, you first define a class class Circle def initialize(radius) @radius = radius end def area Math::PI * @radius ** 2 end def perimeter 2*Math::PI * @radius end end • A class is mainly just a collection of methods

Copyright (C) 2008, http://www.dabeaz.com 5- Instance Variables 23 • Instance
variables are denoted by @varname class Circle def initialize(radius) @radius = radius end def area Math::PI * @radius ** 2 end def perimeter 2*Math::PI * @radius end end Instance variables • These variables are storing the data that is unique to each instance that is created

Copyright (C) 2008, http://www.dabeaz.com 5- Initialization 24 • initialize is
called when an object is created class Circle def initialize(radius) @radius = radius end def area Math::PI * @radius ** 2 end def perimeter 2*Math::PI * @radius end end • This name of this method is "special". Ruby expects initialization to use this speciﬁc name.

Copyright (C) 2008, http://www.dabeaz.com 5- Creating Instances 25 • To
create instances, you use new c = Circle.new(4) d = Circle.new(9) ... • This calls initialize with the supplied argument c = Circle.new(4) class Circle def initialize(radius) @radius = radius end ...

Copyright (C) 2008, http://www.dabeaz.com 5- Calling Methods 26 • To
call methods, you just need an instance c = Circle.new(4) d = Circle.new(9) ... puts c.area # Calls the area method on c puts d.area # Calls the area method on d puts c.perimeter • Inside methods, the instance variables (@vars) bind to the values stored in the instance.

Copyright (C) 2008, http://www.dabeaz.com 5- Inspecting Objects 27 • As
a debugging aid, you can inspect objects c = Circle.new(4) ... puts c.inspect • Generates a string showing what's inside #<Circle:0x25170 @radius=4> • No way to directly access the internals however (more in a minute)

Copyright (C) 2008, http://www.dabeaz.com 5- Instance Data 28 • Important
point : The set of instance variables on an object is not restricted or declared • Whenever a method assigns to @varname, that creates a new instance variable class Circle <Shape def initialize(radius) @radius = radius end ... def set_color(color) @color=color end end This "spontaneously" creates a new instance variable when called the ﬁrst time

Copyright (C) 2008, http://www.dabeaz.com 5- Instance Data 29 • All
instance variables in Ruby are private • The only way to access is through methods class Circle def initialize(radius) @radius = radius end def radius @radius end def radius=(value) @radius=value end end Return the value Set the value

Copyright (C) 2008, http://www.dabeaz.com 5- Using Accessors 30 • Example
of using the accessor methods c = Circle.new(4) puts c.radius # Prints 4 c.radius=5 puts c.area # Prints 78.5398163397448 Both of these operations are actually method calls • Important point : There is never direct access to instance data in Ruby. It's always a method.

Copyright (C) 2008, http://www.dabeaz.com 5- Variables vs. Methods 31 •
Instance variables and methods are separate • Plus, there is special syntax to distinguish instance variables from methods (@varname) • So, it does not matter that there is an instance variable called @radius and a method called radius.

Copyright (C) 2008, http://www.dabeaz.com 5- Inheritance 32 • A class
may inherit from one other class class Shape def initialize @x = 0 @y = 0 end def move(dx,dy) @x += dx @y += dy end end class Circle <Shape ... end superclass

Copyright (C) 2008, http://www.dabeaz.com 5- Inheritance 33 • If no
superclass is listed, Object is assumed class Shape ... end class Shape <Object ... end == • Ruby only supports single inheritance. • So, there is always just one superclass

Copyright (C) 2008, http://www.dabeaz.com 5- Inheritance & Initialization 34 •
Derived classes must initialize parents • This is done using "super" class Shape def initialize @x = 0 @y = 0 end end class Circle <Shape def initialize(radius) super() # Initialize parent @radius = radius end end

Copyright (C) 2008, http://www.dabeaz.com 5- Super 35 • Within any
method, super is a special keyword that refers to the same method in the parent class (superclass) • This is used when you re-implement a method, but still want to call the original version within the new method

Copyright (C) 2008, http://www.dabeaz.com 5- Some Shortcuts 36 • Instance
data always has to be accessed through methods, but deﬁning those methods repeatedly gets tedious and annoying • Here is a shortcut class Shape attr_reader :x, :y def initialize @x = 0 @y = 0 end end This creates accessor methods for reading the values def x @x end def y @y end

Copyright (C) 2008, http://www.dabeaz.com 5- Some Shortcuts 37 • Creating
attribute writers class Shape attr_reader :x, :y attr_writer :x, :y def initialize @x = 0 @y = 0 end end This creates accessor methods for writing the values def x=(value) @x=value end def y=(value) @y=value end • Note - :x is the Ruby syntax for a symbol

Copyright (C) 2008, http://www.dabeaz.com 5- Concept of Attributes 38 •
All access to an object occurs via methods • However, methods that take no arguments also look like data "attributes" class Circle <Shape ... def radius @radius end def area Math::PI*@radius **2 end def perimeter 2*Math::PI*@radius end end c = Circle.new(4) puts c.radius puts c.area puts c.perimeter ... Notice how the access is very "uniform"

Copyright (C) 2008, http://www.dabeaz.com 5- Attributes 39 • An attribute
is part of the "public" interface of an object that's presented to a user • It has nothing to do with the internal state stored by an object (instance data) • Example : Certain attributes are stored (the radius), but others are computed (the area) • This concept of hiding internals behind methods is very big in OO-programming

Copyright (C) 2008, http://www.dabeaz.com 5- Example : C++ 40 From
"Effective C++", by S. Meyers Access methods Instance data Motivation

Copyright (C) 2008, http://www.dabeaz.com 5- Class Variables 41 • Classes
are also "objects" in Ruby • This is subtle, but a class can have its own internal variables (like instance data) class Shape @@ncreated = 0 # Class variable def initialize @x = 0 @y = 0 @@ncreated += 1 end end • A class variable is shared by all instances (but there is just one copy of the variable)

Copyright (C) 2008, http://www.dabeaz.com 5- Class Variables 42 • Class
variables are not part of instances irb(main):002:0> s = Shape.new => #<Shape:0x625b0 @y=0, @x=0> irb(main):003:0> • Nor are they readable... (They're private too) You do not see a reference to @@ncreated here irb(main):002:0> Shape.ncreated NoMethodError: undefined method 'ncreated' for Shape:Class irb(main):003:0>

Copyright (C) 2008, http://www.dabeaz.com 5- Class Methods 43 • Methods
can be defined for the class itself class Shape @@ncreated = 0 ... def Shape.ncreated @@ncreated end end Prefix with the class name to define a class method • To use the method, apply it to the class irb(main):006:0> s = Shape.new => #<Shape:0x58cf4 @x=0, @y=0> irb(main):007:0> Shape.ncreated => 1 irb(main):008:0>

Copyright (C) 2008, http://www.dabeaz.com 5- Class Methods 44 • Class
methods only operate on classes, not instances of objects deﬁned by a class • This is somewhat subtle irb(main):006:0> s = Shape.new => #<Shape:0x58cf4 @x=0, @y=0> irb(main):007:0> Shape.ncreated => 1 irb(main):008:0> s.ncreated NoMethodError: undefined method `ncreated' for #<Shape:0x58cf4 @x=0, @y=0> from (irb):8 irb(main):009:0>

Copyright (C) 2008, http://www.dabeaz.com 5- Commentary 45 • Class methods
are a "deep concept" • When you deﬁne a class, you're actually deﬁning two different kinds of objects • Instances of the class • The class itself • Although they're related, these objects are distinct from each other and handled separately (more in a minute)

Copyright (C) 2008, http://www.dabeaz.com 5- Class Extension 46 • Classes
are "open" in Ruby • After a class has been deﬁned, you can later open it and add new methods to it • Repeated use of class merely extends the previous deﬁnition with new methods

Copyright (C) 2008, http://www.dabeaz.com 5- Class Extension 47 • Example:
Here's a class class Circle <Shape def initialize(radius) @radius = radius end ... end • And some code that extends it c = Circle.new(4) # Create a circle class Circle # Add a new method def holler puts "I'm a happy shiny circle" end end c.holler # Print 'I'm a happy ...'

Copyright (C) 2008, http://www.dabeaz.com 5- Class Extension 48 • Changes
affect instances already created! c = Circle.new(4) d = Circle.new(5) puts c.area # Prints 50.2654824574367 puts d.area # Prints 78.5398163397448 class Circle def area (4/(5/4.0))*@radius**2 end end puts c.area # Prints 51.2 puts d.area # Prints 80.0

Copyright (C) 2008, http://www.dabeaz.com 5- Anonymous Classes 49 • Deﬁnes
new methods for a single instance c = Circle.new(4) d = Circle.new(4) # Circle c is moving to Indiana. Fix it class <<c def area 4/(5/4.0)*@radius**2 end end puts c.area # Prints 51.2 puts d.area # Prints 50.2654824574367

Copyright (C) 2008, http://www.dabeaz.com 5- Modules 50 • Ruby provides
a "module" mechanism module MoveLRUD # Methods for moving def left(dx) # left,right,up,down move(-dx,0) end def right(dx) move(dx,0) end def up(dy) move(0,-dy) end def down(dy) move(0,dy) end end • A module is a namespace • It can contain instance methods/class methods

Copyright (C) 2008, http://www.dabeaz.com 5- Modules 51 • Modules are
a collection of methods, but they do not deﬁne any kind of class or instance • Which makes them rather odd creatures irb(main):010:0> Movable.left(3) NoMethodError: undefined method `left' for Movable:Module from (irb):10 irb(main):011:0> • If you deﬁne instance methods in a module, there's no obviously apparent way to use them

Copyright (C) 2008, http://www.dabeaz.com 5- Mixins 52 • A module
can be included into a class class Shape include MoveLRUD # Include as a mixin def move(dx,dy) @x += dx @y += dy end end • This takes all of the methods in the module and makes them part of the class as if they were deﬁned there • It "mixes in" the other methods

Copyright (C) 2008, http://www.dabeaz.com 5- Mixins 53 • A module
can also be mixed into an instance a = Shape.new b = Shape.new b.extend(MoveLRUD) b.left(4) b.up(3) a.left(4) # Error. Left not defined • Here, the module methods only work on the speciﬁc instance that was extended

Copyright (C) 2008, http://www.dabeaz.com 5- Mixin Commentary 54 • With
mixins, you implement common functionality in one place (a module) • You then include it in a variety of different places over and over again to reuse it • Note: This is a slightly different concept than "inheritance"

Copyright (C) 2008, http://www.dabeaz.com 5- Access Control 55 • Methods
are normally public, meaning anyone can call them • Can also have protected and private methods class Foo private def bar ... end protected def spam ... end end

Copyright (C) 2008, http://www.dabeaz.com 5- Access Control 56 • Protected
• Method can be called by any method of the deﬁning class or subclasses • May be invoked on other instances • Private • Method can only be called by methods in the same class • And only on the current object

Copyright (C) 2008, http://www.dabeaz.com 5- Object Implementation 57 • All
objects have the same representation • A set of instance variables • A reference to the class • Some ﬂags (related to internals) flags ivars class

Copyright (C) 2008, http://www.dabeaz.com 5- Instances and Classes 58 flags
ivars class class Circle <Shape def initialize(radius) @radius = radius end end c = Circle.new(4) d = Circle.new(5) flags ivars class c d { 'radius' => 4, ... } { 'radius' => 5, ... } Circle All instances link back to their class

Copyright (C) 2008, http://www.dabeaz.com 5- Class Implementation 59 • A
class is an object with additional information • A reference to the superclass • A list of methods flags ivars class super methods • Important : A class is also an object

Copyright (C) 2008, http://www.dabeaz.com 5- Class Implementation class Circle <Shape
... def area ... end def perimeter ... end ... end { } Circle flags ivars class super methods { 'area'=> method, 'perimeter' => method, ... } Shape flags ivars class super methods class Shape @@ncreated = 0 def move ... end end { 'ncreated => 0 } { 'move' => method, ... } Object 60

Copyright (C) 2008, http://www.dabeaz.com 5- Method Dispatch 61 • Instances
are linked to classes • Classes are linked to the superclass • This is the key to knowing how methods get dispatched to the appropriate deﬁnition • Essentially you just follow those links

Copyright (C) 2008, http://www.dabeaz.com 5- Method Dispatch Circle flags ivars
class super methods { 'area'=> method, 'perimeter' => method, } Shape flags ivars class super methods { 'move' => method } Object c = Circle.new(4) c.area c.move Every method call involves a search of the class and all base classes (just follow super) 62

Copyright (C) 2008, http://www.dabeaz.com 5- Commentary 63 • The entire
object system is just a big tree flags ivars class flags ivars class Instances flags ivars class super methods flags ivars class super methods Classes

Copyright (C) 2008, http://www.dabeaz.com 5- Commentary 64 • Everything that
happens with objects is ultimately related to that tree structure • Inserting nodes into the tree • Creating various links between nodes

Copyright (C) 2008, http://www.dabeaz.com 5- Extending an Instance 65 flags
ivars class flags ivars class super methods c = Circle.new(4) Circle c { 'area'=> meth, 'perimeter' => } Start by creating a new instance

ivars class flags ivars class super methods Circle c { 'area'=> meth, 'perimeter' => } Now, let's extend that instance by redeﬁning area c = Circle.new(4) class <<c def area 4/(5/4.0)*@radius**2 end end The new area method needs to be inserted here

ivars class flags ivars class super methods c = Circle.new(4) class <<c def area 4/(5/4.0)*@radius**2 end end flags ivars class super methods Circle <virtual> c { 'area'=> meth } { 'area'=> meth, 'perimeter' => } A "virtual" anonymous class gets inserted into the class chain for c V

Copyright (C) 2008, http://www.dabeaz.com 5- Mixins 68 module Foo def
bar puts 'Foo.bar' end end First deﬁne a module A Module is just a collection of methods flags ivars class super methods Foo (Module) { 'bar'=> meth, }

Copyright (C) 2008, http://www.dabeaz.com 5- Mixins 69 flags ivars class
super methods module Foo def bar puts 'Foo.bar' end end class Circle <Shape ... end Circle flags ivars class super methods Foo (Module) flags ivars class super methods Shape Now, start deﬁning a class { 'bar'=> meth, }

super methods module Foo def bar puts 'Foo.bar' end end class Circle <Shape include Foo ... end Circle flags ivars class super methods Foo (Module) flags ivars class super methods Shape Include a module as a mixin { 'bar'=> meth, } The functionality of Foo needs to be added to Circle somehow

super methods module Foo def bar puts 'Foo.bar' end end class Circle <Shape include Foo ... end Circle flags ivars class super methods Foo (Module) flags ivars class super methods Shape Again, an anonymous class gets inserted into the class chain { 'bar'=> meth, } flags ivars class super methods Mixin Proxy

Copyright (C) 2008, http://www.dabeaz.com 5- Classes are Objects 72 •
Deep thought : A class is also an object • If so, it must belong to some class! • It does - check it out class Circle ... end puts Circle.class # Prints 'Class' • The output says that "Circle" is a "Class"

Copyright (C) 2008, http://www.dabeaz.com 5- Class Objects 73 flags ivars
class super methods • Here is a picture Circle flags ivars class super methods Class Circle is a Class • Notice the parallel to instances flags ivars class super methods Circle flags ivars class c = Circle.new(r) c is a Circle

Copyright (C) 2008, http://www.dabeaz.com 5- What's a Class? 74 •
What is this "Class"? • Well, a "Class" is just another object • Which is, well, also a Class irb(main):001:0> puts Class.class Class => nil irb(main):002:0> • Huh?!?!?!

Copyright (C) 2008, http://www.dabeaz.com 5- Taking the Red Pill 75
flags ivars class super methods • Just sketch it out... Circle flags ivars class super methods Class Circle is a Class Class is a Class • Clearly, there's something is going on here • Let's take a look at the superclasses...

Copyright (C) 2008, http://www.dabeaz.com 5- Classes are Modules 76 •
Inspect the superclass irb(main):001:0> puts Class.class Class => nil irb(main):002:0> puts Class.superclass Module => nil • Whoa, a class is a kind of Module • And a Module is a namespace • And in the last lecture we saw how you could implement objects using namespaces

Copyright (C) 2008, http://www.dabeaz.com 5- Classes and Modules 77 flags
ivars class super methods Circle flags ivars class super methods Class Circle is a Class Class is a Class flags ivars class super methods Module Classes are implemented on top of Modules

Copyright (C) 2008, http://www.dabeaz.com 5- Modules are Objects 78 •
Inspect the class and superclass irb(main):001:0> puts Module.class Class => nil irb(main):002:0> puts Module.superclass Object => nil • A module is an object like everything else • There is a class (Module) • Module inherits from Object

Copyright (C) 2008, http://www.dabeaz.com 5-79 flags ivars class super methods
Circle flags ivars class super methods Class flags ivars class super methods Module flags ivars class super methods Object Here, you are seeing inheritance, but it's all about classes. 1. A Circle is a Class 2. A Class is a Module 3. A Module is an Object

• A look at "Object" irb(main):001:0> puts Object.class Class => nil irb(main):002:0> puts Object.superclass nil • An Object is described by a Class • There are no parents to an Object • That's the end of the line...

Circle flags ivars class super methods Class flags ivars class super methods Module flags ivars class super methods Object nil Object has no parents. So, it terminates the chain of linked objects

Circle flags ivars class super methods Class flags ivars class super methods Module flags ivars class super methods Object flags ivars class super methods Shape Let's add in some other classes nil

Copyright (C) 2008, http://www.dabeaz.com 5- How to make sense of
it? 83 • Everything ultimately leads to Object • That's because everything is an Object • All objects are described by a class

Copyright (C) 2008, http://www.dabeaz.com 5- How to make sense of
it? 84 • There are always two different paths • The "instance" path • The "class" path • Instance path : Instance methods • Class path : Class methods • The choice depends on the starting object

Circle flags ivars class super methods Object flags ivars class super methods Shape nil The Instance Path # c is a Circle instance c.area Here, you start with an instance of Circle. flags ivars class c

Circle flags ivars class super methods Class flags ivars class super methods Module flags ivars class super methods Object nil The Class Path c = Circle.new(4) Here, you start with the class itself (Circle).

Copyright (C) 2008, http://www.dabeaz.com 5- Method Resolution 87 • Notice
how the search process is highly uniform • In fact, there is a simple algorithm obj.meth 1. Follow the class link of obj 2. Look in the method table 3. If not found, follow the super link 4. Repeat 2-4 until you find the method

Copyright (C) 2008, http://www.dabeaz.com 5- A Final Complexity 88 •
Class methods • Recall that class methods are methods that operate on classes, not instances class Shape <Object @@ncreated = 0 def Shape.ncreated # A class method @@ncreated end end • These methods live along the class chain

Copyright (C) 2008, http://www.dabeaz.com 5-89 class Shape <Object @@ncreated =
0 def Shape.ncreated @@ncreated end end flags ivars class super methods Shape flags ivars class super methods Shape' (virtual) flags ivars class super methods Class { 'ncreated'=> meth } flags ivars class super methods Object Class methods live in a separate anonymous class that's inserted into the class chain (sometimes known as a "metaclass")

Copyright (C) 2008, http://www.dabeaz.com 5- Final Comments 90 • Here
are the key points • Everything is an object • All objects are described by a class • Classes are objects • Everything is linked together in a big tree/graph

Python

Copyright (C) 2008, http://www.dabeaz.com 5- Objects in Python • Python
has always had "objects", but OOP was never the overriding design philosophy • In fact, user-deﬁned classes were one of the last features added to the language • Recall that one motivation for Ruby was to address a perceived problem with Python OO 92

• Objects in Python are organized in a hierarchy object int • object is at the top • However, the hierarchy is relatively ﬂat • Example: Don't see numbers grouped together under a class "Numeric" float str list dict

Copyright (C) 2008, http://www.dabeaz.com 5- Inspecting Objects 94 • All
objects have a "type" >>> x = 37 >>> x.__class__ <type 'int'> >>> • The type is the class to which an object belongs • Finding the parent class (superclass) >>> int.__bases__ (<type 'object'>,) >>>

Copyright (C) 2008, http://www.dabeaz.com 5- Inspecting Objects 95 • Get
a list of things that are deﬁned (dir) >>> x = 37 >>> dir(x) ['__abs__', '__add__', '__and__', '__class__', '__cmp__', '__coerce__', '__delattr__', '__div__', '__divmod__', '__doc__', '__float__', '__floordiv__', '__getattribute__', '__getnewargs__', '__hash__', '__hex__', '__index__', '__init__', '__int__', '__invert__', '__long__', '__lshift__', '__mod__', '__mul__', '__neg__', '__new__', '__nonzero__', '__oct__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdiv__', ... ] >>>

Copyright (C) 2008, http://www.dabeaz.com 5- The class statement • Deﬁnes
a new user-deﬁned object class Circle(object): def __init__(self,radius): self.radius = radius def area(self): return math.pi*(self.radius**2) def perimeter(self): return 2*math.pi*self.radius • A class is a collection of functions (methods) • Nothing conceptually new here 96

Copyright (C) 2008, http://www.dabeaz.com 5- Creating Instances • The class
serves as a "factory" >>> c = Circle(4.0) >>> c.area() 50.26548245743669 >>> c.radius 4.0 >>> 97 • Note : You don't call a special method like "new", you just use the class as a function

Copyright (C) 2008, http://www.dabeaz.com 5- __init__ method • Used to
initialize objects • Called whenever a new object is created >>> c = Circle(4.0) class Circle(object): def __init__(self,radius): self.radius = radius newly created object • __init__ is example of a "special method" • Has special meaning to Python interpreter 98

Copyright (C) 2008, http://www.dabeaz.com 5- Instance Variables • Data stored
within the object class Circle(object): def __init__(self,radius): self.radius = radius • Outside class, just access through the instance name • Inside class, referenced using self.attrname def area(self): return math.pi*(self.radius**2) >>> c = Circle(4.0) >>> c.radius 4.0 99

Copyright (C) 2008, http://www.dabeaz.com 5- Methods • Functions applied to
instances of an object class Circle(object): ... def area(self): return math.pi*self.radius**2 • By convention, called "self" • The object is always passed as ﬁrst argument >>> c.area() def area(self): ... The name is unimportant---the object is always passed as the ﬁrst argument. It is simply Python programming style to call this argument "self." C++ programmers might prefer to call it "this." 100

Copyright (C) 2008, http://www.dabeaz.com 5- Python Differences • You're already
seeing some huge differences from the Ruby object system • All instance variables are fully visible >>> c = Circle(4) >>> c.radius 4 >>> c.radius = 5 >>> 101 • Explicit use of "self" to refer to the instance def area(self): return math.pi*self.radius**2

Copyright (C) 2008, http://www.dabeaz.com 5- Inheritance • It's fully supported
• List base classes when deﬁning the class class Parent(object): ... class Derived(Parent): ... • Bases speciﬁed in () after class name 102

Copyright (C) 2008, http://www.dabeaz.com 5- Inheritance Example • Shapes and
Circles class Shape(object): def __init__(self): self.x = 0 self.y = 0 def move(self,dx,dy): self.x += dx self.y += dy class Circle(Shape): def __init__(self,radius): Shape.__init__(self) self.radius = radius def area(self): return math.pi*self.radius**2 103

Copyright (C) 2008, http://www.dabeaz.com 5- Multiple Inheritance • Classes in
Python may have multiple bases class Foo(object): ... class Bar(object): ... class Spam(Foo,Bar): ... • We have not seen this before • Not allowed in Ruby, Smalltalk, Java, etc. • There are some nasty issues (later) 104

Copyright (C) 2008, http://www.dabeaz.com 5- Class Variables • Class can
have variables. Just deﬁne in the class deﬁnition class Shape(object): numcreated = 0 # class variable def __init__(self): Shape.numcreated += 1 self.x = 0.0 self.y = 0.0 >>> Shape.numcreated 0 >>> s = Shape() >>> Shape.numcreated 1 >>> 105

Copyright (C) 2008, http://www.dabeaz.com 5- Class Methods • Require special
"decoration" class Shape(object): @classmethod def spam(cls): print "Hello. Your class is", cls class Circle(Shape): ... >>> Circle.spam() Hello. Your class is <class '__main__.Circle'> >>> Shape.spam() Hello. Your class is <class '__main__.Shape'> >>> 106 • Class methods receive the class itself as the ﬁrst argument (classes are also objects)

Copyright (C) 2008, http://www.dabeaz.com 5- An Oddity • Class methods/variables
and instance methods/variables are co-mingled class Shape(object): @classmethod def spam(cls): print "Hello. Your class is", cls class Circle(Shape): ... >>> c = Circle(4) >>> c.spam() Hello. Your class is <class '__main__.Circle'> >>> c.numcreated 1 >>> 107 • Class methods can be invoked via instance

Copyright (C) 2008, http://www.dabeaz.com 5- Other Differences • Classes aren't
quite "open" in the same way as they are in Ruby • If the same class definition appears more than once, the new definition replaces the old definition (but it does not affect existing instances) 108

Copyright (C) 2008, http://www.dabeaz.com 5- Other Differences • Example :
Class redeﬁnition 109 class Foo(object): def bar(self): print "Hello World" a = Foo() class Foo(object): def bar(self): print "Hello Cruel World" b = Foo() a.bar() # Prints "Hello World" b.bar() # Prints "Hello Cruel World"

Copyright (C) 2008, http://www.dabeaz.com 5- Class Extension • You can
add new methods to an existing class by just deﬁning them outside and attaching them to the class object 110 class Circle(object): def __init__(self,radius): self.radius = radius def area(c): return math.pi*c.radius**2 Circle.area = area

Copyright (C) 2008, http://www.dabeaz.com 5- That's it (Mostly) • Believe
it or not, that is about the extent of defining and using classes in Python • A class is just a bunch of functions • The functions are normally an instance method that receives the instance as the first parameter (self) • Can optionally be defined as class methods that receive the class as the first parameter 111

Copyright (C) 2008, http://www.dabeaz.com 5- That's it (Mostly) • There
is no access control (public, private, protected) • No separate notion of "modules" and mixins 112

Copyright (C) 2008, http://www.dabeaz.com 5- Commentary • Most of these
features are available, but they just take a different form • Example : Mixins using multiple inheritance 113 class Shape(object): def move(self,dx,dy): self.x += dx self.y += dy class MoveLRUD(object): def left(self,dx): self.move(-dx,0) def right(self,dx): self.move(dx,0) def up(self,dy): self.move(0,-dy) def down(self,dy): self.move(0,dy) class Circle(Shape,MoveLRUD): ...

Copyright (C) 2008, http://www.dabeaz.com 5- Interlude • Here is the
Python view on objects... • An instance is just a collection of stuff • A class is just a collection of stuff • A dictionary is just a collection of stuff • Hey, I'll just use that! 114

Copyright (C) 2008, http://www.dabeaz.com 5- Object Implementation • Python's implementation
of objects is mainly just a wrapper layer • Objects and classes are just wrappers around dictionaries • And methods are just wrappers around ordinary functions • Let's go take a look... 115

Copyright (C) 2008, http://www.dabeaz.com 5- Recall: Classes • A class
deﬁnition class Circle(Shape): def __init__(self,radius): Shape.__init__(self) self.radius = radius def area(self): return math.pi*self.radius**2 def perimeter(self): return 2*math.pi*self.radius • A class creates a special kind of object >>> Circle <class '__main__.Circle'> >>> • What is this object? 116 A Class Object

Copyright (C) 2008, http://www.dabeaz.com 5- Class Objects • It's a
wrapper around a dictionary >>> Circle.__dict__ <dictproxy object at 0x54ff0> >>> Circle.__dict__.keys() ['__module__','area','perimeter','__dict__','__weakref__', '__doc__','__init__'] >>> Circle.__dict__['area'] <function area at 0x4d430> >>> Circle.__dict__['perimeter'] <function perimeter at 0x4d4b0> <class Circle> .__dict__ { 'area' : <function>, 'perimeter' : <function>, '__init__' : <function>, } __dict__ class object 117

Copyright (C) 2008, http://www.dabeaz.com 5- Instances • Instance data is
also stored in a dictionary >>> c = Circle(4) >>> c.__dict__ {'x' : 0,'radius' : 4, 'y' : 0 } >>> • Dictionary holds attributes class Circle(Shape): def __init__(self,radius) Shape.__init__(self) self.radius = shares Instance c of Circle .__dict__ { 'x' : 0, 'radius' : 4, 'y' : 0 } __dict__ instance 118

Copyright (C) 2008, http://www.dabeaz.com 5- Putting it Together • We
just need to connect the dots • Each instance has a dictionary for its data • Each class has a dictionary for its methods 119

Copyright (C) 2008, http://www.dabeaz.com 5- Instances to Classes • Instances
hold a reference to their class >>> c = Circle(4) >>> c.__dict__ {'x': 0,'radius': 4,'y': 0 } >>> c.__class__ <class '__main__.Circle'> >>> • __class__ attribute refers to class object 120

Copyright (C) 2008, http://www.dabeaz.com 5- Classes to Superclasses • Example:
class A(B,C): ... • Classes may inherit from other classes • Bases stored as a tuple in class object >>> A.__bases__ (<class '__main__.B'>,<class '__main__.C'>) >>> • __bases__ is tuple of base class objects 121

Copyright (C) 2008, http://www.dabeaz.com 5- Object Representation .__dict__ {attrs} .__class__
.__dict__ {attrs} .__class__ .__dict__ {attrs} .__class__ .__dict__ {methods} .__bases__ (base1,base2,...) instances class .__dict__ {methods} .__bases__ .__dict__ {methods} .__bases__ 122 base classes

Copyright (C) 2008, http://www.dabeaz.com 5- Object Representation • Key point:
Everything stored in dictionaries • Instances have dicts (instance data) • Classes have dicts (methods, class attributes) • Instances, classes, bases are linked together (__class__, __bases__) 123

Copyright (C) 2008, http://www.dabeaz.com 5- Attribute Lookup • Python has
special operators for getting, setting, and deleting "attributes" 124 obj.name # Get an attribute value obj.name = value # Set an attribute value del obj.name # Delete an attribute • To ﬁnish off the object system, you just have to deﬁne the behavior of these operators • Connect it up to all of those dictionaries

Copyright (C) 2008, http://www.dabeaz.com 5- Setting Attributes • Setting an
attribute just updates the local dictionary of object >>> c = Circle(4) >>> c.__dict__ {'x': 0, 'radius': 4,'y' : 0 } >>> c.radius = 5 >>> c.color = "Blue" >>> c.__dict__ { 'x': 0, 'radius': 5, 'y': 0, 'color':'Blue' } >>> • Deleting an attribute just removes it from the dictionary 125

Copyright (C) 2008, http://www.dabeaz.com 5- Setting Attributes • Setting an
attribute overrides any attributes set in class or bases >>> c = Circle(4) >>> c.area() 50.2654824574367 >>> c.area = "pretty big" >>> c.area() Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: 'str' object is not callable >>> c.area 'pretty big' >>> • One way to create a mighty kerfufﬂe 126

Copyright (C) 2008, http://www.dabeaz.com 5- Reading Attributes • A more
complicated problem • Attribute may be supplied from many places • Local dictionary • Class object • Base classes (inheritance) 127

Copyright (C) 2008, http://www.dabeaz.com 5- Reading Attributes • First check
in local __dict__ • If not found, look in __dict__ of class >>> c = Circle(...) >>> c.radius 4 >>> c.area() 50.2654824574 >>> c .__dict__ .__class__ {'x' : 0, 'radius' : 4, ...} Circle .__dict__ {'area': <func>, 'perimeter':<func>, '__init__':..} • If not found in class, look in base classes .__bases__ look in __bases__ 1 2 3 128

Copyright (C) 2008, http://www.dabeaz.com 5- A Subtle Issue • Python
uses a single dictionary to store everything associated with a class • This dictionary contains both data (class variables) and methods • So, you can't have instance data and methods with the same names (they'll conﬂict) • However, there is a tricky bit with all of this... 129

Copyright (C) 2008, http://www.dabeaz.com 5- Method Lookup • If you
lookup data, you get the data • If you lookup a method, it's different • You don't get method function! 130 >>> c = Circle(4) >>> c.radius 42 >>> c.area <bound method Circle.area of <__main__.Circle object at 0x6cb50>> >>> • What in the heck is that?

Copyright (C) 2008, http://www.dabeaz.com 5- Bound Methods • Methods always
get "wrapped" • The returned object is a method that's waiting for you to call it... 131 >>> c = Circle(4) >>> a = c.area >>> a <bound method Circle.area of <__main__.Circle object at 0x6cb50>> >>> a() 50.2654824574 >>> Calls the method The method itself as an object

Copyright (C) 2008, http://www.dabeaz.com 5- Bound Methods • Normally you
don't see it, but method calls are always a two-step process like this 132 c.area() <bound method : area> . operator - attribute lookup () operator - call 50.2654824574 • Essentially, looking up a method is separate from calling the method

Copyright (C) 2008, http://www.dabeaz.com 5- Bound Methods • Underneath the
covers 133 >>> c = Circle(4) >>> a = c.area >>> a <bound method Circle.area of <__main__.Circle object at 0x6cb50>> >>> a.im_class <class '__main__.Circle'> >>> a.im_func <function area at 0x69370> >>> a.im_self <__main__.Circle object at 0x6cb50> >>> • What happens on call? >>> a.im_func(a.im_self) 50.2654824574 >>> The class The func Instance

Copyright (C) 2008, http://www.dabeaz.com 5- Wrapping Concerns • The fact
that certain items pop out of a class with a wrapper slapped onto it should be somewhat disturbing • Who or what is doing this wrapping? • How does it ﬁt into the rest of the object system? 134

Copyright (C) 2008, http://www.dabeaz.com 5- Descriptors • Class attribute wrapping
is performed by deﬁning "descriptor" objects • A descriptor is an object that hooks into the attribute access on classes in Python • Allows customized actions to be deﬁned 135

Copyright (C) 2008, http://www.dabeaz.com 5- Descriptors • A Sample Descriptor
Object class Descriptor(object): def __get__(self,instance,cls): print "get", instance,cls def __set__(self,instance,value): print "set", instance, value def __delete__(self,instance) print "delete", instance 136 • Placing it into a class deﬁnition class Foo(object): bar = Descriptor() ... • It just has to have __get__,__set__, etc.

Copyright (C) 2008, http://www.dabeaz.com 5- Descriptors • How it works
>>> f = Foo() >>> f.bar get <__main__.Foo object at 0x5a810> <class '__main__.Foo'> >>> f.bar = 4 set <__main__.Foo object at 0x5a810> 4 >>> del f.bar delete <__main__.Foo object at 0x5a810> >>> • Attribute access automatically invokes __get__(), __set__(), and __delete__() 137

Copyright (C) 2008, http://www.dabeaz.com 5- Descriptors • Descriptors are used
for method wrapping class BoundMethod(object): def __init__(self,func,cls,instance): self.im_func = func self.im_class = cls self.im_self = instance def __call__(self,*args,**kwargs): return self.im_func(self.im_self, *args,**kwargs) 138 class InstanceMethodDescriptor(object): def __init__(self,func): self.func = func def __get__(self,instance,cls): return BoundMethod(self.func,cls,instance)

Copyright (C) 2008, http://www.dabeaz.com 5- Descriptors • Example of how
put together def bar_impl(self): print "I'm an instance method bar" class Foo(object): bar = InstanceMethodDescriptor(bar_impl) 139 • Example use: >>> f = Foo() >>> f.bar <__main__.BoundMethod object at 0x6cbb0> >>> f.bar() I'm an instance method bar >>>

Copyright (C) 2008, http://www.dabeaz.com 5- Descriptor Commentary • If this
makes sense, you're at 11 with Python • Much of this tucked away behind the scenes • It's critical to how Python works • But unknown to most Python programmers 140

Copyright (C) 2008, http://www.dabeaz.com 5- Multiple Inheritance class A(object): pass
class B(object): pass class C(A,B): pass • Base tuple contains multiple entries object A B C • For example: >>> C.__bases__ (<class '__main__.A'>, <class '__main__.B'>) >>> 141

Copyright (C) 2008, http://www.dabeaz.com 5- Multiple Inheritance • Attribute lookup
looks in base classes • However, complex hierarchies make this much more tricky class A(object): def bar(self): pass def spam(self): pass class B(object): def spam(self): pass class C(A,B): pass >>> c = C() >>> c.spam() # Which spam()??? 142

Copyright (C) 2008, http://www.dabeaz.com 5- Multiple Inheritance • Lookup rules
• Class is always checked ﬁrst • Then bases are checked in order listed class A(object): ... class B(object): ... class C(A,B): ... >>> c = C() >>> c.spam() 143 Search order: C, A, B

Copyright (C) 2008, http://www.dabeaz.com 5- Multiple Inheritance class A(object): pass
class B(object): pass class C(A,B): pass class D(B): pass class E(C,D): pass • Consider a more complex hierarchy object A B C D E • What happens here? >>> e = E() >>> e.x # Attribute access 144

Copyright (C) 2008, http://www.dabeaz.com 5- Multiple Inheritance • Search order
is based on a sort of bases object A B C D E • Search rules >>> e = E() >>> e.x • Can all of these be satisﬁed? Check E first C before D : class E(C,D) C before A : class C(A,B) C before B : class C(A,B) D before B : class D(B) A before B : class C(A,B) object last • Answer: Yes. E,C, A, D, B, object 145

Copyright (C) 2008, http://www.dabeaz.com 5- Multiple Inheritance • Method resolution
order (MRO) • __mro__ attribute contains order in which classes are searched >>> E.__mro__ (<class '__main__.E'>, <class '__main__.C'>, <class '__main__.A'>, <class '__main__.D'>, <class '__main__.B'>, <type 'object'>) • Determination of MRO is rather complex • Beyond scope of this talk • "C3 Linearization Algorithm" 146

Copyright (C) 2008, http://www.dabeaz.com 5- Multiple Inheritance • Can deﬁne
classes that are rejected! • Example: class A(object): pass class B(object): pass class C(A,B): pass class D(B,C): pass Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: Error when calling the metaclass bases Cannot create a consistent method resolution order (MRO) for bases B, C object A B C D • Reason: class D(B,C) --> B before C class C(A,B) --> C before B (B is base of C) 147

Copyright (C) 2008, http://www.dabeaz.com 5- Commentary • Objects in Python
are much more exposed • No notion of private data • Implementation is completely visible • Again, it's really just a layer that's been wrapped around dictionaries • However, over time, various "tweaks" have shown up in the language 148

Copyright (C) 2008, http://www.dabeaz.com 5- Private Attributes • Any attribute
with leading __ is "private" class Foo(object): def __init__(self): self.__x = 0 • Example >>> f = Foo() >>> f.__x AttributeError: 'Foo' object has no attribute '__x' >>> • This is really just a name mangling trick >>> f = Foo() >>> f._Foo__x 0 >>> 149

Copyright (C) 2008, http://www.dabeaz.com 5- __slots__ Attribute • You can
restrict the set of attribute names class Foo(object): __slots__ = ['x','y'] ... • Produces errors for other attributes >>> f = Foo() >>> f.x = 3 >>> f.y = 20 >>> f.z = 1 Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: 'Foo' object has no attribute 'z' • Prevents errors, restricts usage of objects 150

Copyright (C) 2008, http://www.dabeaz.com 5- Properties • Consider a class
with some accessor funcs class Foo(object): def __init__(self,name): self.__name = name def getName(self): return self.__name def setName(self,name): if not isinstance(name,str): raise TypeError, "Expected a string" self.__name = name • Property maps accessor funcs to attribute class Foo(object): ... name = property(getName,setName) ... 151

Copyright (C) 2008, http://www.dabeaz.com 5- Properties • Example: >>> f
= Foo("Elwood") >>> f.getName() 'Elwood' >>> f.name = 'Jake' >>> f.getName() 'Jake' >>> f.name = 45 TypeError: Expected a string >>> 152 Notice attribute assignment is caught • Properties would be the closest equivalent to how Ruby deals with attributes (via methods)

Copyright (C) 2008, http://www.dabeaz.com 5- Attribute Access • User defined
classes may redefine attribute access entirely • In other words, you can redefine (.) • Set of special methods for setting, deleting, and getting attributes 153

Copyright (C) 2008, http://www.dabeaz.com 5- __getattribute__() • __getattribute__(self,name) • Called
every time an attribute is read • Default behavior looks at instance dict • Then it checks the class dict • Then it checks base classes (inheritance) • If that fails, __getattr__(self,name) method is invoked 154

Copyright (C) 2008, http://www.dabeaz.com 5- __getattr__() method • __getattr__(self,name) •
A failsafe method. Called if an attribute can't be found using the standard mechanism • Default behavior is to raise AttributeError 155

Copyright (C) 2008, http://www.dabeaz.com 5- __setattr__() method • __setattr__(self,name,value) •
Called every time an attribute is set • Default behavior is to store value in local dictionary of self 156

Copyright (C) 2008, http://www.dabeaz.com 5- Example: class Circle(Shape): def __init__(self,radius):
self.radius = radius def __getattr__(self,name): if name == 'area': return math.pi*self.radius**2 elif name == 'perimeter': return 2*math.pi*self.radius else: return Shape.__getattr__(self,name) >>> c = Circle(4) >>> c.radius 4 >>> c.area 50.2654824574 >>> c.perimeter 25.132741228718345 >>> 157

Copyright (C) 2008, http://www.dabeaz.com 5- Classes as Objects • Like
Ruby, Python also has the concept of classes as objects • However, there is a huge twist to it • Python let's you redeﬁne what a class is! • Let's go take a look... 158

Copyright (C) 2008, http://www.dabeaz.com 5- Overview • When you deﬁne
a class, you get an "object" 159 class Circle(Shape): def __init__(self,radius): Shape.__init__(self) self.radius = radius def area(self): return math.pi*self.radius**2 def perimeter(self): return 2*math.pi*self.radius >>> Circle <class '__main__.Circle'> • A "class object"

Copyright (C) 2008, http://www.dabeaz.com 5- Classes as Objects • Classes
are instances of "types" >>> class Circle(Shape): pass >>> type(Circle) <type 'type'> >>> isinstance(Circle,type) True >>> Recall: type() tells you the type of an object. Here we're using it on a class itself. 160 • Here, Python is following the convention that you see in C++/Java • Classes deﬁne types

Copyright (C) 2008, http://www.dabeaz.com 5- Creating Types • So, class
deﬁnitions create new types. • However, a type is just a class class type(object): def __init__(self, *args, **kwargs): ... >>> type <type 'type'> >>> 161 • It's a class that creates new "types" • This is something known as a "metaclass"

Copyright (C) 2008, http://www.dabeaz.com 5- What is a class? •
Consider a class: 162 • What are its components? • Name ("Circle") • Base classes (Shape) • Functions (__init__,area, perimeter) class Circle(Shape): def __init__(self,radius): Shape.__init__(self) self.radius = radius def area(self): return math.pi*self.radius**2 def perimeter(self): return 2*math.pi*self.radius

Copyright (C) 2008, http://www.dabeaz.com 5- Creating a Class • You
can create a class without using the class statement (just assemble the pieces) 163 def __init__(self,radius): Shape.__init__(self) self.radius = radius def area(self): return math.pi*self.radius**2 def perimeter(self): return 2*math.pi*self.radius >>> methods = { ... '__init__' : __init__, ... 'area' : area, ... 'perimeter' : perimeter } ... >>> Circle = type("Circle",(Shape,),methods) >>> Circle <class '__main__.Circle'> >>>

Copyright (C) 2008, http://www.dabeaz.com 5- Class Deﬁnition • What happens
during class deﬁnition? • Step1: Body of class is extracted (into a string) body = """ def __init__(self,radius): Shape.__init__(self) self.radius = radius def area(self): return math.pi*self.radius**2 def perimeter(self): return 2*math.pi*self.radius """ 164 class Circle(Shape): def __init__(self,radius): Shape.__init__(self) self.radius = radius def area(self): return math.pi*self.radius**2 def perimeter(self): return 2*math.pi*self.radius

Copyright (C) 2008, http://www.dabeaz.com 5- Class Deﬁnition • Step 2:
Body is exec'd in its own dictionary __dict__ = { } exec body in globals(), __dict__ • The statements in the body execute • Afterwards, __dict__ is populated >>> __dict__ {'__init__' : <function __init__ at 0x4da10>, 'area' : <function area at 0x4dd70>, 'perimeter': <function perimeter at 0x4dea0>,} >>> 165

Copyright (C) 2008, http://www.dabeaz.com 5- Class Deﬁnition • Step 3:
Class is constructed from its name, base classes, and the dictionary >>> Circle = type("Circle",(Shape,),__dict__) >>> Circle <class '__main__.Circle'> >>> c = Circle(4) >>> c.area() 50.2654824574 >>> • type(name, bases, dict) constructs a class object 166

Copyright (C) 2008, http://www.dabeaz.com 5- The Metaclass Hook • Python
provides a hook that allows you to intercept the class creation step • Using this, you can feed the "class" into something other than "type" • In other words, you could come up with something very different than a normal class 167

Copyright (C) 2008, http://www.dabeaz.com 5- Metaclass Selection • __metaclass__ attribute
• Sets the metaclass that's used for construction • May be a class attribute or a global variable class Foo: __metaclass__ = type def bar(self): print "Foo.bar" 168 __metaclass__ = type class Foo: ... class Bar: ...

Copyright (C) 2008, http://www.dabeaz.com 5- New Metaclasses • By changing
the metaclass hook, you can create your own magic types • For example, inherit from type and tweak it 169

Copyright (C) 2008, http://www.dabeaz.com 5- Creating a Metaclass • Usually,
you inherit from type and redeﬁne __new__ class mytype(type): def __new__(cls,name,bases,__dict__): print "Creating class : ", name print "Base classes : ", bases print "Attributes : ", __dict__.keys() return type.__new__(cls,name,bases,__dict__) 170 • Then you deﬁne objects that hook to it class myobject: __metaclass__ = mytype

Copyright (C) 2008, http://www.dabeaz.com 5- Metaclass Applications • __new__ method
provides class name, base classes, and dictionary prior to class creation • Can inspect this information • Can modify this information • If you know what you are doing, can be used for a variety of useful/diabolical purposes 171

Copyright (C) 2008, http://www.dabeaz.com 5- Commentary • Metaclasses are probably
the most advanced and misunderstood part of Python • However, used widely by framework developers • Can be used to perform very interesting things with objects • Where do you go after reaching level 11? Metaclasses. 172

Copyright (C) 2008, http://www.dabeaz.com 5- Interlude • The Python implementation
of objects is based on a simple idea (just use dictionaries) • However, there are some subtle complications of that approach. • The co-mingling of data and functions means that you have to play some games with wrappers (descriptors) to get it to work • Otherwise, it's similar to what we saw before 173

Copyright (C) 2008, http://www.dabeaz.com 5- Interlude • The Python approach
has inﬂuenced others • Ruby : No way! • Perl : Let's do that! • Javascript : Objects, dictionaries, what's the difference? 174

Perl

Copyright (C) 2008, http://www.dabeaz.com 5- Perl Objects • Perl has
support for OO programming • It is generally acknowledged that the whole idea for it was taken straight out of Python. • Guido (Python) and Larry Wall (Perl) had previously interaction at conferences • And Perl already had a dictionary type (Hash) 176

Copyright (C) 2008, http://www.dabeaz.com 5- Instance Variables • An object
has to store instance variables • Let's just put them in a hash sub new { my $radius = shift; my $instance_data = { "radius" => $radius }; return $instance_data; } 177 • Hey, Python used a dictionary after all...

Copyright (C) 2008, http://www.dabeaz.com 5- Methods • Let's write functions
that use the hash sub area { my $self = shift; return $PI*$self{'radius'}**2; } sub perimeter { my $self = shift; return 2*$PI*$self{'radius'}; } 178 • These are just normal functions my $c = new(4); print(area($c),"\n");

Copyright (C) 2008, http://www.dabeaz.com 5- Packages • Perl has packages
which are a namespace package Circle; sub new { my $radius = shift; my $instance_data = { "radius" => $radius }; return $instance_data; } sub area { my $self = shift; return $PI*$self{'radius'}**2; } sub perimeter { my $self = shift; return 2*$PI*$self{'radius'}; } 179

Copyright (C) 2008, http://www.dabeaz.com 5- Packages • With namespaces, we're
real close... $c = Circle::new(4); print(Circle::area($c),"\n"); ... 180 • All of the methods are packaged together • Similar to a class • Recall from last lecture : This was one way that classes came about

Copyright (C) 2008, http://www.dabeaz.com 5- Blessing Things • Perl can
"bless" data into a package 181 package Circle; sub new { my $radius = shift; my $instance_data = { "radius" => $radius }; bless $instance_data,"Circle"; return $instance_data; } • This sets an attribute on the hash to point to the package name supplied • Aha! So that's a link to the class (the package)

Copyright (C) 2008, http://www.dabeaz.com 5- Blessing Things • Using the
"blessed" object 182 $c = Circle::new(4); print($c->area(),"\n"); print($c->perimeter(),"\n"); • This gives us the -> syntax for methods

Copyright (C) 2008, http://www.dabeaz.com 5- Patching up Constructors • A
more OO-syntax: 183 $c = Circle->new(4); print($c->area(),"\n"); print($c->perimeter(),"\n"); • Requires a slight change to the new function sub new { my ($pkg,$radius) = @_; # Get name and argument my $instance_data = { "radius" => $radius }; bless $instance_data, $pkg; return $instance_data; }

Copyright (C) 2008, http://www.dabeaz.com 5- Commentary • We now have
basic "objects" • Hash tables tied to a package with methods • Some convenient syntax (->, ->new) • Next : Inheritance 184

Copyright (C) 2008, http://www.dabeaz.com 5- Inheritance • Just have a
special variable with base classes 185 package Shape; sub new { my $pkg = shift; my $instance = {"x" = > 0, "y" => 0 }; bless $instance,$pkg; return $instance; } sub move { my ($self,$dx,$dy) = @_; $self->{'x'} += $dx; $self->{'y'} += $dy; } package Circle; @ISA = ("Shape"); # Inherit from Shape

Copyright (C) 2008, http://www.dabeaz.com 5- Inheritance • Initializing the base
186 package Circle; @ISA = ("Shape"); # Inherit from Shape sub new { my ($pkg,$radius) = @_; my $instance = $pkg->SUPER::new(); $instance->{'radius'} = $radius; return $instance; } ...

Copyright (C) 2008, http://www.dabeaz.com 5- Commentary • There are a
few more details • Basically, you're just linking hash tables and packages together • Hash table is the instance data • Package is the class • Variables in the package set up inheritance 187

Copyright (C) 2008, http://www.dabeaz.com 5- Commentary • The Perl object
system is actually more ﬂexible than you might imagine • For example, you don't technically have to use a Hash object to store data • You can implement objects in different ways • Many other customization features 188

Javascript

Copyright (C) 2008, http://www.dabeaz.com 5- Javascript Objects • Javascript doesn't
really have an OO system based on classes per se. • Instead, it just merges arrays and objects together • Essentially : An associative array is an object. An object is an associative array. 190

Copyright (C) 2008, http://www.dabeaz.com 5- Creating an "Object" • Here's
some instance data 191 var c = { 'radius' : 4, 'x' : 0, 'y' : 0 }; • Once you've done that, you can access the data in two different ways document.writeln(c['radius']); document.writeln(c.radius); • The (.) operator is just an array lookup

Copyright (C) 2008, http://www.dabeaz.com 5- Deﬁne a Method 192 var
c = { 'radius' : 4, 'x' : 0, 'y' : 0 }; c.area = function() { return 3.1415926*this.radius*this.radius; } • Just attach a function to an array • Call the method a = c.area(); • Inside the function, 'this' refers to the array

Copyright (C) 2008, http://www.dabeaz.com 5- Constructor Functions 193 function Circle(radius)
{ this.radius = radius; } c = new Circle(4); • Any function can be a "constructor" • If you call a function like this, 'this' is already set up to point to an empty array • You just place values into it

Copyright (C) 2008, http://www.dabeaz.com 5- A Simple Object 194 function
Circle(radius) { this.radius = radius; this.area = function() { return PI*this.radius*this.radius; } this.perimeter = function() { return 2*PI*this.radius; } } c = new Circle(4); a = c.area(); p = c.perimeter(); • Just write a function and deﬁne methods

Copyright (C) 2008, http://www.dabeaz.com 5- A Problem 195 • One
problem with this approach • Methods get deﬁned and stored in every single instance function Circle(radius) { this.radius = radius; this.area = function() { return PI*this.radius*this.radius; } this.perimeter = function() { return 2*PI*this.radius; } } • Needless to say, that isn't very efﬁcient

Copyright (C) 2008, http://www.dabeaz.com 5- Prototype Objects 196 function Circle(radius)
{ this.radius = radius; } Circle.prototype.area = function() { return PI*this.radius*this.radius; } Circle.prototype.perimeter = function() { return 2*PI*this.radius; } • A function has a hidden "prototype" attached • A prototype is just another object (array)

Copyright (C) 2008, http://www.dabeaz.com 5- Prototype Objects 197 c =
new Circle(4); r = c.radius; # From array c a = c.area(); # From Circle.prototype • If a function has a prototype attached, a link to the prototype gets carried along with any object that gets created • Attribute lookup will go to the prototype if it can't be found in the array itself { 'radius' : 4 } { 'area' : <func> 'perimeter' : <func> } Circle.prototype

Copyright (C) 2008, http://www.dabeaz.com 5- Classes and Prototypes 198 •
Prototypes look a lot like a class • Every object has its own data • But, each object is also linked to a prototype that can supply values as a fallback

Copyright (C) 2008, http://www.dabeaz.com 5- Inheritance 199 • Since Javascript
doesn't really have classes, there is no "class-based" inheritance • However, you can play funny games with linking prototypes together • This gets rather ugly in a hurry

Copyright (C) 2008, http://www.dabeaz.com 5- Prototype Inheritance 200 function Shape()
{ this.x = 0; this.y = 0; } Shape.prototype.move = function(dx,dy) { this.x += dx; this.y += dy; } function Circle(radius) { Shape.call(this); this.radius = radius; } Circle.prototype = new Shape(); delete Circle.prototype.x; delete Circle.prototype.y; Circle.prototype.constructor = Circle; Circle.prototype.area = function() { return PI*this.radius*this.radius; } We create a Shape and use it as the prototype. However, we have to patch it up a bit.

Copyright (C) 2008, http://www.dabeaz.com 5- Borrowing Methods 201 function Shape()
{ this.x = 0; this.y = 0; } Shape.prototype.move = function(dx,dy) { this.x += dx; this.y += dy; } function Circle(radius) { Shape.call(this); this.radius = radius; } for (m in Shape.prototype) { if (typeof Shape.prototype[m] != "function") continue; Circle.prototype[m] = Shape.prototype[m]; } Circle.prototype.area = function() { return PI*this.radius*this.radius; } Here we're just copying functions from one prototype to another

Copyright (C) 2008, http://www.dabeaz.com 5- Commentary 202 • Javascript is
probably the logical extension of using hash tables/arrays to represent objects • It essentially just merges them together • Functions are set up to receive the array as "this" if they're invoked through an array

Copyright (C) 2008, http://www.dabeaz.com 5- Big Picture 204 • We
have taken a very detailed tour of how objects work in a variety of languages • There were some common themes • Covered many subtle differences between implementations

Copyright (C) 2008, http://www.dabeaz.com 5- Wrap-up 205 • Your brain
probably hurts by now • Next time, we'll look at some common design patterns related to use objects • Will shift gears into some other topics

Copyright (C) 2008, http://www.dabeaz.com 6- Files, I/O, Processes and Text
Processing Section 6 2

Copyright (C) 2008, http://www.dabeaz.com 6- Overview 3 • Files, File
systems, and I/O • Processes and subprocesses • Text parsing and pattern matching • Inside regular expressions

the I/O Problem

Copyright (C) 2008, http://www.dabeaz.com 6- Big Applications 5 • Let's
be honest, most "serious" computing applications tend to be written in C, C++, Java, or some kind of "compiled" language • It's partly for performance (a C program may be 100x faster than an equivalent script) • Also for extra safety. The compiler has strict rules and performs all kinds of program checking (to catch errors before you run)

Copyright (C) 2008, http://www.dabeaz.com 6- Reading/Writing Data 6 • Large
applications don't exist in total isolation • They always have to read/write data • The data may arrive in many ways (ﬁles, pipes, network, etc.) • And many possible formats

Copyright (C) 2008, http://www.dabeaz.com 6- Reality 7 • Real programmers
don't use one application for everything (well, let's exclude emacs) • You solve problems by using many different applications for different kinds of tasks • You use the best tool for the job • Much of day-to-day work is involved in simply moving data around between applications

Copyright (C) 2008, http://www.dabeaz.com 6- Example 8 Modeling Application Files
Imaging Images Analysis Database WWW Files Web pages Params Data Acquisition

Copyright (C) 2008, http://www.dabeaz.com 6- Example 9 • In that
picture, each component is a completely separate application • May be written in different languages • Developed completely independently • May be legacy code that can't be replaced

Copyright (C) 2008, http://www.dabeaz.com 6- Personal Experience 10 • When
I started working with dynamic languages in 1995, I was a programmer on a large scientiﬁc computing project • About 80% of our time was spent futzing around with data ﬁles (moving them around, converting them, making them work with other programs, etc.) • It was a huge pain.

Copyright (C) 2008, http://www.dabeaz.com 6- Dynamic Languages 11 Modeling Application
Files Imaging Analysis Database Files Params Data Acquisition Python Script Analysis

Copyright (C) 2008, http://www.dabeaz.com 6- Why Dynamic Languages? 12 •
Very easy to develop and reconﬁgure • Can handle a huge variety of data formats • You're taking a problem which is inherently messy and solving with a language that is adept at solving messy problems.

Copyright (C) 2008, http://www.dabeaz.com 6- Files 14 • Files are
probably the most basic form of handling data • Programs write ﬁles as output • You create ﬁles that serve as input • Let's talk about some basic concepts...

Copyright (C) 2008, http://www.dabeaz.com 6- File Implementation 15 • At
the lowest level, a file is a byte sequence • All operations concerning files are focused around manipulating that byte sequence (reading it, writing it, modifying it, etc.) • To the operating system, there is nothing particularly "special" about any given file • It's just a bunch of bytes...

Copyright (C) 2008, http://www.dabeaz.com 6- Opening a File 16 •
To use a ﬁle, it must ﬁrst be "opened" • Example: f = open("somefile.txt","r") # Open for read f = open("somefile.txt","w") # Open for write f = open("somefile.txt","a") # Open for append • This gives you an object with basic operations f.read(maxbytes) # Read N bytes f.write(text) # Write to a file f.close() # Close the file

Copyright (C) 2008, http://www.dabeaz.com 6- The File API 17 •
The programming model for most languages is taken from low-level system calls (POSIX) open(filename,mode,flags) # Open a file read(fd,buffer,maxsize) # Read into a buffer write(fd,buffer,nbytes) # Write a buffer close(fd) # Close a file seek(fd,offset,origin) # Seek to a new position tell(fd) # Get file pointer • It might be cleaned up a bit, but usually it's not much different than this

Copyright (C) 2008, http://www.dabeaz.com 6- File Internals 18 f =
open("foo.txt","r") mode : r flags : XX fp : 0 Operating System • Opening a ﬁle creates an OS data structure • The contents are not visible • Holds the state of the "ﬁle"

Copyright (C) 2008, http://www.dabeaz.com 6- File Pointer 19 • Most
useful internal state is the ﬁle pointer • Keeps track of current ﬁle position f = open("foo.txt","r") data = f.read(10) data = f.read(15) read(10) read(15) mode : r flags : XX fp : 25 Operating System foo.txt

Copyright (C) 2008, http://www.dabeaz.com 6- Seek and Tell 20 •
Manipulation of the ﬁle pointer >>> f = open("foo.txt","r") >>> f.seek(1024) # Set fp >>> data = f.read(76) >>> f.tell() 1100 >>> • It's exactly the same in most other languages

Copyright (C) 2008, http://www.dabeaz.com 6- Multiple Open Files 21 •
The same file can be open in more than one place at a time (even in the same program) • Each time you open a file, you get a new file object with a separate file pointer • Although each file is managed separately, they all operate on the same underlying data

Copyright (C) 2008, http://www.dabeaz.com 6- Example 22 • Multiple ﬁle
pointers >>> f = open("foo.txt","r") >>> f.readline() 'Hello World\n' >>> f.readline() 'This is a test\n' >>> g = open("foo.txt","r") >>> g.readline() 'Hello World\n' >>> f.tell() 25 >>> g.tell() 12 >>>

Copyright (C) 2008, http://www.dabeaz.com 6- File Updates/Changes 23 • Changes
to a file are reflected everywhere • If a file is opened for reading and the file contents get modified behind the scenes, those changes will affect subsequent read operations • Basically, everything stays in sync. • Details are covered in an OS class

Copyright (C) 2007, http://www.dabeaz.com 6- Text Files 24 • By
default, ﬁles are opened in text mode f = open(filename,"r") # Read, text mode f = open(filename,"w") # Write, text mode f = open(filename,"a") # Append, text mode • Text mode assumes line orientation • However, what is a line? some characters .......\n (Unix) some characters .......\r\n (Windows) some characters .......\r (Classic Mac) • This determination is made by the system

Copyright (C) 2007, http://www.dabeaz.com 6- Newline Handling 25 • When
writing, '\n' is translated to system newline >>> f = open("test.txt","w") >>> f.write("Hello World\n") >>> f.close() • Unix test.txt: Hello World\n • Windows test.txt: Hello World\r\n

Copyright (C) 2007, http://www.dabeaz.com 6- Newline Handling 26 • When
reading, system newline is converted back to the standard '\n' character >>> f = open("test.txt","r") >>> f.read() 'Hello World\n' >>> • Mostly, you don't have to worry about it • .... except if you do cross-platform work

Copyright (C) 2007, http://www.dabeaz.com 6- Cross Platform Text Files 27
• Example: Reading a Windows text ﬁle on Unix >>> f = open("test.txt","r") >>> f.readlines() ['Hello\r\n', 'World\r\n'] >>> • Here, you get that extra '\r' in the input • Which may break code next expecting it

Copyright (C) 2007, http://www.dabeaz.com 6- Binary Files 28 • Binary
data requires a different I/O mode f = open(filename,"rb") # Read, binary mode f = open(filename,"wb") # Write, binary mode f = open(filename,"ab") # Append, binary mode • Disables all newline translation (reads/writes) • Required for binary data on Windows • Optional, but supported on Unix (gotcha)

Copyright (C) 2007, http://www.dabeaz.com 6- Binary File Example 29 •
Difference between modes >>> open("example.txt","r").read() 'Hello World\n' >>> open("example.txt","rb").read() 'Hello World\r\n' >>> • Notice untranslated newline

Copyright (C) 2007, http://www.dabeaz.com 6- Commentary 30 • Sadly, this
business with text vs. binary is part of the operating system itself • All programming languages and applications on the system face the same issue • It's one reason why data is sometimes corrupted when transferred between systems (unintended newline expansion)

Copyright (C) 2008, http://www.dabeaz.com 6- Concept: Processes • A "process"
is a running program • Has its own dedicated resources • Memory, open ﬁles, net connections, etc. • Runs independently (own stack, PC, etc.) • Isolated from other processes • Closely associated with an "application" 32

Copyright (C) 2008, http://www.dabeaz.com 6- The Interpreter Process • Dynamic
programs usually execute inside an interpreter (which is the process) 33

Copyright (C) 2008, http://www.dabeaz.com 6- Subprocesses • A program can
create a new process • This is called a "subprocess" • The subprocess often runs under the control of the original process (which is known as the "parent" process) • Parent often wants to collect output or the status result of the subprocess 34

Copyright (C) 2008, http://www.dabeaz.com 6- Subprocess Control • When launching
a subprocess, the parent typically has control over the following: • Command line arguments • Environment variables • Standard I/O streams • Signal handling 35

Copyright (C) 2008, http://www.dabeaz.com 6- Command Line Arguments • A
list of strings 36 shell % foo.exe arg1 arg2 ... arg3 • In the target process, these shown up in argv C/C++: int main(int argc, char *argv[]) { ... } Java: public static void main(String argv[]) { ... } Python: sys.argv Perl: @ARGV

Copyright (C) 2008, http://www.dabeaz.com 6- Environment Variables • A hash-table
of string values 37 shell % setenv NAME VALUE shell % foo.exe • In the target process C/C++: char *value = getenv("NAME"); Python: value = os.environ['NAME'] Perl: $value = %ENV{'NAME'}

Copyright (C) 2008, http://www.dabeaz.com 6- Standard I/O Streams • A
set of ﬁles (stdin, stdout, stderr) 38 shell % foo.exe >out.txt shell % foo.exe <in.txt shell % foo.exe | bar.exe subprocess stdin stdout stderr • In the shell, controlled via redirection/pipes • Parent process sets up these ﬁles for subprocess

Copyright (C) 2008, http://www.dabeaz.com 6- Signals • A parent process
can signal a subprocess 39 shell % kill -signo pid shell % subprocess • Examples : suspend, terminate, etc. • On Unix, this is the "kill" command parent signal • On Windows, support is weak/nonexistent

Copyright (C) 2008, http://www.dabeaz.com 6- Status Codes • When subprocess
terminates, it returns a status • An integer code of some kind 40 C: exit(status); Java: System.exit(status); Python: raise SystemExit(status) • Convention is for 0 to indicate "success." Anything else is an error.

Copyright (C) 2008, http://www.dabeaz.com 6- Commentary • Keep in mind
that subprocesses are almost entirely independent from the parent • The parent can set up the environment, send signals, and collect return codes, but otherwise has no control over what happens inside the subprocess. 41

Copyright (C) 2008, http://www.dabeaz.com 6- Running a Subprocess $a =
`ls -l`; # Backticks. Perl/Ruby • Support for this varies • There may be some simple options 42 • This runs a shell command and captures the output. • However, it's lacking for a lot of other things • Will often see a process management module. Will illustrate for Python.

Copyright (C) 2008, http://www.dabeaz.com 6- subprocess Module • A high-level
module for subprocesses • Cross-platform (Unix/Windows) • Tries to consolidate the functionality of a wide- assortment of low-level system calls (system, popen(), exec(), spawn(), etc.) • Will illustrate with some common use cases 43

Copyright (C) 2008, http://www.dabeaz.com 6- Executing Commands Problem: You want
to execute a simple shell command or run a separate program. You don't care about capturing its output. import subprocess p = subprocess.Popen(['mkdir','temp']) q = subprocess.Popen(['rm','-f','tempdata']) • Executes a command string • Returns a Popen object (more in a minute) 44

Copyright (C) 2008, http://www.dabeaz.com 6- Specifying the Command subprocess.Popen(['rm','-f','tempdata']) •
Popen() accepts a list of command args 45 • These are the same as the args in the shell shell % rm -f tempdata • Note: Each "argument" is a separate item subprocess.Popen(['rm','-f','tempdata']) # Good subprocess.Popen(['rm','-f tempdata']) # Bad Don't merge multiple arguments into a single string like this.

Copyright (C) 2008, http://www.dabeaz.com 6- PATH Environment >>> os.environ['PATH'] '/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:...'
>>> • When launching a command, Popen() uses the setting of the PATH environment variable to search for the subprocess program 46 • Changes affect subsequent Popen() calls os.environ['PATH']="/mypath/bin:"+os.environ['PATH'] p = subprocess.Popen(["foo"])

Copyright (C) 2008, http://www.dabeaz.com 6- Environment Vars env_vars = {
'NAME1' : 'VALUE1', 'NAME2' : 'VALUE2', ... } p = subprocess.Popen(['cmd','arg1',...,'argn'], env=env_vars) • How to set up environment variables 47 • Note : If this is supplied and there is a PATH environment variable, it will be used to search for the command (Unix)

Copyright (C) 2008, http://www.dabeaz.com 6- Current Directory p = subprocess.Popen(['cmd','arg1',...,'argn'],
cwd='/some/directory') • If you need to change the working directory 48 • Note: This changes the working directory for the subprocess, but does not affect how Popen() searches for the command

Copyright (C) 2008, http://www.dabeaz.com 6- Collecting Status Codes p =
subprocess.Popen(['cmd','arg1',...,'argn']) ... status = p.wait() • When you launch a subprocess, it runs independently from the parent • To wait and collect status, use wait() 49 • Status will be the integer return code (which is also stored) p.returncode # Exit status of subprocess

Copyright (C) 2008, http://www.dabeaz.com 6- Polling a Subprocess p =
subprocess.Popen(['cmd','arg1',...,'argn']) ... if p.poll() is None: # Process is still running else: status = p.returncode # Get the return code • poll() - Checks status of subprocess 50 • Returns None if the process is still running, otherwise the returncode is returned

Copyright (C) 2008, http://www.dabeaz.com 6- Killing a Subprocess p =
subprocess.Popen(['cmd','arg1',...,'argn']) import os os.kill(p.pid,9) # • A notable omission (subprocess module provides no such functionality). • On Unix, can use os.kill() 51 • On Windows, a mess (many options) subprocess.Popen(['TASKKILL','/PID',str(p.pid),'/F']) import win32api win32api.TerminateProcess(int(p._handle),-1)

Copyright (C) 2008, http://www.dabeaz.com 6- Capturing Output Problem: You want
to execute another program and capture its output • Use additional options to Popen() import subprocess p = subprocess.Popen(['cmd'], stdout=subprocess.PIPE) data = p.stdout.read() • This works with both Unix and Windows • Captures any output printed to stdout 52

Copyright (C) 2008, http://www.dabeaz.com 6- Sending/Receiving Data Problem: You want
to execute a program, send it some input data, and capture its output • Set up pipes using Popen() p = subprocess.Popen(['cmd'], stdin = subprocess.PIPE, stdout = subprocess.PIPE) p.stdin.write(data) # Send data p.stdin.close() # No more input result = p.stdout.read() # Read output python cmd p.stdout p.stdin stdin stdout 53

Copyright (C) 2008, http://www.dabeaz.com 6- Sending/Receiving Data Problem: You want
to execute a program, send it some input data, and capture its output • Set up pipes using Popen() p = subprocess.Popen(['cmd'], stdin = subprocess.PIPE, stdout = subprocess.PIPE) p.stdin.write(data) # Send data p.stdin.close() # No more input result = p.stdout.read() # Read output python cmd p.stdout p.stdin stdin stdout 54 Pair of ﬁles that are are hooked up to the subprocess

Copyright (C) 2008, http://www.dabeaz.com 6- Sending/Receiving Data • How to
capture stderr p = subprocess.Popen(['cmd'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE) python cmd p.stdout p.stdin stdin stdout 55 p.stderr stderr • Note: stdout/stderr can also be merged p = subprocess.Popen(['cmd'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

Copyright (C) 2008, http://www.dabeaz.com 6- I/O Redirection • Connecting input
to a file f_in = open("somefile","r") p = subprocess.Popen(['cmd'], stdin=f_in) 56 • Connecting the output to a file f_out = open("somefile","w") p = subprocess.Popen(['cmd'], stdout=f_out) • Basically, stdin and stdout can be connected to any open file object • Note : Must be a real file in the OS

Copyright (C) 2008, http://www.dabeaz.com 6- Subprocess I/O • Subprocess module
can be used to set up fairly complex I/O patterns 57 import subprocess p1 = subprocess.Popen("ls -l", shell=True, stdout=subprocess.PIPE) p2 = subprocess.Popen("wc",shell=True, stdin=p1.stdout, stdout=subprocess.PIPE) out = p2.stdout.read() • Note: this is the same as this popen2.popen2("ls -l | wc")

Copyright (C) 2008, http://www.dabeaz.com 6- I/O Issues • Some care
required when communicating with subprocesses 58 • To signal end of input, don't forget to close the input stream p = subprocess.Popen(['cmd'], stdout=subprocess.PIPE, stdin=subprocess.PIPE) p.stdin.write(data) # Send data p.stdin.close() # No more input result = p.stdout.read() # Read output • If you forget, subprocess may hang

Copyright (C) 2008, http://www.dabeaz.com 6- I/O Issues • subprocess modules
does not work well for controlling interactive processes • Buffering behavior is often wrong (may hang) • Pipes don't properly emulate terminals • Subprocess may not operate correctly 59

Copyright (C) 2008, http://www.dabeaz.com 6- Unix Process Fork Problem: You
want to clone the original process and have two identical processes • fork(), wait(), _exit() import os pid = os.fork() if pid == 0: # Child process ... os._exit(0) else: # Parent process ... # Wait for child os.wait(pid) python python fork() _exit() wait() concurrent execution 60

Copyright (C) 2008, http://www.dabeaz.com 6- Unix Process Fork • fork()
creates an identical process • Newly created process is a "child process" • fork() returns different values in parent/child import os pid = os.fork() if pid == 0: # Child process else: # Parent process 61 pid is 0 in child, non-zero in parent • Parent and child run independently afterwards

Copyright (C) 2008, http://www.dabeaz.com 6- Unix Process Fork • Most
common use-case: multiple clients • Server forks a new process to handle each client 62 Server listening Server Server Server Server fork() Client Client Client Client new clients

Copyright (C) 2008, http://www.dabeaz.com 6- Unix Process Fork • Note:
There are MANY tricky details • Typically covered in an operating systems course • Consult: "Advanced Unix Programming" by W. Richard Stevens 63

Copyright (C) 2008, http://www.dabeaz.com 6- I/O Layers • One problem
with I/O is that data is often encoded in a variety of different formats • Compression (gz, bz2, zip, etc.) • Unicode (UTF-8, UTF-16, etc.) • Text (Base64, Hex, Quopri, etc.) • Data might be a mix of formats • Example: A compressed UTF-8 ﬁle 65

Copyright (C) 2008, http://www.dabeaz.com 6- Solution • Create ﬁle-like layers
that get stacked on top of the existing ﬁle object interface 66 File bzip2 utf-8 write write write read read read

Copyright (C) 2008, http://www.dabeaz.com 6- Solution • Each layer presents
itself as a ﬁle but is really just a wrapper around a low-level ﬁle • This kind of approach is used by Java • Also starting to show up in dynamic languages • Let's look at codecs in Python 67

Copyright (C) 2007, http://www.dabeaz.com 6- codecs.open • codecs.open(filename,mode,encoding) • Opens
a normal file using a specific encoding • Example: 68 >>> f = codecs.open("file.bz2","rb","bz2") >>> f.read() 'Hello World\n' >>>

Copyright (C) 2007, http://www.dabeaz.com 6- codecs.open 69 "base64" Base64 Encoding
"hex" Hex encoding "bz2" Bz2 compression "quopri" MIME Quoted printable "string_escape" String with escape codes "zlib" Zip compression • Example encodings: • Examples: f = codecs.open("file.txt","r","base64") g = codecs.open("file.txt","w","quopri") ...

Copyright (C) 2007, http://www.dabeaz.com 6- codecs Issues • Codecs behave
like ﬁles, but certain operations may break the encoding process • Example: Random access/seeks • Example: Invalid data (encoding error) • Your mileage might vary 70

Copyright (C) 2007, http://www.dabeaz.com 6- codecs and Strings • Strings
and codecs are friends 71 s.encode(encoding) # Encode a string s.decode(encoding) # Decode a string • Example: >>> s = "Hello World" >>> t = s.encode("base64") >>> t 'SGVsbG8gV29ybGQ=\n' >>> t.decode("base64") 'Hello World' >>>

Copyright (C) 2007, http://www.dabeaz.com 6- codecs and Strings • Can
manually handle encodings • Just call encode/decode yourself as needed 72 >>> f = open("foo.dat","wb") >>> f.write(data.encode("zlib")) >>> f.close() >>> g = open("foo.dat","rb") >>> data = g.read().decode("zlib") >>> g.close() • Note: using codecs module may be more memory efﬁcient

Copyright (C) 2007, http://www.dabeaz.com 6- Unicode • Unicode: Multibyte characters
73 s = u"Hello World" t = u"Jalape\u00f1o" • Encodes characters from all used languages • Widely used on Internet (internationalization) • A huge topic (won't cover all details)

Copyright (C) 2007, http://www.dabeaz.com 6- Unicode Representation • Internally, Python
stores Unicode as 16-bit integers (UCS-2) 74 t = u"Jalape\u00f1o" 004a 0061 006c 0061 0070 0065 00f1 006f • Normally, you don't worry about this • Except if you write a unicode string to a ﬁle u"J" --> 00 4a (Big Endian) u"J" --> 4a 00 (Little Endian)

Copyright (C) 2007, http://www.dabeaz.com 6- Unicode and Codecs • Unicode
I/O always involves some encoding • Handled through codecs module 75 >>> f = codecs.open("data.txt","w","utf-8") >>> f.write(u"Hello World\n") >>> f.close() >>> f = codecs.open("data.txt","w","utf-16") >>> f.write(data) >>> • Several hundred character codecs are provided • Consult documentation for details

Copyright (C) 2007, http://www.dabeaz.com 6- Unicode Encodings • Explicit encoding
via strings 76 >>> a = u"Jalape\u00f1o" >>> enc_a = a.encode("utf-8") >>> • Example: Writing Unicode strings to a ﬁle >>> f = open(filename,"wb") >>> f.write(data.encode("utf-8")) • Note: Since encoding may contain binary data, should probably use binary ﬁle modes.

Copyright (C) 2007, http://www.dabeaz.com 6- Unicode Decoding • Strings can
also be decoded into Unicode 77 >>> enc_a = 'Jalape\xc3\xb1o' >>> a = enc_a.decode("utf-8") >>> a u'Jalape\xf1o' >>> • Example: Reading Unicode strings to a ﬁle >>> f = open(filename,"rb") >>> enc_data = f.read() >>> data = enc_data.decode("utf-8") • Again, be aware that Unicode data may contain binary data

Copyright (C) 2007, http://www.dabeaz.com 6- Finding the Encoding • How
do you determine the encoding of a ﬁle? • Might be known in advance (strongly typed) • Often indicated in the ﬁle itself 78 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/> • Depends on the data source, application, etc.

Copyright (C) 2008, http://www.dabeaz.com 6- Part 5 79 The Secret
Life of Regular Expressions

Copyright (C) 2008, http://www.dabeaz.com 6- Regular Expressions • Virtually all
dynamic languages have extensive support for text pattern processing with regular expressions • And in some languages, regular expressions are part of the language itself (e.g., Perl and Ruby). • Let's dig a little deeper... 80

Copyright (C) 2008, http://www.dabeaz.com 6- The Problem • Many problems
involve searching and matching speciﬁc text patterns. • Example: email addresses 81 Please send email to [email protected] and maybe you will get a response (maybe). • Example: URLs Go look on http://www.google.com for details. • Example: A U.S. phone number 773-555-1212

Copyright (C) 2008, http://www.dabeaz.com 6- The Problem • Specifying and
matching a text pattern is a much more complicated problem than looking for an exact substring • Must have a concise and easy way to specify the legal characters that make up a pattern along with the order in which they are supposed to appear 82

Copyright (C) 2008, http://www.dabeaz.com 6- Solution : Regexs • A
regular expression is a concise speciﬁcation of a text pattern • Built from a few basic rules: 83 abc Matches the chars 'abc' exactly [chars] Match characters in a set [^chars] Match characters not in a set pat1|pat2 Matches either pat1 or pat2 pat* Zero or more repetitions of pat pat+ One or more repetitions of pat pat? Zero or more occurence of pat (pat) A group the matches pat • These are then combined

Copyright (C) 2008, http://www.dabeaz.com 6- Regex Example 84 • A
pattern to match the title of an HTML doc <title>(.*?)</title> • Problem 1 : Print all matching lines <html> <head> <title>This is an example</title> </head> <body> ... </body> </html>

Copyright (C) 2008, http://www.dabeaz.com 6- Regex in the Language 85
• Perl open(INFILE,"foo.html"); while ($line = <INFILE>) { if ($line =~ /<title>(.*?)<\/title>/) { print $line; } } • Ruby f = open("foo.html") for line in f if line =~ /<title>(.*?)<\/title>/ print line end end

Copyright (C) 2008, http://www.dabeaz.com 6- Regex as an object 86
• Example: Python import re pat = re.compile('<title>(.*?)</title>') for line in open("foo.html"): if pat.search(line): print line • Here, the regex features are just in a library module. There is no special syntax or operators devoted to matching (it's a method)

pattern to match the title of an HTML doc <title>(.*?)</title> • Problem 2 : Extract just the title text itself <title>This is an example</title> This is an example

Copyright (C) 2008, http://www.dabeaz.com 6- Groups • Regular expressions may
deﬁne groups <title>(.*?)</title> ([\w-]+):(.*) • Groups are assigned numbers <title>(.*?)</title> ([\w-]+):(.*) 1 1 2 • Number determined left-to-right 88

Copyright (C) 2008, http://www.dabeaz.com 6- Group Extraction 89 • Perl
open(INFILE,"foo.html"); while ($line = <INFILE>) { if ($line =~ /<title>(.*?)<\/title>/) { print $1,"\n"; } } • Ruby f = open("foo.html") for line in f if line =~ /<title>(.*?)<\/title>/ print Regexp.last_match(1),"\n" end end

Copyright (C) 2008, http://www.dabeaz.com 6- Group Extraction 90 • Example:
Python import re pat = re.compile('<title>(.*?)</title>') for line in open("foo.html"): m = pat.search(line) if m: print m.group(1)

pattern to match the title of an HTML doc <title>(.*?)</title> • Problem 3 : Change the title to subject <title>This is an example</title> <subject>This is an example</subject>

Copyright (C) 2008, http://www.dabeaz.com 6- Text Substitution 92 • Perl
open(INFILE,"foo.html"); while ($line = <INFILE>) { $line =~ s/<title>(.*?)<\/title>/<subject>\1<\/subject/; print $line; } • Ruby f = open("foo.html") for line in f line.gsub!(/<title>(.*?)<\/title>/, '<subject>\1</subject>') print line end

Copyright (C) 2008, http://www.dabeaz.com 6- Text Substitution 93 • Example:
Python import re pat = re.compile('<title>(.*?)</title>') for line in open("foo.html"): line = pat.sub('<subject>\\1</subject>',line) print line,

Copyright (C) 2008, http://www.dabeaz.com 6- Commentary • Knowing how to
use regular expressions is mostly a matter of reading the manual • All of the books on dynamic languages cover them • Most programmers have used them at some point • So, I'm not going to continue with a manual 94

Copyright (C) 2008, http://www.dabeaz.com 6- Behind the Scenes • I
think most programmers (including myself), think regular expressions involve some fairly serious magic • This is partly true....especially for some of the more hard-core features • However, how do they really work? • Is there anything interesting to be learned by looking into this? Well, maybe.... 95

Copyright (C) 2008, http://www.dabeaz.com 6- Some History • Regular expressions
originate in theoretical computer science--automata theory. • First appear sometime in the 1950s • They were popularized greatly by Ken Thompson who incorporated regex capabilities into the Unix ed editor (~1970) • They then propagated to to other Unix tools (grep, awk, vi, lex, emacs, etc.) 96

Copyright (C) 2008, http://www.dabeaz.com 6- The regex Library • A
free software library written by Henry Spencer (~1985) • Used to build regex support in early versions of Perl and Tcl which then expanded upon/rewrote the library • Almost every modern language with regex support derives directly/indirectly from the Spencer library (or at least its approach) 97

Copyright (C) 2008, http://www.dabeaz.com 6- Patterns -> NFA • Regular
expression patterns are typically used to build a NFA • Non-deterministic Finite Automata • Covered in great detail in theory course, but let's look at the general idea 98

Copyright (C) 2008, http://www.dabeaz.com 6- Patterns to NFA • Example
Pattern 99 ba*b b b > a start ﬁnal • The resulting NFA • To match, you "run" the NFA against input

Copyright (C) 2008, http://www.dabeaz.com 6- Running an NFA • Example:
Search for the pattern ba*b 100 Input: "aabaabbbab" b b > a start ﬁnal You start in the initial state Beginning of input text

Copyright (C) 2008, http://www.dabeaz.com 6- Running an NFA • Start
testing characters 101 Input: "aabaabbbab" b b > a start ﬁnal fail

Copyright (C) 2008, http://www.dabeaz.com 6- Running an NFA • Start
testing characters 102 Input: "aabaabbbab" b b > a start ﬁnal fail

Copyright (C) 2008, http://www.dabeaz.com 6- Running an NFA • Now
start moving through states 103 Input: "aabaabbbab" b b > a start ﬁnal start

Copyright (C) 2008, http://www.dabeaz.com 6- Running an NFA • Keep
going as long as there is a legal move 104 Input: "aabaabbbab" b b > a start ﬁnal start

going... 105 Input: "aabaabbbab" b b > a start ﬁnal start

going... 106 Input: "aabaabbbab" b b > a start ﬁnal start fail! • In the current state, there is no b arrow.

Copyright (C) 2008, http://www.dabeaz.com 6- Running an NFA • Backtrack
and try the other path out 107 Input: "aabaabbbab" b b > a start ﬁnal start The unlabeled arrow here means that we can just move to the next state without reading any input

Copyright (C) 2008, http://www.dabeaz.com 6- Running an NFA • Final
state! - A Match 108 Input: "aabaabbbab" b b > a start ﬁnal start end • If you can make it to the ﬁnal state, it matches

Copyright (C) 2008, http://www.dabeaz.com 6- Building NFAs • All regular
expression patterns can be turned into an NFA using a few primitive building blocks • It's tricky, but relatively straightforward • You can read details 109

Copyright (C) 2008, http://www.dabeaz.com 6- Pathological Cases • Almost all
dynamic languages are using an approach NFA matching known as "recursive backtracking" • This involves trying all possibilities until a suitable match is found • And it can lead to some pathlogical cases • Certain patterns that match very poorly 110

Copyright (C) 2008, http://www.dabeaz.com 6- Example: • A pathological pattern
111 a?a?a?aaa • And the NFA > a a a a a a These arrows mean the "a" is optional

Copyright (C) 2008, http://www.dabeaz.com 6- Recursive Backtracking • Match text
112 aaa • Step 1 > a a a a a a

113 aaa • Step 2 > a a a a a a

114 aaa • Step 3 > a a a a a a At this point, there is no more text, and we're not in the ﬁnal state (a failure)

115 • Step 4 (Backtrack) > a a a a a a Try the empty arrow here aaa

116 • Step 5 > a a a a a a This fails again. Not ﬁnal, no more text. aaa

117 • Step 6 - Backtrack > a a a a a a aaa Go back and try this empty arrow

118 • Step 7 > a a a a a a aaa

119 • Step 8 > a a a a a a aaa Fails. Backtrack

120 • Step 9 - Backtrack > a a a a a a aaa

121 • Step 10 > a a a a a a aaa

122 • Step 11 > a a a a a a aaa Fails. Backtrack • You're starting to get the idea...

123 • Eventually will arrive here... > a a a a a a aaa A match! • Took many guesses • Re-reading of the input string

Copyright (C) 2008, http://www.dabeaz.com 6- Recursive Backtracking • Reading: 124
Russ Cox, "Regular Expression Matching Can be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...)"

Copyright (C) 2008, http://www.dabeaz.com 6- Wrap-up 126 • Next time,
will look at some functional programming issues

Copyright (C) 2008, http://www.dabeaz.com 7- A Taste of Functional Programming
Section 7 2

Copyright (C) 2008, http://www.dabeaz.com 7- Introduction 3 • Over the
last few classes, we have spent a lot of time looking at "objects" • An object encapsulates data and has a collection of methods that operate on that data • However, this is not the only way to do it • Let's return to functions...

Copyright (C) 2008, http://www.dabeaz.com 7- Programming with Functions 4 •
It turns out that you can do a lot of very useful programming just using functions • "Functional programming" • Mathematicians study functions a lot • However, there are some essential features that you need to move beyond the basics • Today, we'll look at it in a little more detail

Copyright (C) 2008, http://www.dabeaz.com 7- Confession 5 • The more
I program, the more I ﬁnd myself drawn towards functional programming • It feels more logically coherent than OO • And a lot less byzantine • Plus, I have a secret past as a math major • Oh yeah, and get off my lawn!

Copyright (C) 2008, http://www.dabeaz.com 7- Disclaimer 6 • Functional programming
is a HUGE topic • Which can be highly mathematical • I'm not going to take that approach... • Especially with my brain pounding cold • However, I will try to cover a few absolute basics and show some interesting examples

Copyright (C) 2008, http://www.dabeaz.com 7- Other Disclaimer 7 • Almost
all examples are going to be Python • Python is by no means considered to be a purely "functional" language • However, it has enough of the core features for me to illustrate some interesting things • People often remark on its freakish similarity to certain parts of Lisp

Copyright (C) 2008, http://www.dabeaz.com 7- Review : Functions 9 •
A function is a series of statements def foo(x): statements ... some calculation ... statements return result • A function receives input arguments • Performs some kind of calculation • Returns a result

Copyright (C) 2008, http://www.dabeaz.com 7- Functions as Objects 10 •
A function is also an object that you can treat like it was ordinary data def square(x): return x*x • You can assign it to a variable s = square • Put it in a list items = [1,"Hello",square] • Pass it as an argument to another function y = foo(3,square)

Copyright (C) 2008, http://www.dabeaz.com 7- Functions as Objects 11 •
In fact, there isn't anything that's allowed on the other objects, but which is forbidden on a function object • The only difference is that the contents of a function don't look like anything you're used to (number, string, array, etc.) • In reality, it's just a sequence of statements

Copyright (C) 2008, http://www.dabeaz.com 7- First-Class Functions 12 • If
functions have equal footing with numbers, strings, and other core datatypes, then they're said to be "ﬁrst-class" • Basically, it means that functions are nothing special---they're just like anything else in the language

Copyright (C) 2008, http://www.dabeaz.com 7- Callbacks 13 • Having ﬁrst-class
functions lets you pass functions into other functions as an argument • This allows a program to make use of so- called "callback" functions • Functions that get executed under certain circumstances by another function

Copyright (C) 2008, http://www.dabeaz.com 7- Callback Example 14 • Classic
use case: Supplying the comparison function for a list sort def wordcmp(s,t): s_l = s.lower() t_l = t.lower() if s_l < t_l : return -1 elif s_l > t_l : return 1 else: return 0 words = ['MONDO','diabolical','Thrash'] words.sort(wordcmp) # Produces ['diabolical','MONDO','Thrash'] • sort() "calls back" into the compare function to help it ﬁgure out the ordering

Copyright (C) 2008, http://www.dabeaz.com 7- Functions as Data 15 •
The fact that functions are data opens up a variety of interesting possibilities • Can have stored tables and collections of functions (already saw that with classes) • Also, functions can be passed around to different parts of a program • At ﬁrst glance, all of this might sound a little exotic (but we'll see examples soon)

Copyright (C) 2008, http://www.dabeaz.com 7- Part 2 16 Inner Functions
and Closures

Copyright (C) 2008, http://www.dabeaz.com 7- Inner Functions 17 • Inside
a function, you can deﬁne new functions and use them elsewhere (like data) • Example: def make_greeting(name): def greet(): print "Hey %s, get off my lawn!" % name return greet • Check it out - a function was returned >>> p = make_greeting("Punk") >>> p <function greet at 0x69330> >>>

Copyright (C) 2008, http://www.dabeaz.com 7- Calling Inner Functions 18 •
Using an "inner function" is interesting • It secretly carries information about all of the variables that were alive when it was deﬁned >>> p = make_greeting("Punk") >>> p <function greet at 0x69330> >>> p() Hey Punk, get off my lawn! >>> Notice how it somehow picked up the name variable

Copyright (C) 2008, http://www.dabeaz.com 7- Closures 19 • A function
together with its surrounding environment is known as a "closure" • Basically, the closure has all of the information needed to make the function execute correctly • Normally all of this is tucked away behind the scenes (it just works)

Copyright (C) 2008, http://www.dabeaz.com 7- Closures 20 • You can
inspect the closure if sneaky >>> p = make_greeting("Punk") >>> p <function greet at 0x69330> >>> p.func_closure (<cell at 0x6c950: str object at 0x6c9e0>,) >>> p.func_closure[0].cell_contents 'Punk' >>> • A closure is almost like a weird kind of "object" >>> k = make_greeting("Kid") >>> g = make_greeting("Governor") >>> g() Hey Governor, get off my lawn! >>> k() Hey Kid, get off my lawn! >>>

Copyright (C) 2008, http://www.dabeaz.com 7- Interlude 21 • So far,
just scratched the surface of what it means for functions to be "ﬁrst-class" • You can pass existing functions around as data • You can create new functions on-the-ﬂy • Newly created functions retain parts of the environment where they were created • This is where it starts to get interesting...

Copyright (C) 2008, http://www.dabeaz.com 7- Part 3 22 Applying Functions
to List Data

Copyright (C) 2008, http://www.dabeaz.com 7- Observation 23 • In a
lot of programs, it seems like you collect a bunch of data into a list • You then apply different operations to the list data to get some new data

Copyright (C) 2008, http://www.dabeaz.com 7- Example: Stock Portfolio 24 •
In the assignment, you wrote some programs that worked with a portfolio of stocks • There was some data in a ﬁle (a list of lines) MSFT,100,54.25 IBM,50,91.10 AA,25,23.10 CAT,75,70.13 MSFT,50,64.23 GM,200,45.11 HPQ,80,37.42 IBM,40,88.20 PG,125,56.22 BA,75,92.72 MSFT,50,71.21 AIG,40,41.81

Copyright (C) 2008, http://www.dabeaz.com 7- Lists as a Data Structure
25 • And in the assignment, it was natural to turn that ﬁle into a list (of lists perhaps) portfolio = [ ['MSFT', 100, 54.25], ['IBM' , 50 , 91.10], ['AA' , 25 , 23.10], ['CAT' , 75 , 70.13], ['MSFT', 50 , 64.23], ['GM' , 200, 45.11], ['HPQ' , 80 , 37.42], ['IBM' , 40 , 88.20], ['PG' , 125, 56.22], ['BA' , 75 , 92.72], ['MSFT', 50 , 71.21], ['AIG' , 40 , 41.81] ]

Copyright (C) 2008, http://www.dabeaz.com 7- Example Calculation 26 • Example:
Calculate the cost of the portfolio total = 0.0 for stock in portfolio: total += stock[1]*stock[2] print "Total", total • Involves a list of "stocks" • Iterating over this list • Performing an "operation" on each item

Copyright (C) 2008, http://www.dabeaz.com 7- List Operations 27 • Most
common list operations can be distilled down to three basic ops • mapping • ﬁltering • reduction

Copyright (C) 2008, http://www.dabeaz.com 7- map() 28 • An operation
that maps a function to each item of a list, producing a new list def map(func, items): result = [] for it in items: result.append(func(it)) return result • Example: def square(x): return x*x nums = [1,2,3,4,5] sqs = map(square,nums) # [1,4,9,16,25]

Copyright (C) 2008, http://www.dabeaz.com 7- ﬁlter() 29 • An operation
that checks each item and discards those that don't match a condition def filter(condf,items): result = [] for it in items: if condf(it): result.append(it) return result • Example: def positive(x): return x > 0 nums = [1,-2,3,-4,5] p = filter(positive,nums) # [1,3,5]

Copyright (C) 2008, http://www.dabeaz.com 7- reduce() 30 • An operation
that combines successive list elements and produces a single result def reduce(combinef,items,initial=0) result = initial for it in items: result = combinef(result,it) return result • Example: def add(x,y): return x+y nums = [1,2,3,4,5] total = reduce(add,nums) # total = 15

Copyright (C) 2008, http://www.dabeaz.com 7- Example 31 • Calculate the
total cost of all stocks in the portfolio with 100 or more shares # Three functions def cost_f(s): return s[1]*s[2] def hundred_f(s): return s[1] >= 100 def add_f(x,y): return x+y # Now, using these operations stocks = filter(hundred_f, portfolio) costs = map(cost_f,stocks) total = reduce(add_f,costs) • It's a little clunky, but essentially applying operations to entire lists at each step

Copyright (C) 2008, http://www.dabeaz.com 7- Commentary 32 • Every dynamic
language already has some variation of map, ﬁlter, and reduce • And some common reductions (sum, min, max) • These are basic list/array operations • Have been around for almost forever. • Look in the manual for details.

Copyright (C) 2008, http://www.dabeaz.com 7- Problem 34 • Since functions
can be easily passed around, you often end up writing code that relies on a lot of small functions or "formulas" # Three functions def cost_f(s): return s[1]*s[2] def hundred_f(s): return s[1] >= 100 def add_f(x,y): return x+y # Now, using these operations stocks = filter(hundred_f, portfolio) costs = map(cost_f,stocks) total = reduce(add_f,costs) • This style gets old real fast...

Copyright (C) 2008, http://www.dabeaz.com 7- Solution : Lambda 35 •
Lambda expressions. Creates a function right on the spot for you # Now, using these operations stocks = filter(lambda s: s[1] >= 100, portfolio) costs = map(lambda s: s[1]*s[2], stocks) total = reduce(lambda x,y: x+y,costs) • Lambda creates a function that is a single expression lambda x,y : x+y # Is the same as typing this out long-form def anon(x,y): return x+y

Copyright (C) 2008, http://www.dabeaz.com 7- Lambda 36 • Don't read
too much into this lambda stuff • It's just a special syntax that let's us take a simple expression and quickly turn it into an unnamed function • Often more convenient than deﬁning a separate function elsewhere • Name "lambda" comes from Lisp which comes from the "Lambda Calculus"

Copyright (C) 2008, http://www.dabeaz.com 7- Using Lambda 37 • Lambda
is interesting because it's actually an expression • You can use it anywhere an expression goes ops = { '+' : lambda x,y: x+y, '-' : lambda x,y: x-y, '*' : lambda x,y: x*y, '/' : lambda x,y: x/y } >>> ops['+'](3,4) 7 >>> ops['*'](3,4) 12 >>>

Copyright (C) 2008, http://www.dabeaz.com 7- Lambda Use 38 • In
reality, you probably want to use it sparingly • Overuse makes code impossible to decipher • Not as powerful as deﬁning a normal function • The body of a lambda can only be a single expression (not a bunch of statements)

Copyright (C) 2007, http://www.dabeaz.com 7- Discussion • map/ﬁlter operations are
very common • Is there an even more convenient way to perform this operation? 40

Copyright (C) 2007, http://www.dabeaz.com 7- List Comprehensions • Creates a
new list by applying an operation to each element of a sequence. >>> a = [1,2,3,4,5] >>> b = [2*x for x in a] >>> b [2,4,6,8,10] >>> • Another example: 41 >>> names = ['Elwood','Jake'] >>> a = [name.lower() for name in names] >>> a ['elwood','jake'] >>>

Copyright (C) 2007, http://www.dabeaz.com 7- List Comprehensions • A list
comprehension can also ﬁlter >>> f = open("stockreport","r") >>> goog = [line for line in f if 'GOOG' in line] >>> >>> a = [1, -5, 4, 2, -2, 10] >>> b = [2*x for x in a if x > 0] >>> b [2,8,4,20] >>> • Another example 42

Copyright (C) 2007, http://www.dabeaz.com 7- List Comprehensions • General syntax
[expression for x in s if condition] • What it means result = [] for x in s: if condition: result.append(expression) 43 • Basically, this is map/ﬁlter rolled into one op

Copyright (C) 2007, http://www.dabeaz.com 7- List Comprehensions • The general
syntax (in full) [expression for x in s if cond1 for y in t if cond2 ... if condfinal] • What it means result = [] for x in s: if cond1: for y in t: if cond2: if condfinal: result.append(expression) 44

Copyright (C) 2007, http://www.dabeaz.com 7- List Comp: Examples • List
comprehensions are hugely useful • Collecting the values of a speciﬁc ﬁeld stocknames = [s['name'] for s in stocks] • Performing database-like queries a = [s for s in stocks if s['price'] > 100 and s['shares'] > 50 ] • Quick mathematics over sequences cost = sum([s['shares']*s['price'] for s in stocks]) 45

Copyright (C) 2007, http://www.dabeaz.com 7- Historical Digression • List comprehensions
come from Haskell a = [x*x for x in s if x > 0] # Python a = [x*x | x <- s, x > 0] # Haskell 46 • And this is motivated by sets (from math) a = { x2 | x ∈ s, x > 0 }

Copyright (C) 2007, http://www.dabeaz.com 7- Big Idea: Being Declarative •
List comprehensions encourage a more "declarative" style of programming when processing sequences of data. • Data can be manipulated by simply "declaring" a series of statements that perform various operations on it. • Although, it may require some care... 47

Copyright (C) 2007, http://www.dabeaz.com 7- Example • Reading a portfolio
of stocks lines = open("dowportfolio.csv") fields = [line.split(",") for line in lines] portfolio = [[f[0],int(f[1]),float(f[2])] for f in fields] 48 • Performing a calculation total = sum([s[1]*s[2] for s in portfolio if s[1] >= 100]) • We're just applying list operation after list operation to get the result we want

Copyright (C) 2008, http://www.dabeaz.com 7- Closures Revisited • Consider this
example: 50 def add(x,y): def do_add(): return x+y return do_add • This function creates a new function that performs a calculation when it runs (later) >>> r = add(3,4) >>> r <function do_add at 0x693b0> >>> r() 7 >>>

Copyright (C) 2008, http://www.dabeaz.com 7- Lazy Evaluation • The last
example illustrates something known as "lazy" evaluation • A function was created to perform some work • But the execution of the function didn't occur until later on (it was delayed) • This style of programming can be used for all sorts of good and evil 51

Copyright (C) 2008, http://www.dabeaz.com 7- Example 52 • Packaging up
expensive calculations in a way where they will only be carried out if actually requested later • Example : Fetch a URL, but, not right now import urllib def prepare_download(url): def do_download(): return urllib.urlopen(url).read() return do_download >>> d = prepare_download("http://www.blah.com") ... >>> text = d() # Okay, do it

Copyright (C) 2008, http://www.dabeaz.com 7- Example 53 • Supply only
some of the arguments to a function now (get the rest later) def partial(func,*args): def call(func,*moreargs): return func(*(args+moreargs)) return call • Example: def add(x,y,z): return x+y+z a = partial(add,2,3) ... print a(4) # prints 9 : 2 + 3 + 4 print a(10) # prints 15 : 2 + 3 + 10

Copyright (C) 2008, http://www.dabeaz.com 7- Example 54 • tail -f
a logﬁle import time def tail(thefile): thefile.seek(0,2) # Go to EOF def do_next(): while True: line = thefile.readline() if line: return line time.sleep(0.1) return do_next • Example: >>> next = tail(open("logfile","r")) >>> while True: ... print next(),

Copyright (C) 2008, http://www.dabeaz.com 7- Discussion 56 • The tail
-f example was interesting • That function created a function which emitted new a new line from a ﬁle every time you called it • You might be able to expand on that idea by writing functions that generate sequences

Copyright (C) 2008, http://www.dabeaz.com 7- Generator Example 57 • Example
: A Countdown def countdown(n): while n > 0: yield n n = n - 1 • This spits out new values for use in a for-loop • Example: >>> c = countdown(5) >>> for n in c: ... print n, ... 5 4 3 2 1 >>>

Copyright (C) 2008, http://www.dabeaz.com 9- Generator Expressions • A generator
version of a list comprehension >>> a = [1,2,3,4] >>> b = (2*x for x in a) >>> b <generator object at 0x58760> >>> for i in b: print b, ... 2 4 6 8 >>> • Important differences • Does not construct a list. • Only useful purpose is iteration • Once consumed, can't be reused 58

Copyright (C) 2008, http://www.dabeaz.com 9- Generator Expressions • General syntax
(expression for i in s for j in t ... if conditional) • Can also serve as a function argument sum(x*x for x in a) • Can be applied to any iterator >>> a = [1,2,3,4] >>> b = (x*x for x in a) >>> c = (-x for x in b) >>> for i in c: print i, ... -1 -4 -9 -16 >>> 59

Copyright (C) 2008, http://www.dabeaz.com 9- Generator Expressions • Example: Sum
a ﬁeld in a large input ﬁle f = open("datfile.txt") # Strip all lines that start with a comment lines = (line for line in f if not line.startswith('#')) # Split the lines into fields fields = (s.split() for s in lines) # Sum up one of the fields print sum(float(f[2]) for f in fields) • Solution 60 823.1838823 233.128883 14.2883881 44.1787723 377.1772737 123.177277 143.288388 3884.78772 ...

Copyright (C) 2008, http://www.dabeaz.com 9- Generator Expressions • Solution 61
• Each generator expression only evaluates data as needed (lazy evaluation) • Example: Running above on a 6GB input ﬁle only consumes about 60K of RAM f = open("datfile.txt") # Strip all lines that start with a comment lines = (line for line in f if not line.startswith('#')) # Split the lines into fields fields = (s.split() for s in lines) # Sum up one of the fields print sum(float(f[2]) for f in fields)

Copyright (C) 2008, http://www.dabeaz.com 9- Commentary • With generators, you
start to think of setting up functions as a processing pipeline • Almost like using pipes in Unix • A larger example follows shortly 62

Copyright (C) 2008, http://www.dabeaz.com 9- Wrap-up • This has only
been a small taste of functional programming idioms • If you go further, focus on organization of functions, closures, routing of data, etc. • Personally, I think it's a fun way to program • Very different than OO however... 64

Copyright (C) 2008, http://www.dabeaz.com 8- Generators and Networking Section 8
2

Copyright (C) 2008, http://www.dabeaz.com 8- Overview 3 • Going to
look at more examples • More with using generators/iterators • Introduction to network programming

Copyright (C) 2008, http://www.dabeaz.com 8- Generators 5 • Last time,
we ended with some discussion of generator functions • However, didn't get a chance to look at more interesting examples • Let's spend a little more time on this

Copyright (C) 2008, http://www.dabeaz.com 8- Generator Example 6 • Example
: A Countdown def countdown(n): while n > 0: yield n n = n - 1 • This spits out new values for use in a for-loop • Example: >>> c = countdown(5) >>> for n in c: ... print n, ... 5 4 3 2 1 >>>

Copyright (C) 2008, http://www.dabeaz.com 8- Ruby is Similar 7 •
Example : A Countdown class Countdown def initialize(n) @start = n end def each n = @start while n > 0 yield n n -= 1 end end end • Use for i in Countdown.new(5) puts i end

Copyright (C) 2008, http://www.dabeaz.com 8- Generator Expressions • A generator
version of a list comprehension >>> a = [1,2,3,4] >>> b = (2*x for x in a) >>> b <generator object at 0x58760> >>> for i in b: print b, ... 2 4 6 8 >>> 8 • Generates a sequence of values where some operation has been applied

Copyright (C) 2008, http://www.dabeaz.com 8- Question • Aside from being
an exotic language feature, how do you go about using generators? • Is there any practical use of this? 9

Copyright (C) 2008, http://www.dabeaz.com 8- Generators as a Pipeline •
Generators are most effectively used to set up data processing pipelines • Similar to pipes in Unix 10 % ls -l | wc • Can structure programs as stages of processing chained together

Copyright (C) 2008, http://www.dabeaz.com 8- Programming Problem 11 Find out
how many bytes of data were transferred by summing up the last column of data in this Apache web server log 81.107.39.38 - ... "GET /ply/ HTTP/1.1" 200 7587 81.107.39.38 - ... "GET /favicon.ico HTTP/1.1" 404 133 81.107.39.38 - ... "GET /ply/bookplug.gif HTTP/1.1" 200 23903 81.107.39.38 - ... "GET /ply/ply.html HTTP/1.1" 200 97238 81.107.39.38 - ... "GET /ply/example.html HTTP/1.1" 200 2359 66.249.72.134 - ... "GET /index.html HTTP/1.1" 200 4447 Oh yeah, and the log ﬁle might be huge (Gbytes)

Copyright (C) 2008, http://www.dabeaz.com 8- The Log File • Each
line of the log looks like this: 12 bytestr = line.rsplit(None,1)[1] 81.107.39.38 - ... "GET /ply/ply.html HTTP/1.1" 200 97238 • The number of bytes is the last column • It's either a number or a missing value (-) 81.107.39.38 - ... "GET /ply/ HTTP/1.1" 304 - • Converting the value if bytestr != '-': bytes = int(bytestr)

Copyright (C) 2008, http://www.dabeaz.com 8- A Non-Generator Soln • Just
do a simple for-loop 13 wwwlog = open("access-log") total = 0 for line in wwwlog: bytestr = line.rsplit(None,1)[1] if bytestr != '-': total += int(bytestr) print "Total", total • We read line-by-line and just update a sum

Copyright (C) 2008, http://www.dabeaz.com 8- A Generator Solution • Let's
use some generator expressions 14 wwwlog = open("access-log") bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog) bytes = (int(x) for x in bytecolumn if x != '-') print "Total", sum(bytes) • Well, that's certainly different • Less code • A completely different programming style

Copyright (C) 2008, http://www.dabeaz.com 8- Generators as a Pipeline •
The solution is setting up a pipeline 15 wwwlog bytecolumn bytes sum() access-log total • Each step is deﬁned by iteration/generation wwwlog = open("access-log") bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog) bytes = (int(x) for x in bytecolumn if x != '-') print "Total", sum(bytes)

Copyright (C) 2008, http://www.dabeaz.com 8- Being Declarative • At each
step of the pipeline, we declare an operation that will be applied to the entire input stream 16 wwwlog bytecolumn bytes sum() access-log total bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog) This operation gets applied to every line of the log ﬁle

Copyright (C) 2008, http://www.dabeaz.com 8- Being Declarative • Instead of
focusing on the problem at a line-by-line level, you just break it down into big operations that operate on the whole ﬁle • This is very much a "declarative" style • The key : Think big... 17

Copyright (C) 2008, http://www.dabeaz.com 8- Iteration is the Glue 18
• The glue that holds the pipeline together is the iteration that occurs in each step wwwlog = open("access-log") bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog) bytes = (int(x) for x in bytecolumn if x != '-') print "Total", sum(bytes) • The calculation is being driven by the last step • The sum() function is consuming values being pushed through the pipeline (via .next() calls)

Copyright (C) 2008, http://www.dabeaz.com 8- Performance • Surely, this generator
approach has all sorts of fancy-dancy magic that is slow. • Let's check it out on a 1Gb log ﬁle... 19

Copyright (C) 2008, http://www.dabeaz.com 8- Performance Contest 20 wwwlog =
open("access-log") total = 0 for line in wwwlog: bytestr = line.rsplit(None,1)[1] if bytestr != '-': total += int(bytestr) print "Total", total wwwlog = open("access-log") bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog) bytes = (int(x) for x in bytecolumn if x != '-') print "Total", sum(bytes) 21.19s 20.14s Time Time

Copyright (C) 2008, http://www.dabeaz.com 8- Programming Problem 21 Write a
program that can easily extract metadata from Firefox browser cache ﬁles • You were suppose to do this in the assignment • Probably encountered a variety of nasty bits of code with that

Copyright (C) 2008, http://www.dabeaz.com 8- The Firefox Cache • There
are four critical ﬁles 22 _CACHE_MAP_ # Cache index _CACHE_001_ # Cache data _CACHE_002_ # Cache data _CACHE_003_ # Cache data • All ﬁles are binary-encoded • _CACHE_MAP_ is the index, but it is encoded in a tricky way. • Don't need it to extract URL requests anyways

Copyright (C) 2008, http://www.dabeaz.com 8- Firefox _CACHE_ Files • _CACHE_00n_
ﬁle organization 23 Free/used block bitmap Blocks 4096 bytes Up to 32768 blocks • The block size varies according to the ﬁle: _CACHE_001_ 256 byte blocks _CACHE_002_ 1024 byte blocks _CACHE_003_ 4096 byte blocks

Copyright (C) 2008, http://www.dabeaz.com 8- Cache Metadata • Metadata is
encoded as a binary structure 24 Header Request String Request Info 36 bytes Variable length (in header) Variable length (in header) • Header encoding (binary, big-endian) magic location fetchcount fetchtime modifytime expiretime datasize requestsize infosize unsigned int (0x00010008) unsigned int unsigned int unsigned int (system time) unsigned int (system time) unsigned int (system time) unsigned int (byte count) unsigned int (byte count) unsigned int (byte count) 0-3 4-7 8-11 12-15 16-19 20-23 24-27 28-31 32-35

Copyright (C) 2008, http://www.dabeaz.com 8- Generate Headers 25 import os,
struct cachefiles = [('_CACHE_001_',256),('_CACHE_002_',1024), ('_CACHE_003_',4096)] def generate_headers(cachedir): for name, blocksize in cachefiles: pathname = os.path.join(cachedir,name) f = open(pathname,"rb") f.seek(4096) while True: header = f.read(36) if not header: break fields = struct.unpack(">9I",header) if fields[0] == 0x00010008: yield f, fields fp = f.tell() offset = fp % blocksize if offset: f.seek(blocksize - offset,1) f.close()

struct cachefiles = [('_CACHE_001_',256),('_CACHE_002_',1024), ('_CACHE_003_',4096)] def generate_headers(cachedir): for name, blocksize in cachefiles: pathname = os.path.join(cachedir,name) f = open(pathname,"rb") f.seek(4096) while True: header = f.read(36) if not header: break fields = struct.unpack(">9I",header) if fields[0] == 0x00010008: yield f, fields fp = f.tell() offset = fp % blocksize if offset: f.seek(blocksize - offset,1) f.close() We loop over each _CACHE_00N_ ﬁle one by one. Open each ﬁle and skip the 4096 byte block bit-map at the beginning

struct cachefiles = [('_CACHE_001_',256),('_CACHE_002_',1024), ('_CACHE_003_',4096)] def generate_headers(cachedir): for name, blocksize in cachefiles: pathname = os.path.join(cachedir,name) f = open(pathname,"rb") f.seek(4096) while True: header = f.read(36) if not header: break fields = struct.unpack(">9I",header) if fields[0] == 0x00010008: yield f, fields fp = f.tell() offset = fp % blocksize if offset: f.seek(blocksize - offset,1) f.close() We read the ﬁle and look for metadata headers (look for the magic bytes)

struct cachefiles = [('_CACHE_001_',256),('_CACHE_002_',1024), ('_CACHE_003_',4096)] def generate_headers(cachedir): for name, blocksize in cachefiles: pathname = os.path.join(cachedir,name) f = open(pathname,"rb") f.seek(4096) while True: header = f.read(36) if not header: break fields = struct.unpack(">9I",header) if fields[0] == 0x00010008: yield f, fields fp = f.tell() offset = fp % blocksize if offset: f.seek(blocksize - offset,1) f.close() Yield the ﬁle handle and header ﬁelds that we read.

struct cachefiles = [('_CACHE_001_',256),('_CACHE_002_',1024), ('_CACHE_003_',4096)] def generate_headers(cachedir): for name, blocksize in cachefiles: pathname = os.path.join(cachedir,name) f = open(pathname,"rb") f.seek(4096) while True: header = f.read(36) if not header: break fields = struct.unpack(">9I",header) if fields[0] == 0x00010008: yield f, fields fp = f.tell() offset = fp % blocksize if offset: f.seek(blocksize - offset,1) f.close() Skip to the start of the next block (we look at the current ﬁle pointer to compute a skip value)

Copyright (C) 2008, http://www.dabeaz.com 8- Generate Headers • Example of
using the previous function 30 for f, header in generate_headers("FFCache"): print f, header • Example output... <open file 'FFCache/_CACHE_001_'> (65544, 0, 1, 1192968652, 119 <open file 'FFCache/_CACHE_001_'> (65544, 0, 1, 1192970452, 119 <open file 'FFCache/_CACHE_001_'> (65544, 0, 1, 1192972252, 119 ... <open file 'FFCache/_CACHE_002_'> (65544, 2701132042L, 2, 11928 <open file 'FFCache/_CACHE_002_'> (65544, 0, 1, 1192892697, 119 <open file 'FFCache/_CACHE_002_'> (65544, 0, 1, 1192892704, 119

Copyright (C) 2008, http://www.dabeaz.com 8- Generate Metadata • Let's add
to the processing pipeline 31 def generate_meta(headers): for f, header in headers: urlstr = f.read(header[7]) infostr = f.read(header[8]) yield header, urlstr, infostr • Example: headers = generate_headers("FFCache") meta = generate_meta(headers) for header, url, info in meta: print header print url print info print

Copyright (C) 2008, http://www.dabeaz.com 8- Generate Metadata • Example output
32 (65544, 0, 1, 1192892514, 1192892514, 4294967295L, 0, 92, 292) HTTP:http://en-us.start.mozilla.com/firefox?client=firefox-a&rl request-method^@GET^@response-head^@HTTP/1.1 301 Moved Permanen Location: http://www.google.com/firefox?client=firefox-a&rls=or Content-Type: text/html Server: gws Content-Encoding: gzip Date: Sat, 20 Oct 2007 15:01:56 GMT Cache-Control: private, x-gzip-ok="" ^@ (65544, 0, 1, 1192892514, 1192892514, 0, 1524, 83, 225) HTTP:http://www.google.com/firefox?client=firefox-a&rls=org.moz request-method^@GET^@response-head^@HTTP/1.1 200 OK Cache-Control: private Content-Type: text/html; charset=UTF-8 Content-Encoding: gzip Server: gws Content-Length: 1524

Copyright (C) 2008, http://www.dabeaz.com 8- Generate Requests • Extract URLs
and fetchtime 33 def generate_requests(meta): for header, rawurl, info in meta: fetchtime = header[3] url = rawurl.split(":",1)[1].strip('\x00') yield url, fetchtime • Example: headers = generate_headers("FFCache") meta = generate_meta(headers) requests = generate_requests(meta) for url, ftime in requests: print url print time.ctime(ftime) print

Copyright (C) 2008, http://www.dabeaz.com 8- Generate Requests • Sample output
34 http://www.google.com/images/firefox/grgrad.gif Sat Oct 20 10:01:54 2007 http://www.google.com/images/firefox/clear.gif Sat Oct 20 10:01:54 2007 http://www.google.com/images/firefox/title.gif Sat Oct 20 10:01:54 2007 http://www.google.com/images/firefox/fox1.gif Sat Oct 20 10:01:54 2007 http://www.google.com/images/firefox/fox2.gif Sat Oct 20 10:01:54 2007 http://www.google.com/images/firefox/google.gif Sat Oct 20 10:01:54 2007

Copyright (C) 2008, http://www.dabeaz.com 8- Generate Domains • Generate domain
names from requests 35 def generate_domains(requests): for url, fetchtime in requests: proto,request = url.split("://",1) domain = request.split("/",1)[0] yield domain • Example: headers = generate_headers("FFCache") meta = generate_meta(headers) requests = generate_requests(meta) domains = generate_domains(requests) for d in sorted(set(domains)): print d

Copyright (C) 2008, http://www.dabeaz.com 8- Generate Domains • Example output:
36 ad.doubleclick.net ads.cnn.com ads.pointroll.com ads.yimg.com ak.bluestreak.com altfarm.mediaplex.com ar.atwola.com buttons.blogger.com d.yimg.com edge.jobthread.com edge.quantserve.com en-us.start.mozilla.com en-us.www.mozilla.com en.wikipedia.org games.slashdot.org gdyn.cnn.com ...

Copyright (C) 2008, http://www.dabeaz.com 8- Commentary • This whole solution
is just one big processing pipeline 37 generate_headers generate_meta generate_requests generate_domains

Copyright (C) 2008, http://www.dabeaz.com 8- Another Example • Concatenate generated
sequences together 38 def concatenate(seq): for s in seq: for item in s: yield s • Example: Find all domains in all caches all_caches = (path for path,dirlist,filelist in os.walk("/") if '_CACHE_MAP_' in filelist) headers = concatenate(generate_headers(path) for path in all_caches) meta = generate_meta(headers) requests = generate_requests(meta) domains = generate_domains(requests)

Copyright (C) 2008, http://www.dabeaz.com 8- Programming Problem • Dave's hedge
fund 39 INSANE MONEY w/ GUIDO PY 142.34 (+8.12) JV 34.23 (-4.23) CPP 4.10 (-1.34) NET 14.12 (-0.50) After watching 87 straight hours of "Guido's Insane Money" on his Tivo, Dave has decided to quit his day job as a jazz musician and start a hedge fund. • Problem : Write a program that can process inﬁnite streams of real-time stock market data

Copyright (C) 2008, http://www.dabeaz.com 8- A Log File • Suppose
a sequence of real-time data is being written to a log (stocklog) 40 unix % tail -f stocklog.dat "MCD",50.80,"6/11/2007","09:30.00",-0.61,51.47,50.80,50.80,92400 "KO",51.63,"6/11/2007","09:30.00",-0.04,51.67,51.63,51.63,395215 "MMM",85.75,"6/11/2007","09:30.00",-0.19,85.94,85.75,85.75,15610 "JNJ",62.08,"6/11/2007","09:30.00",-0.05,62.89,62.08,62.08,25340 "AXP",62.39,"6/11/2007","09:30.01",-0.65,62.79,62.39,62.38,83462 ... • Again let's use generators...

Copyright (C) 2008, http://www.dabeaz.com 8- Tailing a File • A
Python version of 'tail -f' 41 import time def follow(thefile): thefile.seek(0,2) # Go to the end of the file while True: line = thefile.readline() if not line: time.sleep(0.1) # Sleep briefly continue yield line • Idea : Seek to the end of the ﬁle and repeatedly try to read new lines. If new data is written to the ﬁle, we'll pick it up.

Copyright (C) 2008, http://www.dabeaz.com 8- Example • Using our follow
function 42 stocklog = open("stocklog.dat","r") lines = follow(stocklog) ... for line in lines: print line, • This produces the same output as 'tail -f'

Copyright (C) 2008, http://www.dabeaz.com 8- Splitting into Fields • The
lines of stock data are CSV formatted • Let's route them through the CSV module 43 import csv stocklog = open("stocklog.dat","r") lines = follow(stocklog) fields = csv.reader(lines) for f in fields: print f

Copyright (C) 2008, http://www.dabeaz.com 8- Splitting into Fields • Example
output: 44 ['MCD', '51.28', '6/11/2007', '09:37.40', '-0.13', '51.47', '51.28 ['C', '53.11', '6/11/2007', '09:37.41', '-0.22', '53.20', '53.11', ['VZ', '43.00', '6/11/2007', '09:37.41', '-0.07', '42.95', '43.00' ['WMT', '49.75', '6/11/2007', '09:37.41', '-0.33', '49.90', '49.87 ['MRK', '50.37', '6/11/2007', '09:37.41', '0.23', '50.30', '50.37' ['PFE', '26.43', '6/11/2007', '09:37.41', '-0.09', '26.50', '26.43 ['AXP', '62.64', '6/11/2007', '09:37.42', '-0.40', '62.79', '62.64 • An inﬁnite sequence of lists • Each list is a list of strings

Copyright (C) 2008, http://www.dabeaz.com 8- Field Conversion • Let's write
a function that converts a list of strings into a list of converted values 45 def convert_fields(types,fields): return [ty(val) for ty,val in zip(types,fields)] • Example: fields = ['IBM', '103.23', '6/11/2007', '09:43.46', '0.16', '102.87', '103.23', '102.77', '345196'] types = [ str, float, str, str, float, float, float, float, int] cfields = convert_fields(types,fields) # fields = [ 'IBM', 103.23, '6/11/2007', '09:43.46', # 0.16, 102.87, 103.23, 102.77, 345196 ]

Copyright (C) 2008, http://www.dabeaz.com 8- Field Conversion • Let's add
that to our processing pipeline 46 import csv stocklog = open("stocklog.dat","r") lines = follow(stocklog) fields = csv.reader(lines) fieldtypes = [str,float,str,str, float,float,float,float,int] converted = (convert_fields(fieldtypes,f) for f in fields) for s in converted: print s • This now produces lists of converted values

Copyright (C) 2008, http://www.dabeaz.com 8- Field Conversion • Example output:
47 ['MCD', 51.28, '6/11/2007', '09:37.40', -0.13, 51.47, 51.28, 50.80 ['C', 53.11, '6/11/2007', '09:37.41', -0.22, 53.20, 53.11, 52.99', ['VZ', 43.00, '6/11/2007', '09:37.41', -0.07, 42.95, 43.00, 42.78' ['WMT', 49.75, '6/11/2007', '09:37.41', -0.33, 49.90, 49.87, 49.75 ['MRK', 50.37, '6/11/2007', '09:37.41', 0.23, 50.30, 50.37, 49.66' ['PFE', 26.43, '6/11/2007', '09:37.41', -0.09, 26.50, 26.43, 26.31 ['AXP', 62.64, '6/11/2007', '09:37.42', -0.40, 62.79, 62.64, 62.38

Copyright (C) 2008, http://www.dabeaz.com 8- Making Dictionaries • Let's put
all of the ﬁelds into a dictionary 48 import csv stocklog = open("stocklog.dat","r") lines = follow(stocklog) fields = csv.reader(lines) fieldtypes = [str,float,str,str, float,float,float,float,int] converted = (convert_fields(fieldtypes,f) for f in fields) fieldnames = ['name','price','date','time', 'change','open','high','low','volume'] stocks = (dict(zip(fieldnames,c)) for c in converted)

Copyright (C) 2008, http://www.dabeaz.com 8- Making Dictionaries • Example output:
49 {'volume': 584485, 'name': 'IBM', 'price': 103.56, 'high': 103.59999999999999, 'low': 102.77, 'time': '09:57.57', 'date': '6/11/2007', 'open': 102.87, 'change': 0.48999999999999999} {'volume': 441703, 'name': 'CAT', 'price': 78.739999999999995, 'high': 78.879999999999995, 'low': 77.989999999999995, 'time': '09:57.59', 'date': '6/11/2007', 'open': 78.319999999999993, 'change': 0.22} {'volume': 372369, 'name': 'DD', 'price': 51.130000000000003, 'high': 51.18, 'low': 50.600000000000001, 'time': '09:58.01', 'date': '6/11/2007', 'open': 51.130000000000003, 'change': 0.0}

Copyright (C) 2008, http://www.dabeaz.com 8- Interlude • We started with
an inﬁnite input source • We then routed lines from that source through a processing pipeline that produces an inﬁnite sequence of dictionaries 50 follow csv.reader convert makedict stocklog.dat { } lines lines lists lists dicts

Copyright (C) 2008, http://www.dabeaz.com 8- Using the Results • Dave
has a portfolio of stocks 51 portfolio = set(['IBM','MSFT','HPQ','CAT','AA']) • Write a program that prints out a real-time ticker showing the name, price, change, and volume for just these stocks in_portfolio = (s for s in stocks if s['name'] in portfolio) ticker = ((s['name'],s['price'],s['change'],s['volume']) for s in in_portfolio) for t in ticker: print "%10s %10.2f %10.2f %10d" % t

Copyright (C) 2008, http://www.dabeaz.com 8- Example: • Only print ticker
data for negative change 52 in_portfolio = (s for s in stocks if s['name'] in portfolio) ticker = ((s['name'],s['price'],s['change'],s['volume']) for s in in_portfolio) negticker = (t for t in ticker if t[2] < 0) for t in negticker: print "%10s %10.2f %10.2f %10d" % t

Copyright (C) 2008, http://www.dabeaz.com 8- Commentary • That's probably enough
with generators • Concept of a data processing pipeline is pretty powerful if you know how to apply it • Head explosion? 53

Copyright (C) 2008, http://www.dabeaz.com 8- Overview • Dynamic languages are
used heavily in network programming applications • Processing different ﬁle formats • Interacting with web servers • Implementing network servers, etc. 55

Copyright (C) 2008, http://www.dabeaz.com 8- Example : Web Server 56
• This is a complete Python web-server with support for CGI scripting from BaseHTTPServer import HTTPServer from CGIHTTPServer import CGIHTTPRequestHandler import os os.chdir("/home/docs/html") serv = HTTPServer(("",8080),CGIHTTPRequestHandler) serv.serve_forever() • Serves HTML ﬁles and executes scripts in "/cgi-bin" and "/htbin" directories

Copyright (C) 2008, http://www.dabeaz.com 8- Basic Principles • Using a
lot of these network features is mostly a matter of reading the manual • There are various libraries and frameworks • Instead of talking about that, will cover absolute basics of network programming • Material most good programmers should just know about 57

Copyright (C) 2008, http://www.dabeaz.com 8- The Problem • Communication between
computers 58 Network • Send/receive bits

Copyright (C) 2008, http://www.dabeaz.com 8- The Problem • Two main
issues • Addressing - locating computers and services • Data transport - moving bits around 59

Copyright (C) 2008, http://www.dabeaz.com 8- Network Addressing • Computers on
network have a hostname • Hostname mapped to numerical address (e.g., IP address, DNS) 60 Network foo.bar.com 205.172.13.4 www.python.org 82.94.237.218

Copyright (C) 2008, http://www.dabeaz.com 8- Ports • Connections are made
between "ports" • Ports are bound to running processes/services 61 foo.bar.com 205.172.13.4 web email IM Port 80 Port 25 Port 31337 browser sendmail Port 7823 Port 3342

Copyright (C) 2008, http://www.dabeaz.com 8- Connections • A network connection
involves connecting to a host address and port • Expressed as a pair (address,port) • Examples: 62 ("www.python.org",80) ("205.172.13.4",443)

Copyright (C) 2008, http://www.dabeaz.com 8- Client/Server Concept • Servers wait
for incoming connections and provide some kind of service (e.g., web) • Clients make connections to servers 63 www.bar.com 205.172.13.4 web Port 80 browser Client Server • To make it work, servers use standardized port numbers (e.g., web server always on port 80)

Copyright (C) 2008, http://www.dabeaz.com 8- Standard Ports • Some commonly
used port assignments 64 21 FTP 22 SSH 23 Telnet 25 SMTP (Mail) 80 HTTP (Web) 110 POP3 (Mail) 119 NNTP (News) 443 HTTPS (web) • Ports 1-1023 reserved by system (priviledged) • Ports 1024-65535 available to all

Copyright (C) 2008, http://www.dabeaz.com 8- Request/Response Cycle • Most network
application use a request/ response programming model • Client sends a request (e.g., HTTP) 65 GET /index.html HTTP/1.0 • Server sends a response (e.g., HTTP) HTTP/1.0 200 OK Content-type: text/html Content-length: 48823 <HTML> ... • Actual protocol depends on the application

Copyright (C) 2008, http://www.dabeaz.com 8- Sockets • Programming abstraction for
network code • Socket: A communication endpoint 66 socket socket • Supported by socket library module • Allows data to be written/read (e.g., like a ﬁle) network

Copyright (C) 2008, http://www.dabeaz.com 8- Socket Basics • Address families
import socket s = socket.socket(addr_family, type) • Example: 67 • To create a socket socket.AF_INET Internet protocol (IPv4) socket.AF_INET6 Internet protocol (IPv6) • Socket types socket.SOCK_STREAM Connection based stream (TCP) socket.SOCK_DGRAM Datagrams (UDP) >>> from socket import * >>> s = socket(AF_INET,SOCK_STREAM)

Copyright (C) 2008, http://www.dabeaz.com 8- Socket Types • Internet Protocol
• Most common case: TCP connection s = socket(AF_INET, SOCK_STREAM) s = socket(AF_INET, SOCK_DGRAM) 68 • Almost all code will use one of following s = socket(AF_INET, SOCK_STREAM)

Copyright (C) 2008, http://www.dabeaz.com 8- Using a Socket • Creating
a socket is only the ﬁrst step 69 s = socket(AF_INET, SOCK_STREAM) • Further use depends on application • Server • Listen for incoming connections • Client • Make an outgoing connection

Copyright (C) 2008, http://www.dabeaz.com 8- TCP Connections 70 • Computers
establish a dedicated connection • Bi-directional data transfer • Continuous I/O stream (like a ﬁle, pipe, etc.) • Reliable • Connection stays open until explicitly closed DATA TCP/IP socket(AF_INET,SOCK_STREAM)

Copyright (C) 2008, http://www.dabeaz.com 8- TCP Clients • Using a
socket to make a connection from socket import * s = socket(AF_INET,SOCK_STREAM) s.connect(("www.python.org",80)) s.send("GET /index.html HTTP/1.0\n\n") data = s.recv(10000) s.close() 71 • s.connect(addr) makes a connection s.connect(("www.python.org",80)) • Once connected, use send(),recv() to transmit and receive data • close() shuts down the connection

Copyright (C) 2008, http://www.dabeaz.com 8- TCP Server • A simple
server 72 from socket import * s = socket(AF_INET,SOCK_STREAM) s.bind(("",9000)) s.listen(5) while True: c,a = s.accept() print "Received connection from", a c.send("Hello %s\n" % a[0]) c.close() • Send a message back to a client % telnet localhost 9000 Connected to localhost. Escape character is '^]'. Hello 127.0.0.1 Connection closed by foreign host. % Server message

Copyright (C) 2008, http://www.dabeaz.com 8- TCP Server • Address binding
73 from socket import * s = socket(AF_INET,SOCK_STREAM) s.bind(("",9000)) s.listen(5) while True: c,a = s.accept() print "Received connection from", a c.send("Hello %s\n" % a[0]) c.close() • Addressing s.bind(("",9000)) s.bind(("localhost",9000)) s.bind(("192.168.2.1",9000)) s.bind(("104.21.4.2",9000)) binds the socket to a speciﬁc address If system has multiple IP addresses, can bind to a speciﬁc address binds to localhost

Copyright (C) 2008, http://www.dabeaz.com 8- TCP Server • Start listening
for connections 74 from socket import * s = socket(AF_INET,SOCK_STREAM) s.bind(("",9000)) s.listen(5) while True: c,a = s.accept() print "Received connection from", a c.send("Hello %s\n" % a[0]) c.close() • s.listen(backlog) • backlog is # of pending connections to allow • Note: not related to number of clients Tells system to start listening for connections on the socket

Copyright (C) 2008, http://www.dabeaz.com 8- TCP Server • Accepting a
new connection 75 from socket import * s = socket(AF_INET,SOCK_STREAM) s.bind(("",9000)) s.listen(5) while True: c,a = s.accept() print "Received connection from", a c.send("Hello %s\n" % a[0]) c.close() • s.accept() blocks until connection received • Server sleeps if nothing is happening Accept a new client connection

Copyright (C) 2008, http://www.dabeaz.com 8- TCP Server • Client socket
and address 76 from socket import * s = socket(AF_INET,SOCK_STREAM) s.bind(("",9000)) s.listen(5) while True: c,a = s.accept() print "Received connection from", a c.send("Hello %s\n" % a[0]) c.close() Accept returns a pair (client_socket,addr) ("104.23.11.4",27743) <socket._socketobject object at 0x3be30> This is the network/port address of the client that connected This is a new socket that's used for data

Copyright (C) 2008, http://www.dabeaz.com 8- TCP Server • Sending data
77 from socket import * s = socket(AF_INET,SOCK_STREAM) s.bind(("",9000)) s.listen(5) while True: c,a = s.accept() print "Received connection from", a c.send("Hello %s\n" % a[0]) c.close() Send data to client Note: Using the client socket, not the server socket

Copyright (C) 2008, http://www.dabeaz.com 8- TCP Server • Closing the
connection 78 from socket import * s = socket(AF_INET,SOCK_STREAM) s.bind(("",9000)) s.listen(5) while True: c,a = s.accept() print "Received connection from", a c.send("Hello %s\n" % a[0]) c.close() Close client connection • Note: Server can keep client connection alive as long as it wants • Can repeatedly receive/send data

Copyright (C) 2008, http://www.dabeaz.com 8- TCP Server • Waiting for
the next connection 79 from socket import * s = socket(AF_INET,SOCK_STREAM) s.bind(("",9000)) s.listen(5) while True: c,a = s.accept() print "Received connection from", a c.send("Hello %s\n" % a[0]) c.close() Wait for next connection • Original server socket is reused for further connections • Server runs forever

Copyright (C) 2008, http://www.dabeaz.com 8- An Example • A Stock
Price Server • Suppose there is a dictionary with prices 80 prices = { } for line in open("prices.dat"): fields = line.split(",") prices[fields[0]] = float(fields[1]) >>> prices['IBM'] 102.86 >>> prices['AA'] 39.48 >>> • Turn this into a server where clients can connect and get the prices

Copyright (C) 2008, http://www.dabeaz.com 8- Solution 81 import socket s
= socket.socket(socket.AF_INET,socket.SOCK_STREAM) s.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR,1) s.bind(("",9000)) s.listen(5) while True: c,a = s.accept() for name in prices: c.sendall("%s,%0.2f\n" % (name, prices[name])) c.close() • To test: % telnet localhost 9000 AXP,62.58 BA,98.31 DD,50.75 CAT,78.29 AIG,71.38 ...

Copyright (C) 2008, http://www.dabeaz.com 8- An Example • Modify the
last example so speciﬁc prices can be requested and returned • Allow a list of stock names to be sent 82 % telnet localhost 9000 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. IBM AA CAT <newline> CAT,78.29 IBM,102.86 AA,39.48 Connection closed by foreign host. %

= socket.socket(socket.AF_INET,socket.SOCK_STREAM) s.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR,1) s.bind(("",9000)) s.listen(5) while True: c,a = s.accept() f = c.makefile() nameline = f.readline() nameset = nameline.split() for name in prices: if not nameset or name in nameset: c.sendall("%s,%0.2f\n" % (name, prices[name])) f.close() c.close()

Copyright (C) 2008, http://www.dabeaz.com 8- UDP Networking 84 • Data
sent in discrete packets (Datagrams) • No concept of a "connection" • No reliability, no ordering of data • Datagrams may be lost, arrive in any order • Higher performance (used in games, etc.) DATA DATA DATA socket(AF_INET,SOCK_DGRAM)

Copyright (C) 2008, http://www.dabeaz.com 8- UDP Client • Sending a
datagram to a server 85 from socket import * s = socket(AF_INET,SOCK_DGRAM) s.sendto(msg,("server.com",10000)) data, addr = s.recvfrom(maxsize) Create datagram socket • Key concept: No "connection" • You just send a data packet Send a message Wait for a response returned data remote address

Copyright (C) 2008, http://www.dabeaz.com 8- UDP Server • A simple
datagram server 86 from socket import * s = socket(AF_INET,SOCK_DGRAM) s.bind(("",10000)) while True: data, addr = s.recvfrom(maxsize) # Do something ... s.sendto(resp,addr) Create datagram socket • Much simpler than a TCP server • Again: No "connection" is established Bind to a speciﬁc port Wait for a message Send response

Copyright (C) 2008, http://www.dabeaz.com 8- An Example • Create a
UDP stock price server • Server receives "names" • Responds with the price 87

= socket.socket(socket.AF_INET,socket.SOCK_DGRAM) s.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR,1) s.bind(("",10000)) while True: name,addr = s.recvfrom(16) price = prices.get(name,0.0) s.sendto("%0.2f" % price,addr)

Copyright (C) 2008, http://www.dabeaz.com 8- Solution (UDP Client) 89 import
socket price_socket = socket.socket(socket.AF_INET,socket.SOCK_DGRAM) price_server = ("",10000) def get_price(name): price_socket.sendto(name,price_server) price,addr = price_socket.recvfrom(32) return float(price) • Example use: >>> get_price('IBM') 102.86 >>> get_price('AA') 39.479999999999997 >>>

Copyright (C) 2008, http://www.dabeaz.com 8- Commentary • Sockets are the
lowest level of network programming • If you know what you are doing, you can use sockets to write programs that interact with any other program on the network • Of course, the low-level details might be really hairy 90

Copyright (C) 2008, http://www.dabeaz.com 8- Network Protocols • There are
a fairly standard set of common network protocols • HTTP (Web) • FTP • SMTP (email) • etc... 92

Copyright (C) 2008, http://www.dabeaz.com 8- Network Libraries • Most dynamic
languages already have built-in library modules for common protocols • Will give a few examples... 93

Copyright (C) 2008, http://www.dabeaz.com 8- urllib Module • Open a
web page: urlopen() 94 >>> import urllib >>> u = urllib.urlopen("http://www.python/org/index.html") >>> data = u.read() >>> print data <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML ... ... >>> • urlopen() returns a ﬁle-like object • Can use standard ﬁle operations on it

Copyright (C) 2008, http://www.dabeaz.com 8- Web Servers • Suppose you
wanted to implement a completely customized HTTP server • A lot of web frameworks have an option to run "stand-alone" • Let's see how that works... 95

Copyright (C) 2008, http://www.dabeaz.com 8- HTTP Protocol • Clients send
a request GET /index.html HTTP/1.1 Host: www.python.org User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; Accept: text/xml,application/xml,application/xhtml+xml,text/h Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive <blank line> • Request line followed by headers • Terminated by a blank line

Copyright (C) 2008, http://www.dabeaz.com 8- HTTP Protocol • Server sends
a response HTTP/1.1 200 OK Date: Thu, 26 Apr 2007 19:54:01 GMT Server: Apache/2.0.54 (Debian GNU/Linux) DAV/2 SVN/1.1.4 mod_py Last-Modified: Thu, 26 Apr 2007 18:40:24 GMT ETag: "61b82-37eb-5a0eb600" Accept-Ranges: bytes Content-Length: 14315 Connection: close Content-Type: text/html <HTML> ... • Response line followed by headers • Blank line followed by data

Copyright (C) 2008, http://www.dabeaz.com 8- HTTP Protocol • There are
a small number of request types GET POST HEAD PUT • This isn't an exhaustive tutorial • There are standardized response codes 200 OK 403 Forbidden 404 Not Found 501 Not implemented ...

Copyright (C) 2008, http://www.dabeaz.com 8- Customized HTTP • Can define
a custom class... 99 from BaseHTTPServer import BaseHTTPRequestHandler,HTTPServer class MyHandler(BaseHTTPRequestHandler): def do_GET(self): ... def do_POST(self): ... def do_HEAD(self): ... def do_PUT(self): ... serv = HTTPServer(("",8080),MyHandler) serv.serve_forever() Redefine the behavior of the server by defining code for all of the standard HTTP request types

Copyright (C) 2008, http://www.dabeaz.com 8- Customized HTTP • Example: Hello
World 100 from BaseHTTPServer import BaseHTTPRequestHandler,HTTPServer class EchoHandler(BaseHTTPRequestHandler): def do_GET(self): if self.path == '/hello.html': self.send_response(200,"OK") self.send_header('Content-type','text/plain') self.end_headers() self.wfile.write("Hello World!\n") elif: self.send_response(404,"Not found") self.send_header('Content-type','text/plain') self.end_headers() self.wfile.write("I don't know.\n") serv = HTTPServer(("",8080),EchoHandler) serv.serve_forever()

Copyright (C) 2008, http://www.dabeaz.com 8- XML-RPC • Remote Procedure Call
• Uses HTTP as a transport protocol • Parameters/Results encoded in XML • May be used in conjunction with AJAX (Asynchronous Javascript and XML) 101

Copyright (C) 2008, http://www.dabeaz.com 8- Simple XML-RPC • How to
create a stand-alone server 102 from SimpleXMLRPCServer import SimpleXMLRPCServer def add(x,y): return x+y s = SimpleXMLRPCServer(("",8080)) s.register_function(add) s.serve_forever() • How to test it (xmlrpclib) >>> import xmlrpclib >>> s = xmlrpclib.ServerProxy("http://localhost:8080") >>> s.add(3,5) 8 >>> s.add("Hello","World") "HelloWorld" >>>

Copyright (C) 2008, http://www.dabeaz.com 8- Simple XML-RPC • Adding multiple
functions 103 from SimpleXMLRPCServer import SimpleXMLRPCServer s = SimpleXMLRPCServer(("",8080)) s.register_function(add) s.register_function(foo) s.register_function(bar) s.serve_forever() • Registering an instance (exposes all methods) from SimpleXMLRPCServer import SimpleXMLRPCServer s = SimpleXMLRPCServer(("",8080)) obj = SomeObject() s.register_instance(obj) s.serve_forever()

Copyright (C) 2008, http://www.dabeaz.com 8- XML-RPC Commentary • XML-RPC is
very easy to use • Can be be used generally for interprocess communication 104 XML-RPC XML-RPC XML-RPC

Copyright (C) 2008, http://www.dabeaz.com 8- Next time • Concurrent programming
• Message passing/IPC • Programming with threads • More networking 106

Copyright (C) 2008, http://www.dabeaz.com 9- Background 3 • You often
have to write programs that perform multiple tasks in parallel • Example : Network servers that handle multiple client connections • Modern systems have multiple CPU cores • There is interest in writing programs that can take advantage of multiple cores to get better performance (parallel processing)

Copyright (C) 2008, http://www.dabeaz.com 9- Overview 4 • Will talk
about some issues that come up with concurrent programming • Processes and IPC • Message Passing • Threads • Event-driven programming • Co-routines

Copyright (C) 2008, http://www.dabeaz.com 9- Disclaimer 5 • This is
a delicate topic surrounded by tremendous peril • Could run an entire course just on this... • This is going to be more of a survey/intro

Copyright (C) 2008, http://www.dabeaz.com 9- Processes • In a previous
class, we looked at how it is possible to create subprocesses 7 • Example: Setting up a pipe p = subprocess.Popen(['cmd'], stdin = subprocess.PIPE, stdout = subprocess.PIPE) p.stdin.write(data) # Send data p.stdin.close() # No more input result = p.stdout.read() # Read output python cmd p.stdout p.stdin stdin stdout

Copyright (C) 2008, http://www.dabeaz.com 9- Unix Process Fork • fork(),
wait(), _exit() import os pid = os.fork() if pid == 0: # Child process ... os._exit(0) else: # Parent process ... # Wait for child os.wait(pid) python python fork() _exit() wait() concurrent execution 8

Copyright (C) 2008, http://www.dabeaz.com 9- Unix Process Fork • fork()
creates an identical process • Newly created process is a "child process" • fork() returns different values in parent/child import os pid = os.fork() if pid == 0: # Child process else: # Parent process 9 pid is 0 in child, non-zero in parent • Parent and child run independently afterwards

Copyright (C) 2008, http://www.dabeaz.com 9- Concurrency and Networks • Many
programmers ﬁnd their way into concurrent programming by way of network programming • In order to handle multiple clients, servers must manage simultaneous network connections 10

Copyright (C) 2008, http://www.dabeaz.com 9- Sockets and Concurrency • Handling
multiple clients 11 web Port 80 browser web web browser server clients

Copyright (C) 2008, http://www.dabeaz.com 9- Sockets and Concurrency • Each
client has its own socket connection 12 web browser web web browser server clients # server code s = socket(AF_INET, SOCK_STREAM) ... while True: c,a = s.accept() ... a connection point for clients client data transmitted on a different socket

Copyright (C) 2008, http://www.dabeaz.com 9- Sockets and Concurrency • Connection
process 13 web browser web web browser server clients Port 80 web browser connect accept() send()/recv()

Copyright (C) 2008, http://www.dabeaz.com 9- Sockets and Concurrency • To
manage multiple clients, • Server must accept multiple connections and keep all connections alive • Must actively manage all client connections • Each client may be performing different tasks 14

Copyright (C) 2008, http://www.dabeaz.com 9- Forking Server (Unix) 15 import
os from socket import * s = socket(AF_INET,SOCK_STREAM) s.bind(("",9000)) s.listen(5) while True: c,a = s.accept() if os.fork() == 0: # Child process. Manage client ... c.close() os._exit(0) else: # Parent process. Clean up and go # back to wait for more connections c.close() • Each client is handled by a subprocess

Copyright (C) 2008, http://www.dabeaz.com 9- Forking Server • Server forks
a new process to handle each client 16 Server listening Server Server Server Server fork() Client Client Client Client new clients

Copyright (C) 2008, http://www.dabeaz.com 9- It Gets Messy 17 •
There are many ways to set up concurrent execution of clients • Each has various tradeoffs • There is often no one "right" way to do it

Copyright (C) 2008, http://www.dabeaz.com 9- Pre-forked Server (Unix) 18 s
= socket(AF_INET,SOCK_STREAM) s.bind(("",9000)) s.listen(5) nservers = 0 while True: # Spawn servers if nservers < maxservers: if os.fork() == 0: for i in xrange(maxrequests): c,a = s.accept() # Manage client ... c.close() os._exit(0) else: nservers += 1 else: os.wait() n.nservers -= 1 • Server creates copies of itself in advance • A popular approach used by Apache, etc. Each server runs in this simple loop

Copyright (C) 2008, http://www.dabeaz.com 9- Pre-forked Server • There is
a process pool waiting for connections 19 Server listening Server Server Server Server Client new clients Client process pool

Copyright (C) 2008, http://www.dabeaz.com 9- Commentary • The details of
subprocesses are covered in great detail in an Operating Systems coure • There are a few important details • Every process is independent • If multiple CPUs, more than one process can run simultaneously • Processes can exchange data with each other (Interprocess Communication) 20

Copyright (C) 2008, http://www.dabeaz.com 9- IPC and Concurrency • Can
structure applications as a collection of co-operating processes that work together 22 process process process process • Each process runs independently, but sends/ receives data from other processes • Question : What are the communication options?

Copyright (C) 2008, http://www.dabeaz.com 9- IPC Options • IPC: Inter-Process
Communication • Common Choices • Pipes • FIFOs • Sockets • Memory mapped ﬁles • Let's take a look... 23

Copyright (C) 2008, http://www.dabeaz.com 9- Pipes • An I/O channel
between two processes 24 pipe p = subprocess.Popen(['process2'], stdin=subprocess.PIPE, stdout=subprocess.PIPE) # Send data to subprocess p.stdin.write(data) # Receive data from subprocess result = p.stdout.read() Process 1 Process 2 • A pair of "ﬁles" hooked up to a subprocess

Copyright (C) 2008, http://www.dabeaz.com 9- Pipes • Pipes are more
commonly used to collect the output of commands executed as subprocesses • However, a pipe can be left "open" indeﬁnitely • With proper programming, can be used as a bi- directional communication channel for exchanging data • Terminology : This is known as a "co-process" 25

Copyright (C) 2008, http://www.dabeaz.com 9- FIFOs • Unix FIFO queue
(named pipe) 26 • Example: # Creating a FIFO import os os.mkfifo("fifo_A",0666) # Reading from a FIFO f = open("fifo_A","r") data = f.read(nbytes) # Writing to a FIFO f = open("fifo_A","w",0) # Unbuffered I/O f.write(data) Process 1 Process 2 FIFO FIFO

Copyright (C) 2008, http://www.dabeaz.com 9- FIFOs • With care, can
set up elaborate communications 27 Process 1 FIFO1 Process 2 Process 3 FIFO2 FIFO3 • Each process has own FIFO for messages • Any process can send to any other process

Copyright (C) 2008, http://www.dabeaz.com 9- FIFOs • Extreme peril :
With FIFOs, multiple processes can send data to the same target 28 Process 1 Process 2 Process 3 FIFO3 • Will cause chaos on the receiver unless you ﬁgure out some way to coordinate it

Copyright (C) 2008, http://www.dabeaz.com 9- File Locking 29 • May
control access to the channel via ﬁle system locking or some other approach # Each process opens a lock file import fcntl f = open("/tmp/fifo","w",0) # The FIFO g = open("/tmp/fifo.lock","w") # A lock # Critical section fcntl.flock(g.fileno(),fcntl.LOCK_EX) ... f.write("Some data\n") # Write on the FIFO ... fcntl.flock(g.fileno(),fcntl.LOCK_UN) • Example: Unix

Copyright (C) 2008, http://www.dabeaz.com 9- Interlude 30 • You're already
starting to see problems with concurrency • Once there are multiple processes that access to shared resources, you often need to coordinate control and access • Locking and synchronization • Also : Almost none of this is "portable"

Copyright (C) 2008, http://www.dabeaz.com 9- Sockets • Interprocess communication via
network layer 31 Process 1 Process 2 Process 3 socket socket socket • Basic idea: communication via TCP, UDP, etc. • We talked a bit about this last time

Copyright (C) 2008, http://www.dabeaz.com 9- TCP Clients • Using a
socket to make a connection from socket import * s = socket(AF_INET,SOCK_STREAM) s.connect(("some.host.com",10000)) .. # Send/receive data s.send(request) response = s.recv(10000) ... # Done. Close the connection s.close() 32

Copyright (C) 2008, http://www.dabeaz.com 9- TCP Server • A simple
server 33 from socket import * s = socket(AF_INET,SOCK_STREAM) s.bind(("",9000)) s.listen(5) c,a = s.accept() print "Received connection from", a # Send and receive messages request = c.recv(10000) c.send(response) ... # Done c.close()

Copyright (C) 2008, http://www.dabeaz.com 9- Commentary • Sockets most commonly
thought of for "network programming" • Can be used as an IPC mechanism between processes running on the same machine • Networking via the loopback interface (127.0.0.1) 34

Copyright (C) 2008, http://www.dabeaz.com 9- Unix Domain Sockets • Using
the socket API to create a "pipe" s = socket(AF_UNIX,SOCK_STREAM) s.bind("/tmp/foo") s.listen(5) c,a = s.accept() # Send/receive data request = c.recv(10000) c.send(response) ... c.close() 35 • Clients s = socket(AF_UNIX,SOCK_STREAM) s.connect("/tmp/foo") s.send(request) resp = s.recv(10000)

Copyright (C) 2008, http://www.dabeaz.com 9- Pipes versus Sockets • Depending
on the system, pipes and FIFOs are often highly optimized in the operating system • Network layer often involves more processing steps and buffering • However, programming with pipes may be more difﬁcult (especially FIFOs) 36

Copyright (C) 2008, http://www.dabeaz.com 9- Memory Mapped Files • Processes
can share memory via mmap 37 Process 1 Process 2 • Idea here: processes share a mutable byte array • Changes immediately reﬂected in both processes • Highly optimized in the OS (no copying) memory mapped ﬁle array array

Copyright (C) 2008, http://www.dabeaz.com 9- Memory Mapped Files • Creating
a memory mapped ﬁle 38 # Common code import mmap SIZE = 100000 # Number of bytes f = open("shared","w+b") f.seek(SIZE,0) # Expand file to desired size f.write("\n") # Now, memory map the file into an array m = mmap.mmap(f.fileno(),100000, mmap.ACCESS_WRITE) • This creates a shared byte array

Copyright (C) 2008, http://www.dabeaz.com 9- Memory Mapped Files • Using
a memory mapped file 39 m = mmap.mmap(f.fileno(),100000, mmap.ACCESS_WRITE) • Extract data from the memory array data = m[start:stop] • Store data in the memory array m[start:stop] = data # Data must exactly fit • Key point: Modifications to the array instantly appear in all shared copies of the file. Memory is shared, there is no copying/buffering.

Copyright (C) 2008, http://www.dabeaz.com 9- Coordinating Access • Programming with
memory mapped regions requires very careful coordination • Again, you may have to use ﬁle-locks 40 # Each process opens a lock file import fcntl f = open("shared","w+b") # The shared file # Critical section fcntl.flock(f.fileno(),fcntl.LOCK_EX) ... ... Some critical operation ... fcntl.flock(f.fileno(),fcntl.LOCK_UN)

Copyright (C) 2008, http://www.dabeaz.com 9- Portable IPC 41 • How
to program with IPC • In practice, programs can be written in a manner where the actual IPC mechanism being used is hidden from application code. • Programming abstractions: • IPC via "ﬁles" • IPC via "messages"

Copyright (C) 2008, http://www.dabeaz.com 9- IPC via "Files" 42 •
For pipes : You already get a pair of ﬁles p = subprocess.Popen(['cmd'], stdin=subprocess.PIPE, stdout=subprocess.PIPE) in_f = p.stdin out_f = p.stdout • For sockets : Can wrap with a ﬁle-layer s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) ... in_f = s.makefile("w") out_f = s.makefile("r")

Copyright (C) 2008, http://www.dabeaz.com 9- IPC via "Files" 43 •
With a ﬁle-API, you just read/write streams of characters • Processes communicate by interpreting the contents of the I/O stream. • The tricky part : There is no concept of "records" or "messages" • If more than one process write onto a single stream, then you have to ﬁgure out how to coordinate it and sort out the results

Copyright (C) 2008, http://www.dabeaz.com 9- IPC via "Messages" 44 •
General idea : Encapsulate IPC into some sort of message-passing API ch = IPC_Channel(args) ch.send(msg) # Send a message msg = ch.receive() # Receive a message • Message passing is a long established concept • However, there are dozens (if not hundreds) of libraries related to doing it. • Each with their own slightly different API

Copyright (C) 2008, http://www.dabeaz.com 9- IPC via "Messages" 46 •
With messages, processes send well-deﬁned chunks of data to each other. • The absolute critical operations are • send() - Send a message somewhere • receive() - Wait for a message • Let's build a message passing library...

Copyright (C) 2008, http://www.dabeaz.com 9- Message Passing 47 • Basic
idea: Deﬁne an I/O "Channel" class Channel(object): ... • Implement methods such as the following c.send(msg) # Send a message c.receive() # Receive a message • A message is just a string of bytes

Copyright (C) 2008, http://www.dabeaz.com 9- MP Example 48 • Suppose
we want to implement message passing over a pair of ﬁle objects inf # File open for reading outf # File open for writing • Ex: Files might be from a pipe or FIFO

Copyright (C) 2008, http://www.dabeaz.com 9- Channels 49 • Start by
deﬁning a channel class class FileChannel(object): def __init__(self, inf, outf): self.inf = inf self.outf = outf def send(self,msg): pass def receive(self): pass

Copyright (C) 2008, http://www.dabeaz.com 9- Sending Data 50 • Implement
code to send a message class FileChannel(object): ... def send(self,msg): self.outf.write("%d\n" % len(msg)) self.outf.write(msg) self.outf.flush() • In this case, length followed by data size msg • This approach is giving us a means for "framing" the data into records that can be easily understood by the receiver

Copyright (C) 2008, http://www.dabeaz.com 9- Receiving Data 51 • The
opposite of sending class FileChannel(object): ... def receive(self): size_str = self.inf.readline() size = int(size_str) msg = self.inf.read(size) return msg • Note: Would probably want to add some more robust error handling (will skip for now)

Copyright (C) 2008, http://www.dabeaz.com 9- Example : Pipes 52 •
An Echo Client (using pipes) # client.py import sys, channel ch = channel.FileChannel(sys.stdin,sys.stdout) while True: msg = ch.receive() ch.send("Client received: %s" % msg) • This just wraps a channel around stdin/stdout

Copyright (C) 2008, http://www.dabeaz.com 9- Example : Pipes 53 •
Use >>> import popen2,channel >>> out,inp = popen2.popen2("python client.py") >>> ch = channel.FileChannel(out,inp) >>> ch.send("Hello") >>> ch.receive() 'Client received: Hello' >>> ch.send("World") >>> ch.receive() 'Client received: World' >>>

Copyright (C) 2008, http://www.dabeaz.com 9- Example : FIFOs 54 •
An Echo Client # echofifo.py import os, channel os.mkfifo("/tmp/echo_in") os.mkfifo("/tmp/echo_out") echo_in = open("/tmp/echo_in","rb") echo_out = open("/tmp/echo_out","wb",0) ch = channel.FileChannel(echo_in,echo_out) while True: msg = ch.receive() ch.send("Client received: %s" % msg)

Copyright (C) 2008, http://www.dabeaz.com 9- Example : FIFOs 55 •
Use >>> import channel >>> echo_in = open("/tmp/echo_in","wb",0) >>> echo_out = open("/tmp/echo_out","rb") >>> ch = channel.FileChannel(echo_out,echo_in) >>> ch.send("Hello") >>> ch.receive() 'Client received: Hello' >>> • Note : Order in which FIFOs are opened is critical here • Client must already be started

Copyright (C) 2008, http://www.dabeaz.com 9- Example : Sockets 56 •
An Echo Client # echosock.py import socket, channel address = ("",10000) s = socket.socket(socket.AF_INET,socket.SOCK_STREAM) s.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR,1) s.bind(address) s.listen(1) c,a = s.accept() client_f = c.makefile() ch = channel.FileChannel(client_f,client_f) while True: msg = ch.receive() ch.send("Client received: %s" % msg)

Copyright (C) 2008, http://www.dabeaz.com 9- Example : Sockets 57 •
Use >>> import channel >>> import socket >>> s = socket.socket(socket.AF_INET,socket.SOCK_STREAM) >>> s.connect(("",10000) >>> client_f = s.makefile() >>> ch = channel.FileChannel(client_f,client_f) >>> ch.send("Hello") >>> ch.receive() 'Client received: Hello' >>> • Note: Echo client must already be running

Copyright (C) 2008, http://www.dabeaz.com 9- Why Message Passing? 58 •
Message Passing is simple • Just a few basic primitives • send, receive • Can be used to build more advanced IPC programming abstractions • Remote procedure call • Distributed objects (e.g., CORBA, etc.)

Easily reconﬁgured for different systems Process 1 Process 2 Process 3 pipe pipe pipe Process 1 Process 2 Process 3 pipe socket socket

Long history of message passing • It's an established programming technique • Algorithms, properties, pitfalls are known • Scalable performance • Thousands of processors • Supercomputers

Copyright (C) 2008, http://www.dabeaz.com 9- Advanced MP 61 • Dynamic
languages can be extremely powerful when mixed with message passing • Can build systems based on remote procedure call/distributed objects, etc. • Let's look at a simple example...

Copyright (C) 2008, http://www.dabeaz.com 9- Object Serialization 62 • Many
languages allow objects to be "serialized" into strings • For example : pickle module in Python import pickle bytes = pickle.dumps(obj) # Turn obj into bytes obj = pickle.loads(bytes) # Turn bytes back to obj

Copyright (C) 2008, http://www.dabeaz.com 9- Object Serialization 63 • Can
add an object serialization/unserialization step on each end of a communication channel Process 1 Process 2 serialize unserialize • Let's look at that a little further...

Copyright (C) 2008, http://www.dabeaz.com 9- Example 64 • A Pickle
Channel import pickle class PickleChannel(object): def __init__(self,ch): self.ch = ch def send(self,obj): msg = pickle.dumps(obj) self.ch.send(msg) def receive(self): msg = self.ch.receive() return pickle.loads(obj) • This adds load/dump operations on to an existing channel

Copyright (C) 2008, http://www.dabeaz.com 9- Example 65 • A sample
subprocess : An Adder # adder.py import sys, channel fch = channel.FileChannel(sys.stdin,sys.stdout) pch = channel.PickleChannel(fch) while True: x,y = pch.receive() pch.send(x+y) • Receive two objects as input • Adds and sends the result back

Copyright (C) 2008, http://www.dabeaz.com 9- Example 66 • Using the
adder >>> import popen2, channel >>> p_out,p_in = popen2.popen2("python adder.py") >>> fch = channel.FileChannel(p_out,p_in) >>> pch = channel.PickleChannel(fch) >>> pch.send((3,4)) >>> pch.receive() 7 >>> pch.send(("hello","world")) >>> pch.receive() "helloworld" >>> • Notice how any object that supports (+) can be sent

Copyright (C) 2008, http://www.dabeaz.com 9- Remote Procedure Call 67 •
Example is an example of remote procedure call • Have a subprocess/coprocess that implements such functionality • Another process sends data (parameters) and receives results • Can package this up in more exotic ways

Copyright (C) 2008, http://www.dabeaz.com 9- Example: RPC 68 • Remote
Procedure Call # rpcserver.py class RPCServer(object): def __init__(self,ch): self.ch = ch self.funcs = { } def register(self,name,func): self.funcs[name] = func def serve(self): while True: name,args,kwargs = self.ch.receive() result = self.funcs[name](*args,**kwargs) self.ch.send(result) • Note: Needs more error checking

Copyright (C) 2008, http://www.dabeaz.com 9- Example: RPC 69 • RPC
Sample Server # rpcex.py import sys, channel fch = channel.FileChannel(sys.stdin,sys.stdout) pch = channel.PickleChannel(fch) def add(x,y): return x+y def sub(x,y): return x-y import rpcserver serv = rpcserver.RPCServer(pch) serv.register("add",add) serv.register("sub",sub) serv.serve()

Copyright (C) 2008, http://www.dabeaz.com 9- Example: RPC 70 • RPC
Use >>> import popen2, channel >>> p_out,p_in = popen2.popen2("python rpcex.py") >>> fch = channel.FileChannel(out,inp) >>> pch = channel.PickleChannel(fch) >>> pch.send(("add",(3,4),{})) >>> pch.receive() 7 >>> pch.send(("sub",(3,4),{})) >>> pch.receive() -1 >>>

Copyright (C) 2008, http://www.dabeaz.com 9- Example: RPC 71 • A
slightly better client interface def do_rpc(ch,name,*args,**kwargs): ch.send((name,args,kwargs)) return ch.receive() • Example: >>> do_rpc(pch,"add",3,4) 7 >>> do_rpc(pch,"sub",6,2) 4 >>>

Copyright (C) 2008, http://www.dabeaz.com 9- Example: RPC 72 • An
even better client interface def rpc_func(ch,name): def do_rpc(*args,**kwargs): ch.send((name,args,kwargs)) return ch.receive() return do_rpc • Example: >>> add = rpc_func(pch,"add") >>> sub = rpc_func(pch,"sub") >>> add(3,4) 7 >>> sub(3,4) -1 >>>

Copyright (C) 2008, http://www.dabeaz.com 9- Commentary 73 • You could
go even further down this route • We'll leave it at that for now • Also need to think about error checking (a really wicked problem in itself)

Copyright (C) 2008, http://www.dabeaz.com 9- Message Summary 74 • Message
passing is a very powerful technique for setting up concurrent programs • Easily adapted to different I/O schemes • Can be extended across the network • Quite portable if done right

Copyright (C) 2008, http://www.dabeaz.com 9- Concept: Threads • An independent
task that runs inside of a process • Shares resources with the process (memory, ﬁles, etc.) • Has own ﬂow of execution (stack, PC) 76

Copyright (C) 2008, http://www.dabeaz.com 9- Thread Basics 77 % python
program.py Program launch. Python loads a program and starts executing statements statement statement ... "main thread"

program.py Creation of a thread. Launches a function. statement statement ... create thread(foo) def foo():

program.py Parallel execution of statements statement statement ... create thread(foo) def foo(): statement statement ... statement statement ...

program.py thread terminates on return or exit statement statement ... create thread(foo) def foo(): statement statement ... statement statement ... return or exit statement statement ...

program.py statement statement ... create thread(foo) def foo(): statement statement ... statement statement ... return or exit statement statement ... Key idea: Thread is like a little subprocess that runs inside your program thread

Copyright (C) 2008, http://www.dabeaz.com 9- Creating a Thread • To
create a thread, you deﬁne a class import time import threading class CountdownThread(threading.Thread): def __init__(self,count): threading.Thread.__init__(self) self.count = count def run(self): while self.count > 0: print "Counting down", self.count self.count -= 1 time.sleep(5) return • Inherit from Thread and redeﬁne run() 82

Copyright (C) 2008, http://www.dabeaz.com 9- threading module • To launch,
create objects and use start() t1 = CountdownThread(10) # Create the thread object t1.start() # Launch the thread t2 = CountdownThread(20) # Create another thread t2.start() # Launch • Threads execute until run() method returns or exits 83

Copyright (C) 2008, http://www.dabeaz.com 9- Joining a Thread • It
may be necessary to wait for a thread t.start() # Launch a thread ... # Do other work ... # Wait for thread to finish t.join() # Waits for thread t to exit • t.join([timeout]) • Can only be used by other threads (a thread can't join itself) 84

Copyright (C) 2008, http://www.dabeaz.com 9- Daemonic Threads • Creating a
daemon thread (detached thread) t.setDaemon(True) • Daemon threads run forever • Like a background thread • Destroyed when the process exits • Can't be joined • Often used when creating worker/client threads 85

Copyright (C) 2008, http://www.dabeaz.com 9- Thread Synchronization • Threads may
share common data • Extreme care if accessing shared data • One thread must not modify data while another thread is reading it • Otherwise, will get a "race condition" 86

Copyright (C) 2008, http://www.dabeaz.com 9- Race Condition • Consider a
shared object x = 0 • And two threads Thread-1 -------- ... x = x + 1 ... Thread-2 -------- ... x = x - 1 ... • Possible that the attribute will be corrupted • If one thread modiﬁes the value just after the other has read it. 87

Copyright (C) 2008, http://www.dabeaz.com 9- Race Condition • The two
threads Thread-1 -------- ... x = x + 1 ... Thread-2 -------- ... x = x - 1 ... • Low level interpreter code Thread-1 -------- push(x) push(1) add x = pop Thread-2 -------- push(x) push(1) sub x = pop() context switch 88 reads a stale value overwrites update by Thread-2 context switch

Copyright (C) 2008, http://www.dabeaz.com 9- Race Condition • Is this
a real concern or simply theoretical? >>> x = 0 >>> def foo(): ... global x ... for i in xrange(100000000): x += 1 ... >>> def bar(): ... global x ... for i in xrange(100000000): x -= 1 ... >>> t1 = threading.Thread(target=foo) >>> t2 = threading.Thread(target=bar) >>> t1.start(); t2.start() >>> t1.join(); t2.join() >>> x -834018 >>> 89 ???

Copyright (C) 2008, http://www.dabeaz.com 9- Mutex Locks • Mutual exclusion
locks m = threading.Lock() # Create a lock m.acquire() # Acquire the lock m.release() # Release the lock • If another thread tries to acquire the lock, it blocks until the lock is released • Only one thread may hold the lock 90

Copyright (C) 2008, http://www.dabeaz.com 9- Use of Mutex Locks •
Commonly placed around critical sections x = 0 x_lck = threading.Lock() def foo(): global x x_lck.acquire() x += 1 x_lck.release() def bar(): global x x_lck.acquire() x -= 1 x_lck.release() 91 Critical section Critical section

Copyright (C) 2008, http://www.dabeaz.com 9- Other Locking Primitives • Reentrant
Mutex Lock m = threading.RLock() # Create a lock m.acquire() # Acquire the lock m.release() # Release the lock • Semaphores m = threading.Semaphore(n) # Create a semaphore m.acquire() # Acquire the lock m.release() # Release the lock • Lock based on a counter • Can be acquired multiple times by same thread • Won't cover in detail here 92

Copyright (C) 2008, http://www.dabeaz.com 9- Events • Use to communicate
between threads e = threading.Event() e.isSet() # Return True if event set e.set() # Set event e.clear() # Clear event e.wait() # Wait for event • Common use Thread 1 -------- ... # Wait for an event e.wait() ... # Respond to event 93 Thread 2 -------- ... # Trigger an event e.set() notify

Copyright (C) 2008, http://www.dabeaz.com 9- Events/Multiple Threads • Events can
work with multiple threads 94 Thread 1 e.wait() setting the event unblocks all waiting threads Thread 2 e.wait() Thread 3 e.wait() e = threading.Event() blocked Thread X e.set()

Copyright (C) 2008, http://www.dabeaz.com 9- Programming with Threads • Must
deﬁne parts of program that can run concurrently (may depend on algorithm) • Must identify all shared data structures • Must protect critical sections with locks • Synchronize threads with events as needed • Must cross ﬁngers and hope that it works 95

Copyright (C) 2008, http://www.dabeaz.com 9- Thread Pitfalls • Obscure race
conditions (corner cases) • Deadlock (mismanagement of locks) 96 def foo(): lck.acquire() ... if expr: return ... lck.release() Oops. Forgot to release lock • More complicated development/debugging • Poor performance (excessive locking)

Copyright (C) 2008, http://www.dabeaz.com 9- Using Threads • If you
must use threads, consider using the approach which causes the least amount of peril and pain • Independent threads that communicate via message queues 97 Thread 1 Thread 2 Queue

Copyright (C) 2008, http://www.dabeaz.com 9- Queue Example • Queue module
in Python • Creating a Queue with maximum # elements import Queue q = Queue.Queue(maxsize) • To create an inﬁnite Queue import Queue q = Queue.Queue() 98

Copyright (C) 2008, http://www.dabeaz.com 9- Queue Operations • To insert
an item q.put(item) • Blocks until space in is available 99 • Removing items from a queue item = q.get() • If Queue empty, blocks for data to arrive

Copyright (C) 2008, http://www.dabeaz.com 9- Producer-Consumer in_q = Queue.Queue() ...
def consume_items(): while True: item = in_q.get() # Consume the item ... • Producer threads • Consumer thread while True: # Produce an item ... # Send to the consumer in_q.put(item) 100

Copyright (C) 2008, http://www.dabeaz.com 9- Producer-Consumer class Consumer(threading.Thread): def __init__(self):
threading.Thread.__init__(self) self.in_q = Queue.Queue() def send(self,item): self.in_q.put(item) def run(self): while True: item = self.in_q.get() # Process item ... • An alternative formulation is to structure consumers as objects you "send" items to 101 • This ties threads to "message passing"

Copyright (C) 2008, http://www.dabeaz.com 9- Producer-Consumer • Commentary on solution
• No locks. Queue is thread-safe • No shared data. Producer/consumer only communicate via queue. • Strikingly similar to message passing • Code is simple 102

Copyright (C) 2008, http://www.dabeaz.com 9- Cost of Threads • Threads
sometimes considered for applications where there is massive concurrency (e.g., server with thousands of clients) • However, threads are fairly expensive • Don't improve performance (context-switching) • Incur considerable memory overhead (each thread has its own C stack, etc.) 103

Copyright (C) 2008, http://www.dabeaz.com 9- Problems with Threads • Dynamic
languages often make very poor use of threads • The interpreters themselves are often not thread-safe (or are locked down in some way) • Example : Global interpreter lock in Python • As a result, even if you use threads, programs won't run on more than one CPU 104

Copyright (C) 2008, http://www.dabeaz.com 9- Alternatives to Threads • Co-operating
processes (better performance on multiple CPUs) • Event driven programming • Co-routines 105

Copyright (C) 2008, http://www.dabeaz.com 9- Event Driven Systems • Programs
structured as an event loop 107 while True: event = get_event() if event.type == BUTTON_PRESS: do_button(event) elif event.type == MOUSE_MOVE: do_mousemove(event) elif event.type == KEYPRESS: do_keypress(event) elif event.type == FILE_INPUT: do_fileinput(event) elif event.type == NETWORK: do_network(event) ...

Copyright (C) 2008, http://www.dabeaz.com 9- Event Driven Systems • In
event driven systems, programs get built as a collect of function/objects that react to different events • Classic example : GUIs • However, the same approach can be applied to networks, ﬁle I/O, etc. 108

Copyright (C) 2008, http://www.dabeaz.com 9- Example : A GUI Button
• Make a button (using Tk) 109 >>> def response(): ... print "You did it!" ... >>> from Tkinter import Button >>> x = Button(None,text="Do it!",command=response) >>> x.pack() >>> x.mainloop() • Clicking on the button.... You did it! You did it! ...

Copyright (C) 2008, http://www.dabeaz.com 9- Co-routines • A technique sometimes
used for implementing co-operative multitasking 110 def do_foo(): while True: # Various statements .... ... (yield) # Yield control to someone else ... • Basic idea : Functions run until they explicitly yield to some other function • Only one thing runs at once, but it gives the illusion of concurrency

Copyright (C) 2008, http://www.dabeaz.com 9- Coroutine Example • Python co-routine
example 111 def countdown(n): while True: print "T-minus", n (yield) n = n - 1 • This is like a generator, but we're not actually generating any values >>> c = countdown(10) >>> c.next() T-minus 10 >>> c.next() T-minus 9 >>>

Copyright (C) 2008, http://www.dabeaz.com 9- Coroutine Example • Running multiple
co-routines 112 c1 = countdown(20) c2 = countdown(10) procs = [c1,c2] while procs: for p in procs: try: p.next() except StopIteration: procs.remove(p) • This is an outer loop that is "scheduling" the different co-routines (round-robin)

Copyright (C) 2008, http://www.dabeaz.com 9- Coroutine Example • Example output
113 T-minus 20 T-minus 10 T-minus 19 T-minus 9 T-minus 18 T-minus 8 T-minus 17 T-minus 7 T-minus 16 T-minus 6 T-minus 15 T-minus 5 T-minus 14

Copyright (C) 2008, http://www.dabeaz.com 9- Next class • March 20!
• Project presentations and wrap-up • No class, March 13. 115

Principles of Dynamic Languages

Principles of Dynamic Languages

More Decks by David Beazley

Other Decks in Programming

Featured

Transcript